Evan Hubinger
E1038739
Evan Hubinger is an AI safety researcher known for his work on alignment and interpretability, and as one of the early technical leaders at Anthropic.
Statements (44)
| Predicate | Object |
|---|---|
| instanceOf | person ⓘ |
| associatedWithCommunity |
AI safety research community
ⓘ
effective altruism NERFINISHED ⓘ |
| associatedWithOrganization | Machine Intelligence Research Institute (MIRI) NERFINISHED ⓘ |
| basedIn |
United States of America
ⓘ
surface form:
United States
|
| coAuthorOf | Risks from Learned Optimization in Advanced Machine Learning Systems NERFINISHED ⓘ |
| contributedTo |
formalization of inner vs outer alignment distinction
ⓘ
taxonomy of AI alignment failures ⓘ |
| earlyMemberOf | Anthropic technical leadership ⓘ |
| educatedAt | Harvey Mudd College NERFINISHED ⓘ |
| employer | Anthropic NERFINISHED ⓘ |
| fieldOfStudy |
computer science
ⓘ
mathematics ⓘ |
| fieldOfWork |
AI alignment
ⓘ
AI interpretability ⓘ |
| focusesOn |
alignment of large-scale machine learning systems
ⓘ
technical AI safety ⓘ understanding learned optimization in neural networks ⓘ |
| hasBlog | https://www.alignmentforum.org ⓘ |
| hasGitHubProfile | https://github.com/evhub ⓘ |
| hasOnlineHandle | evhub NERFINISHED ⓘ |
| hasPersonalWebsite | https://evhub.github.io ⓘ |
| hasPresentationOn |
inner alignment failures
GENERATED
ⓘ
mesa-optimization GENERATED ⓘ |
| hasResearchInterest |
deceptive alignment
ⓘ
interpretability tools for large models ⓘ scalable oversight ⓘ training dynamics of advanced AI systems ⓘ |
| hasTalkOn |
AI alignment at EAG conferences
ⓘ
AI safety at technical workshops ⓘ |
| knownAs | Evan Hubinger NERFINISHED ⓘ |
| languageSpoken | English ⓘ |
| mainSubjectOfWork |
AI safety threat models
ⓘ
inner alignment ⓘ mesa-optimizers in machine learning systems ⓘ outer alignment ⓘ |
| notableFor |
research on mesa-optimization
ⓘ
work on AI alignment ⓘ work on AI interpretability ⓘ |
| occupation | AI safety researcher ⓘ |
| positionHeld | research scientist at Anthropic ⓘ |
| publishesPreprintsOn | arXiv NERFINISHED ⓘ |
| writesFor |
Alignment Forum
NERFINISHED
ⓘ
LessWrong NERFINISHED ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.