Evan Hubinger

E1038739

person

Evan Hubinger is an AI safety researcher known for his work on alignment and interpretability, and as one of the early technical leaders at Anthropic.

Try in SPARQL Jump to: Statements Referenced by

Statements (44)

Predicate	Object
instanceOf	person ⓘ
associatedWithCommunity	AI safety research community ⓘ effective altruism NERFINISHED ⓘ
associatedWithOrganization	Machine Intelligence Research Institute (MIRI) NERFINISHED ⓘ
basedIn	United States of America ⓘ surface form: United States
coAuthorOf	Risks from Learned Optimization in Advanced Machine Learning Systems NERFINISHED ⓘ
contributedTo	formalization of inner vs outer alignment distinction ⓘ taxonomy of AI alignment failures ⓘ
earlyMemberOf	Anthropic technical leadership ⓘ
educatedAt	Harvey Mudd College NERFINISHED ⓘ
employer	Anthropic NERFINISHED ⓘ
fieldOfStudy	computer science ⓘ mathematics ⓘ
fieldOfWork	AI alignment ⓘ AI interpretability ⓘ
focusesOn	alignment of large-scale machine learning systems ⓘ technical AI safety ⓘ understanding learned optimization in neural networks ⓘ
hasBlog	https://www.alignmentforum.org ⓘ
hasGitHubProfile	https://github.com/evhub ⓘ
hasOnlineHandle	evhub NERFINISHED ⓘ
hasPersonalWebsite	https://evhub.github.io ⓘ
hasPresentationOn	inner alignment failures GENERATED ⓘ mesa-optimization GENERATED ⓘ
hasResearchInterest	deceptive alignment ⓘ interpretability tools for large models ⓘ scalable oversight ⓘ training dynamics of advanced AI systems ⓘ
hasTalkOn	AI alignment at EAG conferences ⓘ AI safety at technical workshops ⓘ
knownAs	Evan Hubinger NERFINISHED ⓘ
languageSpoken	English ⓘ
mainSubjectOfWork	AI safety threat models ⓘ inner alignment ⓘ mesa-optimizers in machine learning systems ⓘ outer alignment ⓘ
notableFor	research on mesa-optimization ⓘ work on AI alignment ⓘ work on AI interpretability ⓘ
occupation	AI safety researcher ⓘ
positionHeld	research scientist at Anthropic ⓘ
publishesPreprintsOn	arXiv NERFINISHED ⓘ
writesFor	Alignment Forum NERFINISHED ⓘ LessWrong NERFINISHED ⓘ

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Anthropic → foundedBy → Evan Hubinger ⓘ