RESOURCES
Get started_
A collection of resources to get you started with mechanistic interpretability and AI red teaming. Updated regularly — work down the list, from orientation to hands-on practice.
Request access to the Discord communityRoadmaps
Courses
Blogs
How to become a mechanistic interpretability researcherNeel Nanda — Head of Alignment, Google DeepMindA Mathematical Framework for Transformer CircuitsAnthropicSelf-preservation or Instruction Ambiguity? Examining the Causes of Shutdown ResistanceAlignment ForumAn Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2Neel NandaInterpretability DreamsChris OlahHow to hack AI appsJoseph Thacker
Papers
YouTube
GitHub
Tools
Low-refusal models
