Interpretability
Peeking inside the black box of machine learning algorithms to build robust understandings of what they do and why.
Current Projects
Releases
Featured
A library implementing the Tuned Lens, along with other tools for extracting, manipulating, and studying the learned representations of transformers across layers.
Publications
Featured