Find all our models, codebases, and datasets
Featured
A 55 billion token dataset of mathematical and scientific documents, created for training the LLeMA models.
A 14.7B token dataset of high quality English mathematical text.
A library implementing the Tuned Lens, along with other tools for extracting, manipulating, and studying the learned representations of transformers across layers.
A diffusion-based model for upscaling images to higher resolution, trained by Katherine Crowson in collaboration with Stability AI.
A series of Korean autoregressive language models made by the EleutherAI polyglot team. We currently have trained and released 1.3B, 3.8B, and 5.8B parameter models.