Curtis Huebner 01/05/2021 Curtis Huebner 01/05/2021

Mesh Transformer Jax

A JAX and TPU-based library developed by Ben Wang. The library has been used to train GPT-J.

https://github.com/kingoflolz/mesh-transformer-jax

Stella Biderman 03/04/2021 Stella Biderman 03/04/2021

VQGAN-CLIP

A technique for doing text-to-image synthesis cheaply using pretrained CLIP and VQGAN models.

VQGAN-CLIP is a methodology for using multimodal embedding models such as CLIP to guide text-to-image generative algorithms without additional training. While the results tend to be worse than pretrained text-to-image generative models, they are orders of magnitude cheaper and can often be assembled out of pre-existing independently valuable models. Our core approach has been adopted to a variety of domains including text-to-3D and audio-to-image synthesis, as well as to develop novel synthetic materials.

Stella Biderman 21/03/2021 Stella Biderman 21/03/2021

GPT-Neo

A set of 3 decoder-only LLMs with 125M, 1.3B, and 2.7B parameters trained on the Pile.

A series of large language models trained on the Pile. It was our first attempt to produce GPT-3-like language models and comes in 125M, 1.3B, and 2.7B parameter variants.

Curtis Huebner 21/03/2021 Curtis Huebner 21/03/2021

GPT-Neo Library

A library for training language models written in Mesh TensorFlow. This library was used to train the GPT-Neo models, but has since been retired and is no longer maintained. We currently recommend the GPT-NeoX library for LLM training.

https://github.com/EleutherAI/gpt-neo

Stella Biderman 31/12/2020 Stella Biderman 31/12/2020

The Pile

A large-scale corpus for training language models, composed of 22 smaller sources. The Pile is publicly available and freely downloadable, and has been used by a number of organizations to train large language models.

The Pile is a curated collection of 22 diverse high-quality datasets for training large language models.

Stella Biderman 30/12/2020 Stella Biderman 30/12/2020

OpenWebText2

OpenWebText2 is an enhanced version of the original OpenWebTextCorpus, covering all Reddit submissions from 2005 up until April 2020. It was developed primarily to be included in the Pile.