Library Curtis Huebner Library Curtis Huebner

Mesh Transformer Jax

A JAX and TPU-based library developed by Ben Wang. The library has been used to train GPT-J.

https://github.com/kingoflolz/mesh-transformer-jax

Read More
Model Stella Biderman Model Stella Biderman

VQGAN-CLIP

A technique for doing text-to-image synthesis cheaply using pretrained CLIP and VQGAN models.

VQGAN-CLIP is a methodology for using multimodal embedding models such as CLIP to guide text-to-image generative algorithms without additional training. While the results tend to be worse than pretrained text-to-image generative models, they are orders of magnitude cheaper and can often be assembled out of pre-existing independently valuable models. Our core approach has been adopted to a variety of domains including text-to-3D and audio-to-image synthesis, as well as to develop novel synthetic materials.

Read More
Model Stella Biderman Model Stella Biderman

GPT-Neo

A set of 3 decoder-only LLMs with 125M, 1.3B, and 2.7B parameters trained on the Pile.

A series of large language models trained on the Pile. It was our first attempt to produce GPT-3-like language models and comes in 125M, 1.3B, and 2.7B parameter variants.

Read More
Library Curtis Huebner Library Curtis Huebner

GPT-Neo Library

A library for training language models written in Mesh TensorFlow. This library was used to train the GPT-Neo models, but has since been retired and is no longer maintained. We currently recommend the GPT-NeoX library for LLM training.

https://github.com/EleutherAI/gpt-neo

Read More
Dataset Stella Biderman Dataset Stella Biderman

The Pile

A large-scale corpus for training language models, composed of 22 smaller sources. The Pile is publicly available and freely downloadable, and has been used by a number of organizations to train large language models.

The Pile is a curated collection of 22 diverse high-quality datasets for training large language models.

Read More
Dataset Stella Biderman Dataset Stella Biderman

OpenWebText2

OpenWebText2 is an enhanced version of the original OpenWebTextCorpus, covering all Reddit submissions from 2005 up until April 2020. It was developed primarily to be included in the Pile.

OpenWebText2 is an enhanced version of the original OpenWebTextCorpus, covering all Reddit submissions from 2005 up until April 2020. It was developed primarily to be included in the Pile.

Read More