Stella Biderman 16/10/2023 Stella Biderman 16/10/2023

LLeMA

Language models for mathematical applications

Stella Biderman 13/02/2023 Stella Biderman 13/02/2023

Pythia

A suite of models designed to enable controlled scientific research on transparently trained LLMs

A suite of 16 models with 154 partially trained checkpoints designed to enable controlled scientific research on openly accessible and transparently trained large language models.

Stella Biderman 15/12/2022 Stella Biderman 15/12/2022

SD Upscaler

A diffusion-based model for upscaling images to higher resolution, trained by Katherine Crowson in collaboration with Stability AI.

A diffusion-based model for upscaling images to higher resolution, trained by Katherine Crowson in collaboration with Stability AI. It is capable of upscaling both generated and non-generated images.

https://colab.research.google.com/drive/1o1qYJcFeywzCIdkfKJy7cTpgZTCM2EI4

Stella Biderman 15/12/2022 Stella Biderman 15/12/2022

Polyglot-Ko

A series of Korean autoregressive language models made by the EleutherAI polyglot team. We currently have trained and released 1.3B, 3.8B, and 5.8B parameter models.

Polyglot-Ko is a series of Korean autoregressive language models made by the EleutherAI polyglot team. We currently have trained and released 1.3B, 3.8B, and 5.8B parameter models.

Stella Biderman 15/12/2022 Stella Biderman 15/12/2022

CLIP-Guided Diffusion

A technique for doing text-to-image synthesis cheaply using pretrained CLIP and diffusion models.

https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj#scrollTo=1YwMUyt9LHG1

Stella Biderman 15/12/2022 Stella Biderman 15/12/2022

Cloob-Conditioned Latent Diffusion

A highly efficient text-to-image model that can be trained without captioned images.

John David Pressman, Katherine Crowson

https://github.com/JD-P/cloob-latent-diffusion

Stella Biderman 05/12/2022 Stella Biderman 05/12/2022

RWKV

RWKV is an RNN with transformer-level performance at some language modeling tasks. Unlike other RNNs, it can be scaled to tens of billions of parameters efficiently.

RWKV is an RNN with transformer-level performance at some language modeling tasks. Unlike other RNNs, it can be scaled to tens of billions of parameters quite efficiently.

Stella Biderman 23/11/2022 Stella Biderman 23/11/2022

OpenFold

A trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold2

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold2

Stella Biderman 02/02/2022 Stella Biderman 02/02/2022

GPT-NeoX-20B

An open source English autoregressive language model trained on the Pile. At the time of its release, it was the largest publicly available language model in the world.

GPT-NeoX-20B is a open source English autoregressive language model trained on the Pile,. At the time of its release, it was the largest publicly available language model in the world.

Guest User 06/10/2021 Guest User 06/10/2021

CARP

A CLIP-like model trained on (text, critique) pairs with the goal of learning the relationships between passages of text and natural language feedback on those passages.

Stella Biderman 04/06/2021 Stella Biderman 04/06/2021

GPT-J

A six billion parameter open source English autoregressive language model trained on the Pile. At the time of its release it was the largest publicly available GPT-3-style language model in the world.

GPT-J is a six billion parameter open source English autoregressive language model trained on the Pile. At the time of its release it was the largest publicly available GPT-3-style language model in the world.

Stella Biderman 03/04/2021 Stella Biderman 03/04/2021

VQGAN-CLIP

A technique for doing text-to-image synthesis cheaply using pretrained CLIP and VQGAN models.

VQGAN-CLIP is a methodology for using multimodal embedding models such as CLIP to guide text-to-image generative algorithms without additional training. While the results tend to be worse than pretrained text-to-image generative models, they are orders of magnitude cheaper and can often be assembled out of pre-existing independently valuable models. Our core approach has been adopted to a variety of domains including text-to-3D and audio-to-image synthesis, as well as to develop novel synthetic materials.

Stella Biderman 21/03/2021 Stella Biderman 21/03/2021

GPT-Neo

A set of 3 decoder-only LLMs with 125M, 1.3B, and 2.7B parameters trained on the Pile.

A series of large language models trained on the Pile. It was our first attempt to produce GPT-3-like language models and comes in 125M, 1.3B, and 2.7B parameter variants.