Dataset Stella Biderman 16/10/2023 Dataset Stella Biderman 16/10/2023 Proof-Pile-2 A 55 billion token dataset of mathematical and scientific documents, created for training the LLeMA models. Read More Dataset Stella Biderman 10/10/2023 Dataset Stella Biderman 10/10/2023 OpenWebMath A 14.7B token dataset of high quality English mathematical text. Read More
Dataset Stella Biderman 16/10/2023 Dataset Stella Biderman 16/10/2023 Proof-Pile-2 A 55 billion token dataset of mathematical and scientific documents, created for training the LLeMA models. Read More
Dataset Stella Biderman 10/10/2023 Dataset Stella Biderman 10/10/2023 OpenWebMath A 14.7B token dataset of high quality English mathematical text. Read More