Datasheet for the Pile
This datasheet describes the Pile, a 825 GiB dataset of human-authored text compiled by EleutherAI for use in large-scale language modeling.
This datasheet describes the Pile, a 825 GiB dataset of human-authored text compiled by EleutherAI for use in large-scale language modeling.