CLIP-Guided Diffusion
A technique for doing text-to-image synthesis cheaply using pretrained CLIP and diffusion models.
A technique for doing text-to-image synthesis cheaply using pretrained CLIP and diffusion models.
https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj#scrollTo=1YwMUyt9LHG1
Cloob-Conditioned Latent Diffusion
A highly efficient text-to-image model that can be trained without captioned images.
John David Pressman, Katherine Crowson
Simulacra Aesthetic Captions
A dataset of prompts, synthetic AI generated images, and aesthetic ratings of those images.
Simulacra Aesthetic Captions is a dataset of over 238000 synthetic images generated with AI models such as CompVis latent GLIDE and Stable Diffusion from over forty thousand user submitted prompts. The images are rated on their aesthetic value from 1 to 10 by users to create caption, image, and rating triplets. In addition to this each user agreed to release all of their work with the bot: prompts, outputs, ratings, completely public domain under the CC0 1.0 Universal Public Domain Dedication. The result is a high quality royalty free dataset with over 176000 ratings.
VQGAN-CLIP
A technique for doing text-to-image synthesis cheaply using pretrained CLIP and VQGAN models.
VQGAN-CLIP is a methodology for using multimodal embedding models such as CLIP to guide text-to-image generative algorithms without additional training. While the results tend to be worse than pretrained text-to-image generative models, they are orders of magnitude cheaper and can often be assembled out of pre-existing independently valuable models. Our core approach has been adopted to a variety of domains including text-to-3D and audio-to-image synthesis, as well as to develop novel synthetic materials.