Stella Biderman 16/12/2023 Stella Biderman 16/12/2023

Detecting Backdoors with Meta-Models

It is widely known that it is possible to implant backdoors into neural networks, by which an attacker can choose an input to produce a particular undesirable output (e.g. misclassify an image). We propose to use meta-models, neural networks that take another network's parameters as input, to detect backdoors directly from model weights. To this end we present a meta-model architecture and train it on a dataset of ~4000 clean and backdoored CNNs trained on CIFAR-10. Our approach is simple and scalable, and is able to detect the presence of a backdoor with accuracy when the test trigger pattern is i.i.d., with some success even on out-of-distribution backdoors.

Stella Biderman 18/05/2022 Stella Biderman 18/05/2022

VQGAN-CLIP: Open domain image generation and editing

Katherine Crowson*, Stella Biderman*, Daniel Kornis, Dashiell Stander, Eric Hallahan, Louis Castricato, and Edward Raff. “VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance.” In Proceedings of the European Conference on Computer Vision (ECCV), 2022.

Generating and editing images from open domain text prompts is a challenging task that heretofore has required expensive and specially trained models. We demonstrate a novel methodology for both tasks which is capable of producing images of high visual quality from text prompts of significant semantic complexity without any training by using a multimodal encoder to guide image generations. We demonstrate on a variety of tasks how using CLIP [37] to guide VQGAN [11] produces higher visual quality outputs than prior, less flexible approaches like DALL-E [38], GLIDE [33] and Open-Edit [24], despite not being trained for the tasks presented. Our code is available in a public repository.