Eliciting Latent Knowledge
Stella Biderman Stella Biderman

Eliciting Latent Knowledge

As models get smarter, humans won't always be able to independently check if a model's claims are true or false. We aim to circumvent this issue by directly eliciting latent knowledge (ELK) inside the model’s activations.

Read More
Alignment MineTest
Stella Biderman Stella Biderman

Alignment MineTest

Alignment-MineTest is a research project that uses the open source Minetest voxel engine as a platform for studying AI alignment.

Read More
Mesaoptimization
Stella Biderman Stella Biderman

Mesaoptimization

Studying how auxiliary optimization objectives arise in models

Read More