You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
train.py train the character counting model, as well as make up the data
intervention.py activation patching experiments
visualize.ipynb results visualization code
./ioi
make_decoder_train_data.py contains code to make up the data
DLA.py implements the DLA experiments
./addition
train.py train the 3 digit addition model, as well as make up the data
intervention.py main activation patching experiments
interventionPlus.py activation patching experiments for the "+" sign
visualize.ipynb results visualization code
./factual
find_heads_attribution.py find 25 most important heads in upper layers
make_data_part1.py select text from COUNTERFACT and BEAR that would "activate" each head (do not attend to BOS too much) at the END position
make_data_part2.py select text from miniPile that would "activate" each head
cal_freq.py calculate token frequency over miniPile
./decoder
model.py defines the model architecture
train.py train the decoder
cache_generation.py generate samples using decoder, but not in a visualized form, need to be transferred to streamlit app
run.sh commands to train decoder and generate samples
utils.pygenerate.py functions used by other files
cache_attention.py used to save attention patterns
scatter_completeness.pyscatter_completeness_plot.py draw scatter plots to verify the completeness
./training_outputs contains the model checkpoint of the probed model for counting and addition task, so the results are reproduceable
./LLM contains prompts and code used to automatically generate interpretation with LLMs
./webAPP contains source code for our web application
Steps to run
Go to ./ioi and run make_decoder_train_data.py to generate data for ioi task. You don't need to do this for counting and addition task. To run factual recall experiment, first download COUNTERFACT and BEAR data (the links are in ./factual/make_data_part1.py), then go to ./factual and run make_data_part1.py and make_data_part2.py sequentially.
Go to ./decoder and check run.sh pick a task you are interested and train the decoder. For example python train.py --probed_task counting --rebalance 6.0 --save_dir $dir_name --batch_size 256 --num_epoch 100 --data_per_epoch 1000000 --num_test_rollout 200 > ./data_and_model/counting.txt
In run.sh it also contains command for generating preimage samples using decoder. For example, python cache_generation.py --probed_task counting and the generation will appear in ./training_outputs The best way to check the generated samples is to go into ./webAPP folder and do streamlit run InversionView.py