This repo contains the source code for the paper TypeT5: Seq2seq Type Inference using Static Analysis.
@inproceedings{Wei2023TypeT5, title={TypeT5: Seq2seq Type Inference using Static Analysis}, author={Jiayi Wei and Greg Durrett and Isil Dillig}, booktitle={International Conference on Learning Representations}, year={2023}, url={https://openreview.net/forum?id=4TyNEhI2GdN} } This project uses pipenv to manage the package dependencies. Pipenv tracks the exact package versions and manages the (project-specific) virtual environment for you. To install all dependencies, make sure you have pipenv and Python 3.10 installed, then, at the project root, run the following two commands:
pipenv --python <path-to-your-python-3.10> # create a new environment for this project pipenv sync --dev # install all specificed dependenciesMore about pipenv:
- To add new dependences into the virtual environment, you can either add them via
pipenv install ..(usingpipenv) orpipenv run pip install ..(usingpipfrom within the virtual environment). - If your pytorch installation is not working properly, you might need to reinstall it via the
pipenv run pip installapproach rather thanpipenv install. - All
.pyscripts below can be run viapipenv run python <script-name.py>. For.ipynbnotebooks, make sure you select the pipenv environment as the kernel. You can run all unit tests by runningpipenv run pytestat the project root.
If you are not using pipenv:
- Make sure to add the environment variables in the .env file to your shell environment when you run the scripts (needed by the parsing library).
- We also provided a requirements.txt file for you to install the dependencies via
pip install -r requirements.txt.
The notebook scripts/run_typet5.ipynb shows you how to download the TypeT5 model from Huggingface and then use it to make type predictions for a specified codebase.
- First, run the notebook scripts/collect_dataset.ipynb to download and split the BetterTypes4Py dataset used in our paper.
- The exact list of repos we used for the experiments in paper can be loaded from
data/repos_split.pklusingpickle.load. They can also be downloaded via this Google Drive link.
- The exact list of repos we used for the experiments in paper can be loaded from
- Then, run scripts/train_model.py to train a new TypeT5 model. Training takes about 11 hours on a single Quadro RTX 8000 GPU with 48GB memory.
- Formatter: We use
blackfor formatting with the default options. - Type Checker: We use Pylance to type check this codebase. It's the built-in type checker shipped with the VSCode Python extension and can be enabled by setting
Python > Anlaysis > Type Checking Modetobasic.
