Project of Machine Learning (CS3308 SJTU)
Clone the repository and install the required dependencies.
# Clone the repository git clone https://github.com/mybearyZhang/ML-for-IC-Design.git # Navigate to the project directory cd ML-for-IC-Design # Install dependencies pip install -r requirements.txtMoreover, you need to install the abc, abc_py, yosys packages for logic synthesis. You can install it refer to the following repositories:
Before running the tasks, ensure your datasets are properly prepared and located in the designated directories. Datas can be found at https://jbox.sjtu.edu.cn/l/01O2vH.
You should have the following directory structure:
ML-for-IC-Design/ ├── project_data/ │ ├── adder_0.pkl │ ├── adder_1.pkl │ └── ... ├── project_data2/ │ ├── adder_0.pkl │ ├── adder_1.pkl │ └── ... ├── InitialAIG/ │ ├── test/ │ │ └── ... │ └── train/ │ └── ... ├── lib/ │ └── 7nm/ │ └── 7nm.lib └──src └── ... This task involves predicting the evaluation metrics of a logic synthesis process using machine learning techniques.
-
Run read.py to Read and Cache Data:
Execute the
read.pyscript to read data from theproject_datadirectory and cache it for further processing.python -m src.task1.read
-
Run split.py to Prepare Train-Test Splits:
Execute the
split.pyscript to convert the cached data into AIG format and split it into train and test sets.python -m src.task1.split
-
Run train.py to Train the Model:
Execute the
train.pyscript to train the model using the train set.python -m src.task1.train [arguments]
Additional parameters can be specified to customize the training process:
- --data_dir: Directory containing the data files.
- --max_train_samples: Maximum number of training samples to load.
- --batch_size: Batch size for training and evaluation.
- --num_workers: Number of workers for data loading.
- --hidden_channels: Number of hidden channels in the GCN.
- --learning_rate: Learning rate for the optimizer.
- --weight_decay: Weight decay for the optimizer.
- --num_epochs: Number of epochs to train.
- --best_model_path: Path to save the best model.
- --project_name: Project name for wandb logging.
In this task, we aim to predict the logic synthesis decision based on the input AIGs.
First we train a GNN network to predict the future rewards of the logic synthesis process. The rewards are calculated based on the evaluation metrics of the AIGs.
Similar to Task 1, we first read and cache the data, then split the data into train and test sets. Finally, we train the model to predict the future rewards. The execution commands are as follows:
python -m src.task2.read python -m src.task2.split python -m src.task2.train_reward [arguments]The arguments are similar to Task 1, please refer to the previous section for more details.
To train the BCQ algorithm, you need to generate a replay buffer first. Please run the following command (Ensure you have run src.task1.read):
python src/task2/gen_buffer.pyAfter you get the replay buffer, you can set the replay buffer path in src/task2/bcq/train.py and train the BCQ algorithm by running the following command:
cd src/task2/bcq python train.pyTo search for the best decision, you can run the following command:
cd src/task2 python main.py --aig AIG --model_path MODEL_PATH [--data_root DATA_ROOT] [--method {astar,greedy,rl}]You can easily choose different search algorithms by passing the --method argument. The available methods are astar, greedy, and rl. --aig argument is the name of the aig file.
If you want to evaluate on a series of AIGs, you can run the following script:
bash eval.sh [method]In this task, we introduce the prompting process utilized for generating Verilog code using Large Language Models (LLMs). Two prompting techniques are employed: naive prompting and in-context learning.
Naive prompting involves providing task descriptions directly to LLMs to generate corresponding Verilog code. Prompts outline the task scope, granting LLMs creative freedom. Sample prompts include:
- Row Multiplication: Write a Verilog code that generates the results of row multiplication.
- Finite State Machine: Write a Verilog code of a finite state machine that encompasses more than 2 stages.
- Memory Unit Design: Write a Verilog code to generate a memory-like storage structure
In Context Learning entails dynamic model adaptation based on task-specific context. Initial solutions, generated without parameter fine-tuning, are iteratively refined using original samples as prompts. The iterative refinement process involves:
- First Prompting: Write a Verilog code that generates the results of row multiplication.
- Second Prompting: I have a generated row multiplication code as follows {last-version-of-code}, refine it by changing the dimension of matrix A and B into 5*5
- Third Prompting: I have a generated row multiplication code as follows {last-version-of-code}, refine it by adding another matrix.
- Fourth Prompting: I have a generated row multiplication code as follows {last-version-of-code}, refine it by adding error handling to the original code.
- Fifth Prompting: I have a generated row multiplication code as follows {last-version-of-code}, refine it by adding test benches.