LooGLE v2: A novel real-world benchmark for long-dependency understanding
First, create a conda environment and install the required dependencies:
conda create -n loogle python=3.10 conda activate loogle pip install vllmThen, clone the benchmark repository:
git clone https://github.com/GraphPKU/LooGLE-v2.git cd LooGLE-v2You can download the benchmark dataset into the ./datasets directory with the following command:
git clone https://huggingface.co/datasets/GraphPKU/LooGLE-v2 ./datasets/LooGLE-v2We take Llama-3.1-8B-Instruct as an example for inference.
First, launch the model server using vllm serve:
vllm serve meta-llama/Llama-3.1-8B-Instruct \ --api-key GraphPKU \ --tensor-parallel-size 4 \ --gpu-memory-utilization 0.95 \ --max_model_len 131072 \ --trust-remote-codeNote:
--tensor-parallel-sizeshould be set to the number of available GPUs.
To run predictions on the benchmark using your model:
python predict.py \ --model Llama-3.1-8B-Instruct \ --data_dir ./datasets/LooGLE-v2After inference is complete, run the evaluation script:
python eval/eval.py \ --input_path ./results/Llama-3.1-8B-Instruct.jsonlThis will compute accuracy and other metrics for the model's performance on LooGLE-v2.