LooGLE-v2

LooGLE v2: A novel real-world benchmark for long-dependency understanding

Evaluation

First, create a conda environment and install the required dependencies:

conda create -n loogle python=3.10 conda activate loogle pip install vllm

Then, clone the benchmark repository:

git clone https://github.com/GraphPKU/LooGLE-v2.git cd LooGLE-v2

Download the Dataset

You can download the benchmark dataset into the ./datasets directory with the following command:

git clone https://huggingface.co/datasets/GraphPKU/LooGLE-v2 ./datasets/LooGLE-v2

Example: Evaluation with Llama-3.1-8B-Instruct

We take Llama-3.1-8B-Instruct as an example for inference.
First, launch the model server using vllm serve:

vllm serve meta-llama/Llama-3.1-8B-Instruct \ --api-key GraphPKU \ --tensor-parallel-size 4 \ --gpu-memory-utilization 0.95 \ --max_model_len 131072 \ --trust-remote-code

Note: --tensor-parallel-size should be set to the number of available GPUs.

Prediction

To run predictions on the benchmark using your model:

python predict.py \ --model Llama-3.1-8B-Instruct \ --data_dir ./datasets/LooGLE-v2

Evaluation

After inference is complete, run the evaluation script:

python eval/eval.py \ --input_path ./results/Llama-3.1-8B-Instruct.jsonl

This will compute accuracy and other metrics for the model's performance on LooGLE-v2.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
eval		eval
pred		pred
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LooGLE-v2

Evaluation

Download the Dataset

Example: Evaluation with Llama-3.1-8B-Instruct

Prediction

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LooGLE-v2

Evaluation

Download the Dataset

Example: Evaluation with Llama-3.1-8B-Instruct

Prediction

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages