Supported Models and Datasets

The table below introduces the models integrated with ms-swift:

  • Model ID: Model ID for the ModelScope Model

  • HF Model ID: Hugging Face Model ID

  • Model Type: Type of the model

  • Default Template: Default chat template

  • Requires: Additional dependencies required to use the model

  • Tags: Tags associated with the model

Large Language Models

Model ID

Model Type

Default Template

Requires

Support Megatron

Tags

HF Model ID

Qwen/Qwen-1_8B-Chat

qwen

qwen

-

-

Qwen/Qwen-1_8B-Chat

Qwen/Qwen-7B-Chat

qwen

qwen

-

-

Qwen/Qwen-7B-Chat

Qwen/Qwen-14B-Chat

qwen

qwen

-

-

Qwen/Qwen-14B-Chat

Qwen/Qwen-72B-Chat

qwen

qwen

-

-

Qwen/Qwen-72B-Chat

Qwen/Qwen-1_8B

qwen

qwen

-

-

Qwen/Qwen-1_8B

Qwen/Qwen-7B

qwen

qwen

-

-

Qwen/Qwen-7B

Qwen/Qwen-14B

qwen

qwen

-

-

Qwen/Qwen-14B

Qwen/Qwen-72B

qwen

qwen

-

-

Qwen/Qwen-72B

Qwen/Qwen-1_8B-Chat-Int4

qwen

qwen

-

-

Qwen/Qwen-1_8B-Chat-Int4

Qwen/Qwen-7B-Chat-Int4

qwen

qwen

-

-

Qwen/Qwen-7B-Chat-Int4

Qwen/Qwen-14B-Chat-Int4

qwen

qwen

-

-

Qwen/Qwen-14B-Chat-Int4

Qwen/Qwen-72B-Chat-Int4

qwen

qwen

-

-

Qwen/Qwen-72B-Chat-Int4

Qwen/Qwen-1_8B-Chat-Int8

qwen

qwen

-

-

Qwen/Qwen-1_8B-Chat-Int8

Qwen/Qwen-7B-Chat-Int8

qwen

qwen

-

-

Qwen/Qwen-7B-Chat-Int8

Qwen/Qwen-14B-Chat-Int8

qwen

qwen

-

-

Qwen/Qwen-14B-Chat-Int8

Qwen/Qwen-72B-Chat-Int8

qwen

qwen

-

-

Qwen/Qwen-72B-Chat-Int8

TongyiFinance/Tongyi-Finance-14B-Chat

qwen

qwen

-

financial

jxy/Tongyi-Finance-14B-Chat

TongyiFinance/Tongyi-Finance-14B

qwen

qwen

-

financial

-

TongyiFinance/Tongyi-Finance-14B-Chat-Int4

qwen

qwen

-

financial

jxy/Tongyi-Finance-14B-Chat-Int4

Qwen/Qwen1.5-0.5B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat

Qwen/Qwen1.5-1.8B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat

Qwen/Qwen1.5-4B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat

Qwen/Qwen1.5-7B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat

Qwen/Qwen1.5-14B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat

Qwen/Qwen1.5-32B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat

Qwen/Qwen1.5-72B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat

Qwen/Qwen1.5-110B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat

Qwen/Qwen1.5-0.5B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B

Qwen/Qwen1.5-1.8B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B

Qwen/Qwen1.5-4B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B

Qwen/Qwen1.5-7B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B

Qwen/Qwen1.5-14B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B

Qwen/Qwen1.5-32B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B

Qwen/Qwen1.5-72B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B

Qwen/Qwen1.5-110B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4

Qwen/Qwen1.5-4B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-GPTQ-Int4

Qwen/Qwen1.5-7B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-GPTQ-Int4

Qwen/Qwen1.5-14B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-GPTQ-Int4

Qwen/Qwen1.5-32B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat-GPTQ-Int4

Qwen/Qwen1.5-72B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-GPTQ-Int4

Qwen/Qwen1.5-110B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat-GPTQ-Int4

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8

Qwen/Qwen1.5-4B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-GPTQ-Int8

Qwen/Qwen1.5-7B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-GPTQ-Int8

Qwen/Qwen1.5-14B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-GPTQ-Int8

Qwen/Qwen1.5-72B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-GPTQ-Int8

Qwen/Qwen1.5-0.5B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-AWQ

Qwen/Qwen1.5-1.8B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-AWQ

Qwen/Qwen1.5-4B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-AWQ

Qwen/Qwen1.5-7B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-AWQ

Qwen/Qwen1.5-14B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-AWQ

Qwen/Qwen1.5-32B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat-AWQ

Qwen/Qwen1.5-72B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-AWQ

Qwen/Qwen1.5-110B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat-AWQ

Qwen/CodeQwen1.5-7B

qwen2

qwen

transformers>=4.37

coding

Qwen/CodeQwen1.5-7B

Qwen/CodeQwen1.5-7B-Chat

qwen2

qwen

transformers>=4.37

coding

Qwen/CodeQwen1.5-7B-Chat

Qwen/CodeQwen1.5-7B-Chat-AWQ

qwen2

qwen

transformers>=4.37

coding

Qwen/CodeQwen1.5-7B-Chat-AWQ

Qwen/Qwen2-0.5B-Instruct

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct

Qwen/Qwen2-1.5B-Instruct

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct

Qwen/Qwen2-7B-Instruct

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct

Qwen/Qwen2-72B-Instruct

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct

Qwen/Qwen2-0.5B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B

Qwen/Qwen2-1.5B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B

Qwen/Qwen2-7B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B

Qwen/Qwen2-72B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

Qwen/Qwen2-7B-Instruct-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-GPTQ-Int4

Qwen/Qwen2-72B-Instruct-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-GPTQ-Int4

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int8

Qwen/Qwen2-7B-Instruct-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-GPTQ-Int8

Qwen/Qwen2-72B-Instruct-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-GPTQ-Int8

Qwen/Qwen2-0.5B-Instruct-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-AWQ

Qwen/Qwen2-1.5B-Instruct-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct-AWQ

Qwen/Qwen2-7B-Instruct-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-AWQ

Qwen/Qwen2-72B-Instruct-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-AWQ

Qwen/Qwen2-Math-1.5B-Instruct

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-1.5B-Instruct

Qwen/Qwen2-Math-7B-Instruct

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-7B-Instruct

Qwen/Qwen2-Math-72B-Instruct

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-72B-Instruct

Qwen/Qwen2-Math-1.5B

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-1.5B

Qwen/Qwen2-Math-7B

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-7B

Qwen/Qwen2-Math-72B

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-72B

Qwen/Qwen2.5-7B-Instruct-1M

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-1M

Qwen/Qwen2.5-14B-Instruct-1M

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-1M

PowerInfer/SmallThinker-3B-Preview

qwen2

qwen

transformers>=4.37

-

PowerInfer/SmallThinker-3B-Preview

Qwen/Qwen2.5-0.5B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct

Qwen/Qwen2.5-1.5B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct

Qwen/Qwen2.5-3B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct

Qwen/Qwen2.5-7B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct

Qwen/Qwen2.5-14B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct

Qwen/Qwen2.5-32B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct

Qwen/Qwen2.5-72B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct

Qwen/Qwen2.5-0.5B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B

Qwen/Qwen2.5-1.5B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B

Qwen/Qwen2.5-3B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B

Qwen/Qwen2.5-7B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B

Qwen/Qwen2.5-14B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B

Qwen/Qwen2.5-32B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B

Qwen/Qwen2.5-72B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-0.5B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-AWQ

Qwen/Qwen2.5-1.5B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-AWQ

Qwen/Qwen2.5-3B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-AWQ

Qwen/Qwen2.5-7B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-AWQ

Qwen/Qwen2.5-14B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-AWQ

Qwen/Qwen2.5-32B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-AWQ

Qwen/Qwen2.5-72B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-AWQ

Qwen/Qwen2.5-Coder-0.5B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B-Instruct

Qwen/Qwen2.5-Coder-1.5B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B-Instruct

Qwen/Qwen2.5-Coder-3B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B-Instruct

Qwen/Qwen2.5-Coder-7B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B-Instruct

Qwen/Qwen2.5-Coder-14B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B-Instruct

Qwen/Qwen2.5-Coder-32B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B-Instruct

Qwen/Qwen2.5-Coder-0.5B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B

Qwen/Qwen2.5-Coder-1.5B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B

Qwen/Qwen2.5-Coder-3B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B

Qwen/Qwen2.5-Coder-7B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B

Qwen/Qwen2.5-Coder-14B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B

Qwen/Qwen2.5-Coder-32B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B

Qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ

Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ

Qwen/Qwen2.5-Coder-3B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B-Instruct-AWQ

Qwen/Qwen2.5-Coder-7B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B-Instruct-AWQ

Qwen/Qwen2.5-Coder-14B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B-Instruct-AWQ

Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8

moonshotai/Kimi-Dev-72B

qwen2_5

qwen2_5

transformers>=4.37

-

moonshotai/Kimi-Dev-72B

Qwen/Qwen2.5-Math-1.5B-Instruct

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-1.5B-Instruct

Qwen/Qwen2.5-Math-7B-Instruct

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-7B-Instruct

Qwen/Qwen2.5-Math-72B-Instruct

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-72B-Instruct

Qwen/Qwen2.5-Math-1.5B

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-1.5B

Qwen/Qwen2.5-Math-7B

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-7B

Qwen/Qwen2.5-Math-72B

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-72B

Qwen/Qwen1.5-MoE-A2.7B-Chat

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen1.5-MoE-A2.7B-Chat

Qwen/Qwen1.5-MoE-A2.7B

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen1.5-MoE-A2.7B

Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

Qwen/Qwen2-57B-A14B-Instruct

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen2-57B-A14B-Instruct

Qwen/Qwen2-57B-A14B

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen2-57B-A14B

Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4

Qwen/QwQ-32B-Preview

qwq_preview

qwq_preview

transformers>=4.37

-

Qwen/QwQ-32B-Preview

Qwen/QwQ-32B

qwq

qwq

transformers>=4.37

-

Qwen/QwQ-32B

Qwen/QwQ-32B-AWQ

qwq

qwq

transformers>=4.37

-

Qwen/QwQ-32B-AWQ

Qwen/Qwen3-0.6B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-0.6B-Base

Qwen/Qwen3-1.7B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-1.7B-Base

Qwen/Qwen3-4B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-4B-Base

Qwen/Qwen3-8B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-8B-Base

Qwen/Qwen3-14B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-14B-Base

Qwen/Qwen3-0.6B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-0.6B

Qwen/Qwen3-1.7B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-1.7B

Qwen/Qwen3-4B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-4B

Qwen/Qwen3-8B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-8B

Qwen/Qwen3-14B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-14B

Qwen/Qwen3-32B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-32B

Qwen/Qwen3-0.6B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-0.6B-FP8

Qwen/Qwen3-1.7B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-1.7B-FP8

Qwen/Qwen3-4B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-4B-FP8

Qwen/Qwen3-8B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-8B-FP8

Qwen/Qwen3-14B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-14B-FP8

Qwen/Qwen3-32B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-32B-FP8

Qwen/Qwen3-4B-AWQ

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-4B-AWQ

Qwen/Qwen3-8B-AWQ

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-8B-AWQ

Qwen/Qwen3-14B-AWQ

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-14B-AWQ

Qwen/Qwen3-32B-AWQ

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-32B-AWQ

swift/Qwen3-32B-AWQ

qwen3

qwen3

transformers>=4.51

-

-

Qwen/Qwen3Guard-Gen-0.6B

qwen3_guard

qwen3_guard

transformers>=4.51

-

Qwen/Qwen3Guard-Gen-0.6B

Qwen/Qwen3Guard-Gen-4B

qwen3_guard

qwen3_guard

transformers>=4.51

-

Qwen/Qwen3Guard-Gen-4B

Qwen/Qwen3Guard-Gen-8B

qwen3_guard

qwen3_guard

transformers>=4.51

-

Qwen/Qwen3Guard-Gen-8B

Qwen/Qwen3-4B-Thinking-2507

qwen3_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-4B-Thinking-2507

Qwen/Qwen3-4B-Thinking-2507-FP8

qwen3_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-4B-Thinking-2507-FP8

Qwen/Qwen3-30B-A3B-Instruct-2507

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

Qwen/Qwen3-235B-A22B-Instruct-2507

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-Instruct-2507

Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

swift/Qwen3-235B-A22B-Instruct-2507-AWQ

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

-

Qwen/Qwen3-4B-Instruct-2507

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-4B-Instruct-2507

Qwen/Qwen3-4B-Instruct-2507-FP8

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-4B-Instruct-2507-FP8

Qwen/Qwen3-Coder-30B-A3B-Instruct

qwen3_coder

qwen3_coder

transformers>=4.51

coding

Qwen/Qwen3-Coder-30B-A3B-Instruct

Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8

qwen3_coder

qwen3_coder

transformers>=4.51

coding

Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8

Qwen/Qwen3-Coder-480B-A35B-Instruct

qwen3_coder

qwen3_coder

transformers>=4.51

coding

Qwen/Qwen3-Coder-480B-A35B-Instruct

Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8

qwen3_coder

qwen3_coder

transformers>=4.51

coding

Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8

swift/Qwen3-Coder-480B-A35B-Instruct-AWQ

qwen3_coder

qwen3_coder

transformers>=4.51

coding

-

Qwen/Qwen3-30B-A3B-Base

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Base

Qwen/Qwen3-30B-A3B

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-30B-A3B

Qwen/Qwen3-235B-A22B

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-235B-A22B

Qwen/Qwen3-30B-A3B-FP8

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-FP8

Qwen/Qwen3-235B-A22B-FP8

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-FP8

swift/Qwen3-30B-A3B-AWQ

qwen3_moe

qwen3

transformers>=4.51

-

cognitivecomputations/Qwen3-30B-A3B-AWQ

swift/Qwen3-235B-A22B-AWQ

qwen3_moe

qwen3

transformers>=4.51

-

cognitivecomputations/Qwen3-235B-A22B-AWQ

iic/Tongyi-DeepResearch-30B-A3B

qwen3_moe

qwen3

transformers>=4.51

-

Alibaba-NLP/Tongyi-DeepResearch-30B-A3B

Qwen/Qwen3-30B-A3B-Thinking-2507

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Thinking-2507

Qwen/Qwen3-30B-A3B-Thinking-2507-FP8

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Thinking-2507-FP8

Qwen/Qwen3-235B-A22B-Thinking-2507

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-Thinking-2507

Qwen/Qwen3-235B-A22B-Thinking-2507-FP8

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-Thinking-2507-FP8

swift/Qwen3-235B-A22B-Thinking-2507-AWQ

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

-

Qwen/Qwen3-Next-80B-A3B-Instruct

qwen3_next

qwen3_nothinking

transformers>=4.57

-

-

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8

qwen3_next

qwen3_nothinking

transformers>=4.57

-

-

Qwen/Qwen3-Next-80B-A3B-Thinking

qwen3_next_thinking

qwen3_thinking

transformers>=4.57

-

-

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8

qwen3_next_thinking

qwen3_thinking

transformers>=4.57

-

-

Qwen/Qwen3-Embedding-0.6B

qwen3_emb

qwen3_emb

-

-

Qwen/Qwen3-Embedding-0.6B

Qwen/Qwen3-Embedding-4B

qwen3_emb

qwen3_emb

-

-

Qwen/Qwen3-Embedding-4B

Qwen/Qwen3-Embedding-8B

qwen3_emb

qwen3_emb

-

-

Qwen/Qwen3-Embedding-8B

iic/gte_Qwen2-1.5B-instruct

qwen2_gte

dummy

-

-

Alibaba-NLP/gte-Qwen2-1.5B-instruct

iic/gte_Qwen2-7B-instruct

qwen2_gte

dummy

-

-

Alibaba-NLP/gte-Qwen2-7B-instruct

codefuse-ai/CodeFuse-QWen-14B

codefuse_qwen

codefuse

-

coding

codefuse-ai/CodeFuse-QWen-14B

iic/ModelScope-Agent-7B

modelscope_agent

modelscope_agent

-

-

-

iic/ModelScope-Agent-14B

modelscope_agent

modelscope_agent

-

-

-

AIDC-AI/Marco-o1

marco_o1

marco_o1

transformers>=4.37

-

AIDC-AI/Marco-o1

modelscope/Llama-2-7b-ms

llama

llama

-

-

meta-llama/Llama-2-7b-hf

modelscope/Llama-2-13b-ms

llama

llama

-

-

meta-llama/Llama-2-13b-hf

modelscope/Llama-2-70b-ms

llama

llama

-

-

meta-llama/Llama-2-70b-hf

modelscope/Llama-2-7b-chat-ms

llama

llama

-

-

meta-llama/Llama-2-7b-chat-hf

modelscope/Llama-2-13b-chat-ms

llama

llama

-

-

meta-llama/Llama-2-13b-chat-hf

modelscope/Llama-2-70b-chat-ms

llama

llama

-

-

meta-llama/Llama-2-70b-chat-hf

AI-ModelScope/chinese-llama-2-1.3b

llama

llama

-

-

hfl/chinese-llama-2-1.3b

AI-ModelScope/chinese-llama-2-7b

llama

llama

-

-

hfl/chinese-llama-2-7b

AI-ModelScope/chinese-llama-2-7b-16k

llama

llama

-

-

hfl/chinese-llama-2-7b-16k

AI-ModelScope/chinese-llama-2-7b-64k

llama

llama

-

-

hfl/chinese-llama-2-7b-64k

AI-ModelScope/chinese-llama-2-13b

llama

llama

-

-

hfl/chinese-llama-2-13b

AI-ModelScope/chinese-llama-2-13b-16k

llama

llama

-

-

hfl/chinese-llama-2-13b-16k

AI-ModelScope/chinese-alpaca-2-1.3b

llama

llama

-

-

hfl/chinese-alpaca-2-1.3b

AI-ModelScope/chinese-alpaca-2-7b

llama

llama

-

-

hfl/chinese-alpaca-2-7b

AI-ModelScope/chinese-alpaca-2-7b-16k

llama

llama

-

-

hfl/chinese-alpaca-2-7b-16k

AI-ModelScope/chinese-alpaca-2-7b-64k

llama

llama

-

-

hfl/chinese-alpaca-2-7b-64k

AI-ModelScope/chinese-alpaca-2-13b

llama

llama

-

-

hfl/chinese-alpaca-2-13b

AI-ModelScope/chinese-alpaca-2-13b-16k

llama

llama

-

-

hfl/chinese-alpaca-2-13b-16k

AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf

llama

llama

transformers>=4.38, aqlm, torch>=2.2.0

-

ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf

LLM-Research/Meta-Llama-3-8B-Instruct

llama3

llama3

-

-

meta-llama/Meta-Llama-3-8B-Instruct

LLM-Research/Meta-Llama-3-70B-Instruct

llama3

llama3

-

-

meta-llama/Meta-Llama-3-70B-Instruct

LLM-Research/Meta-Llama-3-8B

llama3

llama3

-

-

meta-llama/Meta-Llama-3-8B

LLM-Research/Meta-Llama-3-70B

llama3

llama3

-

-

meta-llama/Meta-Llama-3-70B

swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4

llama3

llama3

-

-

study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4

swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8

llama3

llama3

-

-

study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8

swift/Meta-Llama-3-8B-Instruct-AWQ

llama3

llama3

-

-

study-hjt/Meta-Llama-3-8B-Instruct-AWQ

swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4

llama3

llama3

-

-

study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4

swift/Meta-Llama-3-70B-Instruct-GPTQ-Int8

llama3

llama3

-

-

study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8

swift/Meta-Llama-3-70B-Instruct-AWQ

llama3

llama3

-

-

study-hjt/Meta-Llama-3-70B-Instruct-AWQ

ChineseAlpacaGroup/llama-3-chinese-8b-instruct

llama3

llama3

-

-

hfl/llama-3-chinese-8b-instruct

ChineseAlpacaGroup/llama-3-chinese-8b

llama3

llama3

-

-

hfl/llama-3-chinese-8b

LLM-Research/Meta-Llama-3.1-8B-Instruct

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-8B-Instruct

LLM-Research/Meta-Llama-3.1-70B-Instruct

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B-Instruct

LLM-Research/Meta-Llama-3.1-405B-Instruct

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B-Instruct

LLM-Research/Meta-Llama-3.1-8B

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-8B

LLM-Research/Meta-Llama-3.1-70B

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B

LLM-Research/Meta-Llama-3.1-405B

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B

LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B-Instruct-FP8

LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4

LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit

llama3_1

llama3_2

transformers>=4.43

-

unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit

LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4

LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4

AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF

llama3_1

llama3_2

transformers>=4.43

-

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

LLM-Research/Llama-3.2-1B

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.2-1B

LLM-Research/Llama-3.2-3B

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.2-3B

LLM-Research/Llama-3.2-1B-Instruct

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.2-1B-Instruct

LLM-Research/Llama-3.2-3B-Instruct

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.2-3B-Instruct

LLM-Research/Llama-3.3-70B-Instruct

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.3-70B-Instruct

unsloth/Llama-3.3-70B-Instruct-bnb-4bit

llama3_2

llama3_2

transformers>=4.43

-

unsloth/Llama-3.3-70B-Instruct-bnb-4bit

LLM-Research/Reflection-Llama-3.1-70B

reflection

reflection

transformers>=4.43

-

mattshumer/Reflection-Llama-3.1-70B

InfiniAI/Megrez-3b-Instruct

megrez

megrez

-

-

Infinigence/Megrez-3B-Instruct

01ai/Yi-6B

yi

chatml

-

-

01-ai/Yi-6B

01ai/Yi-6B-200K

yi

chatml

-

-

01-ai/Yi-6B-200K

01ai/Yi-6B-Chat

yi

chatml

-

-

01-ai/Yi-6B-Chat

01ai/Yi-6B-Chat-4bits

yi

chatml

-

-

01-ai/Yi-6B-Chat-4bits

01ai/Yi-6B-Chat-8bits

yi

chatml

-

-

01-ai/Yi-6B-Chat-8bits

01ai/Yi-9B

yi

chatml

-

-

01-ai/Yi-9B

01ai/Yi-9B-200K

yi

chatml

-

-

01-ai/Yi-9B-200K

01ai/Yi-34B

yi

chatml

-

-

01-ai/Yi-34B

01ai/Yi-34B-200K

yi

chatml

-

-

01-ai/Yi-34B-200K

01ai/Yi-34B-Chat

yi

chatml

-

-

01-ai/Yi-34B-Chat

01ai/Yi-34B-Chat-4bits

yi

chatml

-

-

01-ai/Yi-34B-Chat-4bits

01ai/Yi-34B-Chat-8bits

yi

chatml

-

-

01-ai/Yi-34B-Chat-8bits

01ai/Yi-1.5-6B

yi

chatml

-

-

01-ai/Yi-1.5-6B

01ai/Yi-1.5-6B-Chat

yi

chatml

-

-

01-ai/Yi-1.5-6B-Chat

01ai/Yi-1.5-9B

yi

chatml

-

-

01-ai/Yi-1.5-9B

01ai/Yi-1.5-9B-Chat

yi

chatml

-

-

01-ai/Yi-1.5-9B-Chat

01ai/Yi-1.5-9B-Chat-16K

yi

chatml

-

-

01-ai/Yi-1.5-9B-Chat-16K

01ai/Yi-1.5-34B

yi

chatml

-

-

01-ai/Yi-1.5-34B

01ai/Yi-1.5-34B-Chat

yi

chatml

-

-

01-ai/Yi-1.5-34B-Chat

01ai/Yi-1.5-34B-Chat-16K

yi

chatml

-

-

01-ai/Yi-1.5-34B-Chat-16K

AI-ModelScope/Yi-1.5-6B-Chat-GPTQ

yi

chatml

-

-

modelscope/Yi-1.5-6B-Chat-GPTQ

AI-ModelScope/Yi-1.5-6B-Chat-AWQ

yi

chatml

-

-

modelscope/Yi-1.5-6B-Chat-AWQ

AI-ModelScope/Yi-1.5-9B-Chat-GPTQ

yi

chatml

-

-

modelscope/Yi-1.5-9B-Chat-GPTQ

AI-ModelScope/Yi-1.5-9B-Chat-AWQ

yi

chatml

-

-

modelscope/Yi-1.5-9B-Chat-AWQ

AI-ModelScope/Yi-1.5-34B-Chat-GPTQ

yi

chatml

-

-

modelscope/Yi-1.5-34B-Chat-GPTQ

AI-ModelScope/Yi-1.5-34B-Chat-AWQ

yi

chatml

-

-

modelscope/Yi-1.5-34B-Chat-AWQ

01ai/Yi-Coder-1.5B

yi_coder

yi_coder

-

coding

01-ai/Yi-Coder-1.5B

01ai/Yi-Coder-9B

yi_coder

yi_coder

-

coding

01-ai/Yi-Coder-9B

01ai/Yi-Coder-1.5B-Chat

yi_coder

yi_coder

-

coding

01-ai/Yi-Coder-1.5B-Chat

01ai/Yi-Coder-9B-Chat

yi_coder

yi_coder

-

coding

01-ai/Yi-Coder-9B-Chat

SUSTC/SUS-Chat-34B

sus

sus

-

-

SUSTech/SUS-Chat-34B

openai-mirror/gpt-oss-20b

gpt_oss

gpt_oss

transformers>=4.55

-

openai/gpt-oss-20b

openai-mirror/gpt-oss-120b

gpt_oss

gpt_oss

transformers>=4.55

-

openai/gpt-oss-120b

ByteDance-Seed/Seed-OSS-36B-Instruct

seed_oss

seed_oss

transformers>=4.56

-

ByteDance-Seed/Seed-OSS-36B-Instruct

ByteDance-Seed/Seed-OSS-36B-Base

seed_oss

seed_oss

transformers>=4.56

-

ByteDance-Seed/Seed-OSS-36B-Base

ByteDance-Seed/Seed-OSS-36B-Base-woSyn

seed_oss

seed_oss

transformers>=4.56

-

ByteDance-Seed/Seed-OSS-36B-Base-woSyn

codefuse-ai/CodeFuse-CodeLlama-34B

codefuse_codellama

codefuse_codellama

-

coding

codefuse-ai/CodeFuse-CodeLlama-34B

langboat/Mengzi3-13B-Base

mengzi3

mengzi

-

-

Langboat/Mengzi3-13B-Base

Fengshenbang/Ziya2-13B-Base

ziya

ziya

-

-

IDEA-CCNL/Ziya2-13B-Base

Fengshenbang/Ziya2-13B-Chat

ziya

ziya

-

-

IDEA-CCNL/Ziya2-13B-Chat

AI-ModelScope/NuminaMath-7B-TIR

numina

numina

-

math

AI-MO/NuminaMath-7B-TIR

FlagAlpha/Atom-7B

atom

atom

-

-

FlagAlpha/Atom-7B

FlagAlpha/Atom-7B-Chat

atom

atom

-

-

FlagAlpha/Atom-7B-Chat

ZhipuAI/chatglm2-6b

chatglm2

chatglm2

transformers<4.42

-

zai-org/chatglm2-6b

ZhipuAI/chatglm2-6b-32k

chatglm2

chatglm2

transformers<4.42

-

zai-org/chatglm2-6b-32k

ZhipuAI/codegeex2-6b

chatglm2

chatglm2

transformers<4.34

coding

zai-org/codegeex2-6b

ZhipuAI/chatglm3-6b

chatglm3

glm4

transformers<4.42

-

zai-org/chatglm3-6b

ZhipuAI/chatglm3-6b-base

chatglm3

glm4

transformers<4.42

-

zai-org/chatglm3-6b-base

ZhipuAI/chatglm3-6b-32k

chatglm3

glm4

transformers<4.42

-

zai-org/chatglm3-6b-32k

ZhipuAI/chatglm3-6b-128k

chatglm3

glm4

transformers<4.42

-

zai-org/chatglm3-6b-128k

ZhipuAI/glm-4-9b-chat

glm4

glm4

transformers>=4.42

-

zai-org/glm-4-9b-chat

ZhipuAI/glm-4-9b

glm4

glm4

transformers>=4.42

-

zai-org/glm-4-9b

ZhipuAI/glm-4-9b-chat-1m

glm4

glm4

transformers>=4.42

-

zai-org/glm-4-9b-chat-1m

ZhipuAI/LongWriter-glm4-9b

glm4

glm4

transformers>=4.42

-

zai-org/LongWriter-glm4-9b

ZhipuAI/GLM-4-9B-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-4-9B-0414

ZhipuAI/GLM-4-32B-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-4-32B-0414

ZhipuAI/GLM-4-32B-Base-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-4-32B-Base-0414

ZhipuAI/GLM-Z1-9B-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-Z1-9B-0414

ZhipuAI/GLM-Z1-32B-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-Z1-32B-0414

ZhipuAI/GLM-4.5-Air-Base

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-Air-Base

ZhipuAI/GLM-4.5-Air

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-Air

ZhipuAI/GLM-4.5-Air-FP8

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-Air-FP8

ZhipuAI/GLM-4.5-Base

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-Base

ZhipuAI/GLM-4.5

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5

ZhipuAI/GLM-4.5-FP8

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-FP8

ZhipuAI/GLM-4.6

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.6

ZhipuAI/GLM-Z1-Rumination-32B-0414

glm4_z1_rumination

glm4_z1_rumination

transformers>4.51

-

zai-org/GLM-Z1-Rumination-32B-0414

ZhipuAI/glm-edge-1.5b-chat

glm_edge

glm4

transformers>=4.46

-

zai-org/glm-edge-1.5b-chat

ZhipuAI/glm-edge-4b-chat

glm_edge

glm4

transformers>=4.46

-

zai-org/glm-edge-4b-chat

codefuse-ai/CodeFuse-CodeGeeX2-6B

codefuse_codegeex2

codefuse

transformers<4.34

coding

codefuse-ai/CodeFuse-CodeGeeX2-6B

ZhipuAI/codegeex4-all-9b

codegeex4

codegeex4

transformers<4.42

coding

zai-org/codegeex4-all-9b

ZhipuAI/LongWriter-llama3.1-8b

longwriter_llama3_1

longwriter_llama

transformers>=4.43

-

zai-org/LongWriter-llama3.1-8b

Shanghai_AI_Laboratory/internlm-chat-7b

internlm

internlm

-

-

internlm/internlm-chat-7b

Shanghai_AI_Laboratory/internlm-7b

internlm

internlm

-

-

internlm/internlm-7b

Shanghai_AI_Laboratory/internlm-chat-7b-8k

internlm

internlm

-

-

-

Shanghai_AI_Laboratory/internlm-20b

internlm

internlm

-

-

internlm/internlm-20b

Shanghai_AI_Laboratory/internlm-chat-20b

internlm

internlm

-

-

internlm/internlm-chat-20b

Shanghai_AI_Laboratory/internlm2-chat-1_8b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-1_8b

Shanghai_AI_Laboratory/internlm2-1_8b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-1_8b

Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-1_8b-sft

Shanghai_AI_Laboratory/internlm2-base-7b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-base-7b

Shanghai_AI_Laboratory/internlm2-7b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-7b

Shanghai_AI_Laboratory/internlm2-chat-7b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-7b

Shanghai_AI_Laboratory/internlm2-chat-7b-sft

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-7b-sft

Shanghai_AI_Laboratory/internlm2-base-20b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-base-20b

Shanghai_AI_Laboratory/internlm2-20b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-20b

Shanghai_AI_Laboratory/internlm2-chat-20b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-20b

Shanghai_AI_Laboratory/internlm2-chat-20b-sft

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-20b-sft

Shanghai_AI_Laboratory/internlm2-math-7b

internlm2

internlm2

transformers>=4.38

math

internlm/internlm2-math-7b

Shanghai_AI_Laboratory/internlm2-math-base-7b

internlm2

internlm2

transformers>=4.38

math

internlm/internlm2-math-base-7b

Shanghai_AI_Laboratory/internlm2-math-base-20b

internlm2

internlm2

transformers>=4.38

math

internlm/internlm2-math-base-20b

Shanghai_AI_Laboratory/internlm2-math-20b

internlm2

internlm2

transformers>=4.38

math

internlm/internlm2-math-20b

Shanghai_AI_Laboratory/internlm2_5-1_8b-chat

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-1_8b-chat

Shanghai_AI_Laboratory/internlm2_5-1_8b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-1_8b

Shanghai_AI_Laboratory/internlm2_5-7b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-7b

Shanghai_AI_Laboratory/internlm2_5-7b-chat

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-7b-chat

Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-7b-chat-1m

Shanghai_AI_Laboratory/internlm2_5-20b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-20b

Shanghai_AI_Laboratory/internlm2_5-20b-chat

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-20b-chat

Shanghai_AI_Laboratory/internlm3-8b-instruct

internlm3

internlm2

transformers>=4.48

-

internlm/internlm3-8b-instruct

deepseek-ai/deepseek-llm-7b-base

deepseek

deepseek

-

-

deepseek-ai/deepseek-llm-7b-base

deepseek-ai/deepseek-llm-7b-chat

deepseek

deepseek

-

-

deepseek-ai/deepseek-llm-7b-chat

deepseek-ai/deepseek-llm-67b-base

deepseek

deepseek

-

-

deepseek-ai/deepseek-llm-67b-base

deepseek-ai/deepseek-llm-67b-chat

deepseek

deepseek

-

-

deepseek-ai/deepseek-llm-67b-chat

deepseek-ai/deepseek-math-7b-base

deepseek

deepseek

-

math

deepseek-ai/deepseek-math-7b-base

deepseek-ai/deepseek-math-7b-instruct

deepseek

deepseek

-

math

deepseek-ai/deepseek-math-7b-instruct

deepseek-ai/deepseek-math-7b-rl

deepseek

deepseek

-

math

deepseek-ai/deepseek-math-7b-rl

deepseek-ai/deepseek-coder-1.3b-base

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-1.3b-base

deepseek-ai/deepseek-coder-1.3b-instruct

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-1.3b-instruct

deepseek-ai/deepseek-coder-6.7b-base

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-6.7b-base

deepseek-ai/deepseek-coder-6.7b-instruct

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-6.7b-instruct

deepseek-ai/deepseek-coder-33b-base

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-33b-base

deepseek-ai/deepseek-coder-33b-instruct

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-33b-instruct

deepseek-ai/deepseek-moe-16b-chat

deepseek_moe

deepseek

-

-

deepseek-ai/deepseek-moe-16b-chat

deepseek-ai/deepseek-moe-16b-base

deepseek_moe

deepseek

-

-

deepseek-ai/deepseek-moe-16b-base

deepseek-ai/DeepSeek-Coder-V2-Instruct

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Coder-V2-Instruct

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

deepseek-ai/DeepSeek-Coder-V2-Base

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Coder-V2-Base

deepseek-ai/DeepSeek-Coder-V2-Lite-Base

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Coder-V2-Lite-Base

deepseek-ai/DeepSeek-V2-Lite

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2-Lite

deepseek-ai/DeepSeek-V2-Lite-Chat

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2-Lite-Chat

deepseek-ai/DeepSeek-V2

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2

deepseek-ai/DeepSeek-V2-Chat

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2-Chat

deepseek-ai/DeepSeek-V2.5

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2.5

deepseek-ai/DeepSeek-V2.5-1210

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2.5-1210

deepseek-ai/DeepSeek-V3-Base

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3-Base

deepseek-ai/DeepSeek-V3

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3

deepseek-ai/DeepSeek-V3-0324

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3-0324

cognitivecomputations/DeepSeek-V3-awq

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

cognitivecomputations/DeepSeek-V3-AWQ

cognitivecomputations/DeepSeek-V3-0324-AWQ

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

cognitivecomputations/DeepSeek-V3-0324-AWQ

deepseek-ai/DeepSeek-Prover-V2-7B

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Prover-V2-7B

deepseek-ai/DeepSeek-Prover-V2-671B

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Prover-V2-671B

unsloth/DeepSeek-V3-bf16

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

unsloth/DeepSeek-V3-bf16

unsloth/DeepSeek-V3-0324-BF16

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

unsloth/DeepSeek-V3-0324-BF16

unsloth/DeepSeek-Prover-V2-671B-BF16

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

unsloth/DeepSeek-Prover-V2-671B-BF16

deepseek-ai/DeepSeek-R1

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-R1

deepseek-ai/DeepSeek-R1-Zero

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-R1-Zero

deepseek-ai/DeepSeek-R1-0528

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-R1-0528

cognitivecomputations/DeepSeek-R1-awq

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

cognitivecomputations/DeepSeek-R1-AWQ

cognitivecomputations/DeepSeek-R1-0528-AWQ

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

cognitivecomputations/DeepSeek-R1-0528-AWQ

unsloth/DeepSeek-R1-BF16

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

unsloth/DeepSeek-R1-BF16

unsloth/DeepSeek-R1-Zero-BF16

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

unsloth/DeepSeek-R1-Zero-BF16

unsloth/DeepSeek-R1-0528-BF16

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

unsloth/DeepSeek-R1-0528-BF16

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

iic/QwenLong-L1-32B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

Tongyi-Zhiwen/QwenLong-L1-32B

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

deepseek_r1_distill

deepseek_r1

-

-

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

deepseek_r1_distill

deepseek_r1

-

-

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

deepseek_r1_distill

deepseek_r1

-

-

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

deepseek-ai/DeepSeek-V3.1-Base

deepseek_v3_1

deepseek_v3_1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3.1-Base

deepseek-ai/DeepSeek-V3.1

deepseek_v3_1

deepseek_v3_1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3.1

deepseek-ai/DeepSeek-V3.1-Terminus

deepseek_v3_1

deepseek_v3_1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3.1-Terminus

deepseek-ai/DeepSeek-V3.2

deepseek_v3_2

deepseek_v3_1

-

-

deepseek-ai/DeepSeek-V3.2

deepseek-ai/DeepSeek-V3.2-Speciale

deepseek_v3_2

deepseek_v3_1

-

-

deepseek-ai/DeepSeek-V3.2-Speciale

deepseek-ai/DeepSeek-V3.2-Exp

deepseek_v3_2

deepseek_v3_1

-

-

deepseek-ai/DeepSeek-V3.2-Exp

deepseek-ai/DeepSeek-V3.2-Exp-Base

deepseek_v3_2

deepseek_v3_1

-

-

deepseek-ai/DeepSeek-V3.2-Exp-Base

deepseek-ai/DeepSeek-Math-V2

deepseek_v3_2

deepseek_v3_1

-

-

deepseek-ai/DeepSeek-Math-V2

OpenBuddy/openbuddy-llama-65b-v8-bf16

openbuddy_llama

openbuddy

-

-

OpenBuddy/openbuddy-llama-65b-v8-bf16

OpenBuddy/openbuddy-llama2-13b-v8.1-fp16

openbuddy_llama

openbuddy

-

-

OpenBuddy/openbuddy-llama2-13b-v8.1-fp16

OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

openbuddy_llama

openbuddy

-

-

OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

OpenBuddy/openbuddy-deepseek-67b-v15.2

openbuddy_llama

openbuddy

-

-

OpenBuddy/openbuddy-deepseek-67b-v15.2

OpenBuddy/openbuddy-llama3-8b-v21.1-8k

openbuddy_llama3

openbuddy2

-

-

OpenBuddy/openbuddy-llama3-8b-v21.1-8k

OpenBuddy/openbuddy-llama3-70b-v21.1-8k

openbuddy_llama3

openbuddy2

-

-

OpenBuddy/openbuddy-llama3-70b-v21.1-8k

OpenBuddy/openbuddy-yi1.5-34b-v21.3-32k

openbuddy_llama3

openbuddy2

-

-

OpenBuddy/openbuddy-yi1.5-34b-v21.3-32k

OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

openbuddy_llama3

openbuddy2

transformers>=4.43

-

OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

OpenBuddy/openbuddy-nemotron-70b-v23.2-131k

openbuddy_llama3

openbuddy2

transformers>=4.43

-

OpenBuddy/openbuddy-nemotron-70b-v23.2-131k

OpenBuddy/openbuddy-llama3.3-70b-v24.3-131k

openbuddy_llama3

openbuddy2

transformers>=4.45

-

OpenBuddy/openbuddy-llama3.3-70b-v24.3-131k

OpenBuddy/openbuddy-mistral-7b-v17.1-32k

openbuddy_mistral

openbuddy

transformers>=4.34

-

OpenBuddy/openbuddy-mistral-7b-v17.1-32k

OpenBuddy/openbuddy-zephyr-7b-v14.1

openbuddy_mistral

openbuddy

transformers>=4.34

-

OpenBuddy/openbuddy-zephyr-7b-v14.1

OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k

openbuddy_mixtral

openbuddy

transformers>=4.36

-

OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k

baichuan-inc/Baichuan-13B-Chat

baichuan

baichuan

transformers<4.34

-

baichuan-inc/Baichuan-13B-Chat

baichuan-inc/Baichuan-13B-Base

baichuan

baichuan

transformers<4.34

-

baichuan-inc/Baichuan-13B-Base

baichuan-inc/baichuan-7B

baichuan

baichuan

transformers<4.34

-

baichuan-inc/Baichuan-7B

baichuan-inc/Baichuan2-7B-Chat

baichuan2

baichuan

-

-

baichuan-inc/Baichuan2-7B-Chat

baichuan-inc/Baichuan2-7B-Base

baichuan2

baichuan

-

-

baichuan-inc/Baichuan2-7B-Base

baichuan-inc/Baichuan2-13B-Chat

baichuan2

baichuan

-

-

baichuan-inc/Baichuan2-13B-Chat

baichuan-inc/Baichuan2-13B-Base

baichuan2

baichuan

-

-

baichuan-inc/Baichuan2-13B-Base

baichuan-inc/Baichuan2-7B-Chat-4bits

baichuan2

baichuan

bitsandbytes<0.41.2, accelerate<0.26

-

baichuan-inc/Baichuan2-7B-Chat-4bits

baichuan-inc/Baichuan2-13B-Chat-4bits

baichuan2

baichuan

bitsandbytes<0.41.2, accelerate<0.26

-

baichuan-inc/Baichuan2-13B-Chat-4bits

baichuan-inc/Baichuan-M1-14B-Instruct

baichuan_m1

baichuan_m1

transformers>=4.48

-

baichuan-inc/Baichuan-M1-14B-Instruct

OpenBMB/MiniCPM-2B-sft-fp32

minicpm

minicpm

transformers>=4.36.0

-

openbmb/MiniCPM-2B-sft-fp32

OpenBMB/MiniCPM-2B-dpo-fp32

minicpm

minicpm

transformers>=4.36.0

-

openbmb/MiniCPM-2B-dpo-fp32

OpenBMB/MiniCPM-1B-sft-bf16

minicpm

minicpm

transformers>=4.36.0

-

openbmb/MiniCPM-1B-sft-bf16

OpenBMB/MiniCPM-2B-128k

minicpm_chatml

chatml

transformers>=4.36

-

openbmb/MiniCPM-2B-128k

OpenBMB/MiniCPM4-0.5B

minicpm_chatml

chatml

transformers>=4.36

-

openbmb/MiniCPM4-0.5B

OpenBMB/MiniCPM4-8B

minicpm_chatml

chatml

transformers>=4.36

-

openbmb/MiniCPM4-8B

OpenBMB/MiniCPM3-4B

minicpm3

chatml

transformers>=4.36

-

openbmb/MiniCPM3-4B

OpenBMB/MiniCPM-MoE-8x2B

minicpm_moe

minicpm

transformers>=4.36

-

openbmb/MiniCPM-MoE-8x2B

TeleAI/TeleChat-7B

telechat

telechat

-

-

Tele-AI/telechat-7B

TeleAI/TeleChat-12B

telechat

telechat

-

-

Tele-AI/TeleChat-12B

TeleAI/TeleChat-12B-v2

telechat

telechat

-

-

Tele-AI/TeleChat-12B-v2

TeleAI/TeleChat-52B

telechat

telechat

-

-

TeleAI/TeleChat-52B

swift/TeleChat-12B-V2-GPTQ-Int4

telechat

telechat

-

-

-

TeleAI/TeleChat2-35B

telechat

telechat

-

-

Tele-AI/TeleChat2-35B

TeleAI/TeleChat2-115B

telechat

telechat

-

-

Tele-AI/TeleChat2-115B

TeleAI/TeleChat2-3B

telechat2

telechat2

-

-

Tele-AI/TeleChat2-3B

TeleAI/TeleChat2-7B-32K

telechat2

telechat2

-

-

Tele-AI/TeleChat2-7B-32K

TeleAI/TeleChat2-35B-32K

telechat2

telechat2

-

-

Tele-AI/TeleChat2-35B-32K

TeleAI/TeleChat2-35B-Nov

telechat2

telechat2

-

-

Tele-AI/TeleChat2-35B-Nov

AI-ModelScope/Mistral-7B-Instruct-v0.1

mistral

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.1

AI-ModelScope/Mistral-7B-Instruct-v0.2

mistral

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.2

LLM-Research/Mistral-7B-Instruct-v0.3

mistral

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.3

AI-ModelScope/Mistral-7B-v0.1

mistral

llama

transformers>=4.34

-

mistralai/Mistral-7B-v0.1

AI-ModelScope/Mistral-7B-v0.2-hf

mistral

llama

transformers>=4.34

-

alpindale/Mistral-7B-v0.2-hf

swift/Codestral-22B-v0.1

mistral

llama

transformers>=4.34

-

mistralai/Codestral-22B-v0.1

mistralai/Devstral-Small-2505

devstral

devstral

transformers>=4.43, mistral-common>=1.5.5

-

mistralai/Devstral-Small-2505

modelscope/zephyr-7b-beta

zephyr

zephyr

transformers>=4.34

-

HuggingFaceH4/zephyr-7b-beta

AI-ModelScope/Mixtral-8x7B-Instruct-v0.1

mixtral

llama

transformers>=4.36

-

mistralai/Mixtral-8x7B-Instruct-v0.1

AI-ModelScope/Mixtral-8x7B-v0.1

mixtral

llama

transformers>=4.36

-

mistralai/Mixtral-8x7B-v0.1

AI-ModelScope/Mixtral-8x22B-v0.1

mixtral

llama

transformers>=4.36

-

mistral-community/Mixtral-8x22B-v0.1

AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf

mixtral

llama

transformers>=4.38, aqlm, torch>=2.2.0

-

ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf

AI-ModelScope/Mistral-Small-Instruct-2409

mistral_nemo

mistral_nemo

transformers>=4.43

-

mistralai/Mistral-Small-Instruct-2409

LLM-Research/Mistral-Large-Instruct-2407

mistral_nemo

mistral_nemo

transformers>=4.43

-

mistralai/Mistral-Large-Instruct-2407

AI-ModelScope/Mistral-Nemo-Base-2407

mistral_nemo

mistral_nemo

transformers>=4.43

-

mistralai/Mistral-Nemo-Base-2407

AI-ModelScope/Mistral-Nemo-Instruct-2407

mistral_nemo

mistral_nemo

transformers>=4.43

-

mistralai/Mistral-Nemo-Instruct-2407

AI-ModelScope/Ministral-8B-Instruct-2410

mistral_nemo

mistral_nemo

transformers>=4.46

-

mistralai/Ministral-8B-Instruct-2410

mistralai/Mistral-Small-24B-Base-2501

mistral_2501

mistral_2501

-

-

mistralai/Mistral-Small-24B-Base-2501

mistralai/Mistral-Small-24B-Instruct-2501

mistral_2501

mistral_2501

-

-

mistralai/Mistral-Small-24B-Instruct-2501

AI-ModelScope/WizardLM-2-7B-AWQ

wizardlm2

wizardlm2

transformers>=4.34

-

MaziyarPanahi/WizardLM-2-7B-AWQ

AI-ModelScope/WizardLM-2-8x22B

wizardlm2_moe

wizardlm2_moe

transformers>=4.36

-

alpindale/WizardLM-2-8x22B

AI-ModelScope/phi-2

phi2

default

-

-

microsoft/phi-2

LLM-Research/Phi-3-small-8k-instruct

phi3_small

phi3

transformers>=4.36

-

microsoft/Phi-3-small-8k-instruct

LLM-Research/Phi-3-small-128k-instruct

phi3_small

phi3

transformers>=4.36

-

microsoft/Phi-3-small-128k-instruct

LLM-Research/Phi-3-mini-4k-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3-mini-4k-instruct

LLM-Research/Phi-3-mini-128k-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3-mini-128k-instruct

LLM-Research/Phi-3-medium-4k-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3-medium-4k-instruct

LLM-Research/Phi-3-medium-128k-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3-medium-128k-instruct

LLM-Research/Phi-3.5-mini-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3.5-mini-instruct

LLM-Research/Phi-4-mini-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-4-mini-instruct

LLM-Research/Phi-3.5-MoE-instruct

phi3_moe

phi3

transformers>=4.36

-

microsoft/Phi-3.5-MoE-instruct

LLM-Research/phi-4

phi4

phi4

transformers>=4.36

-

microsoft/phi-4

MiniMax/MiniMax-Text-01

minimax

minimax

-

-

MiniMaxAI/MiniMax-Text-01

MiniMax/MiniMax-M1-40k

minimax_m1

minimax_m1

-

-

MiniMaxAI/MiniMax-M1-40k

MiniMax/MiniMax-M1-80k

minimax_m1

minimax_m1

-

-

MiniMaxAI/MiniMax-M1-80k

MiniMax/MiniMax-M2

minimax_m2

minimax_m2

-

-

MiniMaxAI/MiniMax-M2

AI-ModelScope/gemma-2b-it

gemma

gemma

transformers>=4.38

-

google/gemma-2b-it

AI-ModelScope/gemma-2b

gemma

gemma

transformers>=4.38

-

google/gemma-2b

AI-ModelScope/gemma-7b

gemma

gemma

transformers>=4.38

-

google/gemma-7b

AI-ModelScope/gemma-7b-it

gemma

gemma

transformers>=4.38

-

google/gemma-7b-it

LLM-Research/gemma-2-2b-it

gemma2

gemma

transformers>=4.42

-

google/gemma-2-2b-it

LLM-Research/gemma-2-2b

gemma2

gemma

transformers>=4.42

-

google/gemma-2-2b

LLM-Research/gemma-2-9b

gemma2

gemma

transformers>=4.42

-

google/gemma-2-9b

LLM-Research/gemma-2-9b-it

gemma2

gemma

transformers>=4.42

-

google/gemma-2-9b-it

LLM-Research/gemma-2-27b

gemma2

gemma

transformers>=4.42

-

google/gemma-2-27b

LLM-Research/gemma-2-27b-it

gemma2

gemma

transformers>=4.42

-

google/gemma-2-27b-it

LLM-Research/gemma-3-1b-pt

gemma3_text

gemma3_text

transformers>=4.49

-

google/gemma-3-1b-pt

LLM-Research/gemma-3-1b-it

gemma3_text

gemma3_text

transformers>=4.49

-

google/gemma-3-1b-it

google/gemma-3-270m

gemma3_text

gemma3_text

transformers>=4.49

-

google/gemma-3-270m

google/gemma-3-270m-it

gemma3_text

gemma3_text

transformers>=4.49

-

google/gemma-3-270m-it

skywork/Skywork-13B-base

skywork

skywork

-

-

skywork/Skywork-13B-base

skywork/Skywork-13B-chat

skywork

skywork

-

-

-

AI-ModelScope/Skywork-o1-Open-Llama-3.1-8B

skywork_o1

skywork_o1

transformers>=4.43

-

Skywork/Skywork-o1-Open-Llama-3.1-8B

inclusionAI/Ling-lite

ling

ling

-

-

inclusionAI/Ling-lite

inclusionAI/Ling-plus

ling

ling

-

-

inclusionAI/Ling-plus

inclusionAI/Ling-lite-base

ling

ling

-

-

inclusionAI/Ling-lite-base

inclusionAI/Ling-plus-base

ling

ling

-

-

inclusionAI/Ling-plus-base

inclusionAI/Ling-mini-2.0

ling2

ling2

-

-

inclusionAI/Ling-mini-2.0

inclusionAI/Ling-mini-base-2.0

ling2

ling2

-

-

inclusionAI/Ling-mini-base-2.0

inclusionAI/Ring-mini-2.0

ring2

ring2

-

-

inclusionAI/Ring-mini-2.0

IEITYuan/Yuan2.0-2B-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-2B-hf

IEITYuan/Yuan2.0-51B-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-51B-hf

IEITYuan/Yuan2.0-102B-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-102B-hf

IEITYuan/Yuan2-2B-Janus-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-2B-Janus-hf

IEITYuan/Yuan2-M32-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-M32-hf

OrionStarAI/Orion-14B-Chat

orion

orion

-

-

OrionStarAI/Orion-14B-Chat

OrionStarAI/Orion-14B-Base

orion

orion

-

-

OrionStarAI/Orion-14B-Base

xverse/XVERSE-7B-Chat

xverse

xverse

-

-

xverse/XVERSE-7B-Chat

xverse/XVERSE-7B

xverse

xverse

-

-

xverse/XVERSE-7B

xverse/XVERSE-13B

xverse

xverse

-

-

xverse/XVERSE-13B

xverse/XVERSE-13B-Chat

xverse

xverse

-

-

xverse/XVERSE-13B-Chat

xverse/XVERSE-65B

xverse

xverse

-

-

xverse/XVERSE-65B

xverse/XVERSE-65B-2

xverse

xverse

-

-

xverse/XVERSE-65B-2

xverse/XVERSE-65B-Chat

xverse

xverse

-

-

xverse/XVERSE-65B-Chat

xverse/XVERSE-13B-256K

xverse

xverse

-

-

xverse/XVERSE-13B-256K

xverse/XVERSE-MoE-A4.2B

xverse_moe

xverse

-

-

xverse/XVERSE-MoE-A4.2B

damo/nlp_seqgpt-560m

seggpt

default

-

-

DAMO-NLP/SeqGPT-560M

vivo-ai/BlueLM-7B-Chat-32K

bluelm

bluelm

-

-

vivo-ai/BlueLM-7B-Chat-32K

vivo-ai/BlueLM-7B-Chat

bluelm

bluelm

-

-

vivo-ai/BlueLM-7B-Chat

vivo-ai/BlueLM-7B-Base-32K

bluelm

bluelm

-

-

vivo-ai/BlueLM-7B-Base-32K

vivo-ai/BlueLM-7B-Base

bluelm

bluelm

-

-

vivo-ai/BlueLM-7B-Base

AI-ModelScope/c4ai-command-r-v01

c4ai

c4ai

transformers>=4.39

-

CohereForAI/c4ai-command-r-v01

AI-ModelScope/c4ai-command-r-plus

c4ai

c4ai

transformers>=4.39

-

CohereForAI/c4ai-command-r-plus

AI-ModelScope/dbrx-base

dbrx

dbrx

transformers>=4.36

-

databricks/dbrx-base

AI-ModelScope/dbrx-instruct

dbrx

dbrx

transformers>=4.36

-

databricks/dbrx-instruct

colossalai/grok-1-pytorch

grok

default

-

-

hpcai-tech/grok-1

AI-ModelScope/mamba-130m-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-130m-hf

AI-ModelScope/mamba-370m-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-370m-hf

AI-ModelScope/mamba-390m-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-390m-hf

AI-ModelScope/mamba-790m-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-790m-hf

AI-ModelScope/mamba-1.4b-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-1.4b-hf

AI-ModelScope/mamba-2.8b-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-2.8b-hf

damo/nlp_polylm_13b_text_generation

polylm

default

-

-

DAMO-NLP-MT/polylm-13b

AI-ModelScope/aya-expanse-8b

aya

aya

transformers>=4.44.0

-

CohereForAI/aya-expanse-8b

AI-ModelScope/aya-expanse-32b

aya

aya

transformers>=4.44.0

-

CohereForAI/aya-expanse-32b

moonshotai/Moonlight-16B-A3B

moonlight

moonlight

transformers<4.49

-

moonshotai/Moonlight-16B-A3B

moonshotai/Moonlight-16B-A3B-Instruct

moonlight

moonlight

transformers<4.49

-

moonshotai/Moonlight-16B-A3B-Instruct

moonshotai/Kimi-K2-Base

kimi_k2

kimi_k2

-

-

moonshotai/Kimi-K2-Base

moonshotai/Kimi-K2-Instruct

kimi_k2

kimi_k2

-

-

moonshotai/Kimi-K2-Instruct

moonshotai/Kimi-K2-Instruct-0905

kimi_k2

kimi_k2

-

-

moonshotai/Kimi-K2-Instruct-0905

moonshotai/Kimi-K2-Thinking

kimi_k2

kimi_k2

-

-

moonshotai/Kimi-K2-Thinking

XiaomiMiMo/MiMo-7B-Base

mimo

qwen

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-Base

XiaomiMiMo/MiMo-7B-SFT

mimo

qwen

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-SFT

XiaomiMiMo/MiMo-7B-RL-Zero

mimo

qwen

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-RL-Zero

XiaomiMiMo/MiMo-7B-RL

mimo

qwen

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-RL

XiaomiMiMo/MiMo-7B-RL-0530

mimo_rl

mimo_rl

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-RL-0530

rednote-hilab/dots.llm1.base

dots1

dots1

transformers>=4.53

-

rednote-hilab/dots.llm1.base

rednote-hilab/dots.llm1.inst

dots1

dots1

transformers>=4.53

-

rednote-hilab/dots.llm1.inst

Tencent-Hunyuan/Hunyuan-A13B-Instruct

hunyuan_moe

hunyuan_moe

-

-

tencent/Hunyuan-A13B-Instruct

Tencent-Hunyuan/Hunyuan-0.5B-Instruct

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Instruct

Tencent-Hunyuan/Hunyuan-1.8B-Instruct

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Instruct

Tencent-Hunyuan/Hunyuan-4B-Instruct

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Instruct

Tencent-Hunyuan/Hunyuan-7B-Instruct

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Instruct

Tencent-Hunyuan/Hunyuan-0.5B-Pretrain

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Pretrain

Tencent-Hunyuan/Hunyuan-1.8B-Pretrain

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Pretrain

Tencent-Hunyuan/Hunyuan-4B-Pretrain

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Pretrain

Tencent-Hunyuan/Hunyuan-7B-Pretrain

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Pretrain

Tencent-Hunyuan/Hunyuan-0.5B-Instruct-FP8

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Instruct-FP8

Tencent-Hunyuan/Hunyuan-1.8B-Instruct-FP8

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Instruct-FP8

Tencent-Hunyuan/Hunyuan-4B-Instruct-FP8

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Instruct-FP8

Tencent-Hunyuan/Hunyuan-7B-Instruct-FP8

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Instruct-FP8

Tencent-Hunyuan/Hunyuan-0.5B-Instruct-AWQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Instruct-AWQ-Int4

Tencent-Hunyuan/Hunyuan-1.8B-Instruct-AWQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Instruct-AWQ-Int4

Tencent-Hunyuan/Hunyuan-4B-Instruct-AWQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Instruct-AWQ-Int4

Tencent-Hunyuan/Hunyuan-7B-Instruct-AWQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Instruct-AWQ-Int4

Tencent-Hunyuan/Hunyuan-0.5B-Instruct-GPTQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Instruct-GPTQ-Int4

Tencent-Hunyuan/Hunyuan-1.8B-Instruct-GPTQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Instruct-GPTQ-Int4

Tencent-Hunyuan/Hunyuan-4B-Instruct-GPTQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Instruct-GPTQ-Int4

Tencent-Hunyuan/Hunyuan-7B-Instruct-GPTQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Instruct-GPTQ-Int4

PaddlePaddle/ERNIE-4.5-0.3B-Base-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-0.3B-PT

PaddlePaddle/ERNIE-4.5-0.3B-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-0.3B-PT

PaddlePaddle/ERNIE-4.5-21B-A3B-Base-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-21B-A3B-Base-PT

PaddlePaddle/ERNIE-4.5-21B-A3B-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-21B-A3B-PT

PaddlePaddle/ERNIE-4.5-300B-A47B-Base-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-300B-A47B-Base-PT

PaddlePaddle/ERNIE-4.5-300B-A47B-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-300B-A47B-PT

google/embeddinggemma-300m

gemma_emb

dummy

-

-

google/embeddinggemma-300m

PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking

ernie_thinking

ernie_thinking

-

-

baidu/ERNIE-4.5-21B-A3B-Thinking

meituan-longcat/LongCat-Flash-Chat

longchat

longchat

transformers>=4.54,<4.56

-

meituan-longcat/LongCat-Flash-Chat

meituan-longcat/LongCat-Flash-Chat-FP8

longchat

longchat

transformers>=4.54,<4.56

-

meituan-longcat/LongCat-Flash-Chat-FP8

answerdotai/ModernBERT-base

modern_bert

dummy

transformers>=4.48

bert

answerdotai/ModernBERT-base

answerdotai/ModernBERT-large

modern_bert

dummy

transformers>=4.48

bert

answerdotai/ModernBERT-large

iic/gte-modernbert-base

modern_bert_gte

dummy

transformers>=4.48

bert, embedding

Alibaba-NLP/gte-modernbert-base

iic/nlp_structbert_backbone_base_std

bert

dummy

-

bert

-

Shanghai_AI_Laboratory/internlm2-1_8b-reward

internlm2_reward

internlm2_reward

transformers>=4.38

-

internlm/internlm2-1_8b-reward

Shanghai_AI_Laboratory/internlm2-7b-reward

internlm2_reward

internlm2_reward

transformers>=4.38

-

internlm/internlm2-7b-reward

Shanghai_AI_Laboratory/internlm2-20b-reward

internlm2_reward

internlm2_reward

transformers>=4.38

-

internlm/internlm2-20b-reward

Qwen/Qwen2-Math-RM-72B

qwen2_reward

qwen

transformers>=4.37

-

Qwen/Qwen2-Math-RM-72B

Qwen/Qwen2.5-Math-PRM-7B

qwen2_5_prm

qwen2_5_math_prm

transformers>=4.37

-

Qwen/Qwen2.5-Math-PRM-7B

Qwen/Qwen2.5-Math-7B-PRM800K

qwen2_5_prm

qwen2_5_math_prm

transformers>=4.37

-

Qwen/Qwen2.5-Math-7B-PRM800K

Qwen/Qwen2.5-Math-PRM-72B

qwen2_5_prm

qwen2_5_math_prm

transformers>=4.37

-

Qwen/Qwen2.5-Math-PRM-72B

Qwen/Qwen2.5-Math-RM-72B

qwen2_5_math_reward

qwen2_5_math

transformers>=4.37

-

Qwen/Qwen2.5-Math-RM-72B

AI-ModelScope/Skywork-Reward-Llama-3.1-8B

llama3_2_reward

llama3_2

transformers>=4.43

-

Skywork/Skywork-Reward-Llama-3.1-8B

AI-ModelScope/Skywork-Reward-Llama-3.1-8B-v0.2

llama3_2_reward

llama3_2

transformers>=4.43

-

Skywork/Skywork-Reward-Llama-3.1-8B-v0.2

AI-ModelScope/GRM_Llama3.1_8B_rewardmodel-ft

llama3_2_reward

llama3_2

transformers>=4.43

-

Ray2333/GRM_Llama3.1_8B_rewardmodel-ft

AI-ModelScope/GRM-llama3.2-3B-rewardmodel-ft

llama3_2_reward

llama3_2

transformers>=4.43

-

Ray2333/GRM-llama3.2-3B-rewardmodel-ft

AI-ModelScope/Skywork-Reward-Gemma-2-27B

gemma_reward

gemma

transformers>=4.42

-

Skywork/Skywork-Reward-Gemma-2-27B

AI-ModelScope/Skywork-Reward-Gemma-2-27B-v0.2

gemma_reward

gemma

transformers>=4.42

-

Skywork/Skywork-Reward-Gemma-2-27B-v0.2

BAAI/bge-reranker-base

bge_reranker

bge_reranker

-

-

BAAI/bge-reranker-base

BAAI/bge-reranker-v2-m3

bge_reranker

bge_reranker

-

-

BAAI/bge-reranker-v2-m3

BAAI/bge-reranker-large

bge_reranker

bge_reranker

-

-

BAAI/bge-reranker-large

iic/gte-reranker-modernbert-base

modern_bert_gte_reranker

bert

transformers>=4.48

bert, reranker

Alibaba-NLP/gte-reranker-modernbert-base

Qwen/Qwen3-Reranker-0.6B

qwen3_reranker

qwen3_reranker

-

-

Qwen/Qwen3-Reranker-0.6B

Qwen/Qwen3-Reranker-4B

qwen3_reranker

qwen3_reranker

-

-

Qwen/Qwen3-Reranker-4B

Qwen/Qwen3-Reranker-8B

qwen3_reranker

qwen3_reranker

-

-

Qwen/Qwen3-Reranker-8B

Multimodal large models

Model ID

Model Type

Default Template

Requires

Support Megatron

Tags

HF Model ID

Qwen/Qwen-VL-Chat

qwen_vl

qwen_vl

-

vision

Qwen/Qwen-VL-Chat

Qwen/Qwen-VL

qwen_vl

qwen_vl

-

vision

Qwen/Qwen-VL

Qwen/Qwen-VL-Chat-Int4

qwen_vl

qwen_vl

-

vision

Qwen/Qwen-VL-Chat-Int4

Qwen/Qwen-Audio-Chat

qwen_audio

qwen_audio

-

audio

Qwen/Qwen-Audio-Chat

Qwen/Qwen-Audio

qwen_audio

qwen_audio

-

audio

Qwen/Qwen-Audio

Qwen/Qwen2-VL-2B-Instruct

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B-Instruct

Qwen/Qwen2-VL-7B-Instruct

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B-Instruct

Qwen/Qwen2-VL-72B-Instruct

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B-Instruct

Qwen/Qwen2-VL-2B

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B

Qwen/Qwen2-VL-7B

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B

Qwen/Qwen2-VL-72B

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8

Qwen/Qwen2-VL-2B-Instruct-AWQ

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B-Instruct-AWQ

Qwen/Qwen2-VL-7B-Instruct-AWQ

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B-Instruct-AWQ

Qwen/Qwen2-VL-72B-Instruct-AWQ

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B-Instruct-AWQ

bytedance-research/UI-TARS-2B-SFT

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-2B-SFT

bytedance-research/UI-TARS-7B-SFT

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-7B-SFT

bytedance-research/UI-TARS-7B-DPO

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-7B-DPO

bytedance-research/UI-TARS-72B-SFT

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-72B-SFT

bytedance-research/UI-TARS-72B-DPO

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-72B-DPO

allenai/olmOCR-7B-0225-preview

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

allenai/olmOCR-7B-0225-preview

Qwen/Qwen2.5-VL-3B-Instruct

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-3B-Instruct

Qwen/Qwen2.5-VL-7B-Instruct

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-7B-Instruct

Qwen/Qwen2.5-VL-32B-Instruct

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-32B-Instruct

Qwen/Qwen2.5-VL-72B-Instruct

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-72B-Instruct

Qwen/Qwen2.5-VL-3B-Instruct-AWQ

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-3B-Instruct-AWQ

Qwen/Qwen2.5-VL-7B-Instruct-AWQ

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-7B-Instruct-AWQ

Qwen/Qwen2.5-VL-32B-Instruct-AWQ

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-32B-Instruct-AWQ

Qwen/Qwen2.5-VL-72B-Instruct-AWQ

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-72B-Instruct-AWQ

Qwen/Qwen2.5-Omni-3B

qwen2_5_omni

qwen2_5_omni

transformers>=4.50, soundfile, qwen_omni_utils, decord

vision, video, audio

Qwen/Qwen2.5-Omni-3B

Qwen/Qwen2.5-Omni-7B

qwen2_5_omni

qwen2_5_omni

transformers>=4.50, soundfile, qwen_omni_utils, decord

vision, video, audio

Qwen/Qwen2.5-Omni-7B

Qwen/Qwen3-Omni-30B-A3B-Instruct

qwen3_omni

qwen3_omni

transformers>=4.57.dev0, soundfile, decord, qwen_omni_utils

vision, video, audio

Qwen/Qwen3-Omni-30B-A3B-Instruct

Qwen/Qwen3-Omni-30B-A3B-Thinking

qwen3_omni

qwen3_omni

transformers>=4.57.dev0, soundfile, decord, qwen_omni_utils

vision, video, audio

Qwen/Qwen3-Omni-30B-A3B-Thinking

Qwen/Qwen3-Omni-30B-A3B-Captioner

qwen3_omni

qwen3_omni

transformers>=4.57.dev0, soundfile, decord, qwen_omni_utils

vision, video, audio

Qwen/Qwen3-Omni-30B-A3B-Captioner

Qwen/Qwen2-Audio-7B-Instruct

qwen2_audio

qwen2_audio

transformers>=4.45,<4.49, librosa

audio

Qwen/Qwen2-Audio-7B-Instruct

Qwen/Qwen2-Audio-7B

qwen2_audio

qwen2_audio

transformers>=4.45,<4.49, librosa

audio

Qwen/Qwen2-Audio-7B

Qwen/Qwen3-VL-2B-Instruct

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-2B-Instruct

Qwen/Qwen3-VL-2B-Thinking

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-2B-Thinking

Qwen/Qwen3-VL-2B-Instruct-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-2B-Instruct-FP8

Qwen/Qwen3-VL-2B-Thinking-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-2B-Thinking-FP8

Qwen/Qwen3-VL-4B-Instruct

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-4B-Instruct

Qwen/Qwen3-VL-4B-Thinking

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-4B-Thinking

Qwen/Qwen3-VL-4B-Instruct-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-4B-Instruct-FP8

Qwen/Qwen3-VL-4B-Thinking-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-4B-Thinking-FP8

Qwen/Qwen3-VL-8B-Instruct

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-8B-Instruct

Qwen/Qwen3-VL-8B-Thinking

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-8B-Thinking

Qwen/Qwen3-VL-8B-Instruct-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-8B-Instruct-FP8

Qwen/Qwen3-VL-8B-Thinking-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-8B-Thinking-FP8

Qwen/Qwen3-VL-32B-Instruct

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-32B-Instruct

Qwen/Qwen3-VL-32B-Thinking

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-32B-Thinking

Qwen/Qwen3-VL-32B-Instruct-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-32B-Instruct-FP8

Qwen/Qwen3-VL-32B-Thinking-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-32B-Thinking-FP8

Qwen/Qwen3-VL-30B-A3B-Instruct

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-30B-A3B-Instruct

Qwen/Qwen3-VL-30B-A3B-Thinking

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-30B-A3B-Thinking

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

Qwen/Qwen3-VL-30B-A3B-Thinking-FP8

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-30B-A3B-Thinking-FP8

Qwen/Qwen3-VL-235B-A22B-Instruct

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-235B-A22B-Instruct

Qwen/Qwen3-VL-235B-A22B-Thinking

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-235B-A22B-Thinking

Qwen/Qwen3-VL-235B-A22B-Instruct-FP8

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-235B-A22B-Instruct-FP8

Qwen/Qwen3-VL-235B-A22B-Thinking-FP8

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-235B-A22B-Thinking-FP8

Qwen/QVQ-72B-Preview

qvq

qvq

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/QVQ-72B-Preview

iic/gme-Qwen2-VL-2B-Instruct

qwen2_gme

qwen2_gme

-

vision

Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

iic/gme-Qwen2-VL-7B-Instruct

qwen2_gme

qwen2_gme

-

vision

Alibaba-NLP/gme-Qwen2-VL-7B-Instruct

AIDC-AI/Ovis1.6-Gemma2-9B

ovis1_6

ovis1_6

transformers>=4.42

vision

AIDC-AI/Ovis1.6-Gemma2-9B

AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4

ovis1_6

ovis1_6

transformers>=4.42

vision

AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4

AIDC-AI/Ovis1.6-Gemma2-27B

ovis1_6

ovis1_6

transformers>=4.42

vision

AIDC-AI/Ovis1.6-Gemma2-27B

AIDC-AI/Ovis1.6-Llama3.2-3B

ovis1_6_llama3

ovis1_6_llama3

-

vision

AIDC-AI/Ovis1.6-Llama3.2-3B

AIDC-AI/Ovis2-1B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-1B

AIDC-AI/Ovis2-2B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-2B

AIDC-AI/Ovis2-4B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-4B

AIDC-AI/Ovis2-8B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-8B

AIDC-AI/Ovis2-16B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-16B

AIDC-AI/Ovis2-34B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-34B

AIDC-AI/Ovis2.5-2B

ovis2_5

ovis2_5

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2.5-2B

AIDC-AI/Ovis2.5-9B

ovis2_5

ovis2_5

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2.5-9B

XiaomiMiMo/MiMo-VL-7B-SFT

mimo_vl

mimo_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

XiaomiMiMo/MiMo-VL-7B-SFT

XiaomiMiMo/MiMo-VL-7B-RL

mimo_vl

mimo_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

XiaomiMiMo/MiMo-VL-7B-RL

mispeech/midashenglm-7b

midashenglm

midashenglm

transformers>=4.52, soundfile

audio

mispeech/midashenglm-7b

ZhipuAI/glm-4v-9b

glm4v

glm4v

transformers>=4.42,<4.45

-

zai-org/glm-4v-9b

ZhipuAI/cogagent-9b-20241220

glm4v

glm4v

transformers>=4.42

-

zai-org/cogagent-9b-20241220

ZhipuAI/GLM-4.1V-9B-Base

glm4_1v

glm4_1v

transformers>=4.53

-

zai-org/GLM-4.1V-9B-Base

ZhipuAI/GLM-4.1V-9B-Thinking

glm4_1v

glm4_1v

transformers>=4.53

-

zai-org/GLM-4.1V-9B-Thinking

ZhipuAI/Glyph

glm4_1v

glm4_1v

transformers>=4.57

-

zai-org/Glyph

ZhipuAI/GLM-4.5V

glm4_5v

glm4_5v

transformers>=4.56

-

zai-org/GLM-4.5V

ZhipuAI/GLM-4.5V-FP8

glm4_5v

glm4_5v

transformers>=4.56

-

zai-org/GLM-4.5V-FP8

ZhipuAI/glm-edge-v-2b

glm_edge_v

glm_edge_v

transformers>=4.46

vision

zai-org/glm-edge-v-2b

ZhipuAI/glm-edge-4b-chat

glm_edge_v

glm_edge_v

transformers>=4.46

vision

zai-org/glm-edge-4b-chat

ZhipuAI/cogvlm-chat

cogvlm

cogvlm

transformers<4.42

-

zai-org/cogvlm-chat-hf

ZhipuAI/cogagent-vqa

cogagent_vqa

cogagent_vqa

transformers<4.42

-

zai-org/cogagent-vqa-hf

ZhipuAI/cogagent-chat

cogagent_chat

cogagent_chat

transformers<4.42, timm

-

zai-org/cogagent-chat-hf

ZhipuAI/cogvlm2-llama3-chat-19B

cogvlm2

cogvlm2

transformers<4.42

-

zai-org/cogvlm2-llama3-chat-19B

ZhipuAI/cogvlm2-llama3-chinese-chat-19B

cogvlm2

cogvlm2

transformers<4.42

-

zai-org/cogvlm2-llama3-chinese-chat-19B

ZhipuAI/cogvlm2-video-llama3-chat

cogvlm2_video

cogvlm2_video

decord, pytorchvideo, transformers>=4.42

video

zai-org/cogvlm2-video-llama3-chat

OpenGVLab/Mini-InternVL-Chat-2B-V1-5

internvl

internvl

transformers>=4.35, timm

vision

OpenGVLab/Mini-InternVL-Chat-2B-V1-5

AI-ModelScope/InternVL-Chat-V1-5

internvl

internvl

transformers>=4.35, timm

vision

OpenGVLab/InternVL-Chat-V1-5

AI-ModelScope/InternVL-Chat-V1-5-int8

internvl

internvl

transformers>=4.35, timm

vision

OpenGVLab/InternVL-Chat-V1-5-int8

OpenGVLab/Mini-InternVL-Chat-4B-V1-5

internvl_phi3

internvl_phi3

transformers>=4.35,<4.42, timm

vision

OpenGVLab/Mini-InternVL-Chat-4B-V1-5

OpenGVLab/InternVL2-1B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-1B

OpenGVLab/InternVL2-2B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-2B

OpenGVLab/InternVL2-8B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B

OpenGVLab/InternVL2-26B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-26B

OpenGVLab/InternVL2-40B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-40B

OpenGVLab/InternVL2-Llama3-76B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Llama3-76B

OpenGVLab/InternVL2-2B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-2B-AWQ

OpenGVLab/InternVL2-8B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B-AWQ

OpenGVLab/InternVL2-26B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-26B-AWQ

OpenGVLab/InternVL2-40B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-40B-AWQ

OpenGVLab/InternVL2-Llama3-76B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Llama3-76B-AWQ

OpenGVLab/InternVL2-8B-MPO

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B-MPO

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-1B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-1B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-2B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-2B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-4B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-4B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-8B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-8B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-26B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-26B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-40B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-40B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-Llama3-76B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-Llama3-76B-Pretrain

OpenGVLab/InternVL2-4B

internvl2_phi3

internvl2_phi3

transformers>=4.36,<4.42, timm

vision, video

OpenGVLab/InternVL2-4B

OpenGVLab/InternVL2_5-1B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-1B

OpenGVLab/InternVL2_5-2B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-2B

OpenGVLab/InternVL2_5-4B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-4B

OpenGVLab/InternVL2_5-8B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-8B

OpenGVLab/InternVL2_5-26B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-26B

OpenGVLab/InternVL2_5-38B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-38B

OpenGVLab/InternVL2_5-78B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-78B

OpenGVLab/InternVL2_5-4B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-4B-AWQ

OpenGVLab/InternVL2_5-8B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-8B-AWQ

OpenGVLab/InternVL2_5-26B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-26B-AWQ

OpenGVLab/InternVL2_5-38B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-38B-AWQ

OpenGVLab/InternVL2_5-78B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-78B-AWQ

OpenGVLab/InternVL2_5-1B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-1B-MPO

OpenGVLab/InternVL2_5-2B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-2B-MPO

OpenGVLab/InternVL2_5-4B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-4B-MPO

OpenGVLab/InternVL2_5-8B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-8B-MPO

OpenGVLab/InternVL2_5-26B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-26B-MPO

OpenGVLab/InternVL2_5-38B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-38B-MPO

OpenGVLab/InternVL2_5-78B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-78B-MPO

OpenGVLab/InternVL3-1B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-1B-Pretrained

OpenGVLab/InternVL3-2B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-2B-Pretrained

OpenGVLab/InternVL3-8B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-8B-Pretrained

OpenGVLab/InternVL3-9B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-9B-Pretrained

OpenGVLab/InternVL3-14B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-14B-Pretrained

OpenGVLab/InternVL3-38B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-38B-Pretrained

OpenGVLab/InternVL3-78B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-78B-Pretrained

OpenGVLab/InternVL3-1B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-1B-Instruct

OpenGVLab/InternVL3-2B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-2B-Instruct

OpenGVLab/InternVL3-8B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-8B-Instruct

OpenGVLab/InternVL3-9B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-9B-Instruct

OpenGVLab/InternVL3-14B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-14B-Instruct

OpenGVLab/InternVL3-38B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-38B-Instruct

OpenGVLab/InternVL3-78B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-78B-Instruct

OpenGVLab/InternVL3-1B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-1B

OpenGVLab/InternVL3-2B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-2B

OpenGVLab/InternVL3-8B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-8B

OpenGVLab/InternVL3-9B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-9B

OpenGVLab/InternVL3-14B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-14B

OpenGVLab/InternVL3-38B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-38B

OpenGVLab/InternVL3-78B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-78B

OpenGVLab/InternVL3-1B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-1B-AWQ

OpenGVLab/InternVL3-2B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-2B-AWQ

OpenGVLab/InternVL3-8B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-8B-AWQ

OpenGVLab/InternVL3-9B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-9B-AWQ

OpenGVLab/InternVL3-14B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-14B-AWQ

OpenGVLab/InternVL3-38B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-38B-AWQ

OpenGVLab/InternVL3-78B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-78B-AWQ

SenseNova/SenseNova-SI-InternVL3-2B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

sensenova/SenseNova-SI-InternVL3-2B

SenseNova/SenseNova-SI-InternVL3-8B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

sensenova/SenseNova-SI-InternVL3-8B

SenseNova/SenseNova-SI-1.1-InternVL3-2B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

sensenova/SenseNova-SI-1.1-InternVL3-2B

SenseNova/SenseNova-SI-1.1-InternVL3-8B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

sensenova/SenseNova-SI-1.1-InternVL3-8B

OpenGVLab/InternVL3-1B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-1B-hf

OpenGVLab/InternVL3-2B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-2B-hf

OpenGVLab/InternVL3-8B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-8B-hf

OpenGVLab/InternVL3-9B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-9B-hf

OpenGVLab/InternVL3-14B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-14B-hf

OpenGVLab/InternVL3-38B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-38B-hf

OpenGVLab/InternVL3-78B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-78B-hf

OpenGVLab/InternVL3_5-1B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-1B-HF

OpenGVLab/InternVL3_5-2B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-2B-HF

OpenGVLab/InternVL3_5-4B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-4B-HF

OpenGVLab/InternVL3_5-8B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-8B-HF

OpenGVLab/InternVL3_5-14B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-14B-HF

OpenGVLab/InternVL3_5-38B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-38B-HF

OpenGVLab/InternVL3_5-30B-A3B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B-HF

OpenGVLab/InternVL3_5-241B-A28B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B-HF

OpenGVLab/InternVL3_5-1B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-1B-Pretrained

OpenGVLab/InternVL3_5-2B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-2B-Pretrained

OpenGVLab/InternVL3_5-4B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-4B-Pretrained

OpenGVLab/InternVL3_5-8B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-8B-Pretrained

OpenGVLab/InternVL3_5-14B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-14B-Pretrained

OpenGVLab/InternVL3_5-38B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-38B-Pretrained

OpenGVLab/InternVL3_5-30B-A3B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B-Pretrained

OpenGVLab/InternVL3_5-241B-A28B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B-Pretrained

OpenGVLab/InternVL3_5-1B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-1B-Instruct

OpenGVLab/InternVL3_5-2B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-2B-Instruct

OpenGVLab/InternVL3_5-4B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-4B-Instruct

OpenGVLab/InternVL3_5-8B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-8B-Instruct

OpenGVLab/InternVL3_5-14B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-14B-Instruct

OpenGVLab/InternVL3_5-38B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-38B-Instruct

OpenGVLab/InternVL3_5-30B-A3B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B-Instruct

OpenGVLab/InternVL3_5-241B-A28B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B-Instruct

OpenGVLab/InternVL3_5-1B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-1B-MPO

OpenGVLab/InternVL3_5-2B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-2B-MPO

OpenGVLab/InternVL3_5-4B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-4B-MPO

OpenGVLab/InternVL3_5-8B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-8B-MPO

OpenGVLab/InternVL3_5-14B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-14B-MPO

OpenGVLab/InternVL3_5-38B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-38B-MPO

OpenGVLab/InternVL3_5-30B-A3B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B-MPO

OpenGVLab/InternVL3_5-241B-A28B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B-MPO

OpenGVLab/InternVL3_5-1B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-1B

OpenGVLab/InternVL3_5-2B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-2B

OpenGVLab/InternVL3_5-4B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-4B

OpenGVLab/InternVL3_5-8B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-8B

OpenGVLab/InternVL3_5-14B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-14B

OpenGVLab/InternVL3_5-38B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-38B

OpenGVLab/InternVL3_5-30B-A3B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B

OpenGVLab/InternVL3_5-241B-A28B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview

internvl3_5_gpt

internvl3_5_gpt

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF

internvl_gpt_hf

internvl_hf

transformers>=4.55.0, timm

vision, video

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF

Shanghai_AI_Laboratory/Intern-S1-mini

interns1

interns1

transformers>=4.55.2,<4.56

vision, video

internlm/Intern-S1-mini

Shanghai_AI_Laboratory/Intern-S1

interns1

interns1

transformers>=4.55.2,<4.56

vision, video

internlm/Intern-S1

Shanghai_AI_Laboratory/Intern-S1-mini-FP8

interns1

interns1

transformers>=4.55.2,<4.56

vision, video

internlm/Intern-S1-mini-FP8

Shanghai_AI_Laboratory/Intern-S1-FP8

interns1

interns1

transformers>=4.55.2,<4.56

vision, video

internlm/Intern-S1-FP8

Shanghai_AI_Laboratory/internlm-xcomposer2-7b

xcomposer2

ixcomposer2

-

vision

internlm/internlm-xcomposer2-7b

Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b

xcomposer2_4khd

ixcomposer2

-

vision

internlm/internlm-xcomposer2-4khd-7b

Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b

xcomposer2_5

xcomposer2_5

decord

vision

internlm/internlm-xcomposer2d5-7b

Shanghai_AI_Laboratory/internlm-xcomposer2d5-ol-7b:base

xcomposer2_5

xcomposer2_5

decord

vision

internlm/internlm-xcomposer2d5-ol-7b:base

Shanghai_AI_Laboratory/internlm-xcomposer2d5-ol-7b:audio

xcomposer2_5_ol_audio

qwen2_audio

transformers>=4.45

audio

internlm/internlm-xcomposer2d5-ol-7b:audio

LLM-Research/Llama-3.2-11B-Vision-Instruct

llama3_2_vision

llama3_2_vision

transformers>=4.45

vision

meta-llama/Llama-3.2-11B-Vision-Instruct

LLM-Research/Llama-3.2-90B-Vision-Instruct

llama3_2_vision

llama3_2_vision

transformers>=4.45

vision

meta-llama/Llama-3.2-90B-Vision-Instruct

LLM-Research/Llama-3.2-11B-Vision

llama3_2_vision

llama3_2_vision

transformers>=4.45

vision

meta-llama/Llama-3.2-11B-Vision

LLM-Research/Llama-3.2-90B-Vision

llama3_2_vision

llama3_2_vision

transformers>=4.45

vision

meta-llama/Llama-3.2-90B-Vision

LLM-Research/Llama-4-Scout-17B-16E

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Scout-17B-16E

LLM-Research/Llama-4-Maverick-17B-128E

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Maverick-17B-128E

LLM-Research/Llama-4-Scout-17B-16E-Instruct

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Scout-17B-16E-Instruct

LLM-Research/Llama-4-Maverick-17B-128E-Instruct-FP8

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

LLM-Research/Llama-4-Maverick-17B-128E-Instruct

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Maverick-17B-128E-Instruct

ICTNLP/Llama-3.1-8B-Omni

llama3_1_omni

llama3_1_omni

openai-whisper

audio

ICTNLP/Llama-3.1-8B-Omni

llava-hf/llava-1.5-7b-hf

llava1_5_hf

llava1_5_hf

transformers>=4.36

vision

llava-hf/llava-1.5-7b-hf

llava-hf/llava-1.5-13b-hf

llava1_5_hf

llava1_5_hf

transformers>=4.36

vision

llava-hf/llava-1.5-13b-hf

llava-hf/llava-v1.6-mistral-7b-hf

llava1_6_mistral_hf

llava1_6_mistral_hf

transformers>=4.39

vision

llava-hf/llava-v1.6-mistral-7b-hf

llava-hf/llava-v1.6-vicuna-7b-hf

llava1_6_vicuna_hf

llava1_6_vicuna_hf

transformers>=4.39

vision

llava-hf/llava-v1.6-vicuna-7b-hf

llava-hf/llava-v1.6-vicuna-13b-hf

llava1_6_vicuna_hf

llava1_6_vicuna_hf

transformers>=4.39

vision

llava-hf/llava-v1.6-vicuna-13b-hf

llava-hf/llava-v1.6-34b-hf

llava1_6_yi_hf

llava1_6_yi_hf

transformers>=4.39

vision

llava-hf/llava-v1.6-34b-hf

llava-hf/llama3-llava-next-8b-hf

llama3_llava_next_hf

llama3_llava_next_hf

transformers>=4.39

vision

llava-hf/llama3-llava-next-8b-hf

llava-hf/llava-next-72b-hf

llava_next_qwen_hf

llava_next_qwen_hf

transformers>=4.39

vision

llava-hf/llava-next-72b-hf

llava-hf/llava-next-110b-hf

llava_next_qwen_hf

llava_next_qwen_hf

transformers>=4.39

vision

llava-hf/llava-next-110b-hf

llava-hf/LLaVA-NeXT-Video-7B-DPO-hf

llava_next_video_hf

llava_next_video_hf

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-DPO-hf

llava-hf/LLaVA-NeXT-Video-7B-32K-hf

llava_next_video_hf

llava_next_video_hf

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-32K-hf

llava-hf/LLaVA-NeXT-Video-7B-hf

llava_next_video_hf

llava_next_video_hf

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-hf

llava-hf/LLaVA-NeXT-Video-34B-hf

llava_next_video_yi_hf

llava_next_video_hf

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-34B-hf

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

llava_onevision_hf

llava_onevision_hf

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

llava-hf/llava-onevision-qwen2-7b-ov-hf

llava_onevision_hf

llava_onevision_hf

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-7b-ov-hf

llava-hf/llava-onevision-qwen2-72b-ov-hf

llava_onevision_hf

llava_onevision_hf

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-72b-ov-hf

01ai/Yi-VL-6B

yi_vl

yi_vl

transformers>=4.34

vision

01-ai/Yi-VL-6B

01ai/Yi-VL-34B

yi_vl

yi_vl

transformers>=4.34

vision

01-ai/Yi-VL-34B

PaddlePaddle/ERNIE-4.5-VL-28B-A3B-PT

ernie_vl

ernie_vl

transformers>=4.52, moviepy

-

baidu/ERNIE-4.5-VL-28B-A3B-PT

PaddlePaddle/ERNIE-4.5-VL-424B-A47B-PT

ernie_vl

ernie_vl

transformers>=4.52, moviepy

-

baidu/ERNIE-4.5-VL-424B-A47B-PT

PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Base-PT

ernie_vl

ernie_vl

transformers>=4.52, moviepy

-

baidu/ERNIE-4.5-VL-28B-A3B-Base-PT

PaddlePaddle/ERNIE-4.5-VL-424B-A47B-Base-PT

ernie_vl

ernie_vl

transformers>=4.52, moviepy

-

baidu/ERNIE-4.5-VL-424B-A47B-Base-PT

PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking

ernie_vl_thinking

ernie_vl_thinking

transformers>=4.52, moviepy

-

baidu/ERNIE-4.5-VL-28B-A3B-Thinking

swift/llava-llama3.1-8b

llava_llama3_1_hf

llava_llama3_1_hf

transformers>=4.41

vision

-

AI-ModelScope/llava-llama-3-8b-v1_1-transformers

llava_llama3_hf

llava_llama3_hf

transformers>=4.36

vision

xtuner/llava-llama-3-8b-v1_1-transformers

AI-ModelScope/llava-v1.6-mistral-7b

llava1_6_mistral

llava1_6_mistral

transformers>=4.34

vision

liuhaotian/llava-v1.6-mistral-7b

AI-ModelScope/llava-v1.6-34b

llava1_6_yi

llava1_6_yi

transformers>=4.34

vision

liuhaotian/llava-v1.6-34b

AI-ModelScope/llava-next-72b

llava_next_qwen

llava_next_qwen

transformers>=4.42, av

vision

lmms-lab/llava-next-72b

AI-ModelScope/llava-next-110b

llava_next_qwen

llava_next_qwen

transformers>=4.42, av

vision

lmms-lab/llava-next-110b

AI-ModelScope/llama3-llava-next-8b

llama3_llava_next

llama3_llava_next

transformers>=4.42, av

vision

lmms-lab/llama3-llava-next-8b

lmms-lab/LLaVA-OneVision-1.5-4B-Instruct

llava_onevision1_5

llava_onevision1_5

transformers>=4.53.0, qwen_vl_utils

vision

lmms-lab/LLaVA-OneVision-1.5-4B-Instruct

lmms-lab/LLaVA-OneVision-1.5-8B-Instruct

llava_onevision1_5

llava_onevision1_5

transformers>=4.53.0, qwen_vl_utils

vision

lmms-lab/LLaVA-OneVision-1.5-8B-Instruct

lmms-lab/LLaVA-OneVision-1.5-4B-Base

llava_onevision1_5

llava_onevision1_5

transformers>=4.53.0, qwen_vl_utils

vision

lmms-lab/LLaVA-OneVision-1.5-4B-Base

lmms-lab/LLaVA-OneVision-1.5-8B-Base

llava_onevision1_5

llava_onevision1_5

transformers>=4.53.0, qwen_vl_utils

vision

lmms-lab/LLaVA-OneVision-1.5-8B-Base

deepseek-ai/deepseek-vl-1.3b-chat

deepseek_vl

deepseek_vl

-

vision

deepseek-ai/deepseek-vl-1.3b-chat

deepseek-ai/deepseek-vl-7b-chat

deepseek_vl

deepseek_vl

-

vision

deepseek-ai/deepseek-vl-7b-chat

deepseek-ai/deepseek-vl2-tiny

deepseek_vl2

deepseek_vl2

transformers<4.42

vision

deepseek-ai/deepseek-vl2-tiny

deepseek-ai/deepseek-vl2-small

deepseek_vl2

deepseek_vl2

transformers<4.42

vision

deepseek-ai/deepseek-vl2-small

deepseek-ai/deepseek-vl2

deepseek_vl2

deepseek_vl2

transformers<4.42

vision

deepseek-ai/deepseek-vl2

deepseek-ai/Janus-1.3B

deepseek_janus

deepseek_janus

-

vision

deepseek-ai/Janus-1.3B

deepseek-ai/Janus-Pro-1B

deepseek_janus_pro

deepseek_janus_pro

-

vision

deepseek-ai/Janus-Pro-1B

deepseek-ai/Janus-Pro-7B

deepseek_janus_pro

deepseek_janus_pro

-

vision

deepseek-ai/Janus-Pro-7B

deepseek-ai/DeepSeek-OCR

deepseek_ocr

deepseek_ocr

transformers==4.46.3, easydict

vision

deepseek-ai/DeepSeek-OCR

OpenBMB/MiniCPM-V

minicpmv

minicpmv

timm, transformers<4.42

vision

openbmb/MiniCPM-V

OpenBMB/MiniCPM-V-2

minicpmv

minicpmv

timm, transformers<4.42

vision

openbmb/MiniCPM-V-2

OpenBMB/MiniCPM-Llama3-V-2_5

minicpmv2_5

minicpmv2_5

timm, transformers>=4.36

vision

openbmb/MiniCPM-Llama3-V-2_5

OpenBMB/MiniCPM-V-2_6

minicpmv2_6

minicpmv2_6

timm, transformers>=4.36, decord

vision, video

openbmb/MiniCPM-V-2_6

OpenBMB/MiniCPM-o-2_6

minicpmo2_6

minicpmo2_6

timm, transformers>=4.36, decord, soundfile

vision, video, omni, audio

openbmb/MiniCPM-o-2_6

OpenBMB/MiniCPM-V-4

minicpmv4

minicpmv4

timm, transformers>=4.36, decord

vision, video

openbmb/MiniCPM-V-4

OpenBMB/MiniCPM-V-4_5

minicpmv4_5

minicpmv4_5

timm, transformers>=4.36, decord

vision, video

openbmb/MiniCPM-V-4_5

MiniMax/MiniMax-VL-01

minimax_vl

minimax_vl

-

vision

MiniMaxAI/MiniMax-VL-01

iic/mPLUG-Owl2

mplug_owl2

mplug_owl2

transformers<4.35, icecream

vision

MAGAer13/mplug-owl2-llama2-7b

iic/mPLUG-Owl2.1

mplug_owl2_1

mplug_owl2

transformers<4.35, icecream

vision

Mizukiluke/mplug_owl_2_1

iic/mPLUG-Owl3-1B-241014

mplug_owl3

mplug_owl3

transformers>=4.36, icecream, decord

vision, video

mPLUG/mPLUG-Owl3-1B-241014

iic/mPLUG-Owl3-2B-241014

mplug_owl3

mplug_owl3

transformers>=4.36, icecream, decord

vision, video

mPLUG/mPLUG-Owl3-2B-241014

iic/mPLUG-Owl3-7B-240728

mplug_owl3

mplug_owl3

transformers>=4.36, icecream, decord

vision, video

mPLUG/mPLUG-Owl3-7B-240728

iic/mPLUG-Owl3-7B-241101

mplug_owl3_241101

mplug_owl3_241101

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-7B-241101

iic/DocOwl2

doc_owl2

doc_owl2

transformers>=4.36, icecream

vision

mPLUG/DocOwl2

BAAI/Emu3-Gen

emu3_gen

emu3_gen

-

t2i

BAAI/Emu3-Gen

BAAI/Emu3-Chat

emu3_chat

emu3_chat

transformers>=4.44.0

vision

BAAI/Emu3-Chat

stepfun-ai/GOT-OCR2_0

got_ocr2

got_ocr2

-

vision

stepfun-ai/GOT-OCR2_0

stepfun-ai/GOT-OCR-2.0-hf

got_ocr2_hf

got_ocr2_hf

-

vision

stepfun-ai/GOT-OCR-2.0-hf

stepfun-ai/Step-Audio-Chat

step_audio

step_audio

funasr, sox, conformer, openai-whisper, librosa

audio

stepfun-ai/Step-Audio-Chat

stepfun-ai/Step-Audio-2-mini

step_audio2_mini

step_audio2_mini

transformers==4.53.3, torchaudio, librosa

audio

stepfun-ai/Step-Audio-2-mini

moonshotai/Kimi-VL-A3B-Instruct

kimi_vl

kimi_vl

transformers<4.49

-

moonshotai/Kimi-VL-A3B-Instruct

moonshotai/Kimi-VL-A3B-Thinking

kimi_vl

kimi_vl

transformers<4.49

-

moonshotai/Kimi-VL-A3B-Thinking

moonshotai/Kimi-VL-A3B-Thinking-2506

kimi_vl

kimi_vl

transformers<4.49

-

moonshotai/Kimi-VL-A3B-Thinking-2506

Kwai-Keye/Keye-VL-8B-Preview

keye_vl

keye_vl

keye_vl_utils

vision

Kwai-Keye/Keye-VL-8B-Preview

Kwai-Keye/Keye-VL-1_5-8B

keye_vl_1_5

keye_vl_1_5

keye_vl_utils>=1.5.2

vision

Kwai-Keye/Keye-VL-1_5-8B

rednote-hilab/dots.ocr

dots_ocr

dots_ocr

transformers>=4.51.0

-

rednote-hilab/dots.ocr

BytedanceDouyinContent/SAIL-VL2-2B

sail_vl2

sail_vl2

transformers<=4.51.3

vision

BytedanceDouyinContent/SAIL-VL2-2B

BytedanceDouyinContent/SAIL-VL2-2B-Thinking

sail_vl2

sail_vl2

transformers<=4.51.3

vision

BytedanceDouyinContent/SAIL-VL2-2B-Thinking

BytedanceDouyinContent/SAIL-VL2-8B

sail_vl2

sail_vl2

transformers<=4.51.3

vision

BytedanceDouyinContent/SAIL-VL2-8B

BytedanceDouyinContent/SAIL-VL2-8B-Thinking

sail_vl2

sail_vl2

transformers<=4.51.3

vision

BytedanceDouyinContent/SAIL-VL2-8B-Thinking

LLM-Research/Phi-3-vision-128k-instruct

phi3_vision

phi3_vision

transformers>=4.36

vision

microsoft/Phi-3-vision-128k-instruct

LLM-Research/Phi-3.5-vision-instruct

phi3_vision

phi3_vision

transformers>=4.36

vision

microsoft/Phi-3.5-vision-instruct

LLM-Research/Phi-4-multimodal-instruct

phi4_multimodal

phi4_multimodal

transformers>=4.36,<4.49, backoff, soundfile

vision, audio

microsoft/Phi-4-multimodal-instruct

AI-ModelScope/Florence-2-base-ft

florence

florence

-

vision

microsoft/Florence-2-base-ft

AI-ModelScope/Florence-2-base

florence

florence

-

vision

microsoft/Florence-2-base

AI-ModelScope/Florence-2-large

florence

florence

-

vision

microsoft/Florence-2-large

AI-ModelScope/Florence-2-large-ft

florence

florence

-

vision

microsoft/Florence-2-large-ft

AI-ModelScope/Idefics3-8B-Llama3

idefics3

idefics3

transformers>=4.45

vision

HuggingFaceM4/Idefics3-8B-Llama3

AI-ModelScope/paligemma-3b-pt-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-224

AI-ModelScope/paligemma-3b-pt-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-448

AI-ModelScope/paligemma-3b-pt-896

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-896

AI-ModelScope/paligemma-3b-mix-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-mix-224

AI-ModelScope/paligemma-3b-mix-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-mix-448

AI-ModelScope/paligemma2-3b-pt-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-3b-pt-224

AI-ModelScope/paligemma2-3b-pt-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-3b-pt-448

AI-ModelScope/paligemma2-3b-pt-896

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-3b-pt-896

AI-ModelScope/paligemma2-10b-pt-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-10b-pt-224

AI-ModelScope/paligemma2-10b-pt-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-10b-pt-448

AI-ModelScope/paligemma2-10b-pt-896

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-10b-pt-896

AI-ModelScope/paligemma2-28b-pt-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-28b-pt-224

AI-ModelScope/paligemma2-28b-pt-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-28b-pt-448

AI-ModelScope/paligemma2-28b-pt-896

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-28b-pt-896

AI-ModelScope/paligemma2-3b-ft-docci-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-3b-ft-docci-448

AI-ModelScope/paligemma2-10b-ft-docci-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-10b-ft-docci-448

LLM-Research/Molmo-7B-O-0924

molmo

molmo

transformers>=4.45

vision

allenai/Molmo-7B-O-0924

LLM-Research/Molmo-7B-D-0924

molmo

molmo

transformers>=4.45

vision

allenai/Molmo-7B-D-0924

LLM-Research/Molmo-72B-0924

molmo

molmo

transformers>=4.45

vision

allenai/Molmo-72B-0924

LLM-Research/MolmoE-1B-0924

molmoe

molmo

transformers>=4.45

vision

allenai/MolmoE-1B-0924

AI-ModelScope/pixtral-12b

pixtral

pixtral

transformers>=4.45

vision

mistral-community/pixtral-12b

InfiniAI/Megrez-3B-Omni

megrez_omni

megrez_omni

-

vision, audio

Infinigence/Megrez-3B-Omni

bytedance-research/Valley-Eagle-7B

valley

valley

transformers>=4.42, av

vision

-

LLM-Research/gemma-3-4b-pt

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-4b-pt

LLM-Research/gemma-3-4b-it

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-4b-it

LLM-Research/gemma-3-12b-pt

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-12b-pt

LLM-Research/gemma-3-12b-it

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-12b-it

LLM-Research/gemma-3-27b-pt

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-27b-pt

LLM-Research/gemma-3-27b-it

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-27b-it

google/gemma-3n-E2B

gemma3n

gemma3n

transformers>=4.53.1

-

google/gemma-3n-E2B

google/gemma-3n-E4B

gemma3n

gemma3n

transformers>=4.53.1

-

google/gemma-3n-E4B

google/gemma-3n-E2B-it

gemma3n

gemma3n

transformers>=4.53.1

-

google/gemma-3n-E2B-it

google/gemma-3n-E4B-it

gemma3n

gemma3n

transformers>=4.53.1

-

google/gemma-3n-E4B-it

mistralai/Mistral-Small-3.1-24B-Base-2503

mistral_2503

mistral_2503

transformers>=4.49

-

mistralai/Mistral-Small-3.1-24B-Base-2503

mistralai/Mistral-Small-3.1-24B-Instruct-2503

mistral_2503

mistral_2503

transformers>=4.49

-

mistralai/Mistral-Small-3.1-24B-Instruct-2503

mistralai/Mistral-Small-3.2-24B-Instruct-2506

mistral_2506

mistral_2506

transformers>=4.49

-

mistralai/Mistral-Small-3.2-24B-Instruct-2506

PaddlePaddle/PaddleOCR-VL

paddle_ocr

paddle_ocr

-

-

PaddlePaddle/PaddleOCR-VL

JinaAI/jina-reranker-m0

jina_reranker_m0

jina_reranker_m0

-

reranker, vision

JinaAI/jina-reranker-m0

Datasets

The table below introduces information about the datasets integrated with ms-swift:

  • Dataset ID: ModelScope dataset ID

  • HF Dataset ID: Hugging Face dataset ID

  • Subset Name: Name of the subset

  • Dataset Size: Size of the dataset

  • Statistic: The statistical count of the dataset. We use the number of tokens for statistics, which helps in adjusting the max_length hyperparameter. We tokenize the dataset using the tokenizer of qwen2.5. The token count varies with different tokenizers. If you need to obtain token statistics for tokenizers of other models, you can acquire it using the script.

  • Tags: Tags associated with the dataset

Dataset ID Subset Name Dataset Size Statistic (token) Tags HF Dataset ID
AI-MO/NuminaMath-1.5 default 896215 116.1±80.8, min=31, max=5064 grpo, math AI-MO/NuminaMath-1.5
AI-MO/NuminaMath-CoT default 859494 113.1±60.2, min=35, max=2120 grpo, math AI-MO/NuminaMath-CoT
AI-MO/NuminaMath-TIR default 72441 100.9±52.2, min=36, max=1683 grpo, math, 🔥 AI-MO/NuminaMath-TIR
AI-ModelScope/COIG-CQIA chinese_traditional
coig_pc
exam
finance
douban
human_value
logi_qa
ruozhiba
segmentfault
wiki
wikihow
xhs
zhihu
44694 331.2±693.8, min=34, max=19288 general, 🔥 -
AI-ModelScope/CodeAlpaca-20k default 20022 99.3±57.6, min=30, max=857 code, en HuggingFaceH4/CodeAlpaca_20K
AI-ModelScope/DISC-Law-SFT default 166758 1799.0±474.9, min=769, max=3151 chat, law, 🔥 ShengbinYue/DISC-Law-SFT
AI-ModelScope/DISC-Med-SFT default 464885 426.5±178.7, min=110, max=1383 chat, medical, 🔥 Flmc/DISC-Med-SFT
AI-ModelScope/Duet-v0.5 default 5000 1157.4±189.3, min=657, max=2344 CoT, en G-reen/Duet-v0.5
AI-ModelScope/GuanacoDataset default 31563 250.3±70.6, min=95, max=987 chat, zh JosephusCheung/GuanacoDataset
AI-ModelScope/LLaVA-Instruct-150K default 623302 630.7±143.0, min=301, max=1166 chat, multi-modal, vision -
AI-ModelScope/LLaVA-Pretrain default huge dataset - chat, multi-modal, quality liuhaotian/LLaVA-Pretrain
AI-ModelScope/LaTeX_OCR default
human_handwrite
human_handwrite_print
synthetic_handwrite
small
162149 117.6±44.9, min=41, max=312 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
AI-ModelScope/LongAlpaca-12k default 11998 9941.8±3417.1, min=4695, max=25826 long-sequence, QA Yukang/LongAlpaca-12k
AI-ModelScope/M3IT coco
vqa-v2
shapes
shapes-rephrased
coco-goi-rephrased
snli-ve
snli-ve-rephrased
okvqa
a-okvqa
viquae
textcap
docvqa
science-qa
imagenet
imagenet-open-ended
imagenet-rephrased
coco-goi
clevr
clevr-rephrased
nlvr
coco-itm
coco-itm-rephrased
vsr
vsr-rephrased
mocheg
mocheg-rephrased
coco-text
fm-iqa
activitynet-qa
msrvtt
ss
coco-cn
refcoco
refcoco-rephrased
multi30k
image-paragraph-captioning
visual-dialog
visual-dialog-rephrased
iqa
vcr
visual-mrc
ivqa
msrvtt-qa
msvd-qa
gqa
text-vqa
ocr-vqa
st-vqa
flickr8k-cn
huge dataset - chat, multi-modal, vision -
AI-ModelScope/MATH-lighteval default 7500 104.4±92.8, min=36, max=1683 grpo, math DigitalLearningGmbH/MATH-lighteval
AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese default 200000 448.4±223.5, min=87, max=4098 chat, sft, 🔥, zh Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
AI-ModelScope/Magpie-Qwen2-Pro-200K-English default 200000 609.9±277.1, min=257, max=4098 chat, sft, 🔥, en Magpie-Align/Magpie-Qwen2-Pro-200K-English
AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered default 300000 556.6±288.6, min=175, max=4098 chat, sft, 🔥 Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
AI-ModelScope/MathInstruct default 262040 253.3±177.4, min=42, max=2193 math, cot, en, quality TIGER-Lab/MathInstruct
AI-ModelScope/MovieChat-1K-test default 162 39.7±2.0, min=32, max=43 chat, multi-modal, video Enxin/MovieChat-1K-test
AI-ModelScope/Open-Platypus default 24926 389.0±256.4, min=55, max=3153 chat, math, quality garage-bAInd/Open-Platypus
AI-ModelScope/OpenO1-SFT default 125894 1080.7±622.9, min=145, max=11637 chat, general, o1 O1-OPEN/OpenO1-SFT
AI-ModelScope/OpenOrca default
3_5M
huge dataset - chat, multilingual, general -
AI-ModelScope/OpenOrca-Chinese default huge dataset - QA, zh, general, quality yys/OpenOrca-Chinese
AI-ModelScope/SFT-Nectar default 131201 441.9±307.0, min=45, max=3136 cot, en, quality AstraMindAI/SFT-Nectar
AI-ModelScope/ShareGPT-4o image_caption 57289 599.8±140.4, min=214, max=1932 vqa, multi-modal OpenGVLab/ShareGPT-4o
AI-ModelScope/ShareGPT4V ShareGPT4V
ShareGPT4V-PT
huge dataset - chat, multi-modal, vision -
AI-ModelScope/SkyPile-150B default huge dataset - pretrain, quality, zh Skywork/SkyPile-150B
AI-ModelScope/WizardLM_evol_instruct_V2_196k default 109184 483.3±338.4, min=27, max=3735 chat, en WizardLM/WizardLM_evol_instruct_V2_196k
AI-ModelScope/alpaca-cleaned default 51760 170.1±122.9, min=29, max=1028 chat, general, bench, quality yahma/alpaca-cleaned
AI-ModelScope/alpaca-gpt4-data-en default 52002 167.6±123.9, min=29, max=607 chat, general, 🔥 vicgalle/alpaca-gpt4
AI-ModelScope/alpaca-gpt4-data-zh default 48818 157.2±93.2, min=27, max=544 chat, general, 🔥 llm-wizard/alpaca-gpt4-data-zh
AI-ModelScope/blossom-math-v2 default 10000 175.4±59.1, min=35, max=563 chat, math, 🔥 Azure99/blossom-math-v2
AI-ModelScope/captcha-images default 8000 47.0±0.0, min=47, max=47 chat, multi-modal, vision -
AI-ModelScope/chartqa_digit_r1v_format default 11399 48.3±5.1, min=37, max=82 grpo zyang39/chartqa_digit_r1v_format
AI-ModelScope/clevr_cogen_a_train default 70000 67.0±0.0, min=67, max=67 qa, math, vision, grpo leonardPKU/clevr_cogen_a_train
AI-ModelScope/coco default huge dataset - multi-modal, en, vqa, quality detection-datasets/coco
AI-ModelScope/databricks-dolly-15k default 15011 199.0±268.8, min=26, max=5987 multi-task, en, quality databricks/databricks-dolly-15k
AI-ModelScope/deepctrl-sft-data default
en
huge dataset - chat, general, sft, multi-round -
AI-ModelScope/egoschema default
cls
101 191.6±80.7, min=96, max=435 chat, multi-modal, video lmms-lab/egoschema
AI-ModelScope/firefly-train-1.1M default 1649399 204.3±365.3, min=28, max=9306 chat, general YeungNLP/firefly-train-1.1M
AI-ModelScope/function-calling-chatml default 112958 465.3±320.1, min=36, max=6106 agent, en, sft, 🔥 Locutusque/function-calling-chatml
AI-ModelScope/generated_chat_0.4M default 396004 272.7±51.1, min=78, max=579 chat, character-dialogue BelleGroup/generated_chat_0.4M
AI-ModelScope/guanaco_belle_merge_v1.0 default 693987 133.8±93.5, min=30, max=1872 QA, zh Chinese-Vicuna/guanaco_belle_merge_v1.0
AI-ModelScope/hh-rlhf helpful-base
helpful-online
helpful-rejection-sampled
huge dataset - rlhf, dpo -
AI-ModelScope/hh_rlhf_cn hh_rlhf
harmless_base_cn
harmless_base_en
helpful_base_cn
helpful_base_en
362909 142.3±107.5, min=25, max=1571 rlhf, dpo, 🔥 -
AI-ModelScope/lawyer_llama_data default 21476 224.4±83.9, min=69, max=832 chat, law Skepsun/lawyer_llama_data
AI-ModelScope/leetcode-solutions-python default 2359 723.8±233.5, min=259, max=2117 chat, coding, 🔥 -
AI-ModelScope/lmsys-chat-1m default 166211 545.8±3272.8, min=22, max=219116 chat, em lmsys/lmsys-chat-1m
AI-ModelScope/math-trn-format default 11500 102.2±88.9, min=36, max=1683 math -
AI-ModelScope/ms_agent_for_agentfabric default
addition
30000 615.7±198.7, min=251, max=2055 chat, agent, multi-round, 🔥 -
AI-ModelScope/orpo-dpo-mix-40k default 43666 938.1±694.2, min=36, max=8483 dpo, orpo, en, quality mlabonne/orpo-dpo-mix-40k
AI-ModelScope/pile default huge dataset - pretrain EleutherAI/pile
AI-ModelScope/ruozhiba post-annual
title-good
title-norm
85658 40.0±18.3, min=22, max=559 pretrain, 🔥 -
AI-ModelScope/school_math_0.25M default 248481 158.8±73.4, min=39, max=980 chat, math, quality BelleGroup/school_math_0.25M
AI-ModelScope/sharegpt_gpt4 default
V3_format
zh_38K_format
103329 3476.6±5959.0, min=33, max=115132 chat, multilingual, general, multi-round, gpt4, 🔥 -
AI-ModelScope/sql-create-context default 78577 82.7±31.5, min=36, max=282 chat, sql, 🔥 b-mc2/sql-create-context
AI-ModelScope/stack-exchange-paired default huge dataset - hfrl, dpo, pairwise lvwerra/stack-exchange-paired
AI-ModelScope/starcoderdata default huge dataset - pretrain, quality bigcode/starcoderdata
AI-ModelScope/synthetic_text_to_sql default 100000 221.8±69.9, min=64, max=616 nl2sql, en gretelai/synthetic_text_to_sql
AI-ModelScope/texttosqlv2_25000_v2 default 25000 277.3±328.3, min=40, max=1971 chat, sql Clinton/texttosqlv2_25000_v2
AI-ModelScope/the-stack default huge dataset - pretrain, quality bigcode/the-stack
AI-ModelScope/tigerbot-law-plugin default 55895 104.9±51.0, min=43, max=1087 text-generation, law, pretrained TigerResearch/tigerbot-law-plugin
AI-ModelScope/train_0.5M_CN default 519255 128.4±87.4, min=31, max=936 common, zh, quality BelleGroup/train_0.5M_CN
AI-ModelScope/train_1M_CN default huge dataset - common, zh, quality BelleGroup/train_1M_CN
AI-ModelScope/train_2M_CN default huge dataset - common, zh, quality BelleGroup/train_2M_CN
AI-ModelScope/tulu-v2-sft-mixture default 326154 523.3±439.3, min=68, max=2549 chat, multilingual, general, multi-round allenai/tulu-v2-sft-mixture
AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto default 230720 471.5±274.3, min=27, max=2232 rlhf, kto -
AI-ModelScope/webnovel_cn default 50000 1455.2±12489.4, min=524, max=490480 chat, novel zxbsmk/webnovel_cn
AI-ModelScope/wikipedia-cn-20230720-filtered default huge dataset - pretrain, quality pleisto/wikipedia-cn-20230720-filtered
AI-ModelScope/zhihu_rlhf_3k default 3460 594.5±365.9, min=31, max=1716 rlhf, dpo, zh liyucheng/zhihu_rlhf_3k
DAMO_NLP/jd default
cls
45012 66.9±87.0, min=41, max=1699 text-generation, classification, 🔥 -
FreedomIntelligence/medical-o1-reasoning-SFT en
zh
50143 98.0±53.6, min=36, max=1508 medical, o1, 🔥 FreedomIntelligence/medical-o1-reasoning-SFT
- default huge dataset - pretrain, quality HuggingFaceFW/fineweb
- auto_math_text
khanacademy
openstax
stanford
stories
web_samples_v1
web_samples_v2
wikihow
huge dataset - multi-domain, en, qa HuggingFaceTB/cosmopedia
HumanLLMs/Human-Like-DPO-Dataset default 10884 47.5±7.9, min=32, max=85 rlhf, dpo HumanLLMs/Human-Like-DPO-Dataset
LLM-Research/xlam-function-calling-60k default
grpo
120000 453.7±219.5, min=164, max=2779 agent, grpo, 🔥 Salesforce/xlam-function-calling-60k
MTEB/scidocs-reranking default 39193 41.9±5.8, min=31, max=107 rerank, 🔥 mteb/scidocs-reranking
MTEB/stackoverflowdupquestions-reranking default 26485 39.9±4.6, min=31, max=77 rerank, 🔥 mteb/stackoverflowdupquestions-reranking
OmniData/Zhihu-KOL default huge dataset - zhihu, qa wangrui6/Zhihu-KOL
OmniData/Zhihu-KOL-More-Than-100-Upvotes default 271261 1003.4±1826.1, min=28, max=52541 zhihu, qa bzb2023/Zhihu-KOL-More-Than-100-Upvotes
PowerInfer/LONGCOT-Refine-500K default 521921 296.5±158.4, min=39, max=4634 chat, sft, 🔥, cot PowerInfer/LONGCOT-Refine-500K
PowerInfer/QWQ-LONGCOT-500K default 498082 310.7±303.1, min=35, max=22941 chat, sft, 🔥, cot PowerInfer/QWQ-LONGCOT-500K
ServiceNow-AI/R1-Distill-SFT v0
v1
1850809 164.2±438.0, min=30, max=32469 chat, sft, cot, r1 ServiceNow-AI/R1-Distill-SFT
TIGER-Lab/MATH-plus train 893929 301.4±196.7, min=50, max=1162 qa, math, en, quality TIGER-Lab/MATH-plus
Tongyi-DataEngine/SA1B-Dense-Caption default huge dataset - zh, multi-modal, vqa -
Tongyi-DataEngine/SA1B-Paired-Captions-Images default 7736284 106.4±18.5, min=48, max=193 zh, multi-modal, vqa -
YorickHe/CoT default 74771 141.6±45.5, min=58, max=410 chat, general -
YorickHe/CoT_zh default 74771 129.1±53.2, min=51, max=401 chat, general -
ZhipuAI/LongWriter-6k default 6000 5009.0±2932.8, min=117, max=30354 long, chat, sft, 🔥 zai-org/LongWriter-6k
- default huge dataset - pretrain, quality allenai/c4
bespokelabs/Bespoke-Stratos-17k default 16710 480.7±236.1, min=266, max=3556 chat, sft, cot, r1 bespokelabs/Bespoke-Stratos-17k
- default huge dataset - pretrain, quality cerebras/SlimPajama-627B
clip-benchmark/wds_voc2007_multilabel default 2501 112.0±0.0, min=112, max=112 multilabel, multi-modal clip-benchmark/wds_voc2007_multilabel
codefuse-ai/CodeExercise-Python-27k default 27224 337.3±154.2, min=90, max=2826 chat, coding, 🔥 -
codefuse-ai/Evol-instruction-66k default 66862 440.1±208.4, min=46, max=2661 chat, coding, 🔥 -
damo/MSAgent-Bench default
mini
638149 859.2±460.1, min=38, max=3479 chat, agent, multi-round -
damo/nlp_polylm_multialpaca_sft ar
de
es
fr
id
ja
ko
pt
ru
th
vi
131867 101.6±42.5, min=30, max=1029 chat, general, multilingual -
damo/zh_cls_fudan-news default 4959 3234.4±2547.5, min=91, max=19548 chat, classification -
damo/zh_ner-JAVE default 1266 118.3±45.5, min=44, max=223 chat, ner -
hjh0119/shareAI-Llama3-DPO-zh-en-emoji default 2449 334.0±162.8, min=36, max=1801 rlhf, dpo shareAI/DPO-zh-en-emoji
huangjintao/AgentInstruct_copy alfworld
db
kg
mind2web
os
webshop
1866 1144.3±635.5, min=206, max=6412 chat, agent, multi-round -
iic/100PoisonMpts default 906 150.6±80.8, min=39, max=656 poison-management, zh -
iic/DocQA-RL-1.6K default 1591 8307.3±7748.9, min=202, max=32563 docqa, rl, long-sequence Tongyi-Zhiwen/DocQA-RL-1.6K
iic/MSAgent-MultiRole default 543 413.0±79.7, min=70, max=936 chat, agent, multi-round, role-play, multi-agent -
iic/MSAgent-Pro default 21910 1978.1±747.9, min=339, max=8064 chat, agent, multi-round, 🔥 -
iic/ms_agent default 30000 645.8±218.0, min=199, max=2070 chat, agent, multi-round, 🔥 -
iic/ms_bench default 316820 353.4±424.5, min=29, max=2924 chat, general, multi-round, 🔥 -
liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT default 110000 72.1±60.9, min=29, max=2315 chat, sft, cot, r1, 🔥 Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT
- default huge dataset - multi-modal, en, vqa, quality lmms-lab/GQA
- 0_30_s_academic_v0_1
0_30_s_youtube_v0_1
1_2_m_academic_v0_1
1_2_m_youtube_v0_1
2_3_m_academic_v0_1
2_3_m_youtube_v0_1
30_60_s_academic_v0_1
30_60_s_youtube_v0_1
1335486 273.7±78.8, min=107, max=638 chat, multi-modal, video lmms-lab/LLaVA-Video-178K
lmms-lab/multimodal-open-r1-8k-verified default 7689 74.0±24.8, min=41, max=214 grpo, vision, 🔥 lmms-lab/multimodal-open-r1-8k-verified
lvjianjin/AdvertiseGen default 97484 130.9±21.9, min=73, max=232 text-generation, 🔥 shibing624/AdvertiseGen
mapjack/openwebtext_dataset default huge dataset - pretrain, zh, quality -
modelscope/DuReader_robust-QG default 17899 242.0±143.1, min=75, max=1416 text-generation, 🔥 -
modelscope/MathR default
clean
6089 188.7±75.3, min=64, max=3341 qa, math -
modelscope/MathR-32B-Distill data 25921 209.4±63.1, min=121, max=3407 qa, math -
modelscope/chinese-poetry-collection default 1710 58.1±8.1, min=31, max=71 text-generation, poetry -
modelscope/clue cmnli 391783 81.6±16.0, min=54, max=157 text-generation, classification clue
modelscope/coco_2014_caption train
validation
454617 389.6±68.4, min=70, max=587 chat, multi-modal, vision, 🔥 -
modelscope/gsm8k main 7473 88.6±21.6, min=41, max=241 qa, math -
open-r1/DAPO-Math-17k-Processed all 17398 122.3±65.2, min=41, max=1517 math, rlvr open-r1/DAPO-Math-17k-Processed
open-r1/verifiable-coding-problems-python default 35735 559.0±255.2, min=74, max=6191 grpo, code open-r1/verifiable-coding-problems-python
open-r1/verifiable-coding-problems-python-10k default 1800 581.6±233.4, min=136, max=2022 grpo, code open-r1/verifiable-coding-problems-python-10k
open-r1/verifiable-coding-problems-python-10k_decontaminated default 1574 575.7±234.3, min=136, max=2022 grpo, code open-r1/verifiable-coding-problems-python-10k_decontaminated
open-r1/verifiable-coding-problems-python_decontaminated default 27839 561.9±252.2, min=74, max=6191 grpo, code open-r1/verifiable-coding-problems-python_decontaminated
open-thoughts/OpenThoughts-114k default 113957 413.2±186.9, min=265, max=13868 chat, sft, cot, r1 open-thoughts/OpenThoughts-114k
swift/self-cognition default
qwen3
empty_think
108 58.9±20.3, min=32, max=131 chat, self-cognition, 🔥 modelscope/self-cognition
sentence-transformers/stsb default
positive
generate
reg
5748 21.0±0.0, min=21, max=21 similarity, 🔥 sentence-transformers/stsb
shenweizhou/alpha-umi-toolbench-processed-v2 backbone
caller
planner
summarizer
huge dataset - chat, agent, 🔥 -
simpleai/HC3 finance
finance_cls
medicine
medicine_cls
11021 296.0±153.3, min=65, max=2267 text-generation, classification, 🔥 Hello-SimpleAI/HC3
simpleai/HC3-Chinese baike
baike_cls
open_qa
open_qa_cls
nlpcc_dbqa
nlpcc_dbqa_cls
finance
finance_cls
medicine
medicine_cls
law
law_cls
psychology
psychology_cls
39781 179.9±70.2, min=90, max=1070 text-generation, classification, 🔥 Hello-SimpleAI/HC3-Chinese
speech_asr/speech_asr_aishell1_trainsets train
validation
test
141600 40.8±3.3, min=33, max=53 chat, multi-modal, audio -
swift/A-OKVQA default 18201 43.5±7.9, min=27, max=94 multi-modal, en, vqa, quality HuggingFaceM4/A-OKVQA
swift/ChartQA default 28299 36.8±6.5, min=26, max=74 en, vqa, quality HuggingFaceM4/ChartQA
swift/Chinese-Qwen3-235B-2507-Distill-data-110k-SFT default 110000 72.1±60.9, min=29, max=2315 🔥, distill, sft -
swift/Chinese-Qwen3-235B-Thinking-2507-Distill-data-110k-SFT default 110000 72.1±60.9, min=29, max=2315 🔥, distill, sft, cot, r1, thinking -
swift/GRIT caption
grounding
vqa
huge dataset - multi-modal, en, caption-grounding, vqa, quality zzliang/GRIT
swift/GenQA default huge dataset - qa, quality, multi-task tomg-group-umd/GenQA
swift/Infinity-Instruct 3M
7M
0625
Gen
7M_domains
huge dataset - qa, quality, multi-task BAAI/Infinity-Instruct
swift/Mantis-Instruct birds-to-words
chartqa
coinstruct
contrastive_caption
docvqa
dreamsim
dvqa
iconqa
imagecode
llava_665k_multi
lrv_multi
multi_vqa
nextqa
nlvr2
spot-the-diff
star
visual_story_telling
988115 619.9±156.6, min=243, max=1926 chat, multi-modal, vision -
swift/MideficsDataset default 3800 201.3±70.2, min=60, max=454 medical, en, vqa WinterSchool/MideficsDataset
swift/Multimodal-Mind2Web default 1009 293855.4±331149.5, min=11301, max=3577519 agent, multi-modal osunlp/Multimodal-Mind2Web
swift/OCR-VQA default 186753 32.3±5.8, min=27, max=80 multi-modal, en, ocr-vqa howard-hou/OCR-VQA
swift/OK-VQA_train default 9009 31.7±3.4, min=25, max=56 multi-modal, en, vqa, quality Multimodal-Fatima/OK-VQA_train
swift/OpenHermes-2.5 default huge dataset - cot, en, quality teknium/OpenHermes-2.5
swift/RLAIF-V-Dataset default 83132 99.6±54.8, min=30, max=362 rlhf, dpo, multi-modal, en openbmb/RLAIF-V-Dataset
swift/RedPajama-Data-1T default huge dataset - pretrain, quality togethercomputer/RedPajama-Data-1T
swift/RedPajama-Data-V2 default huge dataset - pretrain, quality togethercomputer/RedPajama-Data-V2
swift/ScienceQA default 16967 101.7±55.8, min=32, max=620 multi-modal, science, vqa, quality derek-thomas/ScienceQA
swift/SlimOrca default 517982 405.5±442.1, min=47, max=8312 quality, en Open-Orca/SlimOrca
swift/TextCaps default
emb
rerank
huge dataset - multi-modal, en, caption, quality HuggingFaceM4/TextCaps
swift/ToolBench default 124345 2251.7±1039.8, min=641, max=9451 chat, agent, multi-round -
swift/VQAv2 default huge dataset - en, vqa, quality HuggingFaceM4/VQAv2
swift/VideoChatGPT Generic
Temporal
Consistency
3206 87.4±48.3, min=31, max=398 chat, multi-modal, video, 🔥 lmms-lab/VideoChatGPT
swift/WebInstructSub default huge dataset - qa, en, math, quality, multi-domain, science TIGER-Lab/WebInstructSub
swift/aya_collection aya_dataset 202364 474.6±1539.1, min=25, max=71312 multi-lingual, qa CohereForAI/aya_collection
swift/chinese-c4 default huge dataset - pretrain, zh, quality shjwudp/chinese-c4
swift/cinepile default huge dataset - vqa, en, youtube, video tomg-group-umd/cinepile
swift/classical_chinese_translate default 6655 349.3±77.1, min=61, max=815 chat, play-ground -
swift/cosmopedia-100k default 100000 1037.0±254.8, min=339, max=2818 multi-domain, en, qa HuggingFaceTB/cosmopedia-100k
swift/dolma v1_7 huge dataset - pretrain, quality allenai/dolma
swift/dolphin flan1m-alpaca-uncensored
flan5m-alpaca-uncensored
huge dataset - en cognitivecomputations/dolphin
swift/github-code default huge dataset - pretrain, quality codeparrot/github-code
swift/gpt4v-dataset default huge dataset - en, caption, multi-modal, quality laion/gpt4v-dataset
swift/llava-data llava_instruct 624255 369.7±143.0, min=40, max=905 sft, multi-modal, quality TIGER-Lab/llava-data
swift/llava-instruct-mix-vsft default 13640 178.8±119.8, min=34, max=951 multi-modal, en, vqa, quality HuggingFaceH4/llava-instruct-mix-vsft
swift/llava-med-zh-instruct-60k default 56649 207.9±67.7, min=42, max=594 zh, medical, vqa, multi-modal BUAADreamer/llava-med-zh-instruct-60k
swift/lnqa default huge dataset - multi-modal, en, ocr-vqa, quality vikhyatk/lnqa
swift/longwriter-6k-filtered default 666 4108.9±2636.9, min=1190, max=17050 long, chat, sft, 🔥 -
swift/medical_zh en
zh
2068589 256.4±87.3, min=39, max=1167 chat, medical -
swift/moondream2-coyo-5M-captions default huge dataset - caption, pretrain, quality isidentical/moondream2-coyo-5M-captions
swift/no_robots default 9485 300.0±246.2, min=40, max=6739 multi-task, quality, human-annotated HuggingFaceH4/no_robots
swift/orca_dpo_pairs default 12859 364.9±248.2, min=36, max=2010 rlhf, quality Intel/orca_dpo_pairs
swift/path-vqa default 19654 34.2±6.8, min=28, max=85 multi-modal, vqa, medical flaviagiammarino/path-vqa
swift/pile-val-backup default 214661 1831.4±11087.5, min=21, max=516620 text-generation, awq mit-han-lab/pile-val-backup
swift/pixelprose default huge dataset - caption, multi-modal, vision tomg-group-umd/pixelprose
swift/refcoco caption
grounding
92430 45.4±3.0, min=37, max=63 multi-modal, en, grounding jxu124/refcoco
swift/refcocog caption
grounding
89598 50.3±4.6, min=39, max=91 multi-modal, en, grounding jxu124/refcocog
swift/sharegpt common-zh
unknow-zh
common-en
194063 820.5±366.1, min=25, max=2221 chat, general, multi-round -
swift/swift-sft-mixture sharegpt
firefly
codefuse
metamathqa
huge dataset - chat, sft, general, 🔥 -
swift/tagengo-gpt4 default 76437 468.1±276.8, min=28, max=1726 chat, multi-lingual, quality lightblue/tagengo-gpt4
swift/train_3.5M_CN default huge dataset - common, zh, quality BelleGroup/train_3.5M_CN
swift/ultrachat_200k default 207843 1188.0±571.1, min=170, max=4068 chat, en, quality HuggingFaceH4/ultrachat_200k
swift/wikipedia default huge dataset - pretrain, quality wikipedia
tany0699/garbage265 default 132673 39.0±0.0, min=39, max=39 cls, 🔥, multi-modal -
tastelikefeet/competition_math default 12000 101.9±87.3, min=36, max=1683 qa, math -
- default huge dataset - pretrain, quality tiiuae/falcon-refinedweb
wyj123456/GPT4all default 806199 97.3±20.9, min=62, max=414 chat, general -
wyj123456/code_alpaca_en default 20022 99.3±57.6, min=30, max=857 chat, coding sahil2801/CodeAlpaca-20k
wyj123456/finance_en default 68912 264.5±207.1, min=30, max=2268 chat, financial ssbuild/alpaca_finance_en
wyj123456/instinwild default
subset
103695 125.1±43.7, min=35, max=801 chat, general -
wyj123456/instruct default 888970 271.0±333.6, min=34, max=3967 chat, general -
zouxuhong/Countdown-Tasks-3to4 default 490364 126.6±2.0, min=122, max=130 math -