[Feature]: vllm-ascend不支持llmcompressor生成的量化权重

🚀 The feature, motivation and pitch

1、获取 llmcompressor 量化模型
安装llmcompressor：

pip install llmcompressor

下载校准集（https://huggingface.co/datasets/HuggingFaceH4/no_robots）
克隆 https://github.com/vllm-project/llm-compressor.git
用 llm-compressor/examples/quantization_w8a8_int8/llama3_example.py 脚本量化模型
2、用 vllm 离线推理脚本推理量化模型
根据 https://vllm-ascend.readthedocs.io/en/latest/installation.html 构建 docker 容器环境，安装 vllm 和 vllm-ascend。
使用如下推理脚本测试

from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The future of AI is", ] sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # The first run will take about 3-5 mins (10 MB/s) to download models # llm = LLM(model="/data/models/llama3-8b-instruct") llm = LLM(model="/data/models/llama3-8b-instruct-W8A8-Dynamic-Per-Token-llmcompressor") outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

3.结果是目前不支持

Alternatives

No response

Additional context

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: vllm-ascend不支持llmcompressor生成的量化权重 #547

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: vllm-ascend不支持llmcompressor生成的量化权重 #547

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions