Skip to content

[Bug]: v0.12.0rc1跑ds-v3.2偶发性报算子错误后服务中断--KvRmsNormRopeCache的tiling切分有误 #6243

@fool7367

Description

@fool7367

Your current environment

v0.12.0rc1 双机800I A2部署Deepseek V3.2,运行一段时间后报错(前面已成功运行上百条请求)
报错信息如下:

File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_vl.py", line 804, in forward self.exec_kv(kv_no_split, cos, sin, kv_cache, slot_mapping, File "/vllm-workspace/vllm-ascend/vl1m_ascend/attention/sfa v1.py", line 559, in exec_ kv torch_npu.npu_ky_rmsnorm_rope_cache( File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1243, in __call__ return self._op(*args, **kwargs) RuntimeError: npu_kv_rmsnorm_rope_cache:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:47 NPU function error: call aclnnKvRmsNormRopeCache failed, error code is 561002 [ERROR] 2026-01-15-06:35:20 (PID:108300, Device:3, RankID:-1) ERR00100 PTA call acl api failed. EZ9999: Inner Error! EZ9999[PID: 108300] 2026-01-15-06:35:20.425.815 (EZ9999): cos or sin shape is invalid.[FUNC:GetShapeAttrsInfo][FILE:kv_rms_norm_rope_cache_base_tiling.cpp][LINE:291] TraceBack (most recent call last): KvRmsNormRopeCache do tiling failed, ret is -1. Check NnopbaseExecutorDoTiling(executor) failed Check NnopbaseExecutorTilingAndUpdateBinInfo(executor) failed Check NnopbaseExecutorMatchCache(executor) failed Check NnopbaseRunForWorkspace(*executor, workspaceSize) failed 

🐛 Describe the bug

内部文档找到一篇wiki关于pangu72B适配遇到的问题,其中包含此报错,做出如下修改:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions