[Doc][0.7.3] Add performance tuning docs#878
[Doc][0.7.3] Add performance tuning docs#878wangxiyuan merged 11 commits intovllm-project:v0.7.3-devfrom
Conversation
Signed-off-by: 申杉杉 <467638484@qq.com>
TLDRhttps://docs.google.com/spreadsheets/d/1Z6KIp54n2NUhubMPImQrVtKnV5ZaUyu0FwP9NYtXdrA/edit?usp=sharing BaselinePrepareResults: vLLM Ascend v0.7.3Results: vLLM Ascend v0.7.3 + MindIE TurboOptimizedPrepareResults: vLLM Ascend v0.7.3 + Optimized PythonPrepare torch_npu and torchResults: vLLM Ascend v0.7.3 + Optimized Python / Torch / Torch NPUPrepareResults: vLLM Ascend v0.7.3 + Optimized Python / Torch / Torch NPU + MindIE TurboPrepare TCMallocResults: vLLM Ascend v0.7.3 + Optimized Python / Torch / Torch NPU + MindIE Turbo + TCMallocPrepare PYTORCH_NPU_ALLOC_CONFResults: vLLM Ascend v0.7.3 + Optimized Python / Torch / Torch NPU + MindIE Turbo + TCMalloc + PYTORCH_NPU_ALLOC_CONF="max_split_size_mb:250"Prepare PYTORCH_NPU_ALLOC_CONFResults: vLLM Ascend v0.7.3 + Optimized Python / Torch / Torch NPU + MindIE Turbo + TCMalloc + PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"Prepare TASK_QUEUE_ENABLEResults: vLLM Ascend v0.7.3 + Optimized Python / Torch / Torch NPU + MindIE Turbo + TCMalloc + TASK_QUEUE_ENABLE=2Prepare CPU_AFFINITY_CONFResults: vLLM Ascend v0.7.3 + Optimized Python / Torch / Torch NPU + MindIE Turbo + TCMalloc + CPU_AFFINITY_CONF=1Prepare Ascend SchedulerResults: vLLM Ascend v0.7.3 + Optimized Python / Torch / Torch NPU + MindIE Turbo + TCMalloc + TASK_QUEUE_ENABLE=2 + Ascend Scheduler |
wangxiyuan left a comment
There was a problem hiding this comment.
The whole doc missed the content that why the step is required and how it works. Please add more content. Thanks.
docs/source/developer_guide/performance/optimization_and_tuning.md Outdated Show resolved Hide resolved
docs/source/developer_guide/performance/optimization_and_tuning.md Outdated Show resolved Hide resolved
docs/source/developer_guide/performance/optimization_and_tuning.md Outdated Show resolved Hide resolved
docs/source/developer_guide/performance/optimization_and_tuning.md Outdated Show resolved Hide resolved
docs/source/developer_guide/performance/optimization_and_tuning.md Outdated Show resolved Hide resolved
| | ||
| ## Optimizations | ||
| | ||
| ### 1. Compiler Optimization |
There was a problem hiding this comment.
This step is not correct? AFAIK, users should install the package first, then run models, then recompile the package again. The guide below just installs the compiled package, is that enough?
There was a problem hiding this comment.
The compilation is too complex and too time-consuming for users, so after discussing with @Yikun , we finally decided to offer twice compiled packages to users directly.
| @Yikun The benchmark result should be in contained in the doc as well. Right? |
| @wangxiyuan Yes, for significant speedup, should be included in doc, others I believe some notes are enough. For specific data, it's different between hardware series, maybe we should hide or just link to this issue? The percentage can be recorded as reference in doc I think |
| I'm fine with this change. Let's merge first. Feel free to update the content if needed later |

What this PR does / why we need it?
Add performance tuning docs.
Does this PR introduce any user-facing change?
How was this patch tested?