add correct arch info for hopper platform #149

yiakwy-xpu-ml-framework-team · 2025-09-05T09:41:51Z

Pull Request Template

Description

fix version info

update pyproject.toml

pyproject.toml should add building tool and torch info otherwise an exception will be thrown

Type of Change

Please check the relevant option(s):

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance optimization
CUDA kernel improvement
Code refactoring

Related Issues

Please link any related issues:

Fixes #(issue number)
Related to #(issue number)

Changes Made

Please describe the changes you made:

Code Changes

Modified Python API
Updated CUDA kernels
Changed build system
Updated dependencies

Documentation

Updated README
Updated API documentation
Added examples
Updated benchmarks

Testing

Please describe the tests you ran to verify your changes:

Existing tests pass: python -m pytest tests/ -v
Added new tests for new functionality
Benchmarks show no performance regression
Tested on multiple GPU architectures (if applicable)

Test Configuration

OS: ubuntu 22.04
Python: 3.12
PyTorch: 2.8
CUDA: 12.8
GPU: H800

Performance Impact

If this change affects performance, please provide benchmarks:

Before

# Benchmark results before your changes

After

# Benchmark results after your changes

Breaking Changes

If this PR introduces breaking changes, please describe:

What breaks
How users can migrate their code
Why the breaking change is necessary

Checklist

Please check all that apply:

My code follows the project's style guidelines
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published

CUDA-specific (if applicable)

CUDA kernels compile without warnings
Tested on SM 8.0+ architectures
Tested on SM 9.0+ architectures
Memory usage has been profiled
No memory leaks detected

Additional Notes

Any additional information that reviewers should know:

Facilitate tests in Hopper 95% accuracy, but failed to pass test:

test command

python benchmarks/forward_equivalence.py 2>&1 | tee accuracy.log

Screenshots (if applicable)

If your changes include visual elements or performance improvements, please add screenshots or graphs.

yiakwy-xpu-ml-framework-team · 2025-09-05T09:48:09Z

@LoserCheems could you have a look at it ? B.t.w accuracy test in hopper platform failed.

LoserCheems · 2025-09-05T13:32:40Z

I'm very sorry @yiakwy-xpu-ml-framework-team, this was my mistake. I wrongly wrote ; instead of ,.😵

LoserCheems · 2025-09-05T13:33:03Z

Let's merge

add correct arch info for hopper platform

8cbaac2

yiakwy-xpu-ml-framework-team mentioned this pull request Sep 5, 2025

[PERFORMANCE] Accerlerating KV-Cache during Inference #141

Closed

LoserCheems merged commit c432cc0 into flash-algo:main Sep 5, 2025

LoserCheems mentioned this pull request Sep 6, 2025

Installation Error #150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add correct arch info for hopper platform #149

add correct arch info for hopper platform #149

Uh oh!

yiakwy-xpu-ml-framework-team commented Sep 5, 2025 •

edited

Loading

yiakwy-xpu-ml-framework-team commented Sep 5, 2025

LoserCheems commented Sep 5, 2025

LoserCheems commented Sep 5, 2025

Labels

2 participants

add correct arch info for hopper platform #149

add correct arch info for hopper platform #149

Uh oh!

Conversation

yiakwy-xpu-ml-framework-team commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Template

Description

fix version info

update pyproject.toml

Type of Change

Related Issues

Changes Made

Code Changes

Documentation

Testing

Test Configuration

Performance Impact

Before

After

Breaking Changes

Checklist

CUDA-specific (if applicable)

Additional Notes

test command

Screenshots (if applicable)

yiakwy-xpu-ml-framework-team commented Sep 5, 2025

LoserCheems commented Sep 5, 2025

LoserCheems commented Sep 5, 2025

Labels

2 participants

yiakwy-xpu-ml-framework-team commented Sep 5, 2025 •

edited

Loading