Skip to content

Conversation

@yiakwy-xpu-ml-framework-team
Copy link
Contributor

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Sep 5, 2025

Pull Request Template

Description

fix version info

截屏2025-09-05 17 28 45

update pyproject.toml

pyproject.toml should add building tool and torch info otherwise an exception will be thrown

Type of Change

Please check the relevant option(s):

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance optimization
  • CUDA kernel improvement
  • Code refactoring

Related Issues

Please link any related issues:

  • Fixes #(issue number)
  • Related to #(issue number)

Changes Made

Please describe the changes you made:

Code Changes

  • Modified Python API
  • Updated CUDA kernels
  • Changed build system
  • Updated dependencies

Documentation

  • Updated README
  • Updated API documentation
  • Added examples
  • Updated benchmarks

Testing

Please describe the tests you ran to verify your changes:

  • Existing tests pass: python -m pytest tests/ -v
  • Added new tests for new functionality
  • Benchmarks show no performance regression
  • Tested on multiple GPU architectures (if applicable)

Test Configuration

  • OS: ubuntu 22.04
  • Python: 3.12
  • PyTorch: 2.8
  • CUDA: 12.8
  • GPU: H800

Performance Impact

If this change affects performance, please provide benchmarks:

Before

# Benchmark results before your changes 

After

# Benchmark results after your changes 

Breaking Changes

If this PR introduces breaking changes, please describe:

  • What breaks
  • How users can migrate their code
  • Why the breaking change is necessary

Checklist

Please check all that apply:

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

CUDA-specific (if applicable)

  • CUDA kernels compile without warnings
  • Tested on SM 8.0+ architectures
  • Tested on SM 9.0+ architectures
  • Memory usage has been profiled
  • No memory leaks detected

Additional Notes

Any additional information that reviewers should know:

Facilitate tests in Hopper 95% accuracy, but failed to pass test:

final results

test command

python benchmarks/forward_equivalence.py 2>&1 | tee accuracy.log

Screenshots (if applicable)

If your changes include visual elements or performance improvements, please add screenshots or graphs.

facilitate 95% accuracy pass
@yiakwy-xpu-ml-framework-team
Copy link
Contributor Author

@LoserCheems could you have a look at it ? B.t.w accuracy test in hopper platform failed.

@LoserCheems
Copy link
Collaborator

I'm very sorry @yiakwy-xpu-ml-framework-team, this was my mistake. I wrongly wrote ; instead of ,.😵

@LoserCheems
Copy link
Collaborator

Let's merge

@LoserCheems LoserCheems merged commit c432cc0 into flash-algo:main Sep 5, 2025
@LoserCheems LoserCheems mentioned this pull request Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants