Skip to content

Conversation

@ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Apr 30, 2025

This PR does two major changes:

  1. Adds a branch with intermediate allocators when reducing over a contiguous dimension that is SIMD friendly.
  2. Allocate output buffer aligned with input dimensions, instead of always allocating C-order.

Performance is now comparable or better than numpy


📚 Documentation preview 📚: https://pytensor--1385.org.readthedocs.build/en/1385/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant