[Cache] Native mamba & hybrid cache by Cyrilvallez · Pull Request #44950 · huggingface/transformers

Cyrilvallez · 2026-03-23T16:25:13Z

What does this PR do?

As per the title. This PR finally makes mamba layer caches first class citizen, and adds native support for them.

It supports the following layers combinations:

all mamba layers
alternating attention layer/mamba layer
layers that are BOTH mamba and attention (zamba models)

For this, it adds the 2 following layer classes:

MambaLayer
MambaAndAttentionLayer (combining both)

Everything integrates smoothly with the existing cache machinery in the case of hybrid attention/mamba archs, i.e. functions such as get_seq_length, get_mask_sizes (used for mask creation notably) will always look at attention layers.

HuggingFaceDocBuilderDev · 2026-03-23T16:48:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-03-24T21:08:50Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, falcon_h1, falcon_mamba, granitemoehybrid, jamba, lfm2, lfm2_moe, mamba, mamba2, nemotron_h

Cyrilvallez · 2026-03-24T21:12:41Z

run-slow: mamba2 zamba2 granitemoehybrid falcon_h1 lfm2 lfm2_moe qwen3_5 bamba mamba nemotron_h qwen3_next zamba jamba qwen3_5_moe falcon_mamba

github-actions · 2026-03-24T21:13:56Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/bamba", "models/falcon_h1", "models/falcon_mamba", "models/granitemoehybrid", "models/jamba", "models/lfm2", "models/lfm2_moe", "models/mamba", "models/mamba2", "models/nemotron_h", "models/qwen3_5", "models/qwen3_5_moe", "models/qwen3_next", "models/zamba", "models/zamba2"]
quantizations: []

github-actions · 2026-03-24T22:18:30Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	f5ffd694	workflow commit (merge commit)
PR	fc27c37f	branch commit (from PR)
main	28af8184	base commit (on `main`)

Model CI Report

❌ 14 new failed tests from this PR 😭

bamba:
tests/models/bamba/test_modeling_bamba.py::BambaModelIntegrationTest::test_simple_batched_generate_with_padding (❌ ⟹ ❌)
tests/models/bamba/test_modeling_bamba.py::BambaModelIntegrationTest::test_simple_generate (❌ ⟹ ❌)
falcon_h1:
tests/models/falcon_h1/test_modeling_falcon_h1.py::FalconH1ModelIntegrationTest::test_falcon_h1_hard (❌ ⟹ ❌)
falcon_mamba:
tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_4bit (❌ ⟹ ❌)
tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_fp16 (❌ ⟹ ❌)
tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_torch_compile (❌ ⟹ ❌)
mamba:
tests/models/mamba/test_modeling_mamba.py::MambaIntegrationTests::test_compile_mamba_cache (✅ ⟹ ❌)
mamba2:
tests/models/mamba2/test_modeling_mamba2.py::Mamba2IntegrationTest::test_batched_equivalence_with_cache (❌ ⟹ ❌)
tests/models/mamba2/test_modeling_mamba2.py::Mamba2IntegrationTest::test_batched_equivalence_without_cache (❌ ⟹ ❌)
tests/models/mamba2/test_modeling_mamba2.py::Mamba2IntegrationTest::test_simple_generate (❌ ⟹ ❌)
zamba:
tests/models/zamba/test_modeling_zamba.py::ZambaModelTest::test_cpu_offload (✅ ⟹ ❌)
tests/models/zamba/test_modeling_zamba.py::ZambaModelTest::test_disk_offload_bin (✅ ⟹ ❌)
tests/models/zamba/test_modeling_zamba.py::ZambaModelTest::test_disk_offload_safetensors (✅ ⟹ ❌)
zamba2:
tests/models/zamba2/test_modeling_zamba2.py::Zamba2ModelTest::test_cpu_offload (✅ ⟹ ❌)

Cyrilvallez added 2 commits March 23, 2026 17:22

add Cache and test on Mamba

db8e4ff

fix

9d52598

Cyrilvallez added 27 commits March 23, 2026 17:52

fix

659beee

fix

29b91ab

fix

3e02650

fix

fb88345

final fix

1aeddfa

test hybrid with jamba

35db152

fix tests

a50293c

fixes

1607fe2

fix

ddc198a

fix

bae4a78

fix

984b578

combine both types + zambas

cac5d17

add config mapèping

bd8f9e9

adjust tests

b2f1bb8

fix

7795808

fix

18685c6

fix

fcec6bc

more models

b1df43f

final mambas

fdb1579

config

b156ade

finalize almost everything

330e397

simplify tests

b60c6f5

simplify tests further

0e8ca28

fix tests

c2ddcf9

oupsi

b23708f

fix

18feef2

fix broken no_split_modules

ce92f3d

Cyrilvallez and others added 14 commits March 24, 2026 17:16

fix

ab4472b

fixes

08e6265

fix

66d0716

fix

c86f9bb

fixes

ba1b7d6

add layer type

1785621

oupsi

f684133

fix

8ca92a9

style

0d991d7

fix

670d09a

Merge branch 'main' into clean-mamba-cache

bc99c9a

fixes

63e0b93

final fix

eb018e7

forgot those qwens

f8a0702

tests

fc27c37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cache] Native mamba & hybrid cache#44950

[Cache] Native mamba & hybrid cache#44950
Cyrilvallez wants to merge 44 commits intomainfrom
clean-mamba-cache

Cyrilvallez commented Mar 23, 2026 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 23, 2026

github-actions bot commented Mar 24, 2026

Cyrilvallez commented Mar 24, 2026

github-actions bot commented Mar 24, 2026

github-actions bot commented Mar 24, 2026

Labels

2 participants

Conversation

Cyrilvallez commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 23, 2026

github-actions bot commented Mar 24, 2026

Cyrilvallez commented Mar 24, 2026

github-actions bot commented Mar 24, 2026

github-actions bot commented Mar 24, 2026

CI Results

Commit Info

Model CI Report

Labels

2 participants

Cyrilvallez commented Mar 23, 2026 •

edited

Loading