Skip to content

Record: GDN-Hybrid (Gated DeltaNet + Sliding Window Attention) - quantized_bpb 1.02046#1562

Closed
joshkmartinez wants to merge 8 commits intoopenai:mainfrom
joshkmartinez:submission-gdn-hynrid
Closed

Record: GDN-Hybrid (Gated DeltaNet + Sliding Window Attention) - quantized_bpb 1.02046#1562
joshkmartinez wants to merge 8 commits intoopenai:mainfrom
joshkmartinez:submission-gdn-hynrid

Conversation

@joshkmartinez
Copy link
Copy Markdown

Summary

3-Seed Results

Seed Steps EMA BPB Quantized BPB XSA BPB Artifact bytes
42 1864 1.017723 1.026791 1.031731 15,313,984
1337 2239 1.007375 1.016586 1.020691 15,830,308
2024 2241 1.008736 1.017995 1.023138 15,820,201
Mean 1.011278 1.02045733 1.025187 15,654,831.00
Std (sample) 0.00553017

Architecture

This submission uses an SP1024-tokenized GDN-Hybrid backbone with the following high-level structure:

[GDN×5] → SWA → [GDN×5] → SWA_shared

Key components:

  1. SP1024 tokenizer
  2. Gated DeltaNet hybrid backbone
  3. Sliding-window attention side path
  4. MuonEq-R + AdamW training mix
  5. EMA = 0.997
  6. Late QAT threshold = 0.15
  7. GPTQ int6 + zstd-22 packaging

Credits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant