Skip to content

Non-record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.209735#1553

Open
Abhishek8108 wants to merge 1 commit intoopenai:mainfrom
Abhishek8108:submission/gdn-hybrid-architecture
Open

Non-record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.209735#1553
Abhishek8108 wants to merge 1 commit intoopenai:mainfrom
Abhishek8108:submission/gdn-hybrid-architecture

Conversation

@Abhishek8108
Copy link
Copy Markdown

Summary

Non-record submission introducing the GDN-Hybrid architecture: a Griffin-style hybrid that replaces the transformer backbone with Gated DeltaNet (delta-rule linear recurrence) + Sliding Window Attention.

Corrected val_bpb: 1.209735 (3-artifact mean, stride=512)

Layout: [GDN×5] → [SWA] → [GDN×5] → [SWA_shared] — 33.86M params, SP1024, int6 GPTQ + zstd-22. No TTT. Fixed predictor.

BPB Correction (from closed PR #1545)

The original submission was closed after a double-counting bug was found in build_sentencepiece_luts: the leading-space byte was included in base_bytes AND added again conditionally in the eval loop, inflating byte_count and producing an artificially low BPB.

Fix: remove +1 from base_bytes to match the canonical train_gpt.py. The training itself was unaffected. The three saved artifacts were rescored with the corrected formula (EVAL_STRIDE=512) to produce the numbers above. Full results in rescore_results.tsv.

Compliance

  • Fixed predictor, no eval-time adaptation
  • TTT_ENABLED=0
  • No SLOT, RLS, or n-gram mixer at eval
  • GPTQ calibration on model-generated sequences only (no val data)
  • All artifacts < 16MB ✓
  • Training: 590s on 8×H100 SXM per seed ✓
…09735 Moves GDN-Hybrid to track_non_record_16mb with corrected BPB calculation. Fixes double-count bug in build_sentencepiece_luts (leading-space +1 was counted in base_bytes and again in the eval loop). Corrected 3-artifact mean: 1.209735 BPB (stride=512 rescore of saved artifacts). Refs PR openai#1545.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant