Skip to content

Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + TTT 5ep + N-gram Tilt + Hessian SDClip — val_bpb 1.07730#1557

Open
ndokutovich wants to merge 1 commit intoopenai:mainfrom
ndokutovich:submission-s8-ha
Open

Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + TTT 5ep + N-gram Tilt + Hessian SDClip — val_bpb 1.07730#1557
ndokutovich wants to merge 1 commit intoopenai:mainfrom
ndokutovich:submission-s8-ha

Conversation

@ndokutovich
Copy link
Copy Markdown

Record: SP8192 + Improved Parallel Residuals + Score-First TTT + Causal N-gram Tilt + Hessian SDClip

val_bpb = 1.07730 (3-seed mean, std 0.00040) | ~15.97 MB | 8xH100 SXM

3-Seed Results

Seed Sliding BPB TTT BPB Artifact
42 1.07880 1.07684 15,965,495
314 1.07959 1.07748 15,965,495
999 1.07963 1.07757 15,965,495
Mean 1.07934 1.07730

Merged SOTA (PR #1493): 1.0810. Delta: -0.00370 nats.

Techniques

  • Architecture: SP8192, 11L x 512d, 8H/4KV, MLP 4x, improved parallel residuals (L7+)
  • Training: Muon 0.97 (row-normalized), Matrix LR 0.03, EMA 0.997, 3-layer depth recurrence (L3-5)
  • Quantization: GPTQ int6 (attn+MLP) + int8 (embeddings) + Hessian-Aware SDClip (lambda=0.175)
  • Eval: Score-first TTT (SGD, 5 epochs, lr=0.005) + Causal n-gram tilt (beta=2.0, agree=0.1)
  • Compression: Brotli quality=11

Compliance

Attribution

Compute

Funded by OpenAI Advanced Competitor grant ($500 RunPod credit). 8xH100-SXM, ~3 runs for 3 seeds.

…t TTT 5ep + Causal N-gram Tilt + Hessian SDClip — val_bpb 1.07730 (3-seed mean)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant