Record: SP8192 + VarLen Attention + LoRA TTT + Fused MLP — val_bpb 1.0777 (3-seed mean)#1540
Conversation
…MLP — val_bpb 1.0777 (3-seed mean)
| I may be missing something, but there looks to be one remaining Issue #1017 review-risk here. In the VarLen + doc-TTT path, _build_ttt_global_batches sorts validation documents by length before batching via: global_doc_entries = sorted(doc_entries, key=lambda x: x[1][1]) My reading of Issue #1017 is that validation order should be preserved, and this seems like a reordering of the validation set before scoring. Even if the LoRA state is reset per batch and the score may be invariant in practice, I think this is still worth clarifying because under a strict reading it is no longer the default validation order. If I am missing something about why this remains guideline-clean, a short note in the PR would help reviewers a lot. |
Good question! The sort is just for batching efficiency so documents of similar length get grouped together and there's less padding waste. It doesn't affect any document's score though. The LoRA adapter gets reset to zero before every batch (reusable_lora.reset()), so there's no state carried between batches. Each document's BPB is computed completely independently of what other docs happen to be in the same batch or what order the batches run in. The final val_bpb is just a global sum of per token losses divided by total bytes, which is order invariant. PR #1530 (samacqua) uses the same length sorting approach for doc independent LoRA TTT. I can add a clarifying comment in the code if that would help reviewers. |
…RA TTT doc-independent legal; BPB bug alert - PR openai#1541 (bigbag, 1.07785): Improved Parallel Residuals cross-lane + Muon 0.97 — open, hash embed flag pending - PR openai#1540 (aryanbhosale, 1.0777): VarLen Attention + Doc-Independent LoRA TTT rank-96 (score-first, resets per batch) — appears legal - PR openai#1539 confirmed illegal (Pre-Quant AdamW TTT, same ruling as openai#771) - PR openai#1545 BPB double-counting bug: real score ~1.028 claim is ~1.18 actual - PR openai#758 effectively dead: TTT contradiction + unnormalized n-gram both flagged - Session 10 lessons: MATRIX_LR=0.03 pairs with Muon 0.97; doc-independent LoRA TTT is adoptable - No merged SOTA change (still 1.0810); target remains ≤1.0760 https://claude.ai/code/session_01LgqwEDyFnyHsBbyJiSFUjK
Community Review — SP8192 + VarLen Attention + LoRA TTT + Fused MLPThanks @aryanbhosale — this is a substantial novel piece of work. I want to flag one import-path issue I hit during the smoke test, then a deeper compliance read on the LoRA TTT path. What I found in the code (head SHA
Import issue. The top of the decoded payload does This is a CPU-stub limitation, not a PR defect — FA3 is available on H100s where this script is intended to run. But it does mean no CPU import/HP/forward verification from my side; the structural review below is static-code only. Doc-independent TTT & LoRA compliance — the actual question. Per-batch independent LoRA slots are an interesting answer to the Issue #1336 / #1017 causal-dependence question: if the LoRA adapter that scores token Three things I wasn't able to fully verify from a single static read:
Questions:
Verdict: LOOKS INTERESTING, NEEDS DEEPER AUDIT — I'm not ready to call this clean or flag it without the answers above, and the doc-independent framing is a plausible path to a legal TTT variant that the community SLOT cluster could borrow from. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: HOLD pending author clarification on the per-slot / per-doc scoring-before-adaptation ordering in Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL due to |
Record: SP8192 + VarLen Attention + Doc-Independent LoRA TTT + Fused MLP
val_bpb = 1.0777 (3-seed mean, std 0.0003) | ~15.99 MB | 8×H100 SXM
3-Seed Results
Merged SOTA (PR #1493): 1.0810. Delta: −0.0082 nats. Clears 0.005 by 0.0032.
Novel: Fused Triton MLP via importlib Wrapper
Integrates PR #1523's Triton TMA fused MLP into PR #1536's VarLen + LoRA TTT stack using an importlib-based code loader that solves Triton's
inspect.getsourcelines()requirement for compressed wrappers. Writes decompressed code to temp file, loads viaimportlib.util.spec_from_file_locationas__main__.Full Stack
Compliance (Track B)
Score-first LoRA TTT. No SLOT, no hash embed, no pre-quant TTT, no n-gram. All conditions from Issue #1017 satisfied.
Doc sorting note:
_build_ttt_global_batchessorts documents by length for batching efficiency only. The LoRA adapter is reset to zero before every batch (reusable_lora.reset()), so there is no inter-batch state. Each document's BPB is computed independently of batch composition and processing order. The final val_bpb is a global sum of per-token losses divided by total bytes, which is order invariant. Same approach as PR #1530.Credits
PR #1536 @dexhunter, PR #1523 @abaybektursun, PR #1530 @samacqua, PR #1394 @clarkkev, PR #1493 @bigbag