Non-Record: JEPA-NTP Auxiliary Losses (Negative Result) by sidhanth97 · Pull Request #1556 · openai/parameter-golf

sidhanth97 · 2026-04-12T01:11:13Z

This PR adds a non-record submission under:

records/track_non_record_16mb/2026-04-11_JEPA_NTP_Auxiliary_Losses_Negative_Result

Negative result: JEPA-style auxiliary losses (spectral variance floor + cosine-MSE latent prediction from LeWM/LeWorldModel) do not improve next-token prediction in the parameter golf regime.

Results (1 epoch, 2xRTX PRO 6000 Blackwell, torch.compile enabled)

Experiment	val_bpb (post-quant)	Throughput	Int8+zlib
Baseline	1.4326	2,119K tok/s	9.90 MB
JEPA exp4 (spectral + cosine-MSE, layers 2-5)	1.4352 (+0.003)	1,703K tok/s	9.90 MB
MQA + Value Embeds	1.4439 (+0.011)	2,201K tok/s	9.68 MB
MQA + VE + 3x MLP	1.4364 (+0.004)	1,936K tok/s	12.12 MB

Key findings

Spectral floor loss mechanically prevents dimensional collapse (effective rank 424->445/512) but this doesn't translate to better language modeling at 17M params
Cosine-MSE predictor was essentially inert (loss values 0.001-0.003, negligible gradient contribution)
An initial apparent improvement was traced to a torch.compile confound (comparing compiled JEPA vs uncompiled baseline)
MQA at this scale is harmful -- 4 KV heads are too few to spare one
Includes full experimental framework with WandB diagnostics for reproducibility

Submission contents

README.md -- detailed results, methodology, and analysis
submission.json -- leaderboard metadata
train_jepa_ntp.py -- JEPA training script with spectral + cosine-MSE losses
train_modded.py -- MQA + Value Embeddings training script
config.py -- experiment configurations
losses/ -- spectral variance floor, cosine-MSE loss implementations
metrics/ -- effective rank, singular spectrum, latent curvature diagnostics

JEPA-style auxiliary losses (spectral variance floor + cosine-MSE latent prediction from LeWM) do not improve next-token prediction in the parameter golf regime. Baseline val_bpb 1.4326 beats all variants. Includes full experimental framework with losses, metrics, and WandB diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sidhanth97 changed the title ~~Add non-record 16MB submission: JEPA-NTP Auxiliary Losses (Negative Result)~~ Non-Record: JEPA-NTP Auxiliary Losses (Negative Result) Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Record: JEPA-NTP Auxiliary Losses (Negative Result)#1556

Non-Record: JEPA-NTP Auxiliary Losses (Negative Result)#1556
sidhanth97 wants to merge 1 commit intoopenai:mainfrom
sidhanth97:submission/jepa-ntp-experiments

sidhanth97 commented Apr 12, 2026

Labels

1 participant