This project explores the ability of transformer models to learn core linear algebra operations directly from tokenized matrix inputs. Using a nanoGPT-style architecture and the P1000 floating point tokenization scheme, we train transformers on three categories of tasks: classification, regression, and sequence-to-sequence prediction.
Transformers are trained to perform fundamental linear algebra operations on square matrices with entries uniformly sampled from [-1, 1]. The models achieve perfect accuracy on classification tasks, near-perfect performance on scalable regression tasks, and high token-level accuracy on sequence-to-sequence RREF prediction. For detailed analysis and experimental methodology, see the report.
Transformers achieve 100% accuracy on both invertibility and symmetry detection across all tested matrix dimensions.
| Task | 6×6 | 8×8 | 11×11 |
|---|---|---|---|
| Invertibility | 100% | 100% | 100% |
| Symmetry | 100% | 100% | 100% |
Performance measured with relative error ≤ 10% threshold.
| Property | 2×2 | 3×3 | 4×4 | 6×6 | 7×7 | 9×9 | 11×11 |
|---|---|---|---|---|---|---|---|
| Determinant | 96.3% | 95.5% | 86.6% | — | — | — | — |
| Frobenius Norm | — | — | — | 94.6% | 95.7% | 95.4% | 96.0% |
| Trace | — | — | — | 95.1% | 93.9% | 95.4% | 92.7% |
Sequence-to-sequence transformer predicts row-reduced echelon form with high token-level accuracy.
| Matrix Size | Token Accuracy |
|---|---|
| 4×4 | 96.72% |
| 5×5 | 94.60% |
- Base: nanoGPT
- Tokenization: P1000 scheme
- Input: Flattened matrix →
[+, 314, E-2, ...]tokens - Output Head:
- Classification:
Linear(embd → 2) - Regression:
Linear(embd → 1) - Seq2Seq (RREF): Full transformer decoder
- Classification:
| Task | Layers | Heads | Emb Dim | Params |
|---|---|---|---|---|
| Classification | 1 | 2 | 16 | ~17.8K |
| Regression | 6 | 8 | 256 | ~4.96M |
| Gaussian Elimination | 6 | 8 | 432 | ~14.2M |
- Curriculum Learning: Start small (2×2), scale up
- Normalization: Targets normalized by train set mean/std
- Loss: MSE for regression, Cross-Entropy for classification
- Accuracy Metric: Relative error ≤ 10% counts as correct
- Hardware: Trained on GPU T4 x2
1. Create dataset:
python3 create_dataset.py --num_samples 100000 --d 4 --method "row_echelon" Available options:
- num_samples: Dataset size
- d: Dimension of the matrices
- method:
invertible,symmetric,determinant,frobenius,trace,row_echelon - range: Range of values of the matrix elements (default is [-1, 1])
2. Train the model depending on the task:
python3 train_classification.py python3 train_regression.py python3 train_gaussian.py