Skip to content

pavlosdais/Transformers-Linear-Algebra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Based Learning of Fundamental Linear Algebra Operations


This project explores the ability of transformer models to learn core linear algebra operations directly from tokenized matrix inputs. Using a nanoGPT-style architecture and the P1000 floating point tokenization scheme, we train transformers on three categories of tasks: classification, regression, and sequence-to-sequence prediction.

Transformers are trained to perform fundamental linear algebra operations on square matrices with entries uniformly sampled from [-1, 1]. The models achieve perfect accuracy on classification tasks, near-perfect performance on scalable regression tasks, and high token-level accuracy on sequence-to-sequence RREF prediction. For detailed analysis and experimental methodology, see the report.


Results Summary

Classification

Transformers achieve 100% accuracy on both invertibility and symmetry detection across all tested matrix dimensions.

Task 6×6 8×8 11×11
Invertibility 100% 100% 100%
Symmetry 100% 100% 100%

Regression

Performance measured with relative error ≤ 10% threshold.

Property 2×2 3×3 4×4 6×6 7×7 9×9 11×11
Determinant 96.3% 95.5% 86.6%
Frobenius Norm 94.6% 95.7% 95.4% 96.0%
Trace 95.1% 93.9% 95.4% 92.7%

Gaussian Elimination (RREF Prediction)

Sequence-to-sequence transformer predicts row-reduced echelon form with high token-level accuracy.

Matrix Size Token Accuracy
4×4 96.72%
5×5 94.60%

Architecture

  • Base: nanoGPT
  • Tokenization: P1000 scheme
  • Input: Flattened matrix → [+, 314, E-2, ...] tokens
  • Output Head:
    • Classification: Linear(embd → 2)
    • Regression: Linear(embd → 1)
    • Seq2Seq (RREF): Full transformer decoder

Model Sizes

Task Layers Heads Emb Dim Params
Classification 1 2 16 ~17.8K
Regression 6 8 256 ~4.96M
Gaussian Elimination 6 8 432 ~14.2M

Training Details

  • Curriculum Learning: Start small (2×2), scale up
  • Normalization: Targets normalized by train set mean/std
  • Loss: MSE for regression, Cross-Entropy for classification
  • Accuracy Metric: Relative error ≤ 10% counts as correct
  • Hardware: Trained on GPU T4 x2

Usage

1. Create dataset:

python3 create_dataset.py --num_samples 100000 --d 4 --method "row_echelon" 

Available options:

  • num_samples: Dataset size
  • d: Dimension of the matrices
  • method: invertible, symmetric, determinant, frobenius, trace, row_echelon
  • range: Range of values of the matrix elements (default is [-1, 1])

2. Train the model depending on the task:

python3 train_classification.py python3 train_regression.py python3 train_gaussian.py