Transformer Based Learning of Fundamental Linear Algebra Operations

This project explores the ability of transformer models to learn core linear algebra operations directly from tokenized matrix inputs. Using a nanoGPT-style architecture and the P1000 floating point tokenization scheme, we train transformers on three categories of tasks: classification, regression, and sequence-to-sequence prediction.

Transformers are trained to perform fundamental linear algebra operations on square matrices with entries uniformly sampled from [-1, 1]. The models achieve perfect accuracy on classification tasks, near-perfect performance on scalable regression tasks, and high token-level accuracy on sequence-to-sequence RREF prediction. For detailed analysis and experimental methodology, see the report.

Results Summary

Classification

Transformers achieve 100% accuracy on both invertibility and symmetry detection across all tested matrix dimensions.

Task	6×6	8×8	11×11
Invertibility	100%	100%	100%
Symmetry	100%	100%	100%

Regression

Performance measured with relative error ≤ 10% threshold.

Property	2×2	3×3	4×4	6×6	7×7	9×9	11×11
Determinant	96.3%	95.5%	86.6%	—	—	—	—
Frobenius Norm	—	—	—	94.6%	95.7%	95.4%	96.0%
Trace	—	—	—	95.1%	93.9%	95.4%	92.7%

Gaussian Elimination (RREF Prediction)

Sequence-to-sequence transformer predicts row-reduced echelon form with high token-level accuracy.

Matrix Size	Token Accuracy
4×4	96.72%
5×5	94.60%

Architecture

Base: nanoGPT
Tokenization: P1000 scheme
Input: Flattened matrix → [+, 314, E-2, ...] tokens
Output Head:
- Classification: Linear(embd → 2)
- Regression: Linear(embd → 1)
- Seq2Seq (RREF): Full transformer decoder

Model Sizes

Task	Layers	Heads	Emb Dim	Params
Classification	1	2	16	~17.8K
Regression	6	8	256	~4.96M
Gaussian Elimination	6	8	432	~14.2M

Training Details

Curriculum Learning: Start small (2×2), scale up
Normalization: Targets normalized by train set mean/std
Loss: MSE for regression, Cross-Entropy for classification
Accuracy Metric: Relative error ≤ 10% counts as correct
Hardware: Trained on GPU T4 x2

Usage

1. Create dataset:

python3 create_dataset.py --num_samples 100000 --d 4 --method "row_echelon"

Available options:

num_samples: Dataset size
d: Dimension of the matrices
method: invertible, symmetric, determinant, frobenius, trace, row_echelon
range: Range of values of the matrix elements (default is [-1, 1])

2. Train the model depending on the task:

python3 train_classification.py python3 train_regression.py python3 train_gaussian.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer Based Learning of Fundamental Linear Algebra Operations

Results Summary

Classification

Regression

Gaussian Elimination (RREF Prediction)

Architecture

Model Sizes

Training Details

Usage

About

Uh oh!

Releases

Packages

Languages

License

pavlosdais/Transformers-Linear-Algebra

Folders and files

Latest commit

History

Repository files navigation

Transformer Based Learning of Fundamental Linear Algebra Operations

Results Summary

Classification

Regression

Gaussian Elimination (RREF Prediction)

Architecture

Model Sizes

Training Details

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages