Recreating every milestone in Machine Learning and Artificial Intelligence โ from Transformers to Perceptrons.
ReplicateAI is an open initiative to rebuild and verify every major paper in ML/AI history,
starting from modern foundation models (2023โ2025) and tracing backward to the origins of AI.
We believe that understanding AI means rebuilding it โ line by line, layer by layer.
โBecause science means reproducibility.โ
- ๐ Goal: Faithfully re-implement influential ML/AI papers with open code, datasets, and experiments
- ๐งฑ Scope: From Qwen2.5 (2025) to Perceptron (1958)
- ๐ง Approach: Reverse timeline โ start with Foundation Models, then trace history backward
- ๐งพ Output: Each paper becomes a self-contained, reproducible module with reports and experiments
The golden age of open-source foundation models.
| Year | Paper / Model | Organization | Why It Matters | Replicate Goal | Status |
|---|---|---|---|---|---|
| 2025 | Qwen2.5 | Alibaba | Fully open multimodal model (text + image) | Rebuild text/image pipeline | ๐งญ Planned |
| 2025 | DeepSeek-V2 | DeepSeek | MoE + RLHF efficiency breakthrough | Replicate expert routing and reward pipeline | ๐งญ Planned |
| 2025 | Claude 3 Family | Anthropic | Leading alignment via Constitutional AI | Explore rule-based alignment principles | ๐งญ Planned |
| 2024 | LLaMA 3 | Meta | Open foundation model standard | Implement scaled transformer + tokenizer | ๐งญ Planned |
| 2024 | Mixtral 8ร7B | Mistral | Sparse Mixture-of-Experts architecture | Implement routing + expert parallelism | ๐งญ Planned |
| 2024 | Phi-2 / Phi-3 | Microsoft | Small but high-quality model; data-centric | Rebuild synthetic data pipeline | ๐งญ Planned |
| 2024 | Gemini 1 / 1.5 | Google DeepMind | Vision + Text + Reasoning | Prototype multimodal reasoning pipeline | ๐งญ Planned |
| 2023 | Qwen-VL | Alibaba | Vision-language alignment model | Replicate visual encoder + text fusion | ๐งญ Planned |
| 2023 | BLIP-2 / MiniGPT-4 | Salesforce / HKU | Lightweight multimodal bridging | Implement pretrain connector | ๐งญ Planned |
| 2023 | LLaMA 1 / 2 | Meta | Open LLM baseline | Implement tokenizer + attention stack | ๐งญ Planned |
| Year | Paper | Author | Goal | Status |
|---|---|---|---|---|
| 2021 | CLIP | Radford, et al. | Align Vision and NLP in same space using contrastive learning | ๐ฌ Replicating |
| 2020 | ViT | Dosovitskiy et al. | Vision Transformer | โ Done |
| 2018 | BERT | Devlin et al. | Masked Language Modeling | ๐ฌ Replicating |
| 2017 | Transformer | Vaswani et al. | โAttention Is All You Needโ | โ Done |
| 2014 | Seq2Seq | Sutskever et al. | Encoder-decoder translation | ๐งญ Planned |
| 2013 | Word2Vec | Mikolov et al. | Learn word embeddings | ๐งญ Planned |
| 2015 | Bahdanau Attention | Bahdanau et al. | RNN + Attention | ๐งญ Planned |
| Year | Paper | Author | Goal | Status |
|---|---|---|---|---|
| 2015 | ResNet | He et al. | Residual learning | ๐งญ Planned |
| 2014 | VGG | Simonyan et al. | Deep CNN architectures | ๐งญ Planned |
| 2012 | AlexNet | Krizhevsky et al. | GPU-based CNN | ๐งญ Planned |
| 2006 | DBN / RBM | Hinton | Layer-wise pretraining | ๐งญ Planned |
| Year | Paper | Author | Goal | Status |
|---|---|---|---|---|
| 2001 | Random Forests | Breiman | Ensemble learning | ๐งญ Planned |
| 1997 | AdaBoost | Freund & Schapire | Boosting algorithms | ๐งญ Planned |
| 1995 | SVM | Vapnik | Maximum margin classifier | ๐งญ Planned |
| 1977 | EM Algorithm | Dempster et al. | Expectation-Maximization | ๐งญ Planned |
| Year | Paper | Author | Goal | Status |
|---|---|---|---|---|
| 1986 | Backpropagation | Rumelhart et al. | Gradient-based learning | ๐งญ Planned |
| 1985 | Boltzmann Machine | Hinton et al. | Generative stochastic model | ๐งญ Planned |
| 1982 | Hopfield Network | Hopfield | Associative memory | ๐งญ Planned |
| 1958 | Perceptron | Rosenblatt | Linear separability | ๐งญ Planned |
๐งญ Planned โ ๐ฌ In Reproduction โ ๐งช Under Evaluation โ ๐ Verified โ ๐งพ Documented โ ๐งฐ Extended (optional) ReplicateAI/ โโโ stage1_foundation/ โ โโโ 2025_Qwen2.5/ โ โโโ 2024_LLaMA3/ โ โโโ 2023_CLIP/ โโโ stage2_representation/ โ โโโ 2018_BERT/ โ โโโ 2017_Transformer/ โ โโโ 2013_Word2Vec/ โโโ stage3_deep_renaissance/ โ โโโ 2015_ResNet/ โ โโโ 2012_AlexNet/ โ โโโ 2006_DBN/ โโโ stage4_statistical/ โ โโโ 2001_RandomForest/ โ โโโ 1995_SVM/ โโโ stage5_foundations/ โโโ 1986_Backprop/ โโโ 1958_Perceptron/ Each paper module includes:
๐ README.md โ Paper summary & objective ๐ report.md โ Reproduction results & analysis ๐ notebook/ โ Interactive demo ๐ป src/ โ Core implementation ๐ references.bib โ Original citation We welcome contributions from researchers, engineers, and students who believe in reproducibility.
- Fork the repo
- Pick a paper or model not yet implemented
- Follow the Paper Template
- Submit a PR with your code and report
โ Please include:
- clear code (PyTorch / JAX / NumPy)
- short experiment or visualization
- reproducibility notes or deviations
| Stage | Era | Progress |
|---|---|---|
| ๐ช Foundation (2023โ2025) | Modern LLM & Multimodal | โโโโโโโโโโโโโโ 0% |
| ๐ Representation (2013โ2020) | Transformers & Embeddings | โโโโโโโโโโโโโโ 0% |
| ๐งฉ Deep Renaissance (2006โ2014) | CNN Era | โโโโโโโโโโโโโโ 0% |
| ๐ Statistical (1990sโ2000s) | Classical ML | โโโโโโโโโโโโโโ 0% |
| ๐งฌ Foundations (1950sโ1980s) | Neural Origins | โโโโโโโโโโโโโโ 0% |
If you use or reference this project, please cite:
@misc{replicateai2025, author = {ReplicateAI Contributors}, title = {ReplicateAI: Rebuilding the History of Machine Learning and Artificial Intelligence}, year = {2025}, url = {https://github.com/duoan/ReplicateAI} }โReplicate. Verify. Understand.โ
โญ๏ธ Star this repo if you believe reproducibility is the foundation of true intelligence.
