Skip to content

arvindcr4/awesome_agents_papers

Repository files navigation

Awesome Agents Papers Collection

A comprehensive collection of papers and presentation slides on LLM agents, reasoning, and AI systems.

Sources:

Quick Stats

  • Papers: 88 PDFs (organized in 12 folders)
  • Slides: 93 presentation decks (~504 MB)
  • Topics: 15 categories
  • Audio Overviews: See NOTEBOOKLM_LINKS.md for AI-generated podcast summaries
  • Resources: See DEEP_RL_RESOURCES.md for comprehensive RL learning materials

Folder Structure

papers/ ├── agent-frameworks/ # 10 papers - ReAct, AutoGen, DSPy, etc. ├── benchmarks/ # 6 papers - SWE-bench, WorkArena, evals ├── computer-use/ # 5 papers - OSWorld, DigiRL, SWE-agent ├── memory-rag/ # 3 papers - HippoRAG, retrieval systems ├── multi-agent/ # 2 papers - AgentNet, MasRouter ├── planning/ # 5 papers - Tree search, optimization ├── reasoning/ # 9 papers - Chain-of-thought, reasoning ├── rl-finetuning/ # 16 papers - DeepSeek R1, GRPO, DPO ├── robotics/ # 6 papers - Eureka, Voyager, GR00T ├── security/ # 10 papers - Prompt injection, red-teaming ├── theorem-proving/ # 9 papers - LeanDojo, AlphaGeometry └── web-agents/ # 7 papers - WebArena, Mind2Web slides/ # 92 presentation decks (504 MB) 

Table of Contents


Inference-Time Techniques

Paper Slides Code Media
Large Language Models as Optimizers CS839 Prompting II GitHub 🖼️
Large Language Models Cannot Self-Correct Reasoning Yet - - 🎨 🖼️
Teaching Large Language Models to Self-Debug - - 🎨 🖼️ 🎧
Chain-of-Thought Reasoning Without Prompting CoT Princeton Lecture, CoT Toronto, CoT SJTU, CoT Interpretable ML, Concise CoT GitHub (unofficial) 🎨 🖼️
Premise Order Matters in Reasoning with LLMs - - 🎨 🖼️
Chain-of-Thought Empowers Transformers CoT Slides - 🎨 🖼️

Post-Training & Alignment

Paper Slides Code Media
Direct Preference Optimization (DPO) DPO CMU, DPO UT Austin, DPO Toronto, DPO Jinen GitHub 🎨 🖼️
Iterative Reasoning Preference Optimization - - 🎨 🖼️
Chain-of-Verification Reduces Hallucination - GitHub (unofficial) 🎨 🖼️
Unpacking DPO and PPO DPO Slides GitHub 🖼️
RLHF Background RLHF UT Austin - -

Memory & Planning

Paper Slides Code Media
Grokked Transformers are Implicit Reasoners - GitHub 🎨 🖼️
HippoRAG: Neurobiologically Inspired Long-Term Memory HippoRAG NeurIPS GitHub 🎨 🖼️
Is Your LLM Secretly a World Model of the Internet - GitHub 🖼️
Tree Search for Language Model Agents - GitHub 🖼️

Agent Frameworks

Paper Slides Code Media
ReAct: Synergizing Reasoning and Acting ReAct UVA Lecture GitHub 🎨 🖼️ 🎧
AutoGen: Multi-Agent Conversation - GitHub 🎨 🖼️ 🎧
StateFlow: Enhancing LLM Task-Solving - GitHub 🖼️ 🎧
DSPy: Compiling Declarative Language Model - GitHub 🎨 🖼️ 🎧
LLM Agents Tutorials EMNLP 2024 Tutorial, WWW 2024 Tutorial, Berkeley Training Agents - -

Code Generation & Software Agents

Paper Slides Code Media
SWE-agent: Agent-Computer Interfaces Software Agents (Neubig) GitHub 🎨 🖼️
OpenHands: AI Software Developers Software Agents (Neubig) GitHub 🖼️ 🎧
Interactive Tools Assist LM Agents Security Vulnerabilities Code Agents & Vulnerability Detection GitHub -
Big Sleep: LLM Vulnerabilities Real-World Code Agents & Vulnerability Detection - -
SWE-bench Verified - GitHub 🖼️ 🎧

Web & Multimodal Agents

Paper Slides Code Media
WebShop: Scalable Real-World Web Interaction Multimodal Agents Berkeley GitHub -
Mind2Web: Generalist Agent for the Web Multimodal Agents Berkeley GitHub -
WebArena: Realistic Web Environment Multimodal Agents Berkeley, Web Agent Evaluation GitHub -
VisualWebArena Multimodal Agents Berkeley GitHub -
AGUVIS: Unified Pure Vision Agents GUI - GitHub -
BrowseComp: Web Browsing Benchmark - GitHub -

Enterprise & Workflow Agents

Paper Slides Code Media
WorkArena: Common Knowledge Work Tasks - GitHub 🖼️ 🎧
WorkArena++: Compositional Planning - GitHub 🖼️ 🎧
TapeAgents: Holistic Framework Agent Development TapeAgents Slides GitHub 🖼️ 🎧

Mathematics & Theorem Proving

Paper Slides Code Media
LeanDojo: Theorem Proving Retrieval-Augmented LeanDojo AITP, LeanDojo NeurIPS, Theorem Proving ML GitHub 🎨
Autoformalization with Large Language Models - - 🎨
Autoformalizing Euclidean Geometry - GitHub 🎨
Draft, Sketch and Prove: Formal Theorem Provers Theorem Proving ML GitHub 🎨
miniCTX: Neural Theorem Proving Long-Contexts - GitHub 🎨
Lean-STaR: Interleave Thinking and Proving Berkeley Slides GitHub Website 🎨
ImProver: Agent-Based Automated Proof Optimization - GitHub 🎨
In-Context Learning Agent Formal Theorem-Proving - GitHub -
Symbolic Regression: Learned Concept Library - GitHub 🖼️
AlphaGeometry: Solving Olympiad Geometry - GitHub -

Robotics & Embodied Agents

Paper Slides Code Media
Voyager: Open-Ended Embodied Agent Voyager UT Austin GitHub 🎨
Eureka: Human-Level Reward Design Eureka Paper/Slides GitHub 🎨 🖼️
DrEureka: Language Model Guided Sim-To-Real - GitHub 🎨 🖼️
Gran Turismo: Deep Reinforcement Learning - - 🖼️
GR00T N1: Foundation Model Humanoid - GitHub 🎨 🖼️
SLAC: Simulation-Pretrained Latent Action - - -

Scientific Discovery

Paper Slides Code Media
Paper2Agent: Research Papers as AI Agents - GitHub 🖼️ 🎧
OpenScholar: Synthesizing Scientific Literature - GitHub 🖼️

Safety & Security

Paper Slides Code Media
DataSentinel: Game-Theoretic Detection Prompt Injection Prompt Injection Duke GitHub -
AgentPoison: Red-teaming LLM Agents Prompt Injection Duke GitHub 🎨
Progent: Programmable Privilege Control - - -
DecodingTrust: Trustworthiness GPT Models - GitHub -
Representation Engineering: AI Transparency - GitHub -
Extracting Training Data from LLMs - - -
The Secret Sharer: Unintended Memorization - - -
Privtrans: Privilege Separation - - -

Evaluation & Benchmarking

Paper Slides Code Media
Survey: Evaluation LLM-based Agents AgentBench Multi-Turn NeurIPS - 🖼️ 🎧
Adding Error Bars to Evals - GitHub 🖼️ 🎧
Tau2-Bench: Conversational Agents Dual-Control - GitHub 🖼️ 🎧
Data Science Agents Data Science Agents Benchmark - -

Neural & Symbolic Reasoning

Paper Slides Code Media
Beyond A-Star: Better Planning Transformers - GitHub 🖼️
Dualformer: Controllable Fast and Slow Thinking - GitHub 🖼️
Composing Global Optimizers: Algebraic Objects - - 🖼️
SurCo: Learning Linear Surrogates - - 🖼️

Agentic Reasoning & RL Fine-Tuning

Source: redhat-et/agentic-reasoning-reinforcement-fine-tuning

DeepSeek R1 & Reasoning Models

Paper Slides Code Media
DeepSeek-R1: Reasoning via RL DeepSeek R1 Intro, DeepSeek R1 Toronto, DeepSeek R1 CMU, DeepSeek R1 Seoul GitHub 🎨 🖼️
DeepSeek R1: Implications for AI DeepSeek R1 Intro - 🎨 🖼️
DeepSeek R1: Are Reasoning Models Faithful? - - 🎨 🖼️
OpenAI O1 Replication Journey - GitHub 🎨 🖼️
Qwen QwQ Reasoning Model - HuggingFace 🎨 🖼️
Sky-T1: Training Small Reasoning LLMs - GitHub 🖼️
s1: Simple Test-Time Scaling - GitHub 🖼️

GRPO & RL Fine-Tuning

Paper Slides Code Media
DeepSeekMath: GRPO Algorithm Stanford RL for Reasoning GitHub 🎨 🖼️
Guided GRPO: Adaptive Guidance PTA-GRPO Planning GitHub 🖼️
R-Search: Multi-Step Reasoning Stanford RL for Reasoning GitHub 🖼️
RL Fine-tuning: Instruction Following - - 🖼️
RFT Powers Multimodal Reasoning - - 🖼️
STILL-2: Distilling Reasoning - - 🖼️

Agentic RL

Paper Slides Code Media
WebAgent-R1: Multi-Turn RL for Web Agents - GitHub -
ARTIST: Agentic Reasoning & Tool Integration ARTIST Microsoft GitHub 🎨 🖼️ 🎧

Agentic Architectures & Coordination

Papers on multi-agent systems, decentralized coordination, and agentic frameworks

Decentralized Multi-Agent Systems

Paper Slides Code Media
AgentNet: Decentralized Multi-Agent Coordination - GitHub 🎨 🖼️
MasRouter: Multi-Agent Routing MasRouter ACL 2025 GitHub 🎨 🖼️
Multi-Agent RL Overview Edinburgh MARL Intro - -

Device & Computer Control

Paper Slides Code Media
DigiRL: Device Control Agents DigiRL NeurIPS 2024 GitHub 🖼️ 🎧
OSWorld: Multimodal Agents Benchmark - GitHub 🖼️
OS-Harm: Computer Use Safety OS-Harm Benchmark GitHub 🖼️ 🎧

Agent Fine-Tuning & Tool Use

Paper Slides Code Media
FireAct: Language Agent Fine-tuning LLM Agents Tool Learning GitHub 🖼️ 🎧
DeepSeek Janus Pro: Multimodal - GitHub 🎨 🖼️
PTA-GRPO: High-Level Planning PTA-GRPO Planning - -
Stanford RL for Agents Stanford RL Agents 2025 - -
CMU LM Agents CMU Language Models as Agents - -
Mannheim Tool Use Mannheim LLM Agents Tool Use - -

Enterprise & Industry Guides

Resource Description Code
Intel AI Agents Architecture AI agents resource guide -
Cisco Agentic Frameworks Overview of agentic frameworks -

Deep Reinforcement Learning

See Full Deep RL Resources Guide - Comprehensive collection with 100+ resources and 92 slides

Value-Based Methods (DQN Family)

Paper arXiv Slides Code Media
Playing Atari with Deep RL (DQN) 1312.5602 CMU, CVUT, NTHU, Waterloo OpenAI Baselines -
Deep RL with Double Q-learning 1509.06461 CMU DQN OpenAI Baselines -
Dueling Network Architectures 1511.06581 Buffalo OpenAI Baselines -
Prioritized Experience Replay 1511.05952 Buffalo, Julien Vitay, ICML 2020 OpenAI Baselines -
Rainbow: Combining Improvements 1710.02298 Prague, Berkeley, Wisconsin Dopamine -

Policy Gradient Methods

Paper arXiv Slides Code Media
Policy Gradient Methods - Toronto, Berkeley CS285, REINFORCE Stanford Stable-Baselines3 -
Proximal Policy Optimization (PPO) 1707.06347 Waterloo, NTU Taiwan OpenAI Baselines -
Trust Region Policy Optimization (TRPO) 1502.05477 FAU, UT Austin, CMU Natural PG, Toronto PAIR OpenAI Baselines -
High-Dimensional Continuous Control (GAE) 1506.02438 Berkeley CS285 OpenAI Baselines -

Actor-Critic Methods

Paper arXiv Slides Code Media
Asynchronous Methods (A3C) 1602.01783 WPI, Buffalo, NTU, UIUC, Julien Vitay OpenAI Baselines -
Continuous Control (DDPG) 1509.02971 Paderborn, FAU, Julien Vitay, Buffalo Stable-Baselines3 -
Addressing Function Approximation (TD3) 1802.09477 Prague Stable-Baselines3 -
Soft Actor-Critic (SAC) 1801.01290 Toronto PAIR, Purdue, Stanford CS231n, Prague Stable-Baselines3 -

Temporal Difference & Q-Learning

Paper arXiv Slides Code Media
TD Learning Fundamentals - CMU, Michigan, Sutton & Barto - -
Q-Learning - Northeastern, CMU TD - -

Model-Based RL

Paper arXiv Slides Code Media
Model-Based RL - FAU, Toronto, Berkeley, CMU MBRL-Lib -

Imitation & Inverse RL

Paper arXiv Slides Code Media
Imitation Learning - WPI, EPFL imitation -
Inverse Reinforcement Learning - TU Darmstadt, Berkeley CS285 imitation -

Introductory Lectures

Topic Slides
Deep RL Introduction Berkeley CS294, Berkeley 2017

Frameworks & Tools

Tool Link Description
OpenAI Gym GitHub RL environments
Gymnasium GitHub Maintained fork of Gym
Stable-Baselines3 GitHub RL algorithms in PyTorch
Unity ML-Agents GitHub 3D environments
PyTorch pytorch.org Deep learning framework
Google Dopamine GitHub RL research framework
CleanRL GitHub Single-file RL implementations
RLlib GitHub Scalable RL library

View all 100+ resources in DEEP_RL_RESOURCES.md


Recommended Study Path

Beginner

  1. Start with WWW 2024 LLM Agents Tutorial - comprehensive overview
  2. Read ReAct paper + slides + code
  3. Study Chain-of-Thought with CoT Princeton Lecture

Intermediate

  1. Software Agents (Neubig) for code agents + SWE-agent code
  2. DPO CMU Lecture for alignment + DPO code
  3. Multimodal Agents Berkeley for web agents + WebArena code

Advanced

  1. LeanDojo slides for theorem proving + code
  2. HippoRAG NeurIPS for memory systems + code
  3. Prompt Injection Duke for security

Reasoning & RL Fine-Tuning Path

  1. DeepSeek-R1 paper + DeepSeek R1 CMU slides + code
  2. DeepSeekMath GRPO + Stanford RL for Reasoning + code
  3. ARTIST paper for agentic reasoning with tools

License

Papers are property of their respective authors. This collection is for educational purposes.

About

Collection of papers and slide decks on LLM agents, reasoning, and AI systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors