Vision-Language-Action-Model EMMA: End-to-End Multimodal Model for Autonomous Driving MotionLM: Multi-Agent Motion Forecasting as Language Modeling TRAJEGLISH: TRAFFIC MODELING AS NEXT-TOKEN PREDICTION SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction DIFFUSION-BASED PLANNING FOR AUTONOMOUS DRIVING WITH FLEXIBLE GUIDANCE Robust Autonomy Emerges from Self-Play VLM-R1: A stable and generalizable R1-style Large Vision-Language Model ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills