Skip to content
View xiayuqing0622's full-sized avatar

Block or report xiayuqing0622

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
xiayuqing0622/README.md

Hey, I'm Yuqing Xia πŸ‘‹

I'm obsessed with one thing: making LLMs ridiculously fast.

Every wasted microsecond on GPU is a personal offense to me. I work at the intersection of LLM inference systems, GPU kernel wizardry, and AI compilers β€” turning "that's theoretically possible" into shipped code.


πŸ”₯ TileRT β€” LLM Inference, Absurdly Fast

TileRT PyPI

Most inference engines optimize for throughput. We chose the harder problem: per-request latency.

TileRT is a tile-based runtime built for scenarios where every millisecond counts β€” AI-assisted coding, real-time conversation, high-frequency decision making. No batching tricks, no latency hiding. Just raw speed.

  • ⚑ 600 tok/s on DeepSeek-V3.2 | 500 tok/s on GLM-5-FP8
  • 🧠 Multi-Token Prediction β€” why generate one token when you can do three?
  • 🧩 Compiler-driven tile-level scheduling with dynamic rescheduling across devices
  • πŸš€ pip install tilert | Try it live at tilert.ai

🧱 The tile-ai Ecosystem

TileRT doesn't exist in a vacuum. It's part of tile-ai β€” a full stack we're building from scratch around one simple idea: tiles are the right abstraction for AI compute.

Project What it does
πŸ—£οΈ tilelang The language. Write tile programs, get optimized GPU kernels. Simple as that.
🌐 TileScale The scale-out. Multi-GPU, multi-node β€” one mega-device, zero headaches.
βš™οΈ TileOPs The operators. FlashAttention, MLA, DSA β€” battle-tested, auto-tuned.

πŸ›οΈ Previously

  • NNFusion β€” A DNN compiler that turns model descriptions into framework-free, high-performance executables. Built at Microsoft Research. We were doing AI compilers before it was cool. ⭐ 1000+

πŸ› οΈ Tech Stack

CUDA C++ Python PyTorch CUTLASS


πŸ“« Get in Touch

Building something latency-critical? Want to push LLM inference to the edge? Let's talk.

Pinned Loading

  1. tile-ai/TileRT tile-ai/TileRT Public

    Tile-Based Runtime for Ultra-Low-Latency LLM Inference

    Python 687 40