LLM Infra Tutorial
面向中级工程师的 LLM 基础设施课程,涵盖 GPU 显存、分布式并行、推理系统与 RLHF。
From GPU memory hierarchy to NCCL communication primitives — the two pillars of LLM Infra optimization.
From DDP to hybrid parallelism — a systematic guide to every parallelism strategy in large model training.
A deep dive into PagedAttention and RadixAttention — understanding the core design of modern LLM inference engines.
From the four-model RLHF architecture to verl’s system design — understanding why RLHF is fundamentally a systems problem.