LLM Infra Tutorial

面向中级工程师的 LLM 基础设施课程，涵盖 GPU 显存、分布式并行、推理系统与 RLHF。

GPU Memory Model and Distributed Communication Fundamentals

From GPU memory hierarchy to NCCL communication primitives — the two pillars of LLM Infra optimization.

From DDP to hybrid parallelism — a systematic guide to every parallelism strategy in large model training.

A deep dive into PagedAttention and RadixAttention — understanding the core design of modern LLM inference engines.

From the four-model RLHF architecture to verl’s system design — understanding why RLHF is fundamentally a systems problem.