LLM Infra Tutorial

面向中级工程师的 LLM 基础设施课程,涵盖 GPU 显存、分布式并行、推理系统与 RLHF。

GPU Memory Model and Distributed Communication Fundamentals

From GPU memory hierarchy to NCCL communication primitives — the two pillars of LLM Infra optimization.

March 15, 2026 · 23 min · Zhanfeng Mo

The Landscape of Distributed Parallelism Strategies

From DDP to hybrid parallelism — a systematic guide to every parallelism strategy in large model training.

March 16, 2026 · 22 min · Zhanfeng Mo

LLM Inference System Architecture (SGLang as Case Study)

A deep dive into PagedAttention and RadixAttention — understanding the core design of modern LLM inference engines.

March 17, 2026 · 25 min · Zhanfeng Mo

Introduction to RLHF System Design

From the four-model RLHF architecture to verl’s system design — understanding why RLHF is fundamentally a systems problem.

March 17, 2026 · 22 min · Zhanfeng Mo