LLM Infra Tutorial

LLM Infra Tutorialhttps://mzf666.github.io/llm-infra/en/Recent content on LLM Infra TutorialHugoen-USTue, 17 Mar 2026 00:00:00 +0000GPU Memory Model and Distributed Communication Fundamentalshttps://mzf666.github.io/llm-infra/en/posts/01-gpu-memory-distributed/Sun, 15 Mar 2026 00:00:00 +0000https://mzf666.github.io/llm-infra/en/posts/01-gpu-memory-distributed/From GPU memory hierarchy to NCCL communication primitives — the two pillars of LLM Infra optimization.The Landscape of Distributed Parallelism Strategieshttps://mzf666.github.io/llm-infra/en/posts/02-parallel-strategies/Mon, 16 Mar 2026 00:00:00 +0000https://mzf666.github.io/llm-infra/en/posts/02-parallel-strategies/From DDP to hybrid parallelism — a systematic guide to every parallelism strategy in large model training.LLM Inference System Architecture (SGLang as Case Study)https://mzf666.github.io/llm-infra/en/posts/03-inference-sglang/Tue, 17 Mar 2026 00:00:00 +0000https://mzf666.github.io/llm-infra/en/posts/03-inference-sglang/A deep dive into PagedAttention and RadixAttention — understanding the core design of modern LLM inference engines.Introduction to RLHF System Designhttps://mzf666.github.io/llm-infra/en/posts/04-rlhf-system/Tue, 17 Mar 2026 00:00:00 +0000https://mzf666.github.io/llm-infra/en/posts/04-rlhf-system/From the four-model RLHF architecture to verl’s system design — understanding why RLHF is fundamentally a systems problem.