GPU Memory Model and Distributed Communication Fundamentals

From GPU memory hierarchy to NCCL communication primitives — the two pillars of LLM Infra optimization.

March 15, 2026 · 23 min · Zhanfeng Mo