GPU Memory Model and Distributed Communication Fundamentals
From GPU memory hierarchy to NCCL communication primitives — the two pillars of LLM Infra optimization.
From GPU memory hierarchy to NCCL communication primitives — the two pillars of LLM Infra optimization.