NVIDIA AI Open Day Tech Talk Videos Now Available

LLM Training / Inference / CUDA Optimization Special Session
Download technical materials here:
https://scrm.nvidia.cn/mF/cms/none/FuceFYmFh5SGkhdaTzeC7N/e8WT66RnGUn6y5SwaaLx9F1

CUDA Optimization Core: Throughput • Latency
![co-newsletter-influencer-july-thumb-cptx-keynote-600×338-zhCN-3094203.jpg]

Session Overview

This session focuses on GPU CUDA optimization techniques, maximizing computational performance, memory bandwidth utilization, and minimizing latency. By exploring the co-evolution of GPU hardware and CUDA software programming alongside optimization principles, we demonstrate the synergy between hardware architecture and algorithm design. Practical examples using high-performance frameworks like CUTLASS will help developers accelerate AI training/inference in key scenarios (e.g., DeepSeek V3/R1 LLM optimization) and unlock GPU’s full potential.

GPU Computing & Programming Model Evolution: Balancing throughput and latency in asynchronous computing
GPU Memory System Evolution: Techniques for maximizing bandwidth utilization and latency hiding
CUDA Abstraction Evolution: From C++ templates to Python CUTLASS development

Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season

LLM Training Special Session

Large-scale MoE models like DeepSeek-V3 are driving a new wave in AI, presenting unprecedented challenges to existing frameworks. This talk dives into performance breakthroughs for fine-grained MoE models, covering innovative optimizations in Megatron-Core, including:

Memory-efficient management
Compute-communication overlap
Low-precision quantization
Parallel strategy optimization

Key Topics:

Megatron Core MoE in 2025: Architecture, features, performance optimizations, and best practices for DeepSeek-V3
FP8 Mixed-Precision Training: Methodology and performance analysis
FSDP Architecture Design in Megatron-Core

Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season

LLM Inference Special Session

As LLMs demonstrate powerful capabilities across applications, deploying them efficiently and cost-effectively has become a key industry focus. This session explores the latest advances in LLM inference, including:

TensorRT-LLM’s development roadmap
PyTorch workflow best practices
Collaborative optimizations with DeepSeek and the open-source community

Key Topics:

TensorRT-LLM Product Strategy Update
TensorRT-LLM × PyTorch: A new paradigm for high-efficiency LLM inference
Pushing DeepSeek’s Limits with TensorRT-LLM: Joint optimization with Tencent

Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season

NVIDIA AI Open Day Tech Talk Videos Now Available

熱門頭條新聞

其他動漫資訊

動漫世界網絡中國站