NVIDIA AI Open Day Tech Talk Videos Now Available


LLM Training / Inference / CUDA Optimization Special Session
Download technical materials here:
https://scrm.nvidia.cn/mF/cms/none/FuceFYmFh5SGkhdaTzeC7N/e8WT66RnGUn6y5SwaaLx9F1
CUDA Optimization Core: Throughput • Latency
![co-newsletter-influencer-july-thumb-cptx-keynote-600×338-zhCN-3094203.jpg]

Session Overview
This session focuses on GPU CUDA optimization techniques, maximizing computational performance, memory bandwidth utilization, and minimizing latency. By exploring the co-evolution of GPU hardware and CUDA software programming alongside optimization principles, we demonstrate the synergy between hardware architecture and algorithm design. Practical examples using high-performance frameworks like CUTLASS will help developers accelerate AI training/inference in key scenarios (e.g., DeepSeek V3/R1 LLM optimization) and unlock GPU’s full potential.
- GPU Computing & Programming Model Evolution: Balancing throughput and latency in asynchronous computing
- GPU Memory System Evolution: Techniques for maximizing bandwidth utilization and latency hiding
- CUDA Abstraction Evolution: From C++ templates to Python CUTLASS development
Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season
LLM Training Special Session
Large-scale MoE models like DeepSeek-V3 are driving a new wave in AI, presenting unprecedented challenges to existing frameworks. This talk dives into performance breakthroughs for fine-grained MoE models, covering innovative optimizations in Megatron-Core, including:
- Memory-efficient management
- Compute-communication overlap
- Low-precision quantization
- Parallel strategy optimization

Key Topics:
- Megatron Core MoE in 2025: Architecture, features, performance optimizations, and best practices for DeepSeek-V3
- FP8 Mixed-Precision Training: Methodology and performance analysis
- FSDP Architecture Design in Megatron-Core
Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season
LLM Inference Special Session
As LLMs demonstrate powerful capabilities across applications, deploying them efficiently and cost-effectively has become a key industry focus. This session explores the latest advances in LLM inference, including:
- TensorRT-LLM’s development roadmap
- PyTorch workflow best practices
- Collaborative optimizations with DeepSeek and the open-source community
Key Topics:
- TensorRT-LLM Product Strategy Update
- TensorRT-LLM × PyTorch: A new paradigm for high-efficiency LLM inference
- Pushing DeepSeek’s Limits with TensorRT-LLM: Joint optimization with Tencent
Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season
熱門頭條新聞
- German Computer Game Awards 2026: game association congratulates the winners
- Vietnam’s Mobile Games Rank 2nd Globally in 2025 Downloads, Export-Oriented Industry Rises Strongly
- gamescom biz launches enhanced all-in-one platform for 2026 attendees
- CEO of UK Screen Alliance Makes the Case for Employer Engagement to Protect the Screen Industry’s Talent Pipeline
- Housemarque Brings SAROS Insights to Nordic Game 2026
- Honor of Kings x Ne Zha 2 Crossover Officially Launched
- 2026 Gaming App Insights: Adjust Report Signals Shift to Precision & Retention
- German Games PR Veteran Marchsreiter Communications Announces Closure