NVIDIA AI Open Day Tech Talk Videos Now Available

 
LLM Training / Inference / CUDA Optimization Special Session
Download technical materials here:
https://scrm.nvidia.cn/mF/cms/none/FuceFYmFh5SGkhdaTzeC7N/e8WT66RnGUn6y5SwaaLx9F1

CUDA Optimization Core: Throughput • Latency
![co-newsletter-influencer-july-thumb-cptx-keynote-600×338-zhCN-3094203.jpg]

Session Overview

This session focuses on GPU CUDA optimization techniques, maximizing computational performance, memory bandwidth utilization, and minimizing latency. By exploring the co-evolution of GPU hardware and CUDA software programming alongside optimization principles, we demonstrate the synergy between hardware architecture and algorithm design. Practical examples using high-performance frameworks like CUTLASS will help developers accelerate AI training/inference in key scenarios (e.g., DeepSeek V3/R1 LLM optimization) and unlock GPU’s full potential.

  • GPU Computing & Programming Model Evolution: Balancing throughput and latency in asynchronous computing
  • GPU Memory System Evolution: Techniques for maximizing bandwidth utilization and latency hiding
  • CUDA Abstraction Evolution: From C++ templates to Python CUTLASS development

Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season

LLM Training Special Session

Large-scale MoE models like DeepSeek-V3 are driving a new wave in AI, presenting unprecedented challenges to existing frameworks. This talk dives into performance breakthroughs for fine-grained MoE models, covering innovative optimizations in Megatron-Core, including:

  • Memory-efficient management
  • Compute-communication overlap
  • Low-precision quantization
  • Parallel strategy optimization

Key Topics:

  • Megatron Core MoE in 2025: Architecture, features, performance optimizations, and best practices for DeepSeek-V3
  • FP8 Mixed-Precision Training: Methodology and performance analysis
  • FSDP Architecture Design in Megatron-Core

Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season

LLM Inference Special Session

As LLMs demonstrate powerful capabilities across applications, deploying them efficiently and cost-effectively has become a key industry focus. This session explores the latest advances in LLM inference, including:

  • TensorRT-LLM’s development roadmap
  • PyTorch workflow best practices
  • Collaborative optimizations with DeepSeek and the open-source community

Key Topics:

  • TensorRT-LLM Product Strategy Update
  • TensorRT-LLM × PyTorch: A new paradigm for high-efficiency LLM inference
  • Pushing DeepSeek’s Limits with TensorRT-LLM: Joint optimization with Tencent

Watch Full Video:
https://space.bilibili.com/1320140761/lists/5626365?type=season

PHP Code Snippets Powered By : XYZScripts.com