Build Speech AI in Multiple Languages and Train Large Language Models with the Latest from Riva and NeMo Megatron

Major updates to Riva, an SDK for building speech AI applications, and a paid Riva Enterprise offering were announced at NVIDIA GTC 2022 last week. Several key updates to NeMo Megatron, a framework for training Large Language Models, were also announced.

Riva 2.0 general availability

Riva offers world-class accuracy for real-time automatic speech recognition (ASR) and text-to-speech (TTS) skills across multiple languages and can be deployed on-prem, in any cloud. Industry leaders such as Snap, T-Mobile, RingCentral, and Kore.ai use Riva in customer care center applications, transcription, and virtual assistants.

The latest Riva version includes:

ASR in multiple languages: English, Spanish, German, Russian, and Mandarin.
High-quality TTS voices customizable for unique voice fonts.
Domain-specific customization with TAO Toolkit or NVIDIA NeMo for unparalleled accuracy in accent, domain, and country-specific jargon.
Support to run in cloud, on-prem, and on embedded platforms.
A GIF showing how to control Riva text-to-speech pitch and speed using SSML tags.
NVIDIA Riva controllable text-to-speech makes it easy to adjust pitch and speed using SSML tags.
Try Riva automatic speech recognition on the Riva product page.

Defined.ai has collaborated with NVIDIA to provide a smooth workflow for enterprises looking to purchase speech training and validation data across languages, domains, and recording types.

Riva Enterprise

NVIDIA also introduced Riva Enterprise, a paid offering for enterprises deploying Riva at scale with business-standard support from NVIDIA experts.

Benefits include:

Unlimited use of ASR and TTS services on any cloud and on-prem platforms.
Access to NVIDIA AI experts during local business hours for guidance on configurations and performance.
Long-term support for maintenance control and upgrade schedule.
Priority access to new releases and features.
Riva Enterprise is available as a free trial on NVIDIA Launchpad for enterprises to evaluate and prototype their applications.

Riva Enterprise on launchpad includes guided labs to:

Interact with Real-Time Speech AI APIs.
Add Speech AI Capabilities to a Conversational AI Application.
Fine-Tune a Speech AI Pipeline on Custom Data for Higher Accuracy.

NeMo Megatron

NVIDIA announced new updates to NVIDIA NeMo Megatron, a framework for training large language models (LLM) up to trillions of parameters. Built on innovations from the Megatron paper, with NeMo Megatron research institutions and enterprises can train any LLM to convergence. NeMo Megatron provides data preprocessing, parallelism (data, tensor, and pipeline), orchestration and scheduling, and auto-precision adaptation.

It consists of thoroughly tested recipes, popular LLM architecture implementations, and necessary tools for organizations to quickly start their LLM journey.

AI Sweden, JD.com, Naver, and the University of Florida are early adopters of NVIDIA technologies for building large language models.

The latest version includes:

Hyperparameter tuning tool—automatically creates recipes based on customers’ needs and infrastructure limitations.
Reference recipes for T5 and mT5 models.
Support to train LLM on cloud, starting with Azure.
Distributed data preprocessing scripts to shorten end-to-end training time.

Learn more about interesting applications of LLMs and best practices to deploy them in the Natural Language Understanding in Practice: Lessons Learned from Successful Enterprise Deployments GTC session.

About the Authors

About Siddharth Sharma

Siddharth Sharma is a Senior Technical Marketing Manager for Accelerated Computing at NVIDIA. Before joining NVIDIA, Siddharth was a product marketing manager for Simulink and Stateflow at Mathworks, working closely with automotive and aerospace companies to adopt model-based designs for creating control software.

About Gordana Neskovic

Gordana Neskovic is on the AI / DL product marketing team responsible for NVIDIA Maxine. Gordana has held various product marketing, data scientist, AI architect, and engineering roles at VMware, Wells Fargo, Pinterest, SFO-ITT, and KLA-Tencor before joining NVIDIA. She holds a Ph.D. from Santa Clara University and master’s and bachelor’s degrees in electrical engineering from the University of Belgrade, Serbia.

About Sirisha Rella

Sirisha Rella is a technical product marketing manager at NVIDIA focused on computer vision, speech, and language-based deep learning applications. Sirisha received her master’s degree in computer science from the University of Missouri-Kansas City and was a graduate research assistant at the National Science Foundation – Center for Big Learning.

Source: Siddharth Sharma, Gordana Neskovic and Sirisha Rella/NVDIA

Build Speech AI in Multiple Languages and Train Large Language Models with the Latest from Riva and NeMo Megatron

熱門頭條新聞

其他動漫資訊

動漫世界網絡中國站