NVIDIA Speeds AI Language Training toward Real-Time Conversational AI


NVIDIA’s recent developments in AI model size and training speeds have advanced the language understanding of models sufficiently to allow businesses to engage more naturally with customers using real-time, conversational AI.

NVIDIA's AI platform has been used to train one of the most widely recognised AI language models, known as Bidirectional Encoder Representations from Transformers or BERT, in less than an hour, and carry out AI inference in just over two milliseconds. This level of performance makes it possible for developers to use language comprehension for large-scale applications that they can make available to large groups of consumers around the world.

Basic conversational AI services have existed for several years. But until this point, it has beenquite difficult for chatbots, intelligent personal assistants and search engines to operate with human-level comprehension due to the inability to tap into extremely large AI models in realtime. NVIDIA has addressed this problem by adding specific optimisations to its AI platform - accelerating AI training and inference, and building one of the largest language models seen so far.

"Large language models are now making AI practical for natural language," said Bryan Catanzaro, vice president of Applied Deep Learning Research at NVIDIA. "They are helping us solve extremely difficult language problems, bringing us closer to the goal of true, conversational AI. Accelaration of these models allows organisations to create new services that can assist and entertain their customers in ways we hadn’t considered before."

NVIDIA expects AI services powered by natural language understanding to grow at an increasing pace in the near future. Consequently, NVIDIA has fine-tuned its AI platform, resulting in three new performance records in natural language understanding.

In a training speed test running the large version of BERT, an NVIDIA DGX SuperPOD using 92 NVIDIA DGX-2H systems with 1,472 NVIDIA V100 GPUs, cut the typical training time for BERT-Large from several days to 53 minutes. Furthermore, NVIDIA trained BERT-Large on just one NVIDIA DGX-2 system in 2.8 days - demonstrating the GPUs' scalability for conversational AI.

The page you are looking is not published

View Media

In an inference speed test using NVIDIA T4 GPUs running NVIDIA TensorRT, NVIDIA tested inference operations on the BERT-Base SQuAD dataset in only 2.2 milliseconds. This improves on the 10-millisecond processing threshold for many real-time applications, and is a marked improvement from over 40 milliseconds measured with highly optimised CPU code.

With a focus on developers' ongoing need for larger models, NVIDIA Research built and trained one of the world's largest language models based on Transformer, the neural network architecture used for BERT itself and other natural language AI models. NVIDIA's custom model, with 8.3 billion parameters, is 24 times the size of BERT-Large.

Language Understanding in Action

Developers around the world are currently using NVIDIA's AI platform in their own language understanding research and to create new services. Early adopters include Microsoft and some smaller, innovative new companies interested in responsive language-based services.

Microsoft Bing is using its Azure AI platform with the NVIDIA platform to run BERT and produce more accurate search results. Rangan Majumder, group program manager at Microsoft Bing said, "Bing further optimised the inferencing of BERT using NVIDIA GPUs, part of Azure AI infrastructure, which led to the largest improvement in ranking search quality Bing deployed in the last year. We achieved two times the latency reduction and five times throughput improvement during inference using Azure NVIDIA GPUs compared with a CPU-based platform, enabling Bing to carry out more relevant, cost-effective, real-time searches for customers."

Several startups in NVIDIA's Inception program, including Clinc, Passage AI and Recordsure, are also using NVIDIA's AI platform to build conversational AI services for banks, car manufacturers, retailers, healthcare providers and travel and hospitality companies.

Clinc has made NVIDIA GPU-enabled conversational AI systems accessible to more than 30 million people globally through a customer roster including car manufacturers, healthcare organisations and some of the world's major financial institutions, including Barclays, USAA and Turkey's Isbank.  


"Clinc's leading AI platform understands complex questions and transforms them into actionable insights for brands and organisations," said Jason Mars, CEO of Clinc. "The performance of the NVIDIA AI platform has allowed us to push the boundaries of conversational AI and deliver revolutionary services that help our clients use technology to engage with their customers in powerful, more meaningful ways."

Developer Optimisations

NVIDIA has made the software optimisations used in these achievements in conversational AI available to developers:

NVIDIA GitHub BERT training code with PyTorch

NGC model scripts and check-points for TensorFlow

TensorRT optimized BERT Sample on GitHub

Faster Transformer:  C++ API, TensorRT plugin, and TensorFlow OP

MXNet Gluon-NLP with AMP support for BERT (training and inference)

TensorRT optimized BERT Jupyter notebook on AI Hub

Megatron-LM: PyTorch code for training massive Transformer models