On December 18th, Suzhou，that it was the annual NVIDIA GTC China Conference. This time, NVIDIA founder and CEO Jensen Huang focused on four major themes: artificial intelligence (AI), automotive, games and HPC.
Jensen Huang said that this is the largest GTC China to date, with 6,100 participants, a 250% growth in attendees in 3 years.
Jensen Huang announced a series of new NVIDIA products and cooperation progress, the core content is as follows:
1. Baidu and Ali use the NVIDIA AI platform as the recommendation system;
2. Launched the seventh-generation inference optimization software, TensorRT 7, to further optimize real-time conversational AI. The inference latency on the T4 GPU is 1/10 of the CPU;
3. The NVIDIA AI inference platform has been widely used worldwide;
4. Launch software-defined AV platform, next-generation autonomous driving and robotic SoC Orin, with a computing power of 200 TOPS, and plan to start production in 2022;
5. Open source NVIDIA DRIVE autonomous vehicle deep neural network to the transportation industry and launch NVIDIA DRIVE pre-trained model on NGC;
6. Didi will use NVIDIA GPUs to train machine learning algorithms in the data center, and use NVIDIA DRIVE to provide inference capabilities for its L4 self-driving cars;
7. Launch a new version of the NVIDIA ISAAC software development kit SDK, which provides updated AI perception and simulation functions for robots;
8. Announced six games supporting RTX technology;
9. Tencent cooperates with NVIDIA to launch the START cloud gaming service to bring computer gaming experience to the cloud in China;
10. Announced that Rayvision(Fox Renderfarm) Cloud Rendering Platform, the largest cloud rendering platform in Asia, will be equipped with NVIDIA RTX GPUs, and the first batch of 5000 RTX GPUs will be launched in 2020;
11. Released the Omniverse open 3D design collaboration platform for the construction industry (AEC);
12. For genome sequencing, Jensen Huang released CUDA accelerated genomic analysis toolkit NVIDIA Parabricks.
AI: Launched in Baidu and Ali recommendation system, launches next-generation TensorRT software
Since Alex Krivzhevsky used the NVIDIA Kepler GPU to win the ImageNet competition in 2012, NVIDIA has improved training performance by 300 times in 5 years.
With the combination of Volta, new Tensor Core GPU, Chip-on-wafer package, HBM 3D stack memory, NVLink and DGX system, NVIDIA is helping more AI research.
AI will scale from the cloud to the edge. NVIDIA is building a platform for each of the following use cases: DGX for training, HGX for hyperscale clouds, EGX for the edge, and AGX for autonomous systems.
1.Baidu and Ali recommendation systems use NVIDIA GPUs
Jensen Huang said that one of the most important machine learning models of the Internet is the recommendation system model.
Without a recommendation system, people cannot find what they need from hundreds of millions of web searches, billions of Taobao products, billions of TikTok short videos, various online news, tweets and photos.
Deep learning enables automatic feature learning, supports unstructured content data, speeds up latency, and increases throughput by accelerating.
Generally speaking, making a recommendation system faces two major challenges: complex model processing tasks brought by massive data, and real-time requirements for users to immediately see the recommendation results.
In response to this problem, Baidu proposed an AI-Box solution to train advanced large-scale recommendation systems.
Baidu AI-Box is a Wide and Deep structure. It uses the NVIDIA AI platform to train terabytes of data based on the NVIDIA GPU. At the same time, it is faster than the CPU. The training cost is only 1/10 of the CPU, and it supports larger-scale model training.
Similarly, the recommendation system made by Ali also uses the NVIDIA AI platform.
On the day of "Singles’ Day" this year, Ali's sales exceeded 38 billion U.S. dollars in goods. There were about 2 billion categories of goods listed on the e-commerce website, and 500 million users were shopping. The sales reached 268.4 billion a day. One billion referral requests.
If a user spends 1 second viewing a product, it will take 32 years to view all the products.
In this regard, Ali uses the NVIDIA T4 GPU to train the recommendation system, which makes every time a user clicks on a product, he will see other related recommended products.
Originally, the CPU speed was slower, only 3QPS, and the NVIDIA GPU increased the speed to 780QPS.
2.Introduced the seventh-generation inference optimization software TensorRT
At the scene, Jensen Huang announced the official launch of the seventh-generation inference optimization compiler TensorRT 7, which supports RNN, Transformer, and CNN.
TensorRT is NVIDIA's acceleration software for the inference phase of neural networks. It can greatly improve performance by providing optimized AI models.
TensorRT 5 released at the GTC China conference last year only supports CNN and only 30 kinds of transformations, while TensorRT 7 has made a lot of optimizations for Transformer and RNN, can achieve efficient operations with less memory, and supports more than 1,000 calculation transformations and optimizations.
TensorRT 7 can integrate horizontal and vertical operations. It can automatically generate code for a large number of RNN configurations designed by developers, fuse LSTM units point by point, and even fuse across multiple time steps, and do automatic low-precision inference as much as possible.
In addition, NVIDIA introduced a kernel generation function in TensorRT 7, which can generate an optimized kernel with any RNN.
Conversational AI is a typical example of the powerful features of TensorRT 7.
Its function is very complicated. For example, a user speaks a sentence in English and translates it into Chinese. This process needs to convert spoken English into text, understand the text, then convert it into the desired language, and then synthesize it by speech. Turn this text into speech.
A set of end-to-end conversational AI processes may consist of 20 or 30 models, using various model structures such as CNN, RNN, Transformer, autoencoder, and NLP.
Reasoning conversational AI, the reasoning delay of the CPU is 3 seconds, and now using TensorRT 7 to complete the reasoning on the T4 GPU is only 0.3s, which is 10 times faster than the CPU.
3.The NVIDIA AI platform is widely used
In addition, Kuaishou, Meituan, and other Internet companies are also using the NVIDIA AI platform as a deep recommendation system to improve click-through rates, reduce latency and increase throughput, and better understand and meet user needs.
For example, users of Meituan want to find a restaurant or hotel are all achieved by the user's search.
Conversational AI requires programmability, extensive software rollout, and low GPU latency. The NVIDIA AI platform including these models will provide support for the smart cloud.
NVIDIA EGX is an all-in-one AI cloud built for edge AI applications.It is designed for streaming AI applications, Kubernetes container orchestration, security of dynamic and static data, and is connected to all IoT clouds.
For example, Walmart uses it for smart checkout, the US Postal Service sorts mail through computer vision on EGX, and Ericsson will run 5G vRAN and AI Internet of Things on EGX server.
Launched a new generation of automotive SoCs with a computing power of 200 TOPS
NVIDIA DRIVE is an end-to-end AV autonomous driving platform. The platform is software-defined rather than a fixed-function chip, enabling a large number of developers to collaborate in a continuous integration and continuous delivery development mode.
Jensen Huang stated that he would open-source the deep neural network of NVIDIA DRIVE self-driving cars to the transportation industry on the NGC container registration.
1. The next-generation autonomous driving processor ORIN has 7 times the computing power of Xavier
NVIDIA released NVIDIA DRIVE AGX Orin, a new generation of autonomous driving and robotic processor SoCs that meets system safety standards such as ISO 26262 ASIL-D. It will include a series of configurations based on a single architecture and is scheduled to begin production in 2022.
Orin condenses the four-year effort of the NVIDIA team, which is used to process multiple high-speed sensors, sense the environment, create a model of the surrounding environment and define itself, and develop an appropriate action strategy based on specific goals.
It uses a 64-bit Arm Hercules CPU with 8 cores, 17 billion transistors, and a new deep learning and computer vision accelerator. Its performance reaches 200TOPS, which is almost 7 times higher than the previous generation technology (Xavier).
It has easy programming, is supported by rich tools and software libraries, and has new functional safety features that enable CPU and GPU lockstep operation and improve fault tolerance.
The Orin series can be extended from L2 to L5, compatible with Xavier, and can make full use of the original software, so developers can use products that span multiple generations after a one-time investment.
Its new feature is to improve the low-cost version for OEMs, that is, they want to use a single camera for L2 level AV, and at the same time can use the software stack in the entire AV product line.
In addition to chips, many technologies such as NVIDIA's platform and software can be applied in automobiles, helping customers customize applications to further improve product performance.
2.Introduced NVIDIA DRIVE pre-trained model
Jensen Huang also announced the launch of the NVIDIA DRIVE pre-training model on NGC.
A normal running safe autonomous driving technology requires many AI models, and its algorithms are diverse and redundant.
NVIDIA has developed advanced perception models for detection, classification, tracking, and trajectory prediction, as well as perception, localization, planning, and mapping.
These pre-trained models can be registered and downloaded from NGC.
3. Didi selects NVIDIA autonomous driving and cloud infrastructure
Didi Chuxing will use NVIDIA GPUs and other technologies to develop autonomous driving and cloud computing solutions.
Didi will use NVIDIA GPUs to train machine learning algorithms in the data center, and will use NVIDIA DRIVE to provide inference capabilities for its L5 self-driving cars.
In August this year, Didi upgraded its autonomous driving department to an independent company and launched extensive cooperation with industry chain partners.
As part of the AI processing of Didi Autopilot, NVIDIA DRIVE uses multiple deep neural networks to fuse data from various sensors (cameras, lidar, radar, etc.) to achieve a 360-degree understanding of the surrounding environment of the car and plan to get a safe driving path.
In order to train more secure and efficient deep neural networks, Didi will use NVIDIA GPU data center servers.
Didi Cloud will adopt a new vGPU license model, which aims to provide users with better experience, richer application scenarios, more efficient, more innovative and flexible GPU computing cloud services.
4.Release NVIDIA ISAAC Robot SDK
For the field of robotics, Jensen Huang announced the launch of the new NVIDIA Isaac robotic SDK, which greatly speeds up the development and testing of robots, enabling robots to obtain AI-driven sensing and training functions through simulation, so that robots can be tested in various environments and situations And validation, and save costs.
Isaac SDK includes Isaac Robotics Engine (provides application framework), Isaac GEM (pre-built deep neural network models, algorithms, libraries, drivers, and APIs), a reference application for indoor logistics, and the introduction of Isaac Sim to train robots. The generated software can be deployed into real robots that run in the real world.
Among them, camera-based perceptual deep neural networks include models such as object detection, free space segmentation, 3D pose estimation, and 2D human pose estimation.
The new SDK's object detection has also been updated through the ResNet deep neural network, which can be trained using NVIDIA's migration learning toolkit, making it easier to add new objects to detect and train new models.
In addition, the SDK provides a multi-robot simulation. Developers can put multiple robots into the simulation environment for testing. Each robot can run a separate version of the Isaac navigation software stack while moving in a shared virtual environment.
The new SDK also integrates support for NVIDIA DeepStream software. Developers can deploy DeepStream and NVIDIA GPUs at the edge AI that supports robotics applications to process video streams.
Robot developers who have developed their own code can connect their software stack to the Isaac SDK and access routing Isaac functions through the C API, which greatly reduces programming language conversion. C-API access also allows developers to use the Isaac SDK in other programming languages.
According to Jensen Huang, domestic universities use Isaac to teach and study robotics.
5.NVIDIA's automotive ecosystem
NVIDIA has been in the automotive field for more than 10 years and has done a lot with its partners, so that the brain of the AI can better understand and even "drive" vehicles.
After continuous simulation, testing, and verification, and after confirming that the system works, NVIDIA and partners can really apply it to the actual road.
Whether it's a truck company, a regular car company, or a taxi company, you can use this platform to customize your software for specific models.
NVIDIA provides transfer learning tools that allow users to train models in-house and use TensorRT for re-optimization.
In addition, NVIDIA has developed a federal learning system, which is particularly useful for industries that value data privacy.
Whether it is a hospital, a laboratory, or a car company, after developing a training neural network, you can only upload the processed results to some global servers, while keeping the data locally to ensure data privacy.
Gaming: Launches START Cloud Gaming Service with Tencent
"Minecraft" is the world's best-selling video game. It has recently reached 300 million registered users in China. NVIDIA and Microsoft jointly announced that "Minecraft" will support real-time ray tracing (RTX) technology. Currently, NVIDIA RTX technology has been supported by several of the industry's most popular rendering platforms.
People want more lightweight and thin gaming notebooks. For this, NVIDIA created the Max-Q design, combining ultra-high GPU performance and overall system optimization, so that powerful GPUs can be used for thin and light notebooks.
This year, China shipped more than 5 million gaming notebooks, a four-fold increase in five years. GeForce RTX Max-Q notebooks are the fastest growing gaming platform.
In addition, Jensen Huang announced that Tencent and NVIDIA have launched the START cloud gaming service to bring computer gaming experience to the cloud in China.
NVIDIA GPU will provide support for Tencent's START cloud gaming platform. Tencent plans to expand the platform for millions of players to provide them with the same gaming experience as local game consoles, even without using a terminal with insufficient performance.
The NVIDIA RTX platform includes more than 40 products for content workers, ranging from Max-Q thin and light notebooks equipped with GeForce RTX 2060, to workstations equipped with 4-way SLI Quadro RTX 8000 and servers equipped with 8-way RTX 8000.
Jensen Huang announced that the Rayvision(Fox Renderfarm) cloud rendering platform will be equipped with NVIDIA RTX GPUs, and the first batch of 5,000 RTX GPUs will be launched in 2020.
This is the largest cloud rendering platform in Asia. It has rendered three of China's most popular movies in 2019, "Wolf Warrior 2", "Nezha(2019)” and "The Wandering Earth ", and more than 85% of Chinese film studios are Rayvision's customers.
Jensen Huang also released the Omniverse open 3D design collaboration platform for the construction industry (AEC). Both local and cloud support the addition of real-time collaboration capabilities in AEC workflows. It will support mainstream AEC applications such as Autodest REVIT, Trimble SketchUP and McNeel Rhino.
A demo using OMNIVERSE AEC was demonstrated on the spot. China Resources Tower was designed by KPF Architects to perform real-time rendering on a server equipped with 8-way RTX 8000.
HPC: CUDA Accelerated Genomic Analysis Toolkit for Genome Sequencing
NVIDIA's applications in HPC are also very rich. For example, NASA, which plans to send humans to Mars in 2030, runs hundreds of thousands of simulations of Mars landing scenarios on the NVIDIA GPU through FUN3D fluid mechanics software, generating 150TB of data.
For genome sequencing, Jensen Huang released CUDA accelerated genome analysis toolkit NVIDIA Parabricks.
Parabricks can be used to discover mutations and can produce results consistent with industry standard GATK best practice processes, which can achieve 30-50 times speedup, and provide DeepVariant tools to detect genetic mutations using deep learning techniques.
Jensen Huang said that I am pleased to announce that BGI has adopted Parabricks, and by using several GPU servers, it can process the genome at the rate at which its sequencer generates data.
According to him, NVIDIA added two new mainstream applications of 5G vRAN and genome to CUDA this year, which has been supported by industry leaders such as Ericsson and BGI.
In addition, Jensen Huang also mentioned once again that the cooperation between NVIDIA GPU and Arm has been announced, and now supports CUDA to accelerate Arm servers, and has announced the first reference architecture based on Arm NVIDIA HPC for ARM, which can use various Arm-based HPC servers Configure it.
TensorFlow now supports acceleration on Arm. With NVIDIA CUDA on ARM, TensorFlow can achieve world-class performance and scalability.
2019 is coming to an end, and NVIDIA is also at this conference. Not only did it show off its AI, automotive, gaming and HPC capabilities, it also exposed a circle of friends in many fields.
Jensen Huang said that with the end of Moore's Law, GPU accelerated computing will become the future development direction, which is now recognized.
NVIDIA can accelerate single-threaded processing and parallel multi-threaded processing, and optimize through the entire software stack, making multi-GPU and multi-node systems achieve incredible performance. NVIDIA has sold 1.5 billion GPUs, and each GPU is compatible with the same architecture CUDA.
As the biggest beneficiary of the AI deep learning bonus so far, NVIDIA continues to enrich its AI and autonomous driving ecosystem based on high-performance software, hardware, and systems. Finding the core scene to accelerate the landing is still the current priority.