A New Speed Record for the AI Era
In the world of Artificial Intelligence, speed is everything. It determines how quickly your chatbot responds, how fast code is suggested, and ultimately, how powerful and user-friendly an AI service can be.
Today, Microsoft Azure, in partnership with NVIDIA, has just raised the bar—dramatically.
The new Azure ND GB300 v6 Virtual Machines, powered by the NVIDIA GB300 NVL72 rack-scale systems, have achieved an unprecedented performance milestone: 1,100,000 tokens per second (or 1.1 million tokens/s) on the widely used Llama2 70B AI model.
This isn’t just a big number; it’s a 27% speed boost over their previous record!
What is a “Token” and Why Azure Million-Token Matter?
Think of a token as a piece of a word or sentence the AI brain uses to “think” and generate responses.
The ability to process 1.1 million of these tokens every second means a single Azure rack can now handle an immense volume of AI work with lightning speed.
For the Tech Experts: The new GB300 v6 VMs deliver an astonishing 5× higher throughput per GPU than the previous generation of ND H100 v5 virtual machines. This kind of efficiency is a game-changer.
Explanation for a 5-Year-Old: The Super Speedy Computer
Imagine you have a super-smart robot brain that can talk and answer questions really, really fast. This super-smart brain uses words, and we count how many words it can say in just one second.
The people at Microsoft and NVIDIA built a new, bigger, and faster computer engine called Azure ND GB300 v6.
- Before: The fastest engine could say 865,000 words in one second.
- Now: The new engine can say 1,100,000 words in one second! That’s more than a million! 🤩
It’s like they made a race car that goes much, much faster than their old one. This makes the super-smart robot brain answer your questions and finish its homework much quicker!
The Impact: What This Means For Everyone
This achievement isn’t just for tech experts; it delivers significant benefits for the everyday user and business leader.
1. For the General Public (Faster, Smoother AI) 🌎
You will experience less waiting and more thinking. Any application built on these new Azure machines—from conversational AI assistants to powerful summarization tools—will feel faster, more responsive, and smoother in your daily life. It means a superior, near-instant user experience every time you interact with a major AI service.
2. For AI Product Managers (New Product Possibilities) 💡
This new performance curve radically changes the economic and technical feasibility of deploying AI features. Product Managers can now prioritize:
- True Real-Time Features: Launching high-demand, low-latency AI features that were previously too slow or too expensive (e.g., instant document analysis, live translation).
- Massive Scale & Lower Cost: The $\mathbf{5\times}$ higher throughput per GPU allows the same hardware to serve many more users simultaneously. This drastically lowers the operational cost per user, improving the product’s margin and enabling access to more powerful AI models for everyone.
The Technical Edge
Azure achieved this through a combination of cutting-edge hardware and software optimization: utilizing the superior NVIDIA Blackwell architecture (which provides 50% more GPU memory) and running the model with highly efficient FP4 precision and the optimized NVIDIA TensorRT-LLM library.
This achievement confirms that the performance required for large-scale, transformative AI is now available as a reliable, efficient utility, setting a new industry standard.

