- On Wednesday, Google shared information about its AI supercomputer that outperforms Nvidia's systems in terms of speed and efficiency.
- Despite Nvidia's market dominance in AI model training and deployment with over 90%, Google has been utilizing its Tensor Processing Unit chip for AI purposes since 2016, primarily for internal use.
- Google's AI supercomputer features the Tensor Processing Unit and is capable of processing large amounts of data at a faster rate than competing Nvidia systems.
Google shared details about one of its AI supercomputers on Wednesday, stating that it is more efficient and faster than competing Nvidia systems, as the demand for power-intensive machine learning models continues to drive growth in the tech industry.
Despite Nvidia's market dominance in AI model training and deployment, with a share of over 90%, Google has been developing and deploying its own AI chips, known as Tensor Processing Units or TPUs, since 2016.
Google has been a significant trailblazer in the field of AI, and its workforce has made some of the most noteworthy advancements in the past decade. Nevertheless, some suggest that the company has lagged behind in terms of commercializing its innovations. Internally, there is a sense of urgency to launch products and demonstrate that Google has not lost its edge - a "code red" situation within the company.
The process of training AI models and products, such as Google's Bard or OpenAI's ChatGPT, that rely on Nvidia's A100 chips, necessitates a significant number of computers and hundreds or thousands of chips working in tandem. The computers must operate non-stop for weeks or even months.
On Tuesday, Google revealed that it had constructed a system that consists of over 4,000 TPUs linked with custom components engineered to run and train AI models. The system has been active since 2020 and was used for a period of over 50 days to train Google's PaLM model, which is in competition with OpenAI's GPT model.
Google's researchers have stated that their TPU-based supercomputer, TPU v4, is "1.2x-1.7x quicker and employs 1.3x-1.9x less energy than Nvidia A100."
"Given its performance, scalability, and availability, TPU v4 supercomputers are the backbone of large language models," added the researchers.
The Google researchers explained that they did not compare the TPU results with Nvidia's latest AI chip, the H100, because it is newer and employs more advanced manufacturing technology.
The outcomes and rankings of an industry-wide AI chip test known as MLperf were released on Wednesday, and Jensen Huang, the CEO of Nvidia, claimed that the most recent Nvidia chip, the H100, performed significantly better than the previous generation.
In a blog post, Jensen Huang stated that "MLPerf 3.0 shows that Hopper delivers four times the performance of A100," and that "to reach the next stage of Generative AI, new AI infrastructure is required to train Large Language Models with high energy efficiency."
The high computer power requirement for AI is costly, and several industry players are concentrating on developing new chips, components, such as optical connections, or software techniques that lower the amount of computer power required.
Cloud providers such as Google, Microsoft, and Amazon benefit from the energy requirements of AI, as they can rent out computer processing time by the hour and offer credits or computing time to startups to establish relationships. (Google's cloud also sells Nvidia chips for use.) For instance, Google utilized its TPU chips to train Midjourney, an AI image generator.
0 Comments