Elon Musk plans to expand the Colossus AI supercomputer tenfold


Unlock Editor’s Digest for free

Elon Musk’s artificial intelligence start-up xAI has pledged to expand its Colossus supercomputer tenfold to include more than a million graphics processing units, in a bid to leapfrog rivals such as Google, OpenAI and Anthropic.

Colossus, built in just three months earlier this year, is believed to be the world’s largest supercomputer, running a cluster of more than 100,000 interconnected Nvidia GPUs. Chips are used for training Muskova chatbot Grok, which is less advanced and has fewer users than market leader ChatGPT or Google’s Gemini.

Work to increase the size of the Memphis, Tennessee, facility has already begun, the Greater Memphis Chamber said in a statement Wednesday. NvidiaDell and Supermicro Computer would also establish operations in Memphis to support the expansion, the chamber of commerce said, while establishing an “xAI special operations team” to “provide a 24/7 concierge service to the company.”

The cost of getting that many GPUs would be significant. The latest generation of Nvidia GPUs typically cost tens of thousands of dollars, although older versions of the chips can be cheaper. Musk’s planned expansion of Colossus would require an investment that would likely reach tens of billions of dollars — plus the high costs of building, powering and cooling the massive servers on which they would sit. xAI has raised about $11 billion in capital from investors this year.

AI companies are scrambling to secure GPUs and access to data centers to supply the computing power needed to train and run their front-end models of large languages.

OpenAI, maker of ChatGPT, has a nearly $14 billion partnership with Microsoft that includes computing power credits. Anthropic, maker of the Claude chatbot, has received $8 billion in investment from Amazon and is about to give access to a new cluster of more than 100,000 of its specialized AI chips.

Instead of forging partnerships, Musk, the world’s richest man, has used his power and influence in the tech sector to build his own supercomputing capabilities, though he’s playing catch-up after founding xAI just over a year ago. The trajectory has been steep – the start-up is valued at $45 billion and recently raised another $5 billion.

Musk is in fierce competition with OpenAI, which he helped found with Sam Altman, among others, in 2015. The pair later fell out and Musk is now suing OpenAI, seeking to block its transition from a nonprofit to a more traditional enterprise.

An xAI investor said the speed with which Musk created Colossus is a “feather in the cap” of the AI ​​company, despite limited commercial product offerings. “He built the world’s most powerful supercomputer in three months.”

Jensen Huang, Nvidia’s CEO, said in October that “there’s only one person in the world who can do this.” Huang called Colossus “easily the fastest supercomputer on the planet as a single cluster,” and said a data center of this size would typically take three years to build.

The Colossus project caused controversy due to the speed with which it was built. Some have accused it of circumventing building permits and criticized the demands it places on the regional power grid.

“We don’t just lead from the front; we’re accelerating progress at an unprecedented pace while ensuring network stability using megapack technology,” Brent Mayo, xAI’s senior manager of site building and infrastructure, said at the Memphis event, according to a statement.



Source link