How did Deepsek build his artificial intelligence with less money?

Last month, the US financial markets collapsed after a Chinese start-up called Deepseek said he had built one of the most powerful artificial intelligence systems in the world using much less computers chips than many experts thought possible.

Artificial intelligence companies generally train their chatbots using supercomputer rich in 16,000 specialized or more chips. But Deepsek said he needed only about 2,000.

Like deepest engineers detailed in a research document published immediately after Christmas, the start-up has used several technological tricks to significantly reduce the construction costs of its system. His engineers needed only about $ 6 million in rough processing power, about a tenth of what destined has spent to build his latest AI technology.

What exactly did Deepseek do? Here is a guide.

The main artificial intelligence technologies are based on what scientists call neural networks, mathematical systems that learn their skills by analyzing enormous quantities of data.

The most powerful systems spend months analyzing almost the whole English text on the internet, as well as many images, sounds and other multimedia. This requires huge quantities of calculation power.

About 15 years ago, artificial intelligence researchers understood that the specialized computer chips called graphic processing units, or GPU, were an effective way to make this type of data analysis. Companies such as the Silicon Valley Nvidia chipmaker have originally designed these chips to make graphics for computer video games. But the GPUs also had a talent to manage the mathematics that fed neural networks.

As companies pitched more GPUs in their data centers for computers, their artificial intelligence systems could analyze more data.

But the best GPUs cost about $ 40,000 and need enormous quantities of electricity. Sending data between chips can use more electricity than the execution of the chips themselves.

He has done many things. In particular, he embraced a method called “mixture of experts”.

The companies usually created a single neural network that learned all the models in all data on the Internet. This was expensive, because it required huge quantities of data to travel between GPU chip.

If a chip was learning to write a poem and another was learning to write a computer program, he had to talk about it still, if there was a little overlap between poetry and programming.

With the method of mixing experts, researchers have tried to solve this problem by dividing the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics and so on. There may be 100 of these small “expert” systems. Each expert could focus on his particular field.

Many companies have fought with this method, but Deepseek has been able to do it well. His trick was to combine those smaller “expert” systems with a “generalist” system.

The experts still had to exchange some information between them and the generalist – who had a decent but unrelated understanding of each subject – could help coordinate the interactions between the experts.

It is a bit like a publisher who supervises a editorial staff full of specialized journalists.

Much more. But this is not the only thing Deepseek did. He also masterized a simple trick that involves decimal that anyone who remembers his math lesson in the elementary school can understand.

Remember your math teacher who explains the concept of pi. Pi, even indicated as π, is a number that never ends: 3.14159265358979 …

You can use π to perform useful calculations, how to determine the circumference of a circle. When performing those calculations, π agreements to a few decimals: 3.14. If you use this simplest number, get a fairly good estimate of a circle circumference.

Deepseek has done something similar – but on a much wider scale – in the formation of its AI technology.

Mathematics that allows a neural network to identify the models in the text is actually only multiplication: a lot and a lot of multiplication. We are talking about months of multiplication between thousands of computer chips.

Generally, chip multiply numbers that adapt to 16 bit memory. But Deepseek squeezed every number in just 8 bit of memory – half of the space. In essence, he put several decimal to each number to the test.

This meant that each calculation was less accurate. But it didn’t matter. The calculations were quite accurate to produce a truly powerful neural network.

Well, they added another trick.

After squeezing each number in 8 bits of memory, Deepseek took a different way when those numbers are multiplying together. In determining the response to each multiplication problem – carry out a key calculation that would help decide how the neural network would work, has lengthened the response on 32 bit memory. In other words, he has maintained many more decimal. Made the answer more precise.

Well, no. Deepseek engineers showed in their document that they were also very good at writing the very complicated computer code that tells GPUs what to do. They knew how to squeeze even more efficiency from these chips.

Few people have that type of skill. But serious artificial intelligence laboratories have talented engineers needed to combine what Deepseek has done.

Some artificial intelligence workshops may already use at least some of the same tricks. Companies like Openai do not always reveal what they are doing behind closed doors.

But others have been clearly surprised by Deepseek’s work. Doing what the start has done is not easy. The experimentation necessary to find a turning point like this involves millions of dollars – if not billions – in electricity.

In other words, it requires huge quantities of risk.

“You have to put a lot of money in line to try new things – and often fail,” said Tim Dettmers, researcher at the Allen Institute for Artificial Intelligence in Seattle, specialized in efficient artificial intelligence systems and previously has worked as an artificial intelligence researcher at Meta.

“That’s why we don’t see much innovation: people are afraid of losing many million just to try something that doesn’t work,” he added.

Many experts pointed out that the $ 6 million of Deepseek covered only what the start-up spent during the formation of the final version of the system. In their document, Deepseek’s engineers said they spent further research and experimentation before the final training race. But the same applies to any avant -garde project.

Deepseek has experienced and made its fruits. Now, since the Chinese start-up has shared its methods with other artificial intelligence researchers, its technological tricks are ready to significantly reduce the cost of the construction of AI

Leave a Reply

Your email address will not be published. Required fields are marked *