Four highlights on the race to amass data for artificial intelligence

admin • 08.04.2024 15:25 • Güncellendi: 08.04.2024 15:25

Online data has long been a valuable commodity. For years, Meta and Google have used data to target their online advertising. Netflix and Spotify have used it to recommend more movies and music. Political candidates have turned to data to understand which groups of voters to target.

Over the last 18 months it has become increasingly clear that digital data is also crucial for the development of artificial intelligence. Here's what to know.

The more data, the better.

The success of AI depends on data. This is because AI models become more accurate and more human-like with more data.

In the same way that a student learns by reading more books, essays, and other information, large language models – the systems that underpin chatbots – also become more accurate and more powerful if they are fed more data.

Some large language models, such as OpenAI's GPT-3, released in 2020, have been trained on hundreds of billions of “tokens,” which are essentially words or pieces of words. The most recent large language models have been trained on more than three trillion tokens.

Online data is a precious and limited resource.

Technology companies are using publicly available online data to develop their artificial intelligence models, faster than new data is produced. High-quality digital data is predicted to run out by 2026.

Tech companies are doing everything they can to get more data.

In the race for more data, OpenAI, Google and Meta are turning to new tools, changing their terms of service and engaging in internal debates.

At OpenAI, researchers created a program in 2021 that converted the audio of YouTube videos into text and then fed the transcripts into one of its AI models, going against YouTube's terms of service, people familiar with the matter said of the question.

(The New York Times sued OpenAI and Microsoft for using copyrighted news articles without permission to develop artificial intelligence. OpenAI and Microsoft said they used news articles in transformative ways that did not violate the law on copyright.)

Google, which owns YouTube, also used YouTube data to develop its artificial intelligence models, entering a legal gray area of copyright, people familiar with the action said. And Google revised its privacy policy last year so it can use publicly available material to develop more AI products.

Last year at Meta, executives and lawyers discussed how to get more data for AI development and discussed buying a major publisher like Simon & Schuster. In private meetings, they explored the possibility of inserting copyrighted works into their AI model, even if it meant they would be sued later, according to recordings of the meetings obtained by the Times.

One solution could be “synthetic” data.

OpenAI, Google and other companies are exploring using their artificial intelligence to create more data. The result would be so-called “synthetic” data. The idea is that AI models generate new text that can then be used to build better AI

Synthetic data is risky because AI models can make mistakes. Relying on such data can exacerbate these errors.

Four highlights on the race to amass data for artificial intelligence

The more data, the better.

Online data is a precious and limited resource.

Tech companies are doing everything they can to get more data.

One solution could be “synthetic” data.

Leave a Reply Cancel reply

Pakistan Halts Airstrikes on Afghanistan for Eid Celebrations

Boeing: 737 Max Teslimatlarında Kablo Problemi Nedeniyle Gecikme Olacak

His Film Competes for Oscars, But Is It Truly Spanish?

U.S. Strikes Iran’s Oil Hub Amid Israel-Iran Tensions, Trump Claims

Social Media Addiction Trial Approaches Conclusion: Society’s Verdict is Already In.

Microsoft’un yeni Xbox’ı Project Helix, 2027’ye kadar alpha aşamasına geçmeyecek

Earth Nears Critical Climate Target Breach

Kadim Metal Sanatının Zirve Temsili Özel Kılıç Koleksiyonları

West Bank Tensions Surge as Israeli Settlers Claim 3 Palestinian Lives

U.S. Officials: Russia Collaborating on Intelligence with Iran

Pet İlanları

Feno Medya

AKG Bilişim

Edirne Ahval Gazetesi

Avukat Mahmut Rasul UYANIK

The more data, the better.

Online data is a precious and limited resource.

Tech companies are doing everything they can to get more data.

One solution could be “synthetic” data.

Tavsiye Edilen Haberler

Key Elements of the 20-Point Peace Proposal for Ukraine

A Gazan Girl’s Battle Against Severe Hunger

Libya’nın Genelkurmay Başkanı ve 4 Kişi Türkiye’de Uçak Kazasında Hayatını Kaybetti

Leave a Reply Cancel reply