The race to lead artificial intelligence has become a desperate hunt for the digital data needed to advance the technology. To obtain such data, tech companies including OpenAI, Google and Meta cut corners, ignored company policies and debated how to change the law, according to a New York Times analysis.
According to recordings of internal meetings obtained by the Times, last year managers, lawyers and engineers at Meta, the owner of Facebook and Instagram, discussed buying the Simon & Schuster publishing house to procure long-form works. They also agreed to collect copyrighted data from across the Internet, even if it meant facing legal action. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.
Like OpenAI, Google transcribed YouTube videos to gather text for its artificial intelligence models, five people familiar with the company's practices said. This potentially violated the videos' copyrights, which belong to their creators.
Last year, Google also expanded its terms of service. One reason for the change, according to members of the company's privacy team and according to an internal message seen by the Times, was to give Google access to publicly available Google Docs, restaurant reviews on Google Maps and other online material for more information. Artificial intelligence products.
The companies' actions illustrate how online information – news, works of fiction, message board posts, Wikipedia articles, computer programs, photos, podcasts and movies – has increasingly become the lifeblood of the burgeoning artificial intelligence sector. Creating innovative systems depends on having enough data to teach technologies to instantly produce text, images, sounds and videos that resemble what a human creates.