The White Clam Pizza at Frank Pepe Pizzeria Napoletana in New Haven, Connecticut, is a revelation. The crust, kissed by the intense heat of the charcoal oven, achieves a perfect balance between crunchiness and chewiness. Topped with freshly shucked clams, garlic, oregano and a sprinkling of grated cheese, it's a testament to the magic that simple, high-quality ingredients can conjure.
Do you look like me? It is not. The entire paragraph, except the name of the pizzeria and the city, was generated by GPT-4 in response to a simple message requesting a Pete Wells-style restaurant critique.
I have a few quibbles. I would never call any food a revelation, nor would I describe warmth as a kiss. I don't believe in magic and rarely call something perfect without using “almost” or some other hedge. But these lazy descriptors are so common in food writing that I imagine many readers barely notice them. I'm unusually attuned to them because every time I commit a cliché in my text, I get punched by my editor.
He wasn't going to be fooled by the fake Pete. Me neither. But as much as it pains me to admit it, I imagine a lot of people would say he's a four-star fake.
The person responsible for Phony Me is Balazs Kovacs, professor of organizational behavior at the Yale School of Management. In a recent study, he fed a large number of Yelp reviews to GPT-4, the technology behind ChatGPT, and asked it to imitate them. His test subjects – people – couldn't distinguish between genuine reviews and those produced by artificial intelligence. In fact, they were more likely to think the AI overhauls were real. (The phenomenon of computer-generated fakes that are more convincing than reality is so well known that there's a name for it: AI hyperrealism.)
Dr. Kovacs' study belongs to a growing body of research suggesting that the latest versions of generative artificial intelligence can pass the Turing Test, a scientifically confusing but culturally resonant standard. When a computer can trick us into believing that the language it produces was written by a human, we say it has passed the Turing test.
It has long been assumed that artificial intelligence would eventually pass the test, first proposed by mathematician Alan Turing in 1950. But even some experts are surprised by how quickly the technology is improving. “It's happening faster than people expected,” Dr. Kovacs said.
The first time Dr. Kovacs asked GPT-4 to mimic Yelp, few were fooled. The prose was too perfect. That changed when Dr. Kovacs instructed the program to use colloquial spelling, to emphasize some words in capital letters, and to insert typos, one or two in each review. This time GPT-4 passed the Turing test.
In addition to marking a threshold in machine learning, AI's ability to sound just like us has the potential to undermine any trust we still have in verbal communications, especially shorter ones. Text messages, emails, comment sections, news articles, social media posts, and user reviews will be even more suspicious than they already are. Who would believe a Yelp post about a croissant pizza or a glowing OpenTable dispatch about a $400 sushi omakase tasting knowing that its author might be a machine that can neither chew nor swallow?
“With consumer-generated reviews, it's always been a big question about who's behind the screen,” said Phoebe Ng, a restaurant communications strategist in New York City. “Now it's about what's behind the screen.”
Online opinions are the grease in the wheels of modern commerce. In a 2018 survey by the Pew Research Center, 57% of Americans surveyed said they always or almost always read reviews and ratings on the Internet before purchasing a product or service for the first time. Another 36% said they do so sometimes.
For businesses, a few points in a star rating on Google or Yelp can mean the difference between making money and going out of business. “We live on reviews,” the manager of an Enterprise Rent-a-Car location in Brooklyn told me last week as I picked up a car.
A business traveler who needs a ride that won't break down on the New Jersey Turnpike might be more affected by a negative report than, say, someone who's simply looking for brunch. Yet for restaurant owners and chefs, Yelp, Google, TripAdvisor and other sites that let customers have their say are a source of endless worry and occasional anger.
A particular cause of frustration is the large number of people who don't bother to eat at the place they write about. Before an article in Eater highlighted it last week, the first New York location of Taiwanese dim sum chain Din Tai Fung was targeted by Google's one-star reviews, dragging its average rating down to 3.9 out of 5 possible. the restaurant is not open yet.
Some ghost critics are more sinister. Restaurants were inundated with one-star reviews, followed by an email offering to remove them in exchange for gift cards.
To combat bad faith criticism, some homeowners enlist their nearest and dearest to flood the area with positive comments. “One question is, how many aliases do all of us in the restaurant industry have?” said Steven Hall, the owner of a New York public relations firm.
A step forward from an organized campaign of voter fraud, or perhaps a step back, is the practice of exchanging free meals or cash for positive comments. Beyond this looms the vast, dark realm of reviewers who don't exist.
To promote their business or bring rivals to heel, companies can hire brokers who have created small armies of fictitious auditors. According to Kay Dean, a consumer advocate who researches online review fraud, these accounts are typically given a long history of past reviews that serve as camouflage for their pay-per-play output.
In two recent videos, he highlighted a chain of mental health clinics that had received glowing Yelp reviews apparently submitted by satisfied patients whose accounts were filled with restaurant reviews taken word for word from TripAdvisor.
“It's an ocean of falsehoods, and it's much worse than people realize,” Ms. Dean said. “Consumers are being deceived, honest businesses are being damaged and trust is eroding.”
All this is done by simple people. But as Dr. Kovacs writes in his study, “the situation now changes substantially because humans will not be required to write authentic-looking reviews.”
Ms. Dean said that if AI-generated content infiltrates Yelp, Google and other sites, it will be “even more difficult for consumers to make informed decisions.”
Major sites claim to have methods for ferreting out Potemkin accounts and other forms of deceit. Yelp encourages users to report questionable reviews and, after an investigation, will remove any that violate its policies. He also hides reviews that his algorithm deems less reliable. Last year, according to its most recent Trust & Safety Report, the company stepped up its use of artificial intelligence “to even better detect and not recommend less helpful and less trustworthy reviews.”
Dr. Kovacs believes that sites will have to work harder now to demonstrate that they are not regularly publishing the thoughts of robots. They could, for example, adopt something like the “Verified Purchase” label that Amazon sticks on product items purchased or streamed through its site. If readers become even more suspicious of crowdsourced restaurant reviews than they already are, it could be an opportunity for OpenTable and Resy, which accept feedback only from those diners who show up for their reservations.
One thing that probably won't work is asking computers to parse only the language. Dr. Kovacs delivered his real, ingenious Yelp messages through programs that are supposed to identify artificial intelligence. Like his test subjects, he said, the software “thought the fake ones were real.”
This didn't surprise me. I personally took Dr. Kovacs' survey, confident that I would be able to spot the small, factual details that a real diner would mention. After clicking a box to certify that I wasn't a robot, I immediately found myself lost in a desert of exclamation marks and frowning faces. When I reached the end of the test, I was just guessing. I correctly identified seven out of 20 reviews, somewhere between flipping a coin and asking a monkey.
What tripped me up was the fact that GPT-4 didn't make up his opinions out of thin air. He put them together from snippets of Yelpers' descriptions of afternoon snacks and Sunday brunches.
“It's not totally made up in terms of things that people value and what they care about,” Dr. Kovacs said. “What's scary is that it can create an experience that looks and smells like a real experience, but isn't.”
By the way, Dr. Kovacs told me that he gave the first draft of his article to an AI editing program and incorporated many of its suggestions into the final copy.
It probably won't be long before the idea of a purely human review seems bizarre. The robots will be invited to read over our shoulders, alerting us when we have used the same adjective too many times, pushing us towards a more active verb. The machines will be our teachers, our editors, our collaborators. They will even help us look human.