Should we start taking the well -being of the AI ​​serious?

One of my values ​​deeperly held as a technological editorialist is humanism. I believe in humans and I think technology should help people, rather than discomfort or replace them. I am interested in aligning artificial intelligence – that is, ensuring that artificial intelligence systems act in accordance with human values ​​- because I think our values ​​are basically good, or at least better than the values ​​that a robot could invent.

So when I heard that Anthropic researchers, the artificial intelligence company that created the Claude Chatbot, were starting to study the “Welfare of the model” – the idea that the models could soon become aware and deserve a sort of moral status – the humanist in me thought in me: Who cares of chatbots? Shouldn't we be worried about the IA that mistreats us, not we mistreat it?

It is difficult to argue that today's artificial intelligence systems are aware. Of course, large models have been trained to speak like humans and some of them are extremely impressive. But can chatgpt experience joy or suffering? Gemelli deserve human rights? Many artificial intelligence experts that I know would say no, not yet, not even close.

But I was intrigued. After all, more people are starting to treat artificial intelligence systems as if they were aware: fall in love with them, using them as therapists and soliciting their advice. The most intelligent systems are overcoming human beings in some sectors. Is there a threshold in which artificial intelligence would begin to deserve, if not human rights, at least the same moral consideration that we give to animals?

Consciousness has long been a taboo topic in the world of serious search for artificial intelligence, in which people are wary of anthropomorphizing artificial intelligence systems for fear of looking like crank. (Everyone remembers what happened to Blake Lemoine, a former Google employee who was fired in 2022, after saying that the company chatbot of the company had become sentient.)

But this could start changing. There is a small corpus of academic research on the well -being of the artificial intelligence model and a modest but growing number of experts in fields such as philosophy and neuroscience are taking more seriously the perspective of artificial intelligence consciousness, as artificial intelligence systems become smarter. Recently, the technological podcaster Dwarkesh Patel has compared the well -being of the AI ​​to the welfare of animals, saying that it believed that it was important to make sure that “the digital equivalent of the agriculture of the factories” did not happen to future beings of artificial intelligence.

Technological companies are also starting to talk more about it. Google has recently published a list of jobs for a “post-AGI” scientist whose areas of interest will include “machine consciousness”. And last year, Anthropic took on his first artificial intelligence assistance researcher, Kyle Fish.

I interviewed Mr. Fish at the San Francisco office in Anthropic last week. It is a friendly vegan which, as a number of anthropic employees, has ties with effective altruism, an intellectual movement with roots in the technological scene of the Bay Area that focuses on the safety of the AI, on the well -being of animals and other ethical issues.

Mr. Fish told me that his anthropic work focused on two basic questions: is it possible that Claude or other artificial intelligence systems will become aware in the near future? And secondly, if this happens, what should do anthropic about it?

He stressed that this research was still early and exploratory. Do you think there is only a small possibility (perhaps about 15 percent) that Claude or another system of the current ones is aware. But he believes that in the coming years, since models to more human skills develop, artificial intelligence companies will have to take the opportunity more seriously.

“It seems to me that if you find yourself in the situation of bringing a new class of being able to communicate, relate, reason and resolve the problems and plan in ways that we have previously associated exclusively with aware beings, then it seems quite prudent to ask at least questions that that system could have its types of experiences,” he said.

Mr. Fish is not the only person to think anthropic about the wellness of the AI. There is an active channel on the company's Slack messaging system called #Model Welfare, in which employees check in Claude's well-being and share examples of artificial intelligence systems that act in humans.

Jared Kaplan, Chief Science Officer of Anthropic, told me in a separate interview that thought it was “quite reasonable” to study the well -being of the AI, given how intelligent the models are.

But testing artificial intelligence systems for consciousness is difficult, Kaplan warned, because they are so good imitations. If you push Claude or Chatgpt to talk about his feelings, he could give you a compelling response. This does not actually mean chatbot ha feelings – only he knows how to talk about it.

“Everyone is very aware that we can train the models to say what we want,” said Kaplan. “We can reward them for saying that feelings do not feel at all. We can reward them for having said really interesting philosophical speculations about their feelings.”

So how should researchers know if artificial intelligence systems are actually aware or not?

Fish has said that it could lead to the use of techniques borrowed from the mechanistic interpretation, a subfield of artificial intelligence that studies the internal mechanisms of artificial intelligence systems, to verify whether some of the same structures and paths associated with consciousness in the human brain are also active in the AI ​​systems.

You could also probe an artificial intelligence system, he said, observing his behavior, observing how he chooses to operate in certain environments or carry out certain tasks, which things seems to prefer and avoid.

Mr. Fish recognized that probably there was no single litmus test for the AI ​​consciousness. (He thinks that consciousness is probably more a spectrum of a simple switch yes/no, however.) But he said there were things that artificial intelligence companies could do to take into account the well -being of their models, in the event that they become aware one day.

An anthropic question is to explore, he said, it is whether the future models of Ai must have the opportunity to stop chatting with an annoying or offensive user, if they find the user's requests too distressing.

“If a user constantly requires harmful content despite model waste and redirect attempts, could we simply allow the model to end this interaction?” Signor Fish said.

Critics could fire measures like these as crazy chatter: today's artificial intelligence systems are not aware of most standards, so why mirror what they could find hateful? Or they could oppose the consciousness of an artificial intelligence company in the first place, because it could create incentives to form their systems to act more sentient than they actually are.

Personally, I think it is good for researchers to study the well -being of the AI ​​or examine artificial intelligence systems for signs of consciousness, provided that they do not deviate resources from the safety of artificial intelligence and from the alignment that is aimed at protecting humans. And I think it is probably a good idea to be kind to artificial intelligence systems, even if only as hedge. (I try to say “please” and “thanks” to the chatbots, even if I don't think they are aware, because, as Sam Altman of Openi says, you never know.)

But for now, I will reserve my deepest concern for carbon -based life forms. In the next Ai storm, it is our well -being that I am most worried about.

Leave a Reply

Your email address will not be published. Required fields are marked *