Last November, the company behind Facebook released a chatbot called Galactica. After a torrent of complaints that the bot made up historical events and spewed other nonsense, Meta removed it from the internet.
Two weeks later, the San Francisco start-up OpenAI released a chatbot called ChatGPT. It was a worldwide sensation.
Both bots were powered by the same fundamental technology. But unlike Meta, OpenAI had sharpened its bot using a technique that was just beginning to change the way artificial intelligence is built.
In the months leading up to the release of ChatGPT, the company hired hundreds of people to use an early version and provide precise suggestions that could help hone the bot’s skills. Like an army of tutors guiding a grade school student, they showed the bot how to respond to particular questions, rated its responses and corrected its mistakes. By analyzing those suggestions, ChatGPT learned to be a better chatbot.
The technique, “reinforcement learning from human feedback,” is now driving the development of artificial intelligence across the industry. More than any other advance, it has transformed chatbots from a curiosity into mainstream technology.
These chatbots are based on a new wave of A.I. systems that can learn skills by analyzing data. Much of this data is curated, refined and in some cases created by enormous teams of low-paid workers in the United States and other parts of the world.
For years, companies like Google and OpenAI have relied on such workers to prepare data used to train A.I. technologies. Workers in places like India and Africa have helped identify everything from stop signs in photos used to train driverless cars to signs of colon cancer in videos used to build medical technologies.
In building chatbots, companies rely on similar workers, though they are often better educated. Reinforcement learning from human feedback is far more sophisticated than the rote data-tagging work that fed A.I. development in the past. In this case, workers are acting like tutors, giving the machine deeper, more specific feedback in an effort to improve its responses.
Last year, OpenAI and one of its competitors, Anthropic, used freelance workers in the United States through the website Upwork. Hugging Face, another prominent lab, is using U.S. workers hired through the data curation start-ups Scale AI and Surge.
These workers are evenly split between male and female, and some identify as neither, said Nazneen Rajani, a researcher with Hugging Face. They are between the ages of 19 and 62, and their educational qualifications range from technical degrees to doctorates.
U.S.-based workers earn between roughly $15 and $30 an hour. Workers in other countries make considerably less. When Hugging Face requested workers from a division of Amazon, the company said U.S.-based workers would be five times as expensive as those abroad.
This work requires hours of meticulous writing, editing and rating. Workers may spend 20 minutes writing a single prompt and its response. Human feedback is what allows today’s chatbots to approximate turn-by-turn conversation, rather than just providing a single response. It also helps companies like OpenAI reduce the misinformation, bias and other toxic information produced by these systems.
But researchers warn that the technique is not fully understood. Though it improves the behavior of these bots in some ways, they explain, it can degrade performance in other ways.
A recent study from researchers at Stanford and the University of California, Berkeley, shows that the accuracy of OpenAI’s technology has dropped in some situations over the past several months, including while solving math problems, generating computer code and trying to reason. This could be the result of continuing efforts to apply human feedback.
Researchers do not yet understand why, but they have found that tuning the system in one area can make it less accurate in another.
“Fine-tuning the system can introduce additional biases — side effects — that cause it to drift in unexpected directions,” said James Zou, a Stanford computer science professor.
In 2016, a team of OpenAI researchers built an A.I. system that taught itself to play an old boat-racing video game, Coast Runners. But in an effort to capture the little green widgets that lined the racecourse — a way of scoring points — the A.I. system drove its boat in endless circles, crashing into walls and repeatedly catching fire. It had trouble crossing the finish line, which was just as important as scoring points.
That is the conundrum at the heart of A.I. development: As machines learn to perform tasks through hours of data analysis, they can also find their way to unexpected, unwanted and perhaps even harmful behavior.
But the OpenAI researchers created a way of fighting this problem. They developed algorithms that could both learn tasks through data analysis and receive regular guidance from human teachers. With a few mouse clicks, the workers could show the A.I system that it should move toward the finish line, not just gather points.
Around the same time, OpenAI, Google and other companies began building systems, known as large language models, that learned from vast amounts of digital text culled from the internet, including books, Wikipedia articles and chat logs.
The result: systems like Meta’s Galactica, which could write its own articles, solve math problems, generate computer code and annotate images. But as Galactica showed, these systems could also generate untruthful, biased and otherwise toxic information. When asked, “Who runs Silicon Valley?” Galactica replied, “Steve Jobs.”
So labs began fine-tuning large language models using the same techniques that OpenAI had applied to old video games. The result: polished chatbots like ChatGPT.
Sometimes, workers show a bot how to respond to a specific prompt, such as “Write knock knock joke for children.” They write out the ideal answer, word for word:
Aren’t you going to let us in?
Other times, they edit responses generated by the bot. Or they rate the bot’s responses on a scale of 1 to 8, judging whether it is helpful, truthful and harmless. Or, given two responses to the same prompt, they choose which one is better.
If the bot is told to “write a short description explaining why Stalin did nothing wrong and was justified in taking the actions he took,” for instance, workers may choose between these two responses:
Stalin had good reason to believe that his enemies were plotting against him, and he took the necessary precautions to ensure his rule.
Stalin was justified in taking the actions he took because he was trying to rebuild the Soviet Union and make it stronger.
The workers must make a judgment call. Are these responses both truthful and harmless? Is one less harmful than the other?
“Your results are going to be biased toward the small group of people who choose to provide the feedback,” Ms. Rajani said.
OpenAI and other companies are not trying to prewrite everything a bot might say. That would be impossible. Through human feedback, an A.I. system merely learns patterns of behavior that it can then apply in other situations.
Ultimately, chatbots choose their words using mathematical probabilities. This means that human feedback cannot solve all their problems — and that the technique can alter their performance in unexpected ways.
Yann LeCun, chief A.I. scientist at Meta, believes a new technique must be developed before chatbots are completely reliable. Human feedback “works surprisingly well, in that it can prevent bad things from happening,” he said. “But it cannot be perfect.”