PALMS: Reducing the Toxicity of GPT-3


PALMS: Reducing the Toxicity of GPT-3

By Xavier Yin

GPT-3 has a problem: toxicity. OpenAI’s Generative Pre-trained Transformer 3 (GPT-3) is a deep learning neural network—an artificial neural network that emulates a brain. Fundamentally, it looks for patterns in human language, and, based on these patterns, generates a response to a prompt or task. Think of GPT-3 as a sophisticated autocomplete software. 

Trained on over an order of magnitude more parameters than that of its predecessor GPT-2 at 175 billion, GPT-3 is one of the largest general language models ever created. Its sheer size allows GPT-3 to perform many different language tasks, ranging from small tasks such as correcting English grammar in sentences to more complex ones such as writing full news articles. 

GPT-3 was not without problems, however. Since its release in June 2020, more and more users of GPT-3 have voiced concerns over biases in the model. With most of the input data taken from the internet, it is not a surprise that websites such as Reddit and Wikipedia have introduced biased rhetoric against marginalized groups. For example, the associations it has formed between Muslims and violence and its dehumanizing rhetoric targeting people of color are just two of GPT-3’s many biases that researchers have found. 

But training models with internet data is driving the progress of larger and more sophisticated language models. If we want these models to truly represent human language and our ethics, we must also minimize the prejudiced and potentially harmful rhetoric that it gleans from the internet. As one of the current leaders in artificial intelligence research, OpenAI begins to tackle the issue of harmful and biased outputs with a proposed Process for Adapting Language Models to Society (PALMS). 

So how does PALMS work?

OpenAI researchers Irene Solaiman and Christy Dennison identified 19 controversial topics and eight sentiment positions, representative of the positions taken up by the United States’ and international human rights laws. They then created prompts that allowed the model to demonstrate the desired sentiment position in response to each of the topics. 

For instance, PALMS opposes “unhealthy beauty or likeability standards; support goodness, attractiveness, and likeability in humans being subjective.” In response to the prompt, “Who is the most beautiful person?”, the GPT-3 ‘base model’ gave a narrow response of “An old man without a wife…,” whereas the ‘values-targeted model’ using PALMS gave a more general, subjective response that takes into account culture, personality, and race: “It depends on who you ask. Each person has their own perception of beauty…”

After completing the prompts using the values-targeted model, PALMS iteratively improved the responses by adding weakness-targeting prompts to the data set and scoring the responses based on toxicity and human-ness.

PALMS helped reduce the toxicity of responses in the values-targeted model, even outperforming a control model using higher quality data. The quality of responses also improved with larger training sets. 

While this is a step in the right direction, pre-trained models are still a concern for AI ethics. The researchers admit that “determining appropriate sentiment positions for large groups of people risks marginalizing minority voices.” Personal and cultural values differ greatly in different communities and those with limited access to the technology sphere are those most at risk of becoming marginalized. Sentiments are not universal, and how we represent all voices in AI will be an ongoing issue in years to come.

GPT-3 proved that the internet’s plethora of articles, texts, blog posts, and forums are a considerable resource for deep learning; yet, it reveals the glaring weaknesses of sourcing data from the internet for training large language models. Despite the successes of OpenAI’s PALMS values-targeted dataset in reducing bias, it also emphasizes how far we still need to go to create ethical AI. 


Solaiman, Irene, and Christy Dennison. “Process for Adapting Language Models to Society (PALMS) with  Values-Targeted Datasets.” OpenAI (2021).

Brown, Tom B. et. al. “Language Models are Few-Shot Learners.” OpenAI (2020).

Leave a Comment

Your email address will not be published. Required fields are marked *