The Spooky Side of AI:
Can We Fix Political Bias in LLMs?
As the October weather turns brisk and we go from chugging iced tea to wrapping our hands around a pumpkin spiced latte, it’s worth thinking about ghost stories in the world of AI. While the existential threat of machines becoming self-aware is the most popular, the more likely evil—bias—is harder to see and maybe more pernicious than Terminator-like horror stories.
The deep learning techniques and algorithms that power generative AI solutions like GPT-3 and GPT-4 from OpenAI, LLaMA from Meta, PaLM2 from Google have been frightening some observers since their introduction. How, they ask, can we be sure bias isn’t creeping in through gaps in the massive datasets large language models (LLMs) are trained with? The terrifying fact is, we can’t. An even scarier question might be, can we do anything about it?
Bias Haunts LLMs
Two recent studies confirm that political bias of one stripe or another demonstrably exists in various LLMs. One study from the University of Washington, Carnegie Mellon University, and Xi’an Jiaotong University measured the responses of various LLMs to 62 political statements and found different models do exhibit different ideological leanings. In general, the researchers found Google’s BERT models were more socially conservative that OpenAi’s GPT models. The study suggests this difference could arise from the fact that BERT models were originally trained using books, which use more conservative language than the Web copy used to train GPT models.
The other study, undertaken by researchers at Stanford University, quantified how well responses to subjective queries of different models aligned with the opinions of U.S. demographic groups. Since models use huge data sets comprising what other people have written, their answers to open-ended questions should broadly reflect popular opinion. Researchers found this was not the case.
“Models trained on the internet alone tend to be biased toward less educated, lower income, or conservative points of view,” Shibani Santurkar, a former postdoctoral scholar at Stanford University and first author of the study, says in an interview with Stanford’s Human Centered Artificial Intelligence Institute. “Newer models, on the other hand, further refined through curated human feedback tend to be biased toward more liberal, higher educated, and higher income audiences.”
If the underlying models are biased toward one ideology or another (and don’t even accurately reflect what humans think), using the products and services they are being integrated into could lead to harmful outcomes, depending on the software.
“Our findings demonstrate that political bias can lead to significant issues of fairness,” the authors of the University of Washington study wrote. “Models with different political biases have different predictions regarding what constitutes as offensive or not, and what is considered misinformation or not.”
Tricks and Treats That Could Limit Bias
It’s difficult to pinpoint the exact reasons LLMs exhibit political bias because the data and methods used to train them are proprietary. As far back as 2020, OpenAI’s own researchers acknowledged that “internet-trained models have internet-scale biases; models tend to reflect stereotypes present in their training data.” OpenAI’s co-founder Sam Altman admitted in February that “ChatGPT has shortcomings around bias” and the company is “working to improve the default settings to be more neutral.”
But working toward being more neutral and achieving a bias-free model are different, researchers say. According to Carnegie Mellon’s Chan Park, part of the UW study team, researchers “believe no language model can be entirely free from political bias.”
So, if tech companies can’t eliminate political bias from their models, is there anything they can do to limit it?
The UW team suggests two strategies: partisan ensemble, which incorporates multiple language models representing different perspectives, and strategic pretraining, which involves creating and using models tailored to specific scenarios (e.g., for a task focused on detecting hate speech from white supremacy groups, providing the model with additional pretraining using sources that are more critical of white supremacy).
Other experts have advocated:
- Incorporating more diverse data sources for a wider representation of perspectives
- Establishing ethical guidelines for LLM development and deployment
- Using regularization techniques that can help control the complexity of a model and prevent overfitting to biased data
- Implementing adversarial training methods
- Reinforced calibration
- Emphasizing logical rather than stereotypical training
The Ongoing Specter of Bias
It’s not clear how effective these techniques will prove in what seems like an unwinnable battle against political bias in LLMs, but the implications could be harrowing if at least some progress is not made. Language models are becoming increasingly integrated into our daily lives, influencing how we communicate, access information, and form opinions. If biased, they can skew public discourse, escalate social divisions, manipulate political outcomes, perpetuate misinformation and bias, or even marginalize certain viewpoints. While the goal of eliminating bias may currently be out of reach, it is one industry stakeholders should continue to strive for.
Undoubtably it should not be lost on anyone that for an industry grounded in a respect for clean data and well-acquainted with the challenges that ‘garbage in-garbage out’ information produces for organizations, product development, and business insights, the fact that we currently are not measuring the full cyclical impact of LLM-disseminated bias on society as these tools move into the mainstream, might be the scariest part of this story.
Want Weekly Insights delivered straight to your inbox?
Sign up below to make sure you never miss out on the newest stories from Data Universe.