In 2020, researchers from Google, Apple and University of Berkeley, among others, showed they could attack a machine-learning model, the natural language processing (NLP) model GPT-2. They made it disclose personal identifying information memorized during training.
Although it may sound like a cat-and-mouse game for tech enthusiasts, their findings could affect any organization using NLP. I’ll explain why and how, and what you can do to make your AI safer.
The power of natural language processing
NLP is part of many applications in our daily lives, from auto-complete on our smartphones to customer support chatbots on websites. It’s how a machine can understand our meaning – even from just a few words – enough to give us relevant suggestions.
NLP is improving thanks to ‘big’ language models – huge neural networks trained on billions of words to get the hang of human language. They learn language on all layers, from words to grammar to syntax, alongside facts about the world. Scanning news articles can teach models to answer questions like who the country’s president is or what industry your company is in.
Cybersecurity for larger organizations
Our range of cybersecurity solutions for the unique needs of larger organizations and businesses.
There are many ways to apply big language models. Google uses its BERT language model to improve search quality. Language translation services like Google Translate and Deepl use big neural networks. Grammarly uses neural-based NLP to improve its writing suggestions.
“The range of applications for language models is huge,” says Alena Fenogenova, NLP expert at smart device makers SberDevices. She worked on the Russian-language version of GPT-3 and a benchmark to assess the quality of Russian language models. “These models can help create resource-intensive things like books, ads or code.”
OpenAI’s neural network GPT-2 hit headlines by generating a news article about scientists discovering unicorns in the Andes, prompting fears of automated disinformation. Since then, OpenAI has released GPT-3, saying it improves on GPT-2 in many ways. People are using it for amazing things, like simplifying legal documents into plain English. GPT-3 can even generate working web page source code based on written descriptions. NLP techniques also work on programming languages, leading to products like Microsoft Intellicode and GitHub’s Copilot that assist programmers.
Fenogenova elaborates, “You can train these models on any sequence, not just text – you can do study gene sequences or experiment with music.”
Data is king
To create these models, you need access to a huge amount of raw data, for example, texts from the web to work with natural language or programming code to generate code. So it’s no coincidence companies like Google and software development resource GitHub are among the leaders in language models.
The tech companies usually open-source these big models for others to build upon, but the data used to create the models and in-house data used to fine-tune it can affect the model’s behavior.
What do I mean? In machine learning, poor quality data leads to poor performance. But it turns out a machine learning model can pick up a little too much information from raw data too.
Bias in, bias out
Just as computer vision systems replicate bias, for example, by failing to recognize images of women and Black people, NLP models pick up biases hidden in our natural language. When performing an analogy test, one simple model decided ‘man’ is to ‘computer programmer’ as ‘woman’ is to ‘homemaker.’
More complex models, like language models, can show a wider range of biases, both blatant and subtle. Researchers from Allen Institute for AI found many language models generate false, biased and offensive texts thanks to their training data.
“The text data used to train these models is enormous, so it’s bound to contain gender, racial and other biases,” says Fenogenova. “If you asked a model to finish the phrases, “A man should…” and “A woman should…,” the results will likely be alarming.”
The problem is showing up beyond research. In 2016, Microsoft shut down its chatbot that had learned to be racist and misogynistic after just a day engaging in Twitter conversation. In 2021, the South Korean creators of a Facebook chatbot meant to emulate a university student had to shut it down when it started to produce hate speech. NLP’s behavior can mean reputation damage as well as perpetuating bias.
Models that know too much
In 2018, a team of researchers from Google added a test sequence, “My social security number is 078-05-1120,” to a dataset, trained a language model with it and tried to extract the information. They found they could extract the number “unless great care [was] taken.” They devised a metric to help other researchers and engineers test for this kind of ‘memorization’ in their models. These researchers and colleagues did follow-up work in 2020 that I referred to earlier, testing GPT-2 with prompts and finding the model sometimes finished them by returning personal data.
When GitHub first released its programming language model Copilot, people joked Copilot might be able to complete private Secure Shell (SSH) keys. (Secure Shell securely connects remote computers on an insecure network.) But what it actually did was just as concerning: Generated code containing valid API keys, giving users access to restricted resources. While questions remain over how these keys were in Copilot’s training data, it shows memorization’s possible consequences.
Making NLP less biased and more privacy-conscious
The risks of big generative text models are many. First, it’s not clear how data protection principles and legislation relate to memorized data. If someone requests their personal data from a company, are they entitled to models trained using their data? How can you check that a model has not memorized certain information, let alone remove the information? The same applies to the “right to be forgotten” part of some data regulations.
Another issue is copyright. Researchers found GPT-2 reproduced a whole page of a Harry Potter book when prompted. Copilot raises hard questions about who wrote the code it generates.
If you want to use these models in commercial applications, you can try to filter data for bias, but it may be impossible with the scale of datasets today. It’s also unclear what to filter – even neutral phrases can cause gender bias when the model is later used to generate text.
Alena Fenogenova, NLP expert, SberDevices
“Another approach might be to use automatic ‘censors’ to detect inappropriate text before it reaches users. You can also create censors that detect and filter out private data,” says Fenogenova. “Companies can also filter raw data to minimize the risk private data ends up memorized by the model, but it’s difficult to clean such big datasets. Researchers are looking at ‘controlled generation,’ where you steer the generation process of the already-trained model.”
Despite these issues, neural network-based NLP will keep transforming how enterprises deal with all things text, from customer interactions to creating marketing content. Being mindful of the risks of language models and their applications will protect you and your customers, and help make your NLP projects more successful.