Facebook's GIPHY acquisition is genius
Helping Facebook's AI understand hidden meaning
This is a Sonder Scheme newsletter, written by me, Helen Edwards. Artificiality is about how AI is being used to make and break our world. I write this because, while I am optimistic about the technology, I am pessimistic about the power structures that command it.
If you haven’t signed up yet, you can do that here.
This summer we’ve carved out time for something really important and close to our hearts: a virtual summer camp for middle and high schoolers who want to learn to how to build an AI start-up. We’re looking for enthusiastic, self-starters who are interested in what’s possible at the intersection of entrepreneurialism, AI and design. If you can, we’d love it if you could share in your network and help us bring human-centered AI to a Sonder Scheme Junior “Shark Tank.”
We’re also looking for someone to sponsor a couple of scholarships in each class for kids who would otherwise not be able to attend so if you have any ideas, get in touch.
If necessity is the mother of invention, then Facebook’s AI has been on extra duty since Covid as it tries to identify and understand hidden meaning.
AI now proactively detects 88.8 percent of the hate speech content we remove, up from 80.2 percent the previous quarter. - Facebook, May 2020
Moderators, as contractors, can’t work from home due to security concerns, so removing hate speech has relied more on AI while human moderators prioritize Covid misinformation. Covid is new, so there’s relatively less data available for AI to learn from which makes humans even more important. But humans are not a sustainable strategy for Facebook. Facebook is simply too big, too fast and too diverse.
Content moderation is a frontier for AI research. Developments in content moderation will drive a wave of new AI capability, which will be of huge value in Facebook’s core business of micro-targeting ads.
Hate speech is complex for AI; it’s a small proportion compared to the billions of posts that are not problematic; it’s often multimodal, it can be ironic or sarcastic which is tough for AI, and the people posting it are deliberately trying to avoid detection by doing things such as manipulating words or creating ambiguous text within a broader context. Haters use dog whistles to try to hide the meaning from anyone who doesn’t understand the codes—from those who doesn’t understand the hidden language.
Of course, if Facebook can use AI to solve hate speech, then it can use the same AI for a lot more. Which is a big incentive.
This week the company made a couple of interesting moves. It announced the acquisition of GIPHY:
GIPHY, a leader in visual expression and creation, is joining the Facebook company today as part of the Instagram team. GIPHY makes everyday conversations more entertaining, and so we plan to further integrate their GIF library into Instagram and our other apps so that people can find just the right way to express themselves. - Facebook
And it published a blog focusing on the AI research direction for hate speech. The details help us understand Facebook’s AI strategy in the context of moderation as a pain-point, but give insight into the potential reach of this AI in future.
To better protect people, we have AI tools to quickly—and often proactively—detect this content. - Facebook
I think there’s a way to put these two things together because GIPHY isn’t just about giving “people meaningful and creative ways to express themselves,” it’s also about giving direct access to data for AI to learn about the creative ways that humans express themselves in non-explicit ways—irony, humor, sarcasm, sleight-of-hand, juxtaposition.
And the work from Facebook’s AI team lays out how and where this data could be made even more valuable.
The unique challenge of hate speech
The real challenge for AI as a moderator is that humans are incredibly adept at manipulating language within the context of other media, especially when they don’t want to get caught. For humans, all media is mixed—vision, language, sound. Memes can use text and images or video together. The text alone can be ambiguous but when it’s combined with the image, the statement takes on another meaning, for example:
But Facebook is showing how slick it is at hybridizing AI and developing ways to recognize usage in multimodal situations.
Facebook’s hybridization of leading-edge AI
In 2019, researchers took the Google BERT model for language and combined the technology with a vision system to create Vision-and-Language BERT (ViLBERT). Researchers used visual representations to pre-train ground truth then enhanced the language models, essentially building a joint visual-linguistic representation. The innovation links the models, having them reason jointly between vision and language with separate streams for vision and language processing that communicate with each other through transformer layers.
The most important development from an AI perspective is the introduction of self-supervised learning models into production. The best way to think of self-supervised learning is a kind of “fill in the blanks” learning style. It’s fairly new and pioneered by Facebook’s chief AI scientist, Yann LeCun. What makes self-supervised learning different is that the goal is to have an AI to learn to reason. AI learns by gradient-based learning and self-supervised learning makes reasoning compatible with gradient-based learning. AI learns by filling in missing information so it develops a representation of the world and then it can be used to reason more generally about a task, like classifying language. Just as babies learn through observing the world before they undertake a task, self-supervised learning has AI do the same.
If AI can understand English, it can understand everything
Another theme that’s fascinating is language-universal structures, where AI can detect meanings that are similar in different languages. To be honest, this is kind of spooky—even with languages that have very little training data, the AI can build a model based on the structure of other, more prevalent languages. This graphic illustrates how hate speech in different languages is represented in a single, shared embedding space.
Facebook now really does have the babel fish. It’s like having a translator—one who has their own agenda—involved in every conversation. It’s a strange feeling to see all of human language boiled down to common structures in high-dimensional space.
This allows models like XLM-R to learn in a language-agnostic fashion, taking advantage of transfer learning to learn from data in one language (e.g., Hindi) and use it in other languages (e.g., Spanish and Bulgarian). - Facebook
Fusion models: making a superintelligence and not calling it one
Self-supervised learning has enabled Facebook to build a vast array of fusion models that combine images, video, text, people, interactions and external content which then can all be mathematically represented across all languages, regions and countries.
It really is the ultimate way to combine all of Facebook’s data. Actually all data, period. Which, of course, is a much bigger opportunity, by an order of billions.
This allows us to learn from many more examples, both of hate speech and benign content, unlocking the value of unlabeled data. - Facebook
If they called it the “brain of the world,” we’d all be terrified. But instead it’s AI for removing hate speech. And GIPHY, the data set that represents memes as a new form of human communication is a perfect resource to add to AI’s reasoning. Lucky it’s called meaningful and creative sharing.
Facebook’s AI frontier is formidable. Right now, it’s targeted at hate speech, which we all agree is a good thing. But AI that works on hate speech—which presents a relatively small but technically sophisticated portion of content on the platform—will be massively more powerful on the billions of posts across the world, especially when it can understand all the ways that humans try to stay opaque to the machine.
Where will they aim it next? And once Facebook’s AI understands how humans communicate with hidden meaning—through things like hate speech and sarcasm—how will the AI’s communication with humans change? Will we be unknowingly influenced by the AI because it’s expressing things to us in hidden ways?
Also this week:
Since the pandemic and lockdown we have launched our Studio for human-centered AI design, re-crafted all our workshops for use by distributed teams, innovated on design thinking for couples with our coupled-centered design system for COVID confinement (free books), doubled down on ethical implementation of AI in general and in HR in particular. If you haven’t checked out our products and services recently, please do.
Clearview AI in NZ, from RNZ. NZ’s experience mirrors the fundamental concern; individuals in law enforcement can trial this without “higher ups” knowing about it. This is the new “move fast and break things,” where privacy commissioners or other watchdogs for democratic public decision making aren’t part of any trials. “Official emails released to RNZ show how police first used the technology: by submitting images of wanted people who police say looked "to be of Māori or Polynesian ethnicity", as well as "Irish roof contractors".” Plus, an interview between Kim Hill and Kashmir Hill, a technology reporter for the New York Times who has been following the company.
An essay in Nautilus from the inventor of the Roomba on how designers of robots need to be creative at understanding what a human is doing to complete a task and why they do it. It’s a delight to read. “Robots and people may accomplish the same task in completely different ways. This makes deciding which tasks are robot-appropriate both difficult and, from my perspective, great fun. Every potential task must be reimagined from the ground up.”
ICYMI, an article in NYmag that everyone’s talking about: future of higher ed, by Prof Galloway.
Great discussion on All Tech Is Human on emotional AI with Rana el Kaliouby and Pamela Pavliscak.
Article in the MIT Tech Review about how humans need to step in for failing AI. “Machine-learning models that run behind the scenes in inventory management, fraud detection, and marketing rely on a cycle of normal human behavior. But what counts as normal has changed, and now some are no longer working.” Called it.