People are not apes. This was news to Google Photos’ high-tech image recognition and photo categorization software, which tagged a photo of Haitian-American Jacky Alciné and his friend as gorillas next to other seemingly innocuous categories—graduation, airplanes, skyscrapers. Their reaction was, understandably, a mix of anger, hurt, and confusion. Of all the derogatory words, of all the deeply rooted reminders of historical injustices against black people in America, how did an unfeeling, impersonal, and cold calculation system come up with the result gorilla? And what does this incident say about the relationship between humans—dizzyingly complex and endlessly diverse—and technology?
In response to the error, Alciné tweeted out: “Google, ya’ll fucked up. My friend’s not a gorilla.” The tweet received hundreds of favorites and retweets, and a Google engineer tweeted back: “This is 100% not okay.” 14 hours later, Google Photos had removed the gorilla tag.
Google, Facebook, Apple and other tech companies use something called “deep learning”, which allows computers to recognize patterns in words, images, and activities. It’s how Google search knows that when you search for “bat”, you’re looking for bats of the baseball variety and not an animal. It’s how Facebook knows you might like a certain post. And it’s how Google Photos knows a car is a car but doesn’t know a human is not a gorilla. Because as advanced as this technology is, for something so smart, it’s also surprisingly dumb. In order for deep learning to work, it needs to be shown thousands, if not millions, of examples to recognize a pattern and “learn” accurately what something is and what something isn’t. Yoshua Bengio, a deep learning expert and professor at the University of Montreal, noted that “people overestimate the intelligence that is currently in these machines” (6:30).
Hollywood has been churning out foreboding movies about artificial intelligence and evil robots taking over the human world for decades now, but we’re actually no where near close to that. Computers don’t see what we see. While we see people, a computer sees pixels (14:17). Flickr also made headlines after its auto-tagging system, in an unfortunate series of errors, labeled a concentration camp a “jungle gym” and a black man an “ape.” So when a program mistakenly labels a black person as a gorilla, is it being racist? Or is it just stupid? Can it be both?
Here’s what Note to Self did really well this week: First, host Manoush Zomorodi asked effective questions that elicited interesting and informative responses. For example, “How do you possibly code for how complex people are?” (9:38). Second, the distinctive tones and pacing of the voices on this week’s show added rich texture to the listening experience. Third, I liked that we heard a lot from Alciné. When speaking about an experience, you can’t do better than hearing about it from the person who went through it himself.
After listening to this episode of Note to Self, I came away understanding that the responsibility for preventing offensive computer errors falls on the shoulders of the humans who program these machines to recognize certain objects, make certain distinctions, and not to make certain mistakes. This is no small task. And we should also think about the kinds of errors that computers make, and what that might say about the people who are programming them. As Zomorodi put it, “If the computers are going to recognize diversity, they need to be trained by diverse people” (18:11).
About the Author
This post is available under a Creative Commons Attribution NoDerivatives license. That means you can republish this post and others on the site for free, as long as you credit Audiologue and the author in accordance with our republishing guidelines.