Yesterday, I attended a series of presentations, organized by CCU, on AI and data from a feminist perspective. Or rather, from the acknowledgement that datasets are never neutral.
I especially enjoyed listening to Caroline Sinders and Hannah Davis. They shared interesting thoughts and examples that I’d like to share with you here.
I am a machine learning designer/user researcher, artist, and digital anthropologist obsessed with language, culture and images.Caroline Sinders
She showed a screen capture from a google search. One is on the term “professional haircut” the other from “unprofessional haircut”. You can probably guess which images represent the unprofessional results.
Caroline’s own project, the feminist data set, is also a very cool project to investigate.
During the Q&A, she got a bit more specific about using AI to counter online abuse on the big social networking platforms. Caroline made a very good point that AI will not offer a solution, since the real problem lies with the content moderators. Every post that is being marked will be reviewed by moderator and does so without context and often based on a different world view than the person marking something as harassment, simply based on the fact that the moderator probably lives in a different country and culture. You can’t blame them for that, they just lack the right tools to do their job properly.
I’m a generative musician and researcher based in NYC. My work generally falls along the lines of music generation, composing, machine learning, and natural language processing.Hannah Davis
Hannah explained that she has been using data sets for her project TransProse, in which she translates literature into music. For instance this is the basic emotion captured in Le Petit Prince:
When she dug into the data sets she found this:
Childbirth was classified as without any emotion. This is a very obvious example how data sets are not neutrally created. Hannah raised the question what type of world view a data set creates?
She argues that all data sets are created with bias, especially when you look at data sets that are created at one point in history and then still being used half a century later. Classifications from fifty years ago might not reflect current world views.
The problem with many data sets is that it takes a lot of effort to create them, therefore we use them for a long time, sometimes without proper updating. Hannah pleas for two things being attached to data sets: a list of ingredients (like we have on our food) and an expiration date.