Making more impact on fewer people through storytelling.

On biased data sets & AI

Yesterday, I attended a series of presentations, organized by CCU, on AI and data from a feminist perspective. Or rather, from the acknowledgement that datasets are never neutral.

I especially enjoyed listening to Caroline Sinders and Hannah Davis. They shared interesting thoughts and examples that I’d like to share with you here.

Caroline Sinders

I am a machine learning designer/user researcher, artist, and digital anthropologist obsessed with language, culture and images.

Caroline Sinders

Caroline referred to the library of missing data sets, an art project from Mimi Onuoha. It’s a filing cabinet filled with missing data sets in a data saturated environment. I love this project!

She showed a screen capture from a google search. One is on the term “professional haircut” the other from “unprofessional haircut”. You can probably guess which images represent the unprofessional results.

Screen capture from two google searches: “unprofessional haircut” versus “professional haircut”

Caroline’s own project, the feminist data set, is also a very cool project to investigate.

During the Q&A, she got a bit more specific about using AI to counter online abuse on the big social networking platforms. Caroline made a very good point that AI will not offer a solution, since the real problem lies with the content moderators. Every post that is being marked will be reviewed by moderator and does so without context and often based on a different world view than the person marking something as harassment, simply based on the fact that the moderator probably lives in a different country and culture. You can’t blame them for that, they just lack the right tools to do their job properly.

Hannah Davis

I’m a generative musician and researcher based in NYC. My work generally falls along the lines of music generation, composing, machine learning, and natural language processing.

Hannah Davis

Hannah explained that she has been using data sets for her project TransProse, in which she translates literature into music. For instance this is the basic emotion captured in Le Petit Prince:

When she dug into the data sets she found this:

Childbirth is classified as an emotionless event in the data set

Childbirth was classified as without any emotion. This is a very obvious example how data sets are not neutrally created. Hannah raised the question what type of world view a data set creates?

She argues that all data sets are created with bias, especially when you look at data sets that are created at one point in history and then still being used half a century later. Classifications from fifty years ago might not reflect current world views.

The problem with many data sets is that it takes a lot of effort to create them, therefore we use them for a long time, sometimes without proper updating. Hannah pleas for two things being attached to data sets: a list of ingredients (like we have on our food) and an expiration date.

That moment…

…you’re on the brink of publishing something, but don’t really want to, because you feel self-aware, uncomfortable, exposed, vulnerable, an imposter, and at the same time realize you will hit the publish button soon enough, if not today then tomorrow, or next week, because you’ve learned how to cross mental barriers one baby-step at a time until the final click is inevitable.

Oh, dit is het dus (waarschijnlijk)

Deze week viel me al op dat de kinderen op het kinderdagverblijf van Dochter extra druk waren. Elja, een moeder met meer ervaring dan ik, beschrijft het fenomeen van de eindejaarsmoeheid bij kinderen en hun ouders:

Na inmiddels een aantal jaar ervaring durf ik wel te stellen dat de laatste weken voor de Kerstvakantie voor ouders de heftigste weken van het jaar kunnen zijn. En dat is dan nog als je kinderen normaal gesproken redelijk om kunnen gaan met opwinding en een verstoorde routine. Hebben je kinderen ook nog ADHD of autisme of iets anders waardoor verandering ze sterk beïnvloedt (handen omhoog 🙋🏼‍♀️!), dan zijn die laatste weken van het jaar een regelrechte ramp.

Elja Daee, Voor de vaders en moeders

Ah. Dat is het dus waarschijnlijk. Dochter heeft er ook verschijnselen van, maar dan vooral door de intense verkoudheid die ze de dag van Sinterklaasavond ontwikkelde. Het fenomeen valt me vooral op doordat deze week de speelgroep chaotischer is dan anders.

Die Sinterklaasstress is overigens nog wel een dingetje. Ik heb bijvoorbeeld Dochter heel bewust nog niet naar het Sinterklaasjournaal laten kijken. Een spanningsopbouw vanaf 11 november. Bijna een vier weken! Dat leek mij veel te intens voor mijn drie(eneenhalf)jarige. Ik verwacht dat het volgend jaar een verplicht nummer is aangezien ze dan op school zit en het vast ‘talk of the town’ is onder de kleuters. Ik kijk er niet naar uit.

Experimenting with MainWP

At one point in time I decided to combine all my sites into one. Then I decided to stop posting to Facebook, which resulted in posting more on my blog, both private and businessy stuff. Then a friend told me mixing too much private stuff on a business site felt wrong. Then another friend told me he would appreciate to read more family stuff. Then I started a new blog for personal thoughts. Then I started a new project. The project needs a new website. I’m back to three. Therefore I’ve been looking for one blog to rule them all.

Today I installed MainWP, from which I’m posting right now. Installation was flawless. The only effort it took was choosing the domain and host to run it from.

I’m curious how it will make the flow of maintaining multiple sites easier. I’ll update you in a month or so.

Dochter en grote nicht beginnen aan de eerste zak. Sinterklaas had goed geluisterd naar dochter. Er zat ook een Elsajurk bij. Mét mantel. En ketting.


