This is what clubbing did in NL

Now that the UK is opening night-clubs, I’ll show you what to expect over the next few weeks. My government ran an experiment for fifteen days. The experiment was: do whatever you like, but keep your distance and get tested for activities when social distancing is not possible. Oh, and when you get your Janssen-shot your Corona-pass is active immediately (also called ‘Dansen met Janssen’-pass). Oh, and a test is valid for forty hours. That will get you going all weekend! Delta-variant? Yeah, we heard about that, but it’ll will be dominant by September, not earlier.

This is what happened

Number of people who tested positive per 100.000 inhabitants between June 14 and July 19.
First white dot on the blue line beneath the graph: June 26, opening up. Second dot: July 10, closing night-clubs and curfew for all cafes at midnight.
Number of people who tested positive between June 14 and July 19 per age group

In other words: let the unvaccinated groups dance all night long and give some of them long-covid, and their parents (many under 25 still live with their parents), who are still 1 shot and/or two weeks away of being fully vaccinated, too.

And what about Delta? Yep. Taking us by storm.

Covid-19 variants measured between June 28 and July 4

But the Brits are better vaccinated than the Dutch, right? Well…not really.

Vaccination rollout in the UK, source: The Guardian July 19
Vaccination rollout NL, source: European Centre for Disease Prevention and Control July 19

And don’t forget, the Dutch are mainly vaccinated with mRNA vaccines, which are better at fighting off the Delta variant.

I can’t wait to see the UK graphs in a few weeks’ time.

Door |2021-07-19T17:13:44+02:0019 juli 2021|datascience, flow|0 Reacties

The roles in data

For those who are not familiar with the variations of roles in the field of data, this is what I’ve learned so far about those roles. You have data analysts, data architects, data engineers and data scientists.

A data analyst works on, no surprise, analysing data. Often these are people preparing reports (for instance on sales performance) for management. When working in Microsoft Azure, you would spend a lot of time using Power BI to create reports, using polished data sets to perform transformations and calculations. Your goal is to create visualizations that anyone else in your organisation can easily understand.

To create those polished data sets for analysts to work with, you need a data infrastructure. That’s where the data architect comes in. An architect thinks through what the data needs are on one end, knows the perks of the raw data coming in at the other end and then designs the infrastructure in between.

The data engineer then uses the input from the data architect to build the infrastructure.

Sometimes you want to dig deeper into your data and discover more complex patterns. That’s where the data scientist comes in. These math wizards apply statistical, machine learning and AI models to data and are able to tweak these models using there mathematical knowledge.

I was rather surprised to learn how working in data, a relatively new field to work in, already split up in so many roles. And then I’m not even talking about all the specialisations you could choose within these roles for hard core programmers. For instance my trainer worked for many years solely on optimising SQL statements for a living.

My education prepares me for two roles mainly: the analyst and the engineer. I’m most definitely happy with all the skills I learned about using Power BI. That will help me a lot when I start digging for stories using data sets. The data engineering part is absolutely not my cup of tea. It’s really theoretical and an in-depth crash course on database management and Azure cloud infrastructure. You can compare it to fitting electrical pipes in your home. It needs to be done, otherwise you can’t live in your home, but it’s not as exciting as decorating your home. At least I get more excited about decoration and design than fitting pipes. That said, I’m still very happy to get a solid understanding of the inner workings of databases and the cloud tools one needs to create usable data from raw data. It helps me to be able to instruct others to click the right buttons, write the proper SQL statements and build the pipelines for me, so that I can dig for the data story gold 😉

Door |2021-06-23T17:10:12+02:0023 juni 2021|datascience, flow|0 Reacties

Vrouwen blijven onzichtbaar

Het is natuurlijk absoluut simpel om te registreren of een deelnemer aan een klinische studie fysiologisch man of vrouw is. Als je dan een nieuw ontwikkeld vaccin mag testen op pandemische schaal zijn de getallen ook best snel statistisch significant. Hoe logisch is het dan om ook even een kolom m/v op te nemen in je database van gerapporteerde bijwerkingen? Ik dacht dat medische onderzoekers allemaal wel een kopie van Invisible Women op hun nachtkastje hebben liggen. Wat een naïeve gedachte van mij zeg.

[…] de vaccinmakers hebben het element ‘sekse’ goeddeels genegeerd in het vaccinonderzoek en de behandelmethoden van Covid-19. Zo had geen van de gepubliceerde klinische proeven van vijf coronavaccins de opgetreden bijwerkingen uitgesplitst naar sekse.

Hoe vrouwen vergeten werden in het Covid-19-onderzoek (bron: Trouw)
Door |2021-04-08T21:14:02+02:008 april 2021|datascience, vrouw|0 Reacties

Mailchimp gives me more than I want

I discovered a disturbing thing yesterday when exporting my Mailchimp e-mail contacts for IFF. Mailchimp has more personal data than I asked for. When signing up for my weekly digest, all I want to know is an e-mail address. It’s the only thing I need to know for sending an e-mail to a person. I deliberately do not ask for names or other details. The less I know, the better considering the chances of data breaches and GDPR legislation. After exporting my contact list from Mailchimp I found out the service has more data on record than I asked for. For instance for one person who registered for my e-mail subscription list I have a first name, a last name and birthday on record. I’ve never had input fields for that information in my subscription form, so how do those personal details end up in my contact list on Mailchimp?

The only explanation I can think of is that Mailchimp keeps unique ID’s based on an e-mail address. The data that person discloses to a Mailchimp mailing list then ends up in all other mailing lists that person subscribes to using the same mail address. A hint for that explanation are the ID numbers that I also received in my data export from Mailchimp. All users are assigned two ID numbers, a LEID and an EUID. This is Mailchimp’s explanation about these ID’s:

LEID is the unique identifier for a contact, specific to an audience. EUID is the unique identifier for a contact on the account level, across all audiences.

Whatever the reason and mechanics behind Mailchimp’s user ID’s, I’m really shocked to find out I own personal data I never asked for. Even worse, the person subscribing to my mailing list never agreed for me to have that data. A serious breach of trust (and the law).

There is more data in the export file. IP address is logged at opt in and again when confirming ones subscription. Latitude and longitude. And based on that information country and province are logged. I was under the assumption I only asked for an e-mail address and that would be the only thing on record. I was wrong.

My conclusion is that I should never have used Mailchimp in the first place for sending my automated blog digest. I exposed my readers to a data collector. I deeply apologize for that.

I am now switching to a WordPress plugin called Mailpoet. The sign-up data is stored in the database belonging to this blog. The only data I transferred from Mailchimp to my blog are e-mail addresses, as subscribers agreed to when signing up. The only extra information that is logged are IP addresses when signing up and confirming. With that data I can prove consent for signing up to anyone asking (or a subscriber to prove an e-mail address was misused for signing up). I will delete my account for IFF with Mailchimp and remove the export files on my computer.

This case clearly shows how easy it is to collect excess data. Lesson learned.

Door |2020-08-04T15:24:41+02:004 augustus 2020|datadieet, flow|2 Reacties
Ga naar de bovenkant