The stories represented by data are what people relate to

Every now and then I ask myself what the added value is of my skills in storytelling. I mean, in terms of fighting of the spread of a virus, or mending those who get sick, telling stories is not exactly an essential skill. Nor is data analyses, my newly acquired skill. A wonderful graph doesn’t cure any disease.

But then I read an article like this, and I get reminded why I wanted to know more about handling data in the first place.

“If readers don’t relate to the information, they are less likely to act and use it,” said Slovic, a founder and president of Decision Research, a collection of scientists who study the human psyche.

From datajournalism.com: Humanising data: Connecting numbers and people

In other words it’s essential to give data a voice or a face. Place the data in a context that people can relate to. Only then people are willing to act upon the story data tells.

Door |2022-01-05T19:16:54+02:005 januari 2022|dataanalyses, datascience, links|0 Reacties

Donated data tells stories

The Robert Koch Institute launched an app in April 2020, which makes it possible to donate health data to the institutes’ scientists. All in the light of de covid-19 pandemic.

Since they started collecting they published several analyses. Very worthwile to browse through. For instance, last week they published a graph showing how the average heart-rate of those testing positive in their data sample is higher for up to three months. Fascinating.

Door |2021-11-16T12:55:15+02:0016 november 2021|dataanalyses, datascience|0 Reacties

Lessons learned while researching data to find an answer.

Yesterday I published a data story on this blog. That was a first of its kind for me. Of course I’ve used graphs before in posts, but that was always reusing other people’s work. This time I did the data work myself. Here is an unstructured list of the things that I learned while doing it.

  • You start with downloading one dataset, but you’ll always need more data. My starting point was to find data on total houses in the country. The institute CBS has plenty of data available on their Statline website. I quickly found a data set with exactly what I needed: ‘Voorraad woningen; standen en mutaties vanaf 1921’. But of course, when you’re trying to find an answer to the question why housing is so expensive, you’ll need to compare it to population size. Therefore you need to download other data sets as well. For instance population growth;
  • Statline doesn’t always give you all the data available. In my exploration I first downloaded a dataset with numbers on population size starting in 1950. I used this mostly for compiling the graphs, only to find out later that there is another data set available that provides population data starting in 1900. My lesson here is to always dig for more when it comes to using CBS’s data;
  • Exploring data becomes messy rather quickly. I downloaded several data sets and used PowerBI to create a dimension table for ‘year’ and added this column to all tables, so that I could use all data across the tables. This phase is needed to discover what’s happening, but it gets more difficult to keep track of which columns you used from which table with each data set you add;
  • PowerBI is a very handy tool for exploring and combining data sets;
  • After the exploration phase, when I discovered the story the data was telling me, I created a new data set only containing the data that I needed. This way I couldn’t pick the wrong column when making the visuals;
  • To create relationships between the tables I used a ‘Year’ dimension table but only used it as a whole number column. I should have created a proper date dimension table to make it even easier to create relationships between the tables (as my teacher already told me to do with every new data model);
  • PowerBI Desktop is not the best tool for creating output outside the Microsoft PowerBI sphere. PowerBI is mainly meant for building ‘live’ dashboards used inside companies via PowerBI service, the online platform accompanying PowerBI. You can publish a report to service so that others inside your company can look at it. However, I want to publish the visuals on my blog. The only thing I can use from PowerBI Desktop is a PDF export. Luckily I know how to use Photoshop and was able to transform each PDF page in a PNG rather quickly, but that means extra steps between producing and publishing. Rather annoying when you have many graphs;
  • It’s easier to create new columns using a simple calculation in a spreadsheet than to use PowerBI’s DAX formulas to get the same result. In PowerBI I only succeeded doing calculations on columns within the same table, not across tables;
  • You need reflection time on what you’re doing with the data. I started exploring the data more than two weeks ago and only after I showed someone my unpublished post I discovered a flaw in my thinking. In one of my graphs I plotted three lines, two of which were a cumulation of population and houses and the third line was a yearly count of migrant surplus. I was comparing apples and pears to make a point. I corrected this and created a new graph comparing births, deaths and migrants, all accumulative since 1950.
  • I want to learn how I can create interactive SVG-plots on my website so readers can see the actual data behind the graphs.
Door |2021-11-02T11:55:33+02:002 november 2021|dataanalyses, datascience, flow|0 Reacties

Post academic blues

I had no idea how exhausted I was. September 10 I had my official graduation ceremony party and I wasn’t able to do a lot since. Without the need to study, take care of Daughter during the day, or client work, for the first time since…January (or even longer ago?) I had so many hours to fill during the day. It felt really uncomfortable. The first half of 2021 was mainly about being as efficient as I could be, every minute of the day. This month, without many obligations, I could feel the toll that period took on my body and mind. I felt tired so I often slept in the afternoons. I felt restless, unable to enjoy the quiet hours I longed for in previous months. Nevertheless, I clearly needed a few weeks of idle time. C’est la vie.

Then I started reading again. I started making appointments with people. The Man connected me to some people he knows that work in/on/with data. I had several pleasant conversations. Spent some time in the city centre on my own. Had lunch with a friend. I even wrote a new story for Daughter. Slowly my energy levels went up. And now I’m eager to do tackle some of the projects that I had to postpone. Like finding new professionals to interview for my podcast. Or editing that wedding video. Or landing a new client. I’m back and open for business.

Door |2021-09-30T15:27:37+02:0030 september 2021|flow|1 Reactie

This is what clubbing did in NL

Now that the UK is opening night-clubs, I’ll show you what to expect over the next few weeks. My government ran an experiment for fifteen days. The experiment was: do whatever you like, but keep your distance and get tested for activities when social distancing is not possible. Oh, and when you get your Janssen-shot your Corona-pass is active immediately (also called ‘Dansen met Janssen’-pass). Oh, and a test is valid for forty hours. That will get you going all weekend! Delta-variant? Yeah, we heard about that, but it’ll will be dominant by September, not earlier.

This is what happened

Number of people who tested positive per 100.000 inhabitants between June 14 and July 19.
First white dot on the blue line beneath the graph: June 26, opening up. Second dot: July 10, closing night-clubs and curfew for all cafes at midnight.
Number of people who tested positive between June 14 and July 19 per age group

In other words: let the unvaccinated groups dance all night long and give some of them long-covid, and their parents (many under 25 still live with their parents), who are still 1 shot and/or two weeks away of being fully vaccinated, too.

And what about Delta? Yep. Taking us by storm.

Covid-19 variants measured between June 28 and July 4

But the Brits are better vaccinated than the Dutch, right? Well…not really.

Vaccination rollout in the UK, source: The Guardian July 19
Vaccination rollout NL, source: European Centre for Disease Prevention and Control July 19

And don’t forget, the Dutch are mainly vaccinated with mRNA vaccines, which are better at fighting off the Delta variant.

I can’t wait to see the UK graphs in a few weeks’ time.

Door |2021-07-19T17:13:44+02:0019 juli 2021|datascience, flow|0 Reacties

The roles in data

For those who are not familiar with the variations of roles in the field of data, this is what I’ve learned so far about those roles. You have data analysts, data architects, data engineers and data scientists.

A data analyst works on, no surprise, analysing data. Often these are people preparing reports (for instance on sales performance) for management. When working in Microsoft Azure, you would spend a lot of time using Power BI to create reports, using polished data sets to perform transformations and calculations. Your goal is to create visualizations that anyone else in your organisation can easily understand.

To create those polished data sets for analysts to work with, you need a data infrastructure. That’s where the data architect comes in. An architect thinks through what the data needs are on one end, knows the perks of the raw data coming in at the other end and then designs the infrastructure in between.

The data engineer then uses the input from the data architect to build the infrastructure.

Sometimes you want to dig deeper into your data and discover more complex patterns. That’s where the data scientist comes in. These math wizards apply statistical, machine learning and AI models to data and are able to tweak these models using there mathematical knowledge.

I was rather surprised to learn how working in data, a relatively new field to work in, already split up in so many roles. And then I’m not even talking about all the specialisations you could choose within these roles for hard core programmers. For instance my trainer worked for many years solely on optimising SQL statements for a living.

My education prepares me for two roles mainly: the analyst and the engineer. I’m most definitely happy with all the skills I learned about using Power BI. That will help me a lot when I start digging for stories using data sets. The data engineering part is absolutely not my cup of tea. It’s really theoretical and an in-depth crash course on database management and Azure cloud infrastructure. You can compare it to fitting electrical pipes in your home. It needs to be done, otherwise you can’t live in your home, but it’s not as exciting as decorating your home. At least I get more excited about decoration and design than fitting pipes. That said, I’m still very happy to get a solid understanding of the inner workings of databases and the cloud tools one needs to create usable data from raw data. It helps me to be able to instruct others to click the right buttons, write the proper SQL statements and build the pipelines for me, so that I can dig for the data story gold 😉

Door |2021-06-23T17:10:12+02:0023 juni 2021|datascience, flow|0 Reacties
Ga naar de bovenkant