The daunting task of getting your mac ready for data science

Today I did something that I have been postponing for some months now: creating an environment on my iMac to be able to do my own data analysis projects.

I learned to use Python and data science packages such as pandas and matplotlib, but all in the safe environment of datacamp. Now that I’m contemplating running my own projects, I had to install Python3 and the packages on my own computer. I started searching online last week, got a bit overwhelmed with all the variations on things other people listed they installed, felt too confused to continue and closed my browser without installing anything.

This afternoon I was ready to try again. I was mentally prepared this time, so I took my time to compare the variations on installing Python3. I quickly discovered that I should refine my search on using Python for data science, as otherwise I would be installing tools geared towards developers.

I first manually installed Python3 and then read at several data science websites about Anaconda, ‘your data science toolkit’ and ‘developed for solo practitioners’. That sounds like me. I installed it, created a new environment using the latest Python (3.10.0) version and ran straight into trouble when installing some packages. Of course the error messages were very human readable (not), but it mentioned lots of version numbers and greater than, equal to, or smaller than signs. I clearly chose the wrong version of Python to work with. I trashed the environment I created, made a new one using the auto-suggested Python version and automagically everything I needed was in there.

As a final step I installed Jupyter Notebook, by clicking on the install button within Anaconda’s GUI, tested whether it worked with a bit of example code one of the helpful instruction sites had and it worked! I now have a fully functioning data science environment waiting for me to do some awesome projects.

Door |2021-10-07T15:59:19+02:007 oktober 2021|datascience, flow|0 Reacties

This is what clubbing did in NL

Now that the UK is opening night-clubs, I’ll show you what to expect over the next few weeks. My government ran an experiment for fifteen days. The experiment was: do whatever you like, but keep your distance and get tested for activities when social distancing is not possible. Oh, and when you get your Janssen-shot your Corona-pass is active immediately (also called ‘Dansen met Janssen’-pass). Oh, and a test is valid for forty hours. That will get you going all weekend! Delta-variant? Yeah, we heard about that, but it’ll will be dominant by September, not earlier.

This is what happened

Number of people who tested positive per 100.000 inhabitants between June 14 and July 19.
First white dot on the blue line beneath the graph: June 26, opening up. Second dot: July 10, closing night-clubs and curfew for all cafes at midnight.
Number of people who tested positive between June 14 and July 19 per age group

In other words: let the unvaccinated groups dance all night long and give some of them long-covid, and their parents (many under 25 still live with their parents), who are still 1 shot and/or two weeks away of being fully vaccinated, too.

And what about Delta? Yep. Taking us by storm.

Covid-19 variants measured between June 28 and July 4

But the Brits are better vaccinated than the Dutch, right? Well…not really.

Vaccination rollout in the UK, source: The Guardian July 19
Vaccination rollout NL, source: European Centre for Disease Prevention and Control July 19

And don’t forget, the Dutch are mainly vaccinated with mRNA vaccines, which are better at fighting off the Delta variant.

I can’t wait to see the UK graphs in a few weeks’ time.

Door |2021-07-19T17:13:44+02:0019 juli 2021|datascience, flow|0 Reacties

The roles in data

For those who are not familiar with the variations of roles in the field of data, this is what I’ve learned so far about those roles. You have data analysts, data architects, data engineers and data scientists.

A data analyst works on, no surprise, analysing data. Often these are people preparing reports (for instance on sales performance) for management. When working in Microsoft Azure, you would spend a lot of time using Power BI to create reports, using polished data sets to perform transformations and calculations. Your goal is to create visualizations that anyone else in your organisation can easily understand.

To create those polished data sets for analysts to work with, you need a data infrastructure. That’s where the data architect comes in. An architect thinks through what the data needs are on one end, knows the perks of the raw data coming in at the other end and then designs the infrastructure in between.

The data engineer then uses the input from the data architect to build the infrastructure.

Sometimes you want to dig deeper into your data and discover more complex patterns. That’s where the data scientist comes in. These math wizards apply statistical, machine learning and AI models to data and are able to tweak these models using there mathematical knowledge.

I was rather surprised to learn how working in data, a relatively new field to work in, already split up in so many roles. And then I’m not even talking about all the specialisations you could choose within these roles for hard core programmers. For instance my trainer worked for many years solely on optimising SQL statements for a living.

My education prepares me for two roles mainly: the analyst and the engineer. I’m most definitely happy with all the skills I learned about using Power BI. That will help me a lot when I start digging for stories using data sets. The data engineering part is absolutely not my cup of tea. It’s really theoretical and an in-depth crash course on database management and Azure cloud infrastructure. You can compare it to fitting electrical pipes in your home. It needs to be done, otherwise you can’t live in your home, but it’s not as exciting as decorating your home. At least I get more excited about decoration and design than fitting pipes. That said, I’m still very happy to get a solid understanding of the inner workings of databases and the cloud tools one needs to create usable data from raw data. It helps me to be able to instruct others to click the right buttons, write the proper SQL statements and build the pipelines for me, so that I can dig for the data story gold 😉

Door |2021-06-23T17:10:12+02:0023 juni 2021|datascience, flow|0 Reacties

Data nerd

I am slowly morphing into a true data nerd. This week I started learning the data engineering part of data. Learning about lambda and delta lake architecture, ETL, OLTP, OLAP, Apache Spark, notebooks, parquet files, SCD Types 1 2 3 and 6, pipelines, serverless and dedicated SQL pools, PolyBase and there’s more to come.

My head is exploding.

Door |2021-06-07T19:55:52+02:007 juni 2021|datascience, flow|0 Reacties

Another certificate in the pocket

The past two weeks I spent most of my time studying for the MS DA-100 exam, also known as ‘Analyzing Data with Microsoft Power BI’. This morning I took the exam and passed with a very decent score of 893/1000 (although I have to admit I was a bit annoyed not breaking the 900 barrier). After the training and passing the exam I am now skilled enough to start my own data analyzing projects. I’m looking for ideas where to apply my new skills.

Door |2021-05-20T12:08:33+02:0020 mei 2021|datascience, flow|0 Reacties
Ga naar de bovenkant