The daunting task of getting your mac ready for data science

Today I did something that I have been postponing for some months now: creating an environment on my iMac to be able to do my own data analysis projects.

I learned to use Python and data science packages such as pandas and matplotlib, but all in the safe environment of datacamp. Now that I’m contemplating running my own projects, I had to install Python3 and the packages on my own computer. I started searching online last week, got a bit overwhelmed with all the variations on things other people listed they installed, felt too confused to continue and closed my browser without installing anything.

This afternoon I was ready to try again. I was mentally prepared this time, so I took my time to compare the variations on installing Python3. I quickly discovered that I should refine my search on using Python for data science, as otherwise I would be installing tools geared towards developers.

I first manually installed Python3 and then read at several data science websites about Anaconda, ‘your data science toolkit’ and ‘developed for solo practitioners’. That sounds like me. I installed it, created a new environment using the latest Python (3.10.0) version and ran straight into trouble when installing some packages. Of course the error messages were very human readable (not), but it mentioned lots of version numbers and greater than, equal to, or smaller than signs. I clearly chose the wrong version of Python to work with. I trashed the environment I created, made a new one using the auto-suggested Python version and automagically everything I needed was in there.

As a final step I installed Jupyter Notebook, by clicking on the install button within Anaconda’s GUI, tested whether it worked with a bit of example code one of the helpful instruction sites had and it worked! I now have a fully functioning data science environment waiting for me to do some awesome projects.

Door |2021-10-07T15:59:19+02:007 oktober 2021|datascience, flow|0 Reacties

Python logic

I’ve finished the first two modules on Python in DataCamp, the tool Techionista Academy uses for learning the basics. So far so good. It really helps that computer language is not completely alien to me. As a teenager I typed in tiny Basic examples from magazines to run silly programs. At university I started building websites form scratch in HTML and I can read and intervene in PHP thanks to years of experience using WordPress.

Getting properly introduced to Python really gives me better understanding of the logic of it. For instance when to use ‘ ‘ and when not. Or learning the difference between an integer and a float. Sometimes though, its logic is not that logical to me. One assignment asked me to select rows from a DataFrame (also known as a table). In this case the dataframe was a list of dogs with some attributes like age, dog breed and dog owner.

Lets call this DataFrame ‘dogs’. First task was to select dogs older than two.

greater_than_2 = dogs[dogs.Age > 2]

From dataframe ‘dogs’, select from column ‘Age’ everything larger than 2.

Next task was to select dogs who’s status was ‘still missing’.

still_missing = dogs[dogs.Status == 'Still Missing']

Easy enough when you know that ‘==’ means equal to.

Then the last task. Select dogs whose dog breed is not equal to poodle. More of the same, I thought. Although I already learned previously that you can’t use ‘dogs.XXX’ for referring to the right column in the table. This column name has two seperate wordes and therefore requires string notation. String notation means that it requires ‘ ‘. So I wrote:

not_poodle = dogs[dogs['Dog Breed' != 'Poodle']]

Running the code gave me an error. I tried many variations, but couldn’t find the mistake I was making. I had to ask for DataCamp to reveal the correct answer to me. It turned out that I had the second to last ] in the wrong place. The correct solution should have been:

not_poodle = dogs[dogs['Dog Breed'] != 'Poodle']

This felt totally unintuitive when I first read it. Now that I’m explaining it here, it starts to feel a bit more logical. I guess that’s what they call the learning curve. All languages have their own logic, which sometimes are in dissonance with the language you hear in your head.

Door |2021-01-30T15:35:23+02:0030 januari 2021|datascience, flow|1 Reactie
Ga naar de bovenkant