Friday, June 19 is the date that I can point to as Day Zero. On that day I decided I’m going to spend not just a few weeks, but months and years getting to know the ins and outs of Data Science. I’ve had few epiphany moments in my life and one truly hit on that day.
So here’s a quick summary of what I did in June. I intend to make this a regular exercise at the end of each month. It’s difficult to feel like I’m making progress one day from the next – or even from opposite ends of a week. But over a month, and as those months build into a year, I know that I’ll be amazed at how far I’ve come. But here’s what I did by June 30. I’ll go into detail on some of these items in future posts.
- Bought Data Science from Scratch by Joel Grus. Made it through Chapter 6 – Probability.
- Installed the Anaconda Python package to support the Data Science from Scratch book.
- Completed through Week 5 of Coursera’s Programming for Everybody (Python)
- Finished up through Unit 9 of Codecademy’s Python Course.
- Signed up for Coursera’s Specialization Track in Data Science as provided by Johns Hopkins University. Completed the third and final week of the first course in that program, The Data Scientist’s Toolbox.
- Completed Chapters 1 & 2 of R in Action, 2nd Edition
- Began hands-on exploration of SAP HANA by creating my own Amazon Web Services instance of SAP HANA One including installing the SAP HANA Developer’s Studio on my laptop.
- Completed through Week 5 of Hasso Plattner Institute’s In-Memory Data Management course.
- Watched the first seven videos of the YouTube Channel SAP HANA Academy, Working with Data using SAP HANA Studio
- Started the second week of openSAP’s Driving Business Results with Big Data course.
- Made it halfway through Khan Academy’s first unit in their Probability and Statistics series: Independent and dependent events
- Spent a good amount of time on Twitter following good accounts and resources. I’ll definitely be detailing this in a future post.
So yeah, it’s been a bit of drinking from a firehose. And it’s tough to know exactly what the right path is, because to start with you just don’t know what you’re doing. But you just have to dive in and make adjustments as needed. I’ve already altered a few things, but more on that later.
So I’m pursuing three main tracks:
- Data science tools & methods (including R & Python)
- Mathematics (Probability & statistics)
- Data storage & retrieval – primarily SAP HANA which my company has recently acquired.
I look above and see all that I’ve attempted to pursue simultaneously, and I’m amazed at how much of it is freely available or available at very little cost. The tools are free. The amount of free literature and tutorials on the internet on this subject is staggering. Maybe one day I will enroll in a formal graduate program, but there’s enough material out there for me to digest on my own that there’s no excuse not to start right away.
More to come…