Accomplishments – July 2015

It’s the last day of August and I’m just now posting my Data Science education accomplishments for July. That doesn’t mean I haven’t been busy – far from it!

Codecademy’s Python Course

I completed Codecademy’s Python course. I found it to be a great hands-on introduction to the subject. Very short, focused lessons and you immediately get to apply what you learned through guided exercises. This should be the starting point for anyone who wants to learn Python.

HPI In-Memory Data Management Course

I’m a fan of SAP HANA, and Hasso Plattner’s lectures on the history of in-memory computing was made available in a 6-week self-paced course, In-Memory Data Management (2014) – Implications on Enterprise Systems. I grasped about 50% of what was covered but still found it interesting. You won’t get any useful, practical skills out of this course – but it did give a good historical and technical background of the technology upon which SAP HANA was based. Probably the most interesting aspect were the case-studies in the last week of how SAP HANA had a dramatic positive effect on business & research (especially Genome Analysis).

openSAP – Driving Results with Big Data

To further my exposure to SAP HANA, I completed the openSAP course, Driving Business Results with Big Data. It was the first time this 6-week course was offered and the student was able to get hands-on with some of the technology that SAP offers (including some recent acquisitions of predictive & learning technology). This course was a bit rough around the edges and the technical hurdles to get access to the demo systems may have been a bit high. I thought the educational goals were lofty and may have missed the mark somewhat. For example, in the final exam the question “What was the first item ever sold on eBay” was egregiously irrelevant to the course’s objectives. The hands-on exercises were basically point & click activities with very little context explanation. Still, it was a first time through for this course and if you were looking for a survey of what enterprise platforms such as SAP are up to in this space, you’d be hard-pressed to find a better offering.

Khan Academy – Probability and Statistics

I worked through all of Khan Academy’s Probability and statistics lecture videos. My math was a bit rusty (although I did major in Mathematics back in the Dark Ages). I was absolutely thrilled with the pace & approach of these lectures. By themselves, I’m not going to be a Statistics expert, but it did provide a good launching pad for me to explore the more advanced material found elsewhere. This series takes you right up to the foundation of chi-square analysis and ANOVA testing, but it didn’t go too deep. But I feel better now about handling some of the more advanced statistical techniques I’m encountering in Data Science books.

Coursera – The Data Scientists Toolbox

I completed the first course in the Coursera Data Science Specialization track, The Data Scientist’s Toolbox. I have written before about how much I love Coursera and I’ve just recently completed my third course in this track. The Toolbox class was very light on material and wasn’t very challenging. However, it was absolutely necessary to ensure that you have R and RStudio installed and that you create and can interact with git and GitHub. I’ve had no exposure to any of these tools and platforms before, but they are certainly utilized in every course after this. This particular class allowed you to dip your toe into the water before you dive into the pool.

The Data Science Handbook

I read every word of The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists. This was an absolutely riveting collection of first-hand accounts of what real data scientists are doing on a daily basis, how they got to where they were and advice for those who are entering into the field. I followed every one of them on Twitter and have been fortunate to interact with a few of them. This is a mandatory read for anyone heading down the same path as myself.

I’ll be posting my August summary in a few days and promise to keep up on this with a bit more frequency. There’s a lot on my mind about Data Science – especially a tiny little flame war I’ve gotten into on a Coursera Message Board. More on that later.