How I Take Online Courses for Data Science (Part 2 – Self-quizzing)

The Struggle

In my last blog, I shared how I take notes while engaged in an MOOC for Data Science. I proceeded happily for months in this fashion, ringing each bell as I completed course after course in various specializations.

Young man, youth tired out or shattered after a hard nightAround February of this year, however, a sinking feeling was starting to settle in: I just wasn’t retaining a lot of the information I was learning. Sure I was scoring 100’s on all the quizzes and completing the assignments on-time without any issues, but I felt uneasy.

A month after I worked on the code for a gradient descent algorithm for a lab assignment, you think I had any clue what gradient descent was?

Two months after I learned how to create a support vector machine model in python, you think I recalled what library to import to even start?

Three months after I learned to separate data into training & test groups in R, you think I could remember a single command to do so?

NO – I found myself constantly having to go back to older notes for the most basic commands. I was spending all of my time on StackOverflow looking for solutions to the most basic questions (like how to reverse a Python list, how to come up with 10 random integers in R, etc). If I was to seriously work in the Data Science realm, I knew I needed to have a solid, fundamental level of proficiency with the tools and techniques I was expecting to use.

Enter Mnemosyne

Quite a while ago, I had gotten it into my mind to learn Japanese. My sole motivation: in my academic career the only thing I absolutely sucked at was foreign languages and it wasn’t for lack of effort. In my 30’s, I wanted to wipe that blemish off my record by tackling one of the hardest languages for native English-speakers pursue.

imageI ran through all 3 levels of Rosetta Stone. I listened to every minute of Pimsleur’s entire Japanese collection. I had more books & videos than I knew what to do with. The most valuable tool I used, however, was the open-source flashcard system, Mnemosyne.

I confess, I didn’t try a different flashcard programs and settle upon this one as the best. But what I did want was a tool to help me identify the concepts I was struggling with and beat me over the head with them until they became second-nature.

From their website:

Mnemosyne uses a sophisticated algorithm to schedule the best time for a card to come up for review. Difficult cards that you tend to forget quickly will be scheduled more often, while Mnemosyne won’t waste your time on things you remember well.

A quick demo

Here is what a typical card looks like. There’s a question at the top and I work out the solution over in my RStudio session (or IPython session).

image

At the top of the card, in the title bar, is the topic (in this case, R). I’ve also created cards for Python, Spark, Statistics and general data science.

At the bottom, you can see how many cards I have scheduled for today for this topic (11). Mnemosyne takes care of all of the card scheduling – I just follow its lead. Usually I get between 10-20 cards a day per topic.

Those 11 scheduled cards are ones that I’ve already seen before. Mnemosyne felt the time was right to bring them up again.

Also at the bottom are the number of cards that I’ve not seen at all (2) and the total number of cards for the topic that’s in the deck (169). The 2 cards I’ve not seen yet were ones that I created just yesterday and I’ll see those once I’ve finished my 11 scheduled cards.

Once I’ve worked out the solution (or have the answer in my head), I click Show answer.

image

I then have to grade myself. Beyond just right or wrong, I have to indicate the amount of struggle I had.

  • 0 means I had no clue; it was completely forgotten and I need to see it again in a few more minutes.
  • 5 means that this question has become elementary for me and I had better not see it again for while.
  • 2 means that I got the answer for the most part, but I struggled with it, or doubt I’m going to remember it so please show it again to me tomorrow.

imageOver time, Mnemosyne keeps a history on all of your cards & scores. If you give the same card a 5 several times in succession, it gets pushed out by several days, then weeks, then months. But if you’ve struggled with a question that you previously had mastered, no problem – it goes back into more frequent rotation.

Creating a card

A good portion of the effort (but well worth it) is in the creation of the cards.

image

The questions and answers can be just plain text, but it also accepts some rudimentary HTML (like the <code> and <pre> tags). You can also link to images which will get embedded in the card.

With a plug-in, you can also enter mathematical equations via LaTeX, but it’s not trivial to get that in place. It took a good weekend of being pissed off to get it to finally work.

 

 

 

Type of questions

As I read a textbook (i.e. Learning Python) or review notes from a previous course (like the self-paced R swirl courses), I look for self-contained opportunities to create a Mnemosyne card. They can’t be too intricate like Kaggle competitions; they need to be something to address a specific topic or technique and doable within just a few minutes at most.

Here’s some sample questions I have set up (in order of increasing complexity):

image

image

image

Effectiveness

I started my first cards in May while waiting between sessions at the ODSC East conference in Boston. As of today, I have nearly 400 cards created (and expect to reach well over a thousand by the end of the year).

I spend about 30 minutes a day with scheduled R cards and another 30 minutes a day with Python. I also spend another 30 – 45 minutes a day contributing new cards (while reading a book or other notes).

The time spent in Mnemosyne has prevented me from taking a course or two that I would normally have signed up for. So is it worth it?

My confidence in my skillset has never been higher. Previously, I would look back on the 8 courses I had taken in the Johns Hopkins – Coursera Data Science Specialization and really wonder if I had learned anything at all. Once I started making up and challenging myself with these cards daily, I can really feel the difference. I’m now able to whip up ggplots in R, read & process data files in Python with ease and many other tasks.

USE IT OR LOSE IT

Yes, that’s an extremely cliche saying. But it doesn’t make it any less effective.

I would often see people advise new students to grab datasets they’re interested in and begin playing with them. Others recommend tackling a real-life data problem like those found on Kaggle. And I truly believe that’s good advice. But there’s an in-between stage that I believe is critical as well:

  • Take courses, learn material
  • Become proficient with basic techniques
  • Work on real datasets, attempt Kaggle challenges

There’s no worse feeling than accepting a data challenge, sitting down to a blank R screen, and watching it remain blank as you struggle to recall the basic commands. Is it ReadCSV() or read_csv() or please_read_this_damn_csv_file_already()? How do you use dplyr again? Why am I constantly getting this Python syntax error when all I want to do is separate out the text line into individual words?

I can tell you now with the passion of a revivalist preacher – Mnemosyne works. I am stunned by what I am able to recall now – to the point where the syntax and basic techniques are starting to form muscle memory (in the brain). Sure, there will always be the need to do online searching or RTFM’s ahead. But the more I can do as a reflexive action, the more time I can spend on the actual analysis. And that’s where the fun is, is it not?

image

What you need to bring to this, however, is patience – and lots of it. You have to be OK with getting things wrong and requiring many, many attempts at a given card before it feels like it’s sticking. When I get a question wrong, I am grateful. Grateful that Mnemosyne found a whole in my knowledge base. Grateful that I discovered this deficiency now rather than at a job interview. Grateful the Mnemosyne is not judgmental and is always going to give me additional chances to get it right.

I have experienced days where I got 9 out of 10 questions wrongs. That’s fine. I’ll get them next time. And as the successes keep piling up, my confidence that I will indeed retain any given topic increases as well.

Do you have any studying techniques and tips to share? Any questions or ideas for improvement on the above? Please let me know in the comments below.

4 thoughts on “How I Take Online Courses for Data Science (Part 2 – Self-quizzing)

Leave a comment