My Journey has Come to an End – a Postscript

It has been a couple of months since my last update and this post will essentially bring to conclusion the “diary” of my data science self-education endeavor.

For a number of reasons (professional and personal), I moved on from my prior place of employment for the past 19 years. It was a successful ride and I certainly hope I gave as much as I had received in my time there.  During my final year there as CIO, I had become enamored with the field of Data Science and began documenting my exploration in this field. My last day with them was April 1 and I took advantage of newly found free time to spend more time with studies as well as relocating from New York to South Carolina.

My plan was to find employment in SC in the realm of Data Science (or at least something in the Data Analytics field that could transition to Data Science). I took a little sabbatical over the summer and began my search in earnest after Labor Day. I created a resume (I’ve not had to use one in nearly 19 years) which I believe highlighted my education (undergraduate degree in Math & Physics, graduate degree in Physics), my real-world business experience in the Supply Chain industry and my Data Science certifications (as thoroughly documented in this blog).

As with most elements in my life, things did not go according to plan (just ask my two ex-wives!). Here are some of the challenges I encountered.

  • South Carolina is not a hotbed of Data Science activity. Job board searches in South Carolina for “Data Scientist” generate less than a handful of hits. Travel north to North Carolina or west to Atlanta, and you’ll find a land of opportunity. There were a handful of openings in Charleston and Columbia, but sadly, those cities are hours away from where I live.
  • You need to live in/near a big city. When I attended ODSC East in Boston back in May, I observed at the Career fair that recruitment for new Data Scientists was primarily near large cities. If you have the passion for DS and the flexibility to move, head to Washington DC, NYC, Boston, Chicago, Seattle, San Francisco, etc. Unless you’ve a well-established track record in the field, telecommuting is not likely to be an option. For me, it was too much to ask my wife to leave her career of 20 years to uproot the family, leave the town of her birth, her friends & relatives and gamble with me on a new part of the country.
  • My career history may have been too intimidating for employers looking to fill lower-level positions. When I realized that pure data scientist positions were not to be found near me, I began to seek any role in a data analytics related field. I knew realistically that what I was seeking would have resulted in a salary that was half or even less than what I had been making before. Fortunately, I was in a position where the salary wasn’t as important to me as the challenge of entering a field where I’d be intellectually challenged every day. And I was confident that as I gained experience, my salary would commensurately improve. However, despite explaining my situation & desires in a cover letter, it might have been too much for a potential employer to see a resume of a former CIO applying for a position that may have been far down on the organizational chart. While climbing the ladder is acceptable, stepping down & starting over causes some consternation.
  • Lack of experience and education in Data Science. For all the courses I completed, books I’ve read, and certificates I’ve received, it still doesn’t add up to a PhD, let alone a Masters Degree in Data Science. I had hoped that my real-world experience would’ve compensated for this, but it turned out not to be the case. Unfortunately, Masters Degrees are not cheap (I know, I already have a couple) but as I’m already funding two of my children through college, I just didn’t have it in me to fund a third (me). I’m convinced that the education I’ve received is equivalent to any MS program in the subject, but I can certainly see from an employer’s point of view that a formal education is more of a guarantee of the skillset being advertised on the resume.

    If I had to do it over, I would’ve gotten involved in publishing data projects much earlier on. In my view, I had wanted to make sure I had attempted to learn about as many techniques as possible before tackling publicly available datasets. Instead, I should have been tackling them and publishing my work (i.e. on Github) back when all I knew was linear regression. I didn’t have an extensive library of original work and by the time I felt comfortable doing so it was really too late to make a difference in my employment search. I hadn’t built up a network of collaborators and peers in the subject area and striking out on my own proved to be fruitless.

A Happy Postscript

One thing that I’ve learned in my 40+ years of existence is this: Just because you didn’t end up where you planned, it doesn’t mean you’re not where you’re meant to be.

I am actually thrilled to announce that I’ve begun the next phase of my career with a fantastic company. As of last week, I have been hired into the role of Product Manager with NCR. I’ll be in charge of setting the vision, strategy and execution of their ERP product for the Distribution & Wholesale space. I remain in the same Supply Chain industry that I’ve spent the past two decades in, and as my new boss tells it, I get to join the “dark side” of being a technology solutions provider instead of being the consumer.

It’s a great opportunity that came to me unlooked for and is filled with potential to work with some great technology minds worldwide to create solutions to real-world problems.

Have I completely abandoned Data Science? Heck no. I believe that one of the reasons NCR hired me was my knowledge of the subject. I personally plan to introduce the practice of pattern recognition, predictive analytics and decision making into the software platform that I’m now responsible for. Although I will not be able to spend 8 hours a day playing in R, Python and Spark, I will be keeping tabs reading other blogs and books, and maybe tinkering around in R from time to time. But first, I still need to figure out where the bathrooms are in this massive organization.

I’m glad to have heard from many of you who found my personal journal to be of help. I wish all of you well on your journey – regardless of where it takes you!

image

– Bill Kimler, 12/31/2016

Accomplishments – Sept 2016

As a reminder, a one-page summary of all the courses, books & videos
I’ve reviewed in the past year can be found on my Journey Roadmap page.

imageSouth Carolina is hot. We went to a “fall festival” in late September where the autumn temperature peaked at 100 degrees with nary a breeze to be found. For a few moments, I questioned my decision to relocate South, but I’m sure the payoff will occur in the months ahead.

I hit a big milestone this month in that my sabbatical from employment has come to an end and I have begun a job search in earnest. Although I’ve submitted my resume to a number of job postings, I’ve not had any bites yet – even for interviews. I’ve observed that Data Science job openings are in abundance around big cities (NYC, Boston, Washington DC, Chicago, San Francisco). I’m not seeing a ton of opportunity here in South Carolina (Greenville area), but I’m looking!

So here’s how I passed the time in September:

edX – Berkeley U – CS110x Big Data Analysis with Apache Spark

I’ve been pursuing 3 main areas of study:

  • R
  • Python
  • Apache Spark

I learned early on that the areas of Data Science and Big Data are very different, although they are often interchangeably used in mass-media articles. Big Data is the technical feat of handling massive amounts of data (storing and retrieving), in amounts more than any single computer or disk can handle. Data Science are the techniques for deriving useful and actionable information from data (of any size – not just large amounts).

Apache Spark is a very interesting bridge between the two. Like Hadoop, it is an architecture for handling data spread out over many clusters of machines (Big Data) and comes with a rich library of algorithms (Data Science) optimized for running in parallel over these many machines and combining the results together. You can work in Apache Spark using Java, Scala (a java-like language) or Python (pyspark).

This second course provided by Berkeley University on the EdX platform, dives into Machine Learning algorithms using Apache Spark. As in the first course (CS105x Introduction to Apache Spark) the lecture videos were brief and not very in-depth. However, the meat & potatoes were the various interactive labs where 90% of the learning occurs. These labs were extraordinary. They provided links to Spark API documentation where appropriate, walked you through simple examples before sending you on your own to code and provided many checkpoints to make sure you were doing things correctly.

Continue reading

Accomplishments – Aug 2016

As a reminder, a one-page summary of all the courses, books & videos
I’ve reviewed in the past year can be found on my Journey Roadmap page.

imageIt’s been a summer of incredible transition for me as I’ve made a permanent move from the relatively chilly climate of New York (old house shown to the right) to the equatorial heat misery of South Carolina. I can only hope that this investment pays off in the winter when I’m enjoying a balmy 50-degree day while the Northeast shovels out of a blizzard.

I’ve not posted an “Accomplishments”  blog since May, but that certainly shouldn’t indicate that I’ve not been pursuing Data Science over the summer. Far from it! Although I hadn’t completed any new courses or books in June and July, when I wasn’t busy packing up or tossing out all of my life’s possessions, I took advantage of the time to revisit a lot of the topics I’d covered in the past year.  I began creating hundreds of Mnemosyne flashcards to sharpen my skillset. I retook the UoW Machine Learning: Regression Course, going over all code examples in painstaking detail. I also re-read every word of  “An Introduction to Statistical Learning with Applications in R”, working through all of R labs and exercises, incorporating sample code into my Mnemosyne card set. It was an absolutely necessary activity, and I feel much stronger as a result. Consider revisiting some old courses you’ve taken – you’d be surprised that you can still get something new from them with multiple tries.

August, however, with the move complete, a number of endeavors also came to a successful close.

Completed Items

Coursera – Machine Learning: Clustering and Retrieval

This is the fourth course in the University of Washington Machine Learning Specialization on Coursera. Grouping and association were the theme here. Diving into large datasets of Wikipedia article entries, we found commonality between groups of articles, implemented various measures of “alikeness”, assigned articles to topics based on word groupings and made predictions on new articles based on models build from large training sets.

Continue reading