October | 2015 | dreaming of data

setwd("C:/Users/Bill Kimler/Documents/Orders") files <- list.files("./OrderHistory") customerNumbers <- vector() for (i in 1:length(files)){ openFile <- file(paste0("./OrderHistory/",files[i])) customer <- substring(readLines(openFile, n = 5)[4], 38, 47) close(openFile) customerNumbers <- append(customerNumbers, customer) }

setwd("C:/Users/Bill Kimler/Documents/Orders") files <- list.files("./OrderHistory") extractCustomers <- fuction (filename){ openFile <- file(paste0("./OrderHistory/",files[i])) customer <- substring(readLines(openFile, n = 5)[4], 38, 47) close(openFile) return(customer) }

August was a bit “light” for summer vacation reasons, but I attacked September with renewed vigor! I look back to when I started this journey in June, and I marvel at how much I’ve learned since thing. And I’ve also been humbled at how much more there is to learn (years’ worth!). However, in the spirit of documenting what’s been accomplished, here’s the record for September 2015:

Coursera – Exploratory Data Analysis

This is the fourth course in the Coursera Data Science Specialization track and by far, the most enlightening one so far. The initial exploration of a data set is vital to future insights. In addition to learning about the various R graphics packages (especially ggplot) an introduction to clustering (i.e. k-clusters) provided a taste of things to come.

Data Science requires hard work. It’s not a simple set of skills that can easily be achieved by a set of short tutorials. The last part of this course on clustering gives a taste of that fact. You will need an understanding of some higher level mathematics (statistics and linear algebra) in order to make use of the tools and algorithms that have been developed. No pain, no gain, folks. If you’re not willing to sweat for your eigenvalues, then get out of the gym!

Coursera – Machine Learning

S octave peaking of mental stretches, this 11-week course on Machine Learning was one of the most challenging I’ve encountered since graduate school It was heavy on the mathematics (especially Linear Algebra) and introduced me to a new mathematics software package called Octave (an open source rendition of MatLab).

This has been the most challenging course I’ve taken so far. The professor recorded 110 lecture videos on a wide variety of topics from basic linear algebra to pattern recognition in photographs to determine numeric digits. You will not be an expert in the subject after this course (after all, are you an expert in Physics after a single class?). But the notes I took and the challenging quizzes and lab exercises this course demanded will provide a wealth of material that I will be referring to for years to come!

Make no mistake – this is an advanced course. But this is where modern Data Science is at. You need to know this material if you’re serious about the subject.

The two items above are what were completed in September. But for the record, I pursued a number of other tracks throughout the month. I’ll write more about each in the month I completed them, but briefly, this month also contained daily work in:

Doing Data Science – What a great book written by two women who developed a course on Data Science at Columbia University! This book feels like a friend or coworker who pulls me aside and says, “Here’s what Data Science is really about.”

Data Smart – I’m working through this book for a second time. I’ve read the book already, but now I’m going back to the beginning and working through every Excel example, taking detailed notes on every step. Chapter 2 on K-means clustering makes so much more sense now, especially tied in the with Exploratory Data Analysis course as described above.

SAP HANA Administration and Getting Started with SAP Lumira – I’ve read more than halfway through both of these books in September. HANA as an in-memory, lightening quick database and Lumira as one of the coolest interactive reporting platforms I’ve ever gotten my hands on.

I gave a demonstration this morning to my company’s CEO of a HANA dataset consisting of about a million records that were sliced & diced in no time at all using Lumira’s beautiful & intuitive graph development platform. I’ve been working with data & reporting for 20 years now, and I still am smiling from this leap in technology.

Coursera – Data Analysis and Statistical Inference – Finally, I began a new course that reinforced the fundamentals of probability and statistics. So far, basic statistics (mean, variance, normal distribution, Bernoulli distribution) as well as Baye’s Theorem and hypothesis testing were covered in detail in the first three weeks. This foundation of statistics is essential to further progress in Data Science. And so far, this has been the highest quality of any Coursera course I’ve worked through.

	Brian Devine on My Journey has Come to an End…
	dreamingofdata on May Interlude – Returnin…
	Gayatri Kalele on May Interlude – Returnin…
	My Journey has Come… on May Interlude – Returnin…
	Istvan on Accomplishments – Sept…

dreaming of data

a data science journey

Month: October 2015

Simple file scraper in R

Accomplishments – September 2015

Coursera – Exploratory Data Analysis

Coursera – Machine Learning