I returned recently from ODSC – East, a 3-day Data Science Conference held in Boston. I am no stranger to tech conferences, having attended numerous SAP, SalesForce.com and other major events across the country. But this was the first ODSC I’ve attended (not surprising since I only found my calling for data science last summer). And it may very well be my last.
Everyone’s experience was certainly unique. No doubt, there were many who felt this was the best 3-day experience of their lives. I also know personally of a few who were so frustrated by the event, they bailed out on Sunday.
So what I write below are my observations and experiences. If you are considering going to one in the future, please do seek out other opinions. But for the $800 I shelled out for this weekend (registration, hotel, travel, food, parking), I could have gotten 10 Coursera courses or 15 books instead, and benefitted far more than I did from this conference.
My goals were simple:
- Learn data science (technology, methodology, state of the industry, etc)
- Explore career opportunities
For me, this conference fell short on all three accounts. More below:
The Friday training workshops
Unfortunately, the first session I picked, “Business and Science Domain Applications of Big Data Analytics Using R” was a bad pick. It was a very disjointed presentation, bouncing between reading research papers verbatim from a big screen and imagining visualizations on an empty wall. It was a stream-of-consciousness exploration of a number of machine learning techniques (like nearest neighbors and principle component analysis), but none of it clear to someone who came to learn about it. There was no mention of “big data” (the iris dataset does not count as big data) nor was there much R to be seen. Although, there was a surprise Python code review randomly thrown in the middle.
The afternoon session “Hadoop Hands On” was a little better. I only cursorily know what Hadoop (that there’s a yellow puffy elephant was the extent of my knowledge) and hoped this workshop would fill in the void. David Whitehouse from Cloudera did a great job explaining the role Hadoop plays in the overall Big Data stack, and how other platforms like Spark and Kafka play a role. He even got under the hood to explain how data was stored and gave some working demonstrations of scripts to pull various data together.
Disappointingly, there was very little “hands-on” to be seen. He did bring 20 USB drives containing a 5 GB Cloudera virtual instance which were passed around the 200 or so members of the audience to copy to their local machines. But aside from that, it was pretty much a 4 hour lecture, which was certainly engaging for the first 3 hours. I really did wish this workshop could’ve been rethought to allow for more hands-on activity.
There were many other workshops I could have attended, including:
- Intro to Data Visualization
- Using Open Data with APIs
- Intro to Python for Data Science
- Adaptive Deep Learning for Vision and Language
- Interactive Data Visualizations in R with Shiny and ggplot2
- …quite a few others
From what I saw on the Twitter feed, there were mixed reviews on these as well. Some were praised, others were panned.
The Saturday morning keynotes
Saturday morning started with a packed house of maybe 600+ data scientists to here the following four “celebrities” speak
- Ingo Mierswa, RapidMiner
- Lukas Biewald, Crowdflower
- the famous Kirk Borne, Booz Allen Hamilton
- Stephan Karpinski, Julia Computing
All four were engaging. Of course, Kirk Borne knocked it out of the park. I was most pleasantly surprised by Stephan Karpinski who spoke about the Julia programming language. Unless he was BS’ing up there, it’s a programming language especially suited for speedy data analytics and definitely worth exploring.
My recommendation for the organizers: In future sessions, please do go out of your way to provide a panel of speakers that’s just a bit more diverse than 4 white guys. We don’t have to look far. In my non-scientific scan of the attendees, they came in all shapes, sizes and colors. We could certainly stand to see something a little more representative in the keynote speakers.
The Saturday & Sunday sessions
The “meat” of the conference were the nearly 100 sessions to choose from over the weekend. A couple of quick observations and then some highlights.
Organizers were not prepared for the size of the crowd. You would have thought every data scientist in the world was in attendance. The section of the convention center to which this conference was isolated was too small for the volume of people. Starting with the registration process Friday morning (long, slow lines you’d expect from a new Disney World ride) it was clear that there was a logistics problem. There were coffee & bagels provided, but as one person observed, “Nothing sadder than seeing data scientists in front of an empty coffee urn”.
These sessions were first come, first serve. I think pre-registration would have been very welcome here. Quite a few of the sessions were packed with every seat taken and the floor filled with sitting participants trying to make do. In one of the sessions, an organizer had to kick about 20 attendees out because the room was over capacity. Once it became clear that sessions could fill up so quickly, the attendees jostled to get into rooms with the fervor of Black Friday shoppers at the mall.
Scheduling was erratic and inconsistent. There were three sources of data for the schedule: The web site, a printed paper handed out upon registration and a mobile app. It was observed that sometimes the three didn’t agree. I can certainly appreciate that last-minute changes do occur – but when something fundamental like lunch changes, you can see there were opportunities to be had in the scheduling department.
The mobile app was awful for trying to find a session of interest. The conference had several major tracks: Disruptive Data, Big Data, Data Visualization, and Data Science for Good. You were free to pick any sessions from among any of the tracks – but the mobile app would only show you one track at a time and one room at a time. Trying to determine what’s playing at 12:30pm? You have to navigate each track separately and click on each room to see what’s playing. It could take you five minutes to figure out where to go. One of the attendees was generous enough to create and share a GoogleDoc with a simple grid-view of all the sessions. Sometimes, simple is better.
Finally, they need to build in buffer time between sessions (5-10 minutes) to allow travel time from room-to-room as well as to allow rooms to clear out before attendees arrive for the next session. It got pretty ugly when 150 people trying to get into a room that was still packed from the previous session that had just let out.
Workshops were often just lectures. Don’t advertise something as a workshop if it’s just a speaker talking for an hour. You’re supposed to work at a workshop. If you don’t intend to make it a hands-on experience, fine. There’s certainly plenty to be learned from lectures as well. Just don’t call it a workshop.
Sessions of note
There were a couple of sessions that I do want to give praise to.
- Dhiana Deva (Spotify) gave a very energetic, down-to-earth tour of various machine learning techniques. Titled “Machine Learning for Everyone”, she certainly delivered her promise making the most of the 30 minutes she had and provided a great service to brand-new data scientists.
- Allen Downey (Olin College) led a great hands-on workship on “Bayesian Statistics Made Simple”. A full appreciation of Bayes’ statistics has eluded me, but with a combination of GoogleSheets and iPython notebooks, I have a much deeper understanding of the subject. Next month, I will be reading his freely available book, “Think Bayes”.
- Jared Lander (Lander Analytics) led my favorite workshop: “ggplot2: from scratch to compelling graphs”. My fingers are still sore from keeping up. The best quote from the conference came from him as he took ggplot2 requests from the audience: “It’s going to be disgusting, but let’s do it. Oh God, it became a big pile of vomit!”
- Ted Kwartler (Liberty Mutual Insurance) had a very interesting session, “Introduction to Text Mining in R”. With Ted, you felt like you saddled up to an experienced coworker who told you to toss away the textbook and he’s going to show you how it’s really done. He was generous enough to share his R code, cultivated over years of battle-tested experience.
The Career Fair
I have a future blog coming soon in which I will become more personal and tell my story. But for the purposes of this post, I have recently left my job of CIO for a large distribution company (my place of employment for 19 years) and am in the process of selling my house in New York and relocating to South Carolina. I’m not actively seeking employment – but I will once my move is permanent (hopefully no later than then end of the summer).
I dropped in at the Career Fair to do a little reconnaissance on what companies are looking for: skills, experience, location of employees, etc. What I discovered there was not all that encouraging.
First, I did observe that the booths were divided evenly between educational programs looking to take on students (like Rutgers, MIT, General Assembly). I’m not planning to plunk down $20k-$40k on a degree program, so I passed on those. The remaining 10 tables were represented by corporations like Johnson & Johnson, Nielson Ratings, McKinsey & Company and CVS. The stories I saw there were all very similar:
- Go to our web site. You will see exactly what positions we are looking for
- A minimum of Masters or PhD in data analysis
- Need to live in or near a big city (Boston, NY, Washington, Chicago)
I didn’t walk away from the Career Fair with any extraordinary level of confidence. But then again, it was a very, very small sample of companies that are looking to hire – and you know what they say about extrapolating from small samples.
My overall impressions were the following:
- This is a conference that’s experiencing growing pains. Hopefully, they learn and improve with each iteration
- Advanced QA needs to be performed on the sessions. Expected standards of delivery should be given (i.e. workshops should be interactive, presentations should be concise with slides presented in advance)
- I would have benefitted from more organized assistance with networking and socialization. A networking beer party in a crowded dim room that’s too loud to have any conversation doesn’t work for me. It’d work just fine if I were showing up with 5 coworkers. But I knew absolutely no one there – so a better environment for breaking the ice would have been appreciated
- Guidance on level of expertise needed at various sessions. Maybe tracks labeled: Beginner, Intermediate, Expert
- Scheduling needs to be fixed and the app overhauled
ODSC has a lot of potential, and I may attend again next year and hopefully walk away feeling like I got my money’s worth.