Last week, I finished the 6 course Data Analyst Nanodegree offered by Sabastian Thrun’s Udacity Inc. Even with an above average number of academic credentials behind me, I will venture to say that this was the best educational endeavor I’ve been through yet. When I worked through my undergrad and graduate degrees in economics, I was young. Not even close was I to taking those learning opportunities as seriously as I could have — but they did provide other types of growth experiences so well worth it nonetheless. Before I found Udacity, I worked through Coursera’s Data Science Specialization where I learned to program in r and after that I spent a few months working with a mentor via Thinkful to learn Python and improve my analytical capabilities. After that, I took several online courses to brush up my statistics. While Udacity is my favorite learning experience to date, I was certainly prepared for it through a medley of work done beforehand.
The material is very video based containing intravideo quizzes with reading material as support. These are useful and curated to help you create a tangible product at the end of the class. However, the real value of being in the Nanodegree is access to the forums where instructors (not just one) follow up on student’s questions and concerns … rapidly. If I posted a question before I drove to work at 8 A.M., I would have an answer on the forum by the time I sat down in the office at 9 A.M. You never have to wait to figure something out and you always have help and guidance. Here is a quick overview of the projects I worked on…
- NYC Subway Analysis [ https://github.com/FrankRuns/Udacity/blob/master/NYC-Subway-Analysis/SubwayAnalysis_FrankCorrigan.pdf ] : You code in Python. You start with subway data from the NYC MTA, combine it with weather data from The Weather Underground … and analyze away via visualization and statistical rigor. You must contemplate how to work with missing values, non-normal distributions, and statistical hypothesis testing to eventually answer the question — ‘do more people ride the subway when it’s raining?’ For me, this course was about understanding the process of an end to end data science project.
- Boulder, CO. Open Street Map Data Analysis with MongoDB [ https://github.com/FrankRuns/Udacity/blob/master/OpenStreetMap-Analysis/OSM_Data_Project/WranglingOpenStreetsMapData.pdf ] : You’re still in Python. And now you learn about unstructured databases (to store data without a predefined data model). You also start to get a feel for the notion of big data since you are required to work with a .osm file > 500MB. You are required to take a .osm (which is a XML file) file from Open Street Map, parse it, and feed the data into MongoDB using a Python driver. Then you run queries on the database to answer questions about that geographic region (how many footpaths and mountain peaks are located in this region?).
- Political Contributions Data Analysis with R [ https://github.com/FrankRuns/Udacity/blob/master/Data-Analysis-with-R/Presidential_Campaign_Donations_FL2012.md ] : Enter the r programming language. This course is a deep dive in exploratory data analysis. For me this course was about 2 things: the dplyr and ggplot2 packages in r. dplyr allows you to swiftly organize, join, and aggregate data. ggplot2 helps you create powerful visualizations QUICKLY. Following this course, I immediately started presenting r visualizations in the office… and the results were not disappointing.
- Identifying Corporate Fraud with Email Data, Corporate Compensation Data, and Machine Learning [ https://github.com/FrankRuns/Udacity/blob/master/Machine-Learning/final_project/submission/IdentifyFraudfromEnronEmail.pdf ] : Back to Python. You dive into all the common machine learning topics — classification, regression, overfitting & regularization, cross validation, feature engineering, decomposition methods (PCA), outlier detection, feature scaling techniques, tdf-idf, etc. Algorithms of focus include Naive Bayes, SVM, Decision Trees, k-means, and linear regression. You learn how to take data and make a prediction… but more importantly learn why your prediction may be right or wrong. To me, this course (and to a lesser extent all these courses) are not about coding… but about a new way to approach the world.
- Visualizing Uber Surge Price Data with d3.js [ https://github.com/FrankRuns/Udacity/tree/master/Visualize-Data/uber_vis ] : Enter JavaScript and html/css. This course spends a bit of time on the theory of data visualization but also walks through coding examples in d3.js — the most notable one was creating a line chart of World Cup attendance over time. For the project, I opted out of using one of the recommended data sets and collected surge rate data from the Uber API, cleaned it in Pythong and R, and wrote the final script in javascript.
- A/B Testing [ https://github.com/FrankRuns/Udacity/blob/master/AB-Testing/ABTesting_FinalProjectSubmission.pdf ] : Admittedly, this course is not fun (of course in m humble opinion). A lot more talking and a lot less coding. The goal is to analyze one of Udacity’s A/B Tests — whether or not a friendly message letting new users know that success in Udacity courses takes 5+ hours per week will deter some unprepared students (and therefore less likely to succeed) from consuming coaching resources while at the same time not diminishing the rate of students that sign up for the course and finish successfully. I was satisfied with the final project, but the course overall was a more challenging mountain for me than the previous 5 courses.
Great. So I now have a portfolio of cool projects, I feel comfortable coding in R, Python, and JavaScript, and I belong to a small but growing community of Udacity graduates for support in future undertakings. Is that Udacity’s secret ingredient? I believe Udacity’s secret ingredient is the positivity, support, and encouragement students receive from the organizations leaders. The feedback I received during the interview dry run project (optional, but another opportunity to test out your data science communication skills) perfectly demonstrates what I mean. When I submitted this project I included this comment…
Thanks again for reviewing. I know it’s not perfect, but I wanted to stay within the 1 hour time limit. I look forward to reading you comments on how I can improve. I hope you don’t think I am hopeless as a data analyst — what I don’t have in analytic nimbleness… I more than make up for with tenacity! Thanks again.
Although I had some further work to do on the project, the feedback commentary I received motivated me to work twice as hard…
I think you are being too hard on yourself. You have included an explanation with every single one of your answers (which is a rare occurrence in my experience as a reviewer). This indicates the importance you place not just on the analytics, but also on making sure that everyone can follow the work and the conclusions you come to. This is really good stuff.
In terms of “analytical nimbleness”, I think that sort of thing comes with experience. When you become a subject matter expert in both the analytical techniques and the dataset you are working with, novel approaches come easier. Don’t be too hard on yourself, but make sure that you are challenging yourself with new problems and new datasets.
I’m not sure I’ll ever forget how I felt when I read this on my iPhone. Simple encouragement, such as this, has me excited to jump into Kaggle contests, open-source projects, more data driven analytics tools at work, and probably the Machine Learning Engineer Nanodegree. As long as you are prepared (understand a bit of Python and your statistics are better than dismal) I 100% recommend investing in this 12-month long venture.