The hype around artificial intelligence (AI) and machine learning (ML) skyrocketed in late 2016. I felt at the time that my lack of understanding of the topics affected my ability to discuss them with others and my ability to grapple with some of the more existential questions around algorithm fairness. At that moment I decided to immerse myself in the world of data science. I’ll share the resources and educational materials I’ve used for more than a year with a hope that you’ll find them helpful too.
Statistics and linear algebra are two fields necessary for understanding ML. I was familiar with linear algebra, but not so much with statistics. I learned the basics from two resources:
Both resources are visual and not as dull as most textbooks, and most importantly, they provide enough understanding to start with ML. You might need more advanced material for advanced ML research, but here I’m focused on how to get started in the field.
I see it as a significant failure of the educational system that it didn’t teach me statistics in more than 15 years I’ve spent there, especially because half of it was biased towards math and computer science. Only when I learned the basics of it did I understand how much my career would have been easier and my reasoning better if I had done it earlier.
There are many good ML and data science resources out there. I’ve decided on Udacity’s Machine Learning Engineer Nanodegree for a few reasons:
- It has good coverage of ML foundations and specific areas like supervised, unsupervised, reinforced, and deep learning.
- The course consists of video lectures, quizzes, and coding assignments. You learn best when you combine instructions with hands-on work.
- People who work in the field review all project submissions. Feedback from reviewers with industry experience has been immensely helpful and insightful, even when it came in the form of rejection.
- Python is the language of choice throughout the course. I already know Python basics, and it’s the same tech stack used at Google, so it was a nice benefit.
To graduate from the course, I had to define a problem and solve it using ML. I’ve decided to build a book recommender system because books and reading are my passion. Recommender systems weren’t in the curriculum, and that has challenged me to find solutions on my own; I couldn’t just repurpose some of the course assignments. Here’s the report and code if you’re interested:
My biggest takeaway from the project is that simple techniques and adjustments are responsible for 80-90% of the final result. You have to critically examine if the remainder is worth that additional complexity you need to get to 100%.
If you find all of this at least a bit interesting, I’ve also written another piece where I shared my thoughts and observations about what ML is in essence, why we experienced the recent hype, and some of the potential dangers and opportunities that await us in the future. I think it’s worth reading.