How to Get Started with Machine Learning

In General by fossil6 Comments

Welcome back! Now before we start, I’m going to briefly list out the math stuff since I wasn’t specific enough on the previous post. If you want a specific guideline to learn the math part really well, then message me or leave a comment and I will reply.

Calculus/Linear Algebra

short answer: know how to take gradients (derivatives) in vector calculus. Learn about eigenvectors and eigenvalues.

long answer: Most machine learning problems involve some sort of optimization. Calculus solves the problem to provide easy analytical solutions that you can write in a couple lines of Python code. But you probably won’t have to do even that. The math is handled for you in most frameworks. Why you need to know calculus is to bridge the gap between a general understanding of an algorithm and its mathematical construction. If you ever want to get the point of reading textbooks, publications or understanding college-level machine learning lectures, you won’t survive without calculus.

Khan Academy [Multivariable Calculus]

Khan Academy [Linear Algebra]

Probability

short answer: understand Bayes’ theorem, how probability distributions works, and what expectation values are.

long answer: I’m aware most computer science students have taken a discrete mathematics course that covered the basic probability concepts. If so, you should be more or less prepared in this category. If not, the probability will be the tricker math to learn. Machine learning is heavily tied into probability.  Khan Academy has a great course on introductory probability. It should suffice, but it is up to you to develop the intuition for the more difficult concepts.

Know the Math? Let’s Start!

Python and R have become standards in the data science community. I recommend Python over R so I will be focusing on that. I see R as a great statistical tool, but when we dwell into machine learning and especially deep learning, nothing beats Python currently. Once you’re comfortable with the math we can start the real stuff. Note, you don’t need to master any of the math, just really know the specifics of what I mentioned above. The further you go into data science, some math concepts will come to you naturally.

Hello Dr. Andrew Ng!

Sign up for the famous Introductory Machine Learning course by Andrew Ng on Coursera [Link]. It does not use Python but the professor explains the concepts extremely well for beginners. The course has a 4.9/5 stars from ~50,000 ratings and Dr. Ng is one of the leaders of the field. He also teaches a deep learning course on there, but since that’s a bit more advanced, you can enroll in it later.

Next, watch the first 15 lectures of the Stanford Machine Learning Course on Youtube (the rest cover reinforcement learning). This will fortify the mathematical concepts and expose you the in-depth theory and classical algorithms in machine learning. This part is very important. Think of it as a thorough expansion of his Coursera course.

Read and Code!

Since you’re going to need to learn some hands-ons implementation, grab a copy of Python Machine Learning off of Amazon. The book can be finished off in less than two weeks if you commit one hour a day to it. You will learn all the tricks and tips to using the main Python data science frameworks, including pandas, scikit-learn, numpy, Keras, etc. I’ve written a detailed post on the technical stack a typical machine learning beginner should adapt to, feel free to check it out.

Photo credit and rights to Amazon.com

Grab a copy of Deep Learning by Ian Goodfellow. This will serve as your reference text but I highly recommend you to read the first 7 chapters before you put it on the shelf. The authors place a higher emphasis on implementation concepts which is great! If you’re head is still woozy from everything before, the first 4 chapters serve as refreshers to linear algebra, probability and optimization. For avid mathematical readers, the book frequently cites papers in their texts just in case you want to dabble in proofs. If you do not wish to purchase the book, they have an online directory site dedicated to the book that you can keep on your Favorites tab [Link].

Photo credit and rights to Amazon.com

They covers the basic ML algorithms in literally one chapter and then dedicate the rest of the book to design, implementation and practices to deep learning. The book does require a strong understanding of math, so you’re going to find yourself re-reading chapters several time. But that is something you should always expect. This field ain’t easy.

We’re not quite done yet…

Watch Dr. Yaser Abu-Mostafa’s famous Learning From Data Caltech lectures on YouTube. It’s actually quite introductory but it will solidify the theoretical framework of statistical learning for you. I personally loved this course. The professor is probably my favorite to listen to amongst these lecture videos.

More Coding with Udemy!

LazyProgrammer on Udemy offers AMAZING Python courses in machine learning including deep learning. Deep learning is a sensational buzz right now in most industries, so it is important you start playing with frameworks that can perform deep learning. I’ll cover deep learning frameworks in a separate post; you need to learn the conceptual basics first with Goodfellow’s book. Unfortunately each of LazyProgrammer’s course will cost ~ $10 if you buy them at the right time during sales. But I still HIGHLY recommend them – Link. If you would like to know specifics on which course is worth taking, message me.

Now learn how evil robots are made…

Reinforcement Learning

Watch DeepMind’s Dr. David Silver’s course on introductory reinforcement learning on Youtube. Reinforcement learning is a branch of machine learning that is separate from what you have learned. It has wide and open ended applications and is used more often than you think. It’s the math behind creating intelligent AI’s to play a board game, or robots that learn how to navigate or balance an object.

The classic book on reinforcement learning by Sutton and Barto is available free online and is not a difficult read. [Link] The lectures by David Silver go hand in hand with them somewhat, so read and watch concurrently.

And possibly the BEST resource of all things reinforcement learning is this repository on GitHub [Link]. It provides detailed Python examples for virtually every core RL algorithm as well as summarized notes and links to lectures and talks. Keep it on your favorites tab.

Conclusion

Phew! Well that is it. If you do even 80% of what I listed in this guide, you should be ready for an entry-level data science position. That is assuming your also practicing writing code for ML projects as well. Note I did not cover statistics because I wanted to make a separate post on that. For interviews, you may find yourself answering questions that are geared towards model performance and evaluation which require lot of statistical jargon.

Competition is tough; many data scientist and machine learning engineer new-hires are being recruited from Masters/PhD programs. This means you will need to make yourself known. Since you’re a coder, you should create a GitHub and post your machine learning projects on there. Then fill your LinkedIn and resume with your new projects and data science technical stack.

As my knowledge about the data science/ML/AI industry grows, I will probably continue to make updates to this post. But for now, here you have it.

Good Luck!

Comments

  1. The Python Machine Learning book is 454 pages long and quite dense. How could anyone possibly read it in 14 hours (1 hour a day for 2 weeks)?

    1. Author

      A giant chunk of the book is full of plots and scripts. Hence, you can actually get through the theory pretty quick if you pace yourself 🙂

Leave a Comment