Machine Learning

Tutorials, R and Python

  • Intro to Machine Learning (Py): An excellent introduction to applied ML from Udacity. The course focuses on the ML library scikit-learn. Part of Udacity’s Data Analyst Nanodegree, it takes an estimated 10 weeks to complete.

  • Machine Learning (Py): A popular introduction to the theory behind common ML algorithms, from Coursera founder and Stanford professor Andrew Ng. It takes an estimated 11 weeks to complete. A certificate is available for Coursera subscribers, but the material is free for everyone. Use of Octave/Matlab in only required when pursuing a certificate.

  • Chris Albon’s personal website - Lots of short tutorials. Mostly ML, but also web scraping, regular expressions, visualization, etc. Chris has also written a book.

  • Deep Learning: An online version of the popular deep learning textbook.

  • Natural Language Processing with Python: Free online version of the popular NLP book. Uses NLTK. Updated for Python 3.

  • Kaggle Titanic Tutorial (R): A tutorial aimed at Kaggle’s Titanic: Machine Learning from Disaster. Begins with some basics, then moves on to decision trees, feature engineering, and random forests.

  • Kaggle Titanic Tutorial (Py): Machine learning with scikit-learn and tensorflow

  • Machine Learning Mastery from Jason Brownlee (R/Python): Includes lots of self-study tutorials covering beginner to advanced topics in machine learning and statistics. Brownlee also offers some ebooks for $37-47, in case you’re looking for more depth and/or structure.

  • fast.ai: A website dedicated to making the power of deep learning accessible to all.

Toolkits

  • Scikit-Learn (Py): Simple and efficient tools for data mining and data analysis. Accessible to everybody, and reusable in various contexts. Built on NumPy, SciPy, and matplotlib. Open source, commercially usable - BSD license.

  • Keras (Py): A Python deep learning library. Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation.

  • TensorFlow (Py): An open source machine learning framework.

  • PyTorch (Py): A deep learning framework for fast, flexible experimentation.

  • Natural Language Toolkit (Py): NLTK is a leading platform for building Python programs to work with human language data.

  • caret (R): The caret package (short for _C_lassification _A_nd _RE_gression _T_raining) is a set of functions that attempt to streamline the process for creating predictive models.

  • class (R): Various functions for classification, including k-nearest neighbour, Learning Vector Quantization and Self-Organizing Maps.

  • stats (R): Offers a number of functions for supervised and unsupervised learning.