Mathematics of Machine Learning

Practical Information

Thursday 15-16 Online


In this course we will study the mathematical foundations of Machine Learning, with an emphasis on the interplay between approximation theory, statistics, and numerical optimization. We will begin with a study of Statistical Learning Theory, including the concepts of Empirical Risk Minimization and VC dimension. We will then move on to the study of numerical Optimization methods, which provide the foundation of machine learning algorithms. We will then discuss the foundations of many modern developmets in deep learning, including sequence models, variational autoencoders, generative adversarial networks, and reinforcement learning. While the course will be theoretical in nature, you are encouraged to experiment with Python and machine learning packages.

The lecture will consist of two recorded lectures per week and one online live lecture. More information will be available on the Moodle page. Regular lecture notes will be published on Moodle and these pages.


The course requires basic mathematical background, such as provided by the core curriculum at Warwick University. Specifically, the module will require familiarity with concepts from probability theory. An overview of mathematical background is available here, and this may serve as a refresher (but don’t worry if there are some things in there that you are not familiar with). The part dealing with probability and statistics is available as a separate document here.

Intended Learning Outcomes

Upon completion of this module you should be able to:

  • Describe the problem of supervised learning from the point of view of function approximation, optimization, and statistics.

  • Identify the most suitable optimization and modelling approach for a given machine learning problem.

  • Analyse the performance of various optimization algorthms from the point of view of computational complexity (both space and time) and statistical accuracy.

  • Implement a simple neural network architecture and apply it to a pattern recognition task.

  • Summarize recent developments in deep learning, including sequence models, deep generative models, robustness, and reinforcement learning.

Lecture Notes

Brief lecture notes will be published regularly on this page and on Moodle. They will be available on the dedicated Lectures page.


Weekly problem sheets can be found on the Exercises page. Assessed work will be 15% of your mark. Further information will be made available soon.

Additional Resources

The following references are not needed for the course, but can provide additional information and perspective for those interested.

  1. Felipe Cucker and Ding Xuan Zhou. Learning theory: an approximation theory viewpoint. Cambridge University Press, 2007
  2. Vladimir Vapnik. The nature of statistical learning theory. Springer, 2013
  3. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Springer, 2001
  4. Amir Beck. First-Order Methods in Optimization. SIAM, 2017
  5. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2018
  6. Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014