Statistical Data Science - MAT00100H

«Back to module search

  • Department: Mathematics
  • Credit value: 20 credits
  • Credit level: H
  • Academic year of delivery: 2025-26

Module summary

Provides the theory behind machine learning algorithms as well as practical implementation in R, allowing students to perform statistical analyses of real data, from the formulation of the question to be investigated through to the presentation of the results.

Related modules


Additional information

This module can be taken from a general background in probability and statistics (e.g. a Stage 1 “Introduction to Probability and Statistics” module). An indicative brief syllabus is as follows:

  • Axioms of probability

  • Independence

  • Bayes Theorem

  • Random variables and moments

  • Joint distributions (mainly discrete) and covariance

  • The Law of Large Numbers and The Central Limit Theorem

  • Statistical models

  • Estimators (including what it means to be unbiased)

  • Confidence intervals for the mean of a normal distribution (variance known/unknown)

  • Linear regression

Although knowledge of how to code (e.g in R) is not a prerequisite for this module, it would be an advantage.

Elective Pre-Requisites

These pre-requisites only apply to students taking this module as an elective.

Semester 1
Prerequisites: Introductory University-level probability and statistics, equivalent to that found in MAT00004C. Core mathematics content: differentiation; vectors/matrices/eigenvectors.

Module will run

Occurrence Teaching period
A Semester 1 2025-26

Module aims

Provides the theory behind machine learning algorithms as well as practical implementation in R, allowing students to perform statistical analyses of real data, from the formulation of the question to be investigated through to the presentation of the results.

Module learning outcomes

By the end of the module, students will be able to:

  1. Describe and discuss the theoretical foundations of the statistical models and tools considered.

  2. Use various statistical tools to analyse real datasets in R.

  3. Select appropriate machine learning and statistical approaches for specific applications.

  4. Perform independent statistical data analysis on a real data set with a particular research question.

  5. Write up the results of statistical data analysis, employing tables and graphs as appropriate.

Module content

Subject content

  • pattern recognition, measuring objects, features and patterns;

  • data reduction and pre-processing;

  • representation, distance and similarity measures;

  • feature selection, classification and validation;

  • unsupervised learning, clustering algorithms and principal components analysis;

  • Bayesian decision theory;

  • supervised learning, such as linear discriminant analysis;

  • machine learning algorithms, for example decision trees;

  • combining classifiers

Academic and graduate skills

  • application of pattern recognition and machine learning techniques to a range of problems;

  • use of appropriate scaling, feature weighting and other pre-processing techniques.

Indicative assessment

Task % of module mark
Closed/in-person Exam (Centrally scheduled) 50
Essay/coursework 50

Special assessment rules

None

Additional assessment information

If a student has a failing module mark, only failed components need to be reassessed.

Indicative reassessment

Task % of module mark
Closed/in-person Exam (Centrally scheduled) 50
Essay/coursework 50

Module feedback

Current Department policy on feedback is available in the student handbook. Coursework and examinations will be marked and returned in accordance with this policy.

Indicative reading

James G, Witten D, Hastie T and Tibshirani R (2013). An Introduction to Statistical Learning with Applications in R. Springer

Everitt B and Hothorn T (2011). An Introduction to Applied Multivariate Analysis with R. Springer