Statistical Data Science - MAT00100H
Module summary
Provides the theory behind machine learning algorithms as well as practical implementation in R, allowing students to perform statistical analyses of real data, from the formulation of the question to be investigated through to the presentation of the results.
Related modules
Additional information
This module can be taken from a general background in probability and statistics (e.g. a Stage 1 “Introduction to Probability and Statistics” module). An indicative brief syllabus is as follows:
-
Axioms of probability
-
Independence
-
Bayes Theorem
-
Random variables and moments
-
Joint distributions (mainly discrete) and covariance
-
The Law of Large Numbers and The Central Limit Theorem
-
Statistical models
-
Estimators (including what it means to be unbiased)
-
Confidence intervals for the mean of a normal distribution (variance known/unknown)
-
Linear regression
Although knowledge of how to code (e.g in R) is not a prerequisite for this module, it would be an advantage.
Elective Pre-Requisites
These pre-requisites only apply to students taking this module as an elective.
Semester 1
Prerequisites: Introductory University-level probability and statistics, equivalent to that found in MAT00004C. Core mathematics content: differentiation; vectors/matrices/eigenvectors.
Module will run
Occurrence | Teaching period |
---|---|
A | Semester 1 2025-26 |
Module aims
Provides the theory behind machine learning algorithms as well as practical implementation in R, allowing students to perform statistical analyses of real data, from the formulation of the question to be investigated through to the presentation of the results.
Module learning outcomes
By the end of the module, students will be able to:
-
Describe and discuss the theoretical foundations of the statistical models and tools considered.
-
Use various statistical tools to analyse real datasets in R.
-
Select appropriate machine learning and statistical approaches for specific applications.
-
Perform independent statistical data analysis on a real data set with a particular research question.
-
Write up the results of statistical data analysis, employing tables and graphs as appropriate.
Module content
Subject content
-
pattern recognition, measuring objects, features and patterns;
-
data reduction and pre-processing;
-
representation, distance and similarity measures;
-
feature selection, classification and validation;
-
unsupervised learning, clustering algorithms and principal components analysis;
-
Bayesian decision theory;
-
supervised learning, such as linear discriminant analysis;
-
machine learning algorithms, for example decision trees;
-
combining classifiers
Academic and graduate skills
-
application of pattern recognition and machine learning techniques to a range of problems;
-
use of appropriate scaling, feature weighting and other pre-processing techniques.
Indicative assessment
Task | % of module mark |
---|---|
Closed/in-person Exam (Centrally scheduled) | 50 |
Essay/coursework | 50 |
Special assessment rules
None
Additional assessment information
If a student has a failing module mark, only failed components need to be reassessed.
Indicative reassessment
Task | % of module mark |
---|---|
Closed/in-person Exam (Centrally scheduled) | 50 |
Essay/coursework | 50 |
Module feedback
Current Department policy on feedback is available in the student handbook. Coursework and examinations will be marked and returned in accordance with this policy.
Indicative reading
James G, Witten D, Hastie T and Tibshirani R (2013). An Introduction to Statistical Learning with Applications in R. Springer
Everitt B and Hothorn T (2011). An Introduction to Applied Multivariate Analysis with R. Springer