Statistical Pattern Recognition - MAT00031H

Department: Mathematics
Credit value: 10 credits
Credit level: H
Academic year of delivery: 2022-23

Related modules

Additional information

Students who wish to take this module but have not taken Introduction to Probability and Statistics should talk to the lecturer to obtain permission.

Pre-requisites for Natural Sciences students: must have taken Maths for Sciences 1 MAT00007C.

Module will run

Occurrence	Teaching period
A	Autumn Term 2022-23

Module aims

To introduce pattern recognition and machine learning techniques with the emphasis on applications in chemistry and biology.
To describe algorithms for clustering and classification.
To discuss application of the methods to large multivariate datasets obtained from –omics technologies (genomics/proteomics/metabolomics).
To discuss application of the methods to computer vision problems, in particular the analysis of biological images.
To allow students to apply the methods to a range of problems.

Module learning outcomes

Subject content

pattern recognition, measuring objects, features and patterns;
data reduction and pre-processing;
representation, distance and similarity measures;
feature selection, classification and validation;
unsupervised learning, clustering algorithms and principal components analysis;
Bayesian decision theory;
supervised learning, such as linear discriminant analysis and partial least squares;
machine learning algorithms, for example neural networks, self-organizing maps and decision trees;
combining classifiers;

Academic and graduate skills

application of pattern recognition and machine learning techniques to a range of problems;
use of appropriate scaling, feature weighting and other pre-processing techniques;
design and use of simple pattern recognition systems.

Module content

This course will introduce pattern recognition techniques with particular emphasis on applications in Chemistry and Biology. The aim is to extract useful information from chemical or biochemical data and techniques for handling multidimensional data together with algorithms for clustering and classifying data will be explained. Datasets from –omics technologies (genomics/proteomics/metabolomics) are typically extremely large, with very many more variables than observations (samples), and require data reduction and feature selection methods. In contrast, the application of pattern recognition and machine learning in computer vision problems requires feature extraction from images and the number of images is usually far greater than the number of variables extracted. Image analysis for automated biological object recognition will be discussed.

Indicative assessment

Task	% of module mark
Closed/in-person Exam (Centrally scheduled)	100

Special assessment rules

None

Indicative reassessment

Task	% of module mark
Closed/in-person Exam (Centrally scheduled)	100

Module feedback

Current Department policy on feedback is available in the undergraduate student handbook. Coursework and examinations will be marked and returned in accordance with this policy.

Indicative reading

Webb & Copsey, Statistical Pattern Recognition, Wiley
Hastie, Tibshirani & Friedman, The elements of Statistical Learning, 2nd Edition, Springer, 2008.