Students who wish to take this module but have not taken Introduction to Probability and Statistics should talk to the lecturer to obtain permission.
Pre-requisites for Natural Sciences students: must have taken Maths for Sciences 1 MAT00007C.
Module will run
Occurrence
Teaching period
A
Autumn Term 2022-23
Module aims
To introduce pattern recognition and machine learning techniques with the emphasis on applications in chemistry and biology.
To describe algorithms for clustering and classification.
To discuss application of the methods to large multivariate datasets obtained from –omics technologies (genomics/proteomics/metabolomics).
To discuss application of the methods to computer vision problems, in particular the analysis of biological images.
To allow students to apply the methods to a range of problems.
Module learning outcomes
Subject content
pattern recognition, measuring objects, features and patterns;
data reduction and pre-processing;
representation, distance and similarity measures;
feature selection, classification and validation;
unsupervised learning, clustering algorithms and principal components analysis;
Bayesian decision theory;
supervised learning, such as linear discriminant analysis and partial least squares;
machine learning algorithms, for example neural networks, self-organizing maps and decision trees;
combining classifiers;
Academic and graduate skills
application of pattern recognition and machine learning techniques to a range of problems;
use of appropriate scaling, feature weighting and other pre-processing techniques;
design and use of simple pattern recognition systems.
Module content
This course will introduce pattern recognition techniques with particular emphasis on applications in Chemistry and Biology. The aim is to extract useful information from chemical or biochemical data and techniques for handling multidimensional data together with algorithms for clustering and classifying data will be explained. Datasets from –omics technologies (genomics/proteomics/metabolomics) are typically extremely large, with very many more variables than observations (samples), and require data reduction and feature selection methods. In contrast, the application of pattern recognition and machine learning in computer vision problems requires feature extraction from images and the number of images is usually far greater than the number of variables extracted. Image analysis for automated biological object recognition will be discussed.
Indicative assessment
Task
% of module mark
Closed/in-person Exam (Centrally scheduled)
100
Special assessment rules
None
Indicative reassessment
Task
% of module mark
Closed/in-person Exam (Centrally scheduled)
100
Module feedback
Current Department policy on feedback is available in the undergraduate student handbook. Coursework and examinations will be marked and returned in accordance with this policy.