Accessibility statement

Machine Learning in Chemistry: from atoms to atmospheres - CHE00047M

« Back to module search

  • Department: Chemistry
  • Credit value: 20 credits
  • Credit level: M
  • Academic year of delivery: 2024-25
    • See module specification for other years: 2023-24

Module summary

This module will provide an overview of the application of data science and machine learning in chemistry and beyond by looking at four specific problem areas: the analysis of atomic structures, the simulation of molecular dynamics, the handling of atmospheric data, and the analysis of scientific image data. You will learn about different types of chemical data and the software methods used to validate, analyse and extract conclusions from them. You will apply this knowledge to some practical problems using real data and industry-standard tools. The diversity of data types, sources and approaches will give you enough experience to approach new problem domains with confidence.

Module will run

Occurrence Teaching period
A Semester 2 2024-25

Module aims

While data analysis methodology remains common to all disciplines, different methods are particularly suited to help with certain kinds and volume of available data. This module aims to provide relevant experience in the use of data analysis and machine learning techniques in four distinct areas of chemistry: atomic structure ('Molecular Structure Data'), atomistic simulations ('Machine Learning in Computational Chemistry'), atmospheric chemistry ('Atmospheric Data') and molecular property prediction and design ('Applications of Neural Networks in Chemistry').

Module learning outcomes

Students will be able to:

  • Analyse and evaluate large datasets from different sources.

  • Develop suitable validation criteria for different data types.

  • Create software that extracts chemical knowledge from computational representations of molecules.

  • Appreciate applications of supervised and unsupervised machine learning models in computational chemistry.

  • Implement graph neural networks for molecular property prediction.

Module content

Content separated by sub-module:

  • Macromolecular Structure: accessing and retrieving data from a molecular structure database; performing data validation; gathering statistical information about bond lengths, angles, and torsions.

  • Atmospheric Data: accessing and working with atmospheric data e.g. air pollution data; counterfactual analysis used in for example the analysis of interventions to improve air quality; parameterizations to support atmospheric chemistry modelling.

  • Machine Learning in Computational Chemistry: bypassing expensive computational chemical calculations using machine learning (ML); representing structures of molecules in computers; using these representations for unsupervised classification and clustering of molecular structures; using neural networks for rapid prediction of potential energies; and using kernel regression to predict molecular properties.

  • Applications of Neural Networks in Chemistry: working with molecules in computers [atomic simulation environment (ASE); RDKit]; molecular representations; molecular property prediction using feedforward (handcrafted features) and graph (learned features) neural networks (GNNs); generative molecular design using recurrent neural networks (RNNs).

Indicative assessment

Task % of module mark
Essay/coursework 50
Essay/coursework 50

Special assessment rules

None

Additional assessment information

Assessment 1

Essay/coursework (project report, data presentation including code): Data analysis and presentation for chemistry problem domains. Students to submit code in a compressed file, use up to 4 sides of an A4 to describe the results of their data analysis of one chemistry problem domain.

Assessment 2

Essay/coursework (project report, data presentation including code): Machine learning and neural networks for chemistry applications. Students to submit code in a compressed file, use up to 4 sides of an A4 to describe the results of the application of machine learning and neural networks to one chemistry problem domain.

Indicative reassessment

Task % of module mark
Essay/coursework 50
Essay/coursework 50

Module feedback

Feedback will be provided through workshops and online exercises. Feedback on summative work will be provided within 25 working days of the assessment.

Indicative reading

  • Introduction to Data Science : A Python Approach to Concepts, Techniques, and Applications
    Laura Igual, Santi Segui´. Springer 2017

  • Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython
    Wes McKinney. O'Reilly 2017

  • Pro Git
    Scott Chacon, Ben Straub. Apress 2014

  • Python and Matplotlib Essentials for Scientists and Engineers
    Matt A. Wood. Claypool Publishers 2015

  • Visualization for the Physical Sciences
    Lipsa et al. Computer Graphics Forum, 2012, Vol.31 (8), p.2317-2347

  • Introduction to Scientific Visualization
    Helen Wright. Springer 2007

  • Data Modeling Essentials
    Graeme Simsion, Graham Witt. Morgan Kaufmann 2004

  • Database Design - Adrienne Watt, Nelson Eng. BC Open Textbook Project 2014



The information on this page is indicative of the module that is currently on offer. The University constantly explores ways to enhance and improve its degree programmes and therefore reserves the right to make variations to the content and method of delivery of modules, and to discontinue modules, if such action is reasonably considered to be necessary. In some instances it may be appropriate for the University to notify and consult with affected students about module changes in accordance with the University's policy on the Approval of Modifications to Existing Taught Programmes of Study.