Machine learning for data science - CHE00045M

« Back to module search

Department: Chemistry

Credit value: 20 credits

Credit level: M

Academic year of delivery: 2024-25
See module specification for other years: 2023-24

Module summary

Data Science is the science and craft of extracting information from, and testing hypotheses against, data. A universe of statistical and machine learning techniques exist to help us do this. This module explores that universe through a series of lectures and ‘hands-on’ Python/Scikit-Learn workshops that are structured to give you a strong foundation stretching from design through development and deployment - it equips you to build robust, reliable, and scalable data pipelines that integrate machine learning models for your own data science projects.

Module will run

Occurrence	Teaching period
A	Semester 1 2024-25

Module aims

How can data help us to answer scientific questions? The aim of this module is to familiarise you with different machine learning problem domains (e.g. supervised, unsupervised, and reinforcement learning) and give you an appreciation of the kind of machine learning models available, in addition to ‘hands-on’ experience designing, developing, and deploying robust, reliable, and scalable data pipelines that integrate these models in Python/Scikit-Learn.

You will learn how to preprocess and partition data, select and/or design useful features, and implement, evaluate, and improve machine learning models. You will also learn to work with deep learning models, e.g. convolutional and recurrent neural networks; you will learn how to use these models with structured data, and get ‘hands-on’ experience implementing them in Python/Tensorflow/Keras. On completion of this module, you will come away with a strong foundation of applications-focused knowledge and practical skills stretching across the whole data science pipeline; you will be able to design, develop, deploy, and evaluate your own machine learning solutions.

Module learning outcomes

Students will be able to:

Distinguish different machine learning problem types: supervised vs. unsupervised vs. reinforcement; classification vs. regression.
Carry out data preprocessing, partitioning, and feature selection.
Implement supervised machine learning algorithms using Scikit-Learn.
Select and appraise alternative machine learning algorithms.
Evaluate the performance of a machine learning algorithm and implement techniques to improve it.
Implement deep machine learning algorithms (e.g. neural networks) using Tensorflow and Keras.
Design, develop, and deploy components across the data exploration, preprocessing, and prediction pipelines, constructing ‘end-to-end’ solutions.

Module content

Machine learning problem domains.
Data preprocessing: categorical and continuous data.
Supervised learning: e.g., (non)linear and logistic regression, support vector machines (SVMs), decision trees.
Model (cross-)validation and evaluation.
Model improvement: hyperparameter optimisation.
Neural networks: multilayer perceptrons (MLP); convolutional (CNN) and recurrent (RNN) neural networks.

Indicative assessment

Task	% of module mark
Essay/coursework	50
Essay/coursework	50

Special assessment rules

None

Additional assessment information

Scikit-Learn machine learning task.

1× Python notebook (50%)

Tensorflow/Keras machine learning task.

1x Python notebook (50%)

Indicative reassessment

Task	% of module mark
Essay/coursework	50
Essay/coursework	50

Module feedback

Feedback will be provided through workshops, online exercises and a formative assessment. Feedback on summative work will be provided within 25 working days of the assessment.

Indicative reading

Introduction to data science: a Python approach to concepts, techniques, and applications
Laura Igual, Santi Segui´. Springer 2017
Python for data analysis: data wrangling with Pandas, NumPy, and IPython
Wes McKinney. O'Reilly 2017
The hundred-page machine learning book
Andriy Burkov. Andriy Burkov 2019
Machine learning engineering
Andriy Burkov. True Positive Ltd. 2020
Machine learning with PyTorch and Scikit-Learn
Sebastian Raschka, Yuxi Liu, Vahid Mirjalili. Packt Publishing Ltd. 2022
Deep Learning with Tensorflow and Keras
Amita Kapoor, Antonio Gulli, Sujit Pal. Packt Publishing Ltd. 2022.

Studying at York

Machine learning for data science - CHE00045M

Module summary

Module will run

Module aims

Module learning outcomes

Module content

Indicative assessment

Special assessment rules

Additional assessment information

Indicative reassessment

Module feedback

Indicative reading

Sitemap

Studying

Support and advice

Health and wellbeing

Work, volunteering and career planning

York Futures

Study and work abroad

Accommodation

IT and online services

Finance

Student life in York

If things go wrong