- Department: Chemistry
- Credit value: 20 credits
- Credit level: M
- Academic year of delivery: 2023-24
- See module specification for other years: 2024-25
Computer programming is a key skill for data science, although it takes a slightly different form to general purpose programming. Programming allows us to quickly perform complex analysis, to automate more routine analyses, and to manage massive datasets with minimal work. You will learn to use the Python computer language and the Pandas extension. You will then apply this knowledge to create computer programs to read, process, interpret and present complex data in different ways. You will learn how to work collaboratively on computation problems, and how to record and report what you have done.
Occurrence | Teaching period |
---|---|
A | Semester 1 2023-24 |
Computer programming is a key skill for data science, although it takes a slightly different form to general purpose programming. Programming allows us to quickly perform complex analysis, to automate more routine analyses, and to manage massive datasets with minimal work. We will learn to perform data analysis using the Python language and Pandas extension.
The teaching of computer programming is often done in a way which maintains it as an elite activity. A key focus of this course is to teach programming in an inclusive way which makes it accessible to groups who have traditionally been marginalised in the computational sciences. We achieve this by closely linking programming concepts with familiar problems from different fields at every stage, and by delaying the introduction of more complex concepts until they are obviously required to address real world challenges.
Students will be able to:
Implement python code to read, manipulate and analyse datasets
Apply features of programming languages including data structures, loops, conditions, functions
Apply software engineering principles including documentation, testing and collaboration tools
Develop python notebooks for data analysis and data management
Create and organise GIT version control repositories
Use shell scripting and high performance computing
Evaluate different programming languages for a given data science application
Module Content Detail
Python programming for data science problems
Collaborative software engineering
Managing complex data
Data visualisation
Shell scripting and high performance computing
How to learn other programming languages
Task | % of module mark |
---|---|
Essay/coursework | 50 |
Essay/coursework | 50 |
None
Structured programming exercise (50%):
Computer program
50%
Freeform programming exercise (50%) (35% for code and 15% for viva):
Computer program, documentation and oral presentation
50%
Task | % of module mark |
---|---|
Essay/coursework | 50 |
Essay/coursework | 50 |
Feedback will be provided through workshops, online exercises and a formative assessment. Feedback on summative work will be provided within 25 working days of the assessment.
Introduction to data science : a Python approach to concepts, techniques and applications
Laura Igual, Santi Segui´. Springer 2017
Python for data analysis : data wrangling with Pandas, NumPy, and IPython
Wes McKinney. O'Reilly 2017
Pro Git
Scott Chacon, Ben Straub. Apress 2014
Python and Matplotlib essentials for scientists and engineers
Matt A. Wood. Claypool Publishers 2015
Visualization for the Physical Sciences
Lipsa et al. Computer graphics forum, 2012, Vol.31 (8), p.2317-2347
Introduction to scientific visualization
Helen Wright. Springer 2007
Data Modeling Essentials
Graeme Simsion, Graham Witt. Morgan Kaufmann 2004
Database design
Adrienne Watt, Nelson Eng. BC Open Textbook Project 2014