Programming for data science - CHE00044M
Module summary
Computer programming is a key skill for data science, although it takes a slightly different form to general purpose programming. Programming allows us to quickly perform complex analysis, to automate more routine analyses, and to manage massive datasets with minimal work. You will learn to use the Python computer language and the Pandas extension. You will then apply this knowledge to create computer programs to read, process, interpret and present complex data in different ways. You will learn how to work collaboratively on computation problems, and how to record and report what you have done.
Module will run
Occurrence | Teaching period |
---|---|
A | Semester 1 2023-24 |
Module aims
Computer programming is a key skill for data science, although it takes a slightly different form to general purpose programming. Programming allows us to quickly perform complex analysis, to automate more routine analyses, and to manage massive datasets with minimal work. We will learn to perform data analysis using the Python language and Pandas extension.
The teaching of computer programming is often done in a way which maintains it as an elite activity. A key focus of this course is to teach programming in an inclusive way which makes it accessible to groups who have traditionally been marginalised in the computational sciences. We achieve this by closely linking programming concepts with familiar problems from different fields at every stage, and by delaying the introduction of more complex concepts until they are obviously required to address real world challenges.
Module learning outcomes
Students will be able to:
-
Implement python code to read, manipulate and analyse datasets
-
Apply features of programming languages including data structures, loops, conditions, functions
-
Apply software engineering principles including documentation, testing and collaboration tools
-
Develop python notebooks for data analysis and data management
-
Create and organise GIT version control repositories
-
Use shell scripting and high performance computing
-
Evaluate different programming languages for a given data science application
Module content
Module Content Detail
-
Python programming for data science problems
-
Collaborative software engineering
-
Managing complex data
-
Data visualisation
-
Shell scripting and high performance computing
-
How to learn other programming languages
Indicative assessment
Task | % of module mark |
---|---|
Essay/coursework | 50 |
Essay/coursework | 50 |
Special assessment rules
None
Additional assessment information
Structured programming exercise (50%):
Computer program
50%
Freeform programming exercise (50%) (35% for code and 15% for viva):
Computer program, documentation and oral presentation
50%
Indicative reassessment
Task | % of module mark |
---|---|
Essay/coursework | 50 |
Essay/coursework | 50 |
Module feedback
Feedback will be provided through workshops, online exercises and a formative assessment. Feedback on summative work will be provided within 25 working days of the assessment.
Indicative reading
-
Introduction to data science : a Python approach to concepts, techniques and applications
Laura Igual, Santi Segui´. Springer 2017 -
Python for data analysis : data wrangling with Pandas, NumPy, and IPython
Wes McKinney. O'Reilly 2017 -
Pro Git
Scott Chacon, Ben Straub. Apress 2014 -
Python and Matplotlib essentials for scientists and engineers
Matt A. Wood. Claypool Publishers 2015 -
Visualization for the Physical Sciences
Lipsa et al. Computer graphics forum, 2012, Vol.31 (8), p.2317-2347 -
Introduction to scientific visualization
Helen Wright. Springer 2007 -
Data Modeling Essentials
Graeme Simsion, Graham Witt. Morgan Kaufmann 2004 -
Database design
Adrienne Watt, Nelson Eng. BC Open Textbook Project 2014