Responsible Data Science by Design
Project Leads: Professor Dimitris Kolovos, University of York and Distinguished Professor Michel Dumontier, Maastricht University
Profs Kolovos and Dumontier had a "compelling vision to seamlessly embed the principles of responsible data science into practical software development. This is a major move away from the current approach to embedding specific principles in software development which usually takes the form of checklists of requirements or legal/ethical expectations that developers may or may not follow."
This joint York-Maastricht project began in November 2019, and developed from Dimitris and Michel’s mutual interest in data science and knowledge management for software engineering. The aim of the three year project was to work on personal data vaults which individuals can hold their data in, which is a bit like a personal data bank account. However, their aim wasn’t to just build the basic infrastructure that is emergent in other projects such as Solid, which has been adopted in Belgium, and many other data facilitators that have emerged for health data but to additionally ask the questions of how do we compute over these distributed personal data vaults, and how do we conduct research with data that is distributed and not centralised.
Using a complimentary two-pronged approach and having an innovative problem domain including health care, health data, health care provisioning and research they did some case studies such as distributed polling and multi-party computation, carrying out computations among a set of peers without revealing each peer's individual information to everyone else.
They looked at everything from a privacy preserving perspective, making sure data is secure, looking at consent and reuse in the context of GDPR and examining the place for this technology in society and what it means in relation to the existing laws, norms or practices as well as from an educational perspective.
They also held a Data Science Summer School which ran in 2021-2022 as an introductory course to data science for students and professionals. The course provided an introduction to data science by covering basics of programming in Python: data cleaning and preprocessing techniques, data classification algorithms and validation methods. Teaching was carried out in an interactive problem-based learning environment where students were given opportunities to apply taught concepts to different datasets. The course lasted 2.5 weeks with daily lectures and practicals, and a final exam. By the end of the course students also completed a data analysis project. Over 30 students attended the summer school.
A student who attended stated “I thoroughly enjoyed the Summer School Introduction to Data Science. As a mature student with a background in Architecture, I am still new to the subject. Therefore, I personally found the course intense, but I am impressed with myself that I managed to learn so much in such a short time, which is testimony to the quality of the teaching. At the end, students were able to construct a simple, but nevertheless full-blown Machine Learning project, which surpassed my expectations. I would certainly recommend Maastricht as a university, and Data Science as a discipline, so much so that I am considering taking the time to complete a master’s degree at the Institute for Data Science.”
The project team also made many scientific contributions to the field, across engineering, life sciences and biomedical informatics, through papers and presentations. Some of their publications have appeared in very competitive publications and conferences, including the following papers:
- Towards model-based bias mitigation in machine learning, accepted in the A-ranked ACM/IEEE MODELS conference;
- ciTIzen-centric DatA pLatform (TIDAL): Sharing Distributed Personal Data in a Privacy-Preserving Manner for Health Research accepted for publication in the Journal Semantic Web, a key journal for Semantic Web research;
- DePhi: Decentralizing Philanthropy via Blockchain for Traceable Micro-transactions, presented in the 4th International Conference on Blockchain, Robotics and AI for Networking Security Conference (BRAINS);
- Web monetisation, published in the Internet Policy Review journal.
For the future, the professors have plans for a joint masters in responsible data science and will be embedding some of their emerging ideas from the scientific, technical and legal ethical sides of their joint project into the educational programme.
Of the collaboration itself, Distinguished Professor Michel Dumontier said “it started as a very different form of working together, where we started with an opportunity to work together, found common ground, outlined a vision and strategy, and formed a new team to tackle pressing challenges related to data management and processing. I think we can be quite proud of the success that we achieved in this project.”
Professor Dimitris Kolovos added “We also now understand better what our two teams are really good at and how we can collaborate with Maastricht in other joint projects in the future”.