Accessibility statement

Bioinformatics

Like many other sciences, biomedical research is increasingly making use of big data to improve our understanding of health and disease. Technically, bioinformatics refers to any computational analysis of biological data, but typically indicates the analysis of sequencing data.

The JBU was an early adopter of sequencing analysis through our “Urotheliome” project – utilising particularly RNA sequencing to understand the gene expression profile of urothelial cells in development, health and disease, including cancer. Since the appointment of Dr Andrew Mason in 2017, bioinformatics has become ever more integral to our research of cancer and benign disease. RNA sequencing forms the current majority of what we do bioinformatically, forming the basis for work in transcriptomic signatures, differential expression analysis, and machine learning-driven molecular subtyping.

In addition to the analysis and generation of data within the JBU, we make use of cancer data from international consortia. Predominantly this includes data from The Cancer Genome Atlas (TCGA) and UROMOL to understand muscle-invasive and non-muscle invasive bladder cancers respectively. Dr Mason is one of the bioinformatic leads of the bladder cancer research group in the Genomics England 100,000 Genomes Project – currently performing the largest analysis of bladder cancer mutations to date.

Biologically-driven molecular subtyping

Tumours can be divided based on mutation status, but often there are not enough “driver” or “drug-able” mutations to provide sufficiently resolved, clinically-relevant subgroups. Gene expression profiling, derived from RNA sequencing, provides a measure of the metabolic activity of particular tumours.

These approaches work by using expression values from genes which vary significantly across a cohort of tumours – allowing tumours to be separated into groups based on shared cellular profiles. Whilst these approaches are very good at separating large molecular features, such as “basal” or “luminal” bladder cancers, they largely focus on what a tumour looks like now, rather than how it got there. The JBU has pioneered work to determine specific in vitro signatures of particular processes, to then use targeted gene signatures to subtype tumours by unsupervised machine learning. Some of these defined groups match the broad macro-molecular groupings of traditional approaches, but others cut across or inform new subtypes, suggesting new therapeutic targets.

Uncovering the urotheliome

All somatic cells in our body contain the same DNA, with approximately 20,000 different genes providing the instructions (transcripts) for making proteins. Our cells become specialised by controlling which of those genes are “on” or “off”, and how much transcript “on” genes can produce. Each cell type therefore has a specific “transcriptomic profile” which can be observed, measured and manipulated to understand its regulation, plasticity and potential. Crucially for research into cancers (and other diseases), we can compare the profile of normal cells to cancers to understand what has gone wrong, and to identify particular processes which could be targeted therapeutically.

The Jack Birch Unit began to profile the urothelial transcriptome in the late 2000s using microarrays, a fluorescence-based technique for quantifying transcript abundance. We quickly shifted to RNA sequencing, a more absolute quantitative approach which also allows you to look at the transcript sequence.

In 2010 Professor Southgate and Dr Baker were awarded The Astellas European Foundation prize in urology to use RNA sequencing to derive the “Urotheliome” – a high resolution transcriptomic map of the urothelium. Since then we have developed datasets of urothelial differentiation, stimulations by particular gene regulators and drugs, characterised bladder cancer cell lines, and developed specific transcriptomic signatures of immune activation, viral infection and metabolic dysregulation. These datasets are all interrogated with reference to international bladder cancer cohorts, such as The Cancer Genome Atlas (TCGA) and UROMOL consortia. By working with normal data we are also able to understand which genes are transcribed in urothelium, and which transcripts correspond to immune and muscle contamination of tumours.

We have now amassed a large database of RNA sequencing data from hundreds of patients and experimental conditions which we are now developing into a public-facing resource for the academic community.

Novel urothelial transcripts?

While we have approximately 20,000 genes, our cells are capable of producing over 100,000 different proteins. This is partly because genes can produce transcripts which include different parts of the gene sequence. Studies from across the human body find tissue-specific transcripts, many of which have key regulatory or functional roles in the tissue. As urothelium is a relatively under-studied tissue, we have begun to using “long-read” RNA sequencing to find exactly which transcripts are produced by urothelium, and whether there are any novel ones.

We are also working to understand how cells “choose” which transcript to use. This feeds into work by Dr Mason on human endogenous retroviruses (HERVs): remnants of historical retrovirus infections which can modify gene expression.

Resolving the urotheliome to single cells

Much of our work has focused on the population-level transcriptome of the urothelium: the overall profile of the tissue. We have recently begun to apply new technological developments to profile single cells, allowing us to identify rare cell subtypes within the urothelium, and to address whether hierarchical stem cells are present in the tissue.
In work funded by the Biotechnology and Biological Sciences Research Council we have also generated multi-omic (transcriptomic, DNA methylation and chromatin accessibility) data from different urothelial populations to understand the plasticity and drivers of phenotype change.