Lucas Étourneau

FDR control and missing value imputation for the analysis of mass spectrometry-based proteomics data

Published on 24 January 2024

Thesis presented January 24, 2024

Abstract:
Proteomics involves characterizing the proteome of a biological sample, that is, the set of proteins it contains, and doing so as exhaustively as possible. By identifying and quantifying protein fragments that are analyzable by mass spectrometry (known as peptides), proteomics provides access to the level of gene expression at a given moment. This is crucial information for improving the understanding of molecular mechanisms at play within living organisms. These experiments produce large amounts of data, often complex to interpret and subject to various biases. They require reliable data processing methods that ensure a certain level of quality control, as to guarantee the relevance of the resulting biological conclusions.
The work of this thesis focuses on improving this data processing, and specifically on the following two major points: The first is controlling for the false discovery rate (FDR), when either identifying (1) peptides or (2) quantitatively differential biomarkers between a tested biological condition and its negative control. Our contributions focus on establishing links between the empirical methods stemmed for proteomic practice and other theoretically supported methods. This notably allows us to provide directions for the improvement of FDR control methods used for peptide identification.
The second point focuses on managing missing values, which are often numerous and complex in nature, making them impossible to ignore. Specifically, we have developed a new algorithm for imputing them that leverages the specificities of proteomics data. Our algorithm has been tested and compared to other methods on multiple datasets and according to various metrics, and it generally achieves the best performance. Moreover, it is the first algorithm that allows imputation following the trending paradigm of "multi-omics": If it is relevant to the experiment, it can impute more reliably by relying on transcriptomic information, which quantifies the level of messenger RNA expression present in the sample. Finally, Pirat is implemented in a freely available software package, making it easy to use for the proteomic community.

Keywords:
Biostatistics, Proteomics, Missing Value Imputation, FDR control, Mass Spectrometry, Transcriptomics

On-line thesis.

Top page

Keywords : PhD defense | EDyP | bioinformatics | biostatistics | transcriptomics | proteomics | Mass spectrometry | FDR

Alternative and Atomic Energies Agency

CEA is a French government-funded technological research organisation in four main areas: low-carbon energies, defense and security, information technologies and health technologies. A prominent player in the European Research Area, it is involved in setting up collaborative projects with many partners around the world.

Top page

Biosciences and bioenGineering for hEalth Laboratory - UA13 INSERM-CEA-UGA

In the same section :

FDR control and missing value imputation for the analysis of mass spectrometry-based proteomics data

Keywords : PhD defense | EDyP | bioinformatics | biostatistics | transcriptomics | proteomics | Mass spectrometry | FDR

Browse the site

Alternative and Atomic Energies Agency

Browse the portal

Biosciences and bioenGineering for hEalth Laboratory - UA13 INSERM-CEA-UGA

Biosciences and bioenGineering for hEalth Laboratory

Biomicrotechnology and functional genomics team (Biomics)

Exploring the Dynamics of Proteomes team (EDyP)

Genetics and Chemogenomics team (Gen&Chem)

In the same section :

FDR control and missing value imputation for the analysis of mass spectrometry-based proteomics data

Keywords : PhD defense | EDyP | bioinformatics | biostatistics | transcriptomics | proteomics | Mass spectrometry | FDR

Browse the site

Alternative and Atomic Energies Agency

Browse the portal