Computing

Published on 5 August 2022

Mass spectrometry for proteomics has undergone many recent technological changes, leading to the development of instruments capable of fragmenting and analysing more and more complex samples at ever faster speeds. The data produced are undoubtedly richer, but they also represent a greater volume, up to 200 Go/month. These data must be organised and managed in a secure and automated system. This rapidly became a key focus of our activity as it became necessary to ensure reliable follow-through for samples and their corresponding data. The computing team developed and implemented an infrastructure capable of responding to these requirements, which today has a storage capacity of 5 To. No computing solutions adapted to Research and Development activity and to the management of large volumes of electronic data existed, we therefore developed a software solution - ePims - which ensures sample traceability, automated transfer of acquisitions and organises data on the storage space. This solution and its successive adaptations have been used in the laboratory since 2005. ePims is distributed under an open source licence, deployed by ASA computing solutions, to whom knowledge was transferred.

Identification and quantification of proteins are the basis of activity in mass spectrometry-based proteomics analysis. The computing team maintains, updates and configures the Mascot identification motor, which serves to compare MS data to databases of protein sequences. Because Mascot can return results for several thousand or tens of thousands of spectra and peptides and several hundred proteins, we developed a tool to help with validation. This tool, IRMa, allows automated validation of the suggestions made by Mascot according to filtering rules, or can be used for manual validation. With automated validation, IRMa delivers a coherent result with a controlled rate of false-positives.

Samples which are exceedingly complex are generally pre-fractionated. This allows us to delve further into the protein contents of these samples, but leads to tens or hundreds of independent identifications which must be combined. To deal with this combination of results, we developed a relational database capable of storing a large number of identifications, and a software application, hEIDI, to allow users to consult these data, combine them or compare them, not just individually, but across the proteome as a whole, or a sub-proteome.

The whole chain, from transfer of acquisition data to ePims up to validation of identification results and their export to the databases exploited by hEIDI has been automated by combining ePims with Mascot Distiller, Mascot Daemon and IRMa.

Dupierris V, Masselon C, Court M, Kieffer-Jaquinod S and Bruley C
A toolbox for validation of mass spectrometry peptides identification and generation of database: IRMa.
Bioinformatics, 2009, 25(15): 1980-1981

Top page

Alternative and Atomic Energies Agency

CEA is a French government-funded technological research organisation in four main areas: low-carbon energies, defense and security, information technologies and health technologies. A prominent player in the European Research Area, it is involved in setting up collaborative projects with many partners around the world.

Top page

Biosciences and bioengineering for health laboratory

In the same section :

Computing

Browse the site

Alternative and Atomic Energies Agency

Browse the portal

Biosciences and bioengineering for health laboratory

The Lab

Biomicrotechnology and functional genomics (Biomics)

Exploring the Dynamics of Proteomes team (EDyP)

Genetics and Chemogenomics team (Gen&Chem)

In the same section :

Computing

Browse the site

Alternative and Atomic Energies Agency

Browse the portal