You are here : Home > The EDyP team > Expertise and Developments > Computing

Computing

Published on 5 August 2022
Mass spectrometry for proteomics has undergone many recent technological changes, leading to the development of instruments capable of fragmenting and analysing more and more complex samples at ever faster speeds. The data produced are undoubtedly richer, but they also represent a greater volume, up to 200 Go/month. These data must be organised and managed in a secure and automated system. This rapidly became a key focus of our activity as it became necessary to ensure reliable follow-through for samples and their corresponding data. The computing team developed and implemented an infrastructure capable of responding to these requirements, which today has a storage capacity of 5 To. No computing solutions adapted to Research and Development activity and to the management of large volumes of electronic data existed, we therefore developed a software solution - ePims - which ensures sample traceability, automated transfer of acquisitions and organises data on the storage space. This solution and its successive adaptations have been used in the laboratory since 2005. ePims is distributed under an open source licence, deployed by ASA computing solutions, to whom knowledge was transferred.

Identification and quantification of proteins are the basis of activity in mass spectrometry-based proteomics analysis. The computing team maintains, updates and configures the Mascot identification motor, which serves to compare MS data to databases of protein sequences. Because Mascot can return results for several thousand or tens of thousands of spectra and peptides and several hundred proteins, we developed a tool to help with validation. This tool, IRMa, allows automated validation of the suggestions made by Mascot according to filtering rules, or can be used for manual validation. With automated validation, IRMa delivers a coherent result with a controlled rate of false-positives.

Samples which are exceedingly complex are generally pre-fractionated. This allows us to delve further into the protein contents of these samples, but leads to tens or hundreds of independent identifications which must be combined. To deal with this combination of results, we developed a relational database capable of storing a large number of identifications, and a software application, hEIDI, to allow users to consult these data, combine them or compare them, not just individually, but across the proteome as a whole, or a sub-proteome.

The whole chain, from transfer of acquisition data to ePims up to validation of identification results and their export to the databases exploited by hEIDI has been automated by combining ePims with Mascot Distiller, Mascot Daemon and IRMa.

Dupierris V, Masselon C, Court M, Kieffer-Jaquinod S and Bruley C
A toolbox for validation of mass spectrometry peptides identification and generation of database: IRMa.
Bioinformatics, 2009, 25(15): 1980-1981