Thesis presented October 22, 2012
Abstract: Up to 40% of all proteins are known to bind metals, the intrinsic metal atoms providing catalytic, regulatory and/or structural roles critical to their functions. These metalloproteins are ubiquitous and of major importance within the three domains of life. However, current methods dedicated to identifying members of this large family within bacterial proteomes are either not suitable for large-scale approach or are of relatively limited performance when no 3D structural template is available.
Within this context, different sequence analysis tools relying on different category of protein descriptors (
e.g. patterns, conserved domains, phylogenetic prints) were assessed. To overcome their relative lack of sensibility, new descriptors, specific towards iron-sulfur proteins identification were built: (i) co-conservation profiles of the metal ligands and (ii) tailored profile-HMMs for remote homologs detection. Their respective predictive power towards the identification of a manually curated iron-sulfur proteins dataset were assessed, either separately or in combination.
All relevant descriptors were finally gathered into a generalized linear model by using the elastic-net method. The predictive model has been evaluated on
Escherichia coli whole proteome resulting in a precision of 89% and a recall of 83%. Eventually, it has been applied to 300 proteomes allowing investigating different biological relationships, such as iron-sulfur proteins relative abundances and the oxygen dependency of bacterial organisms.
Keywords: Metalloproteins, iron-sulfur clusters, bacterial proteomes, patterns, HMMs, generalized linear model, elastic-net
Download this thesis.