Using high-abundance proteins as guides for fast and effective peptide/protein identification from human gut metaproteomic data

Table 3 The breakdown of the running time (in CPU hours) for HAPiID

	HM403	HM415	HM454	HM455	HM466	HM467	HM494	HM503
HAPiID DB^∗ (targeted search)	76,663	67,537	79,813	84,304	89,693	73,905	94,395	73,766
HAPiID time (profiling)	4.02	4.05	3.23	3.843	3.47	3.36	3.95	3.92
HAPiID time (targeted search)	4.46	4.57	4.49	5.63	5.64	4.33	6.35	5.13
HAPiID time (total)	8.48	8.62	7.72	9.473	9.11	7.69	10.3	9.05
MS-GF+ time^∗∗ (IGC db search)	367.95	485.79	413.58	510.47	462.47	405.86	503.33	495.22

^*The row shows the sizes of the target databases (for the second targeted search step) in HAPiID, which contains putative proteins from top n most abundant genomes covering 80% of the total spectra during profiling step. These numbers vary slightly across samples. For comparison, the target databases for the HAPiID’s first search step (i.e., HAPdb) and the MetaPro-IQ’s first step (i.e., IGC db) contain 1.1×10⁵ and 9.8×10⁶ proteins, respectively.
^**For comparison purpose, we ran MS-GF+ search against the massive target database (IGC db) used in the MetaPro-IQ’s first step to estimate the lower bound of the running time for the MetaPro-IQ pipeline.

ISSN: 2049-2618