Identification of outliers among contaminant microbes. Left: for each of n = 97 serum sample RNA input masses, sequencing reads for the total ERCC set (n = 92 different transcripts) are normalized per million (rpm) and presented in green; sequencing rpm aligning to the E. coli genome are presented in blue; and sequencing rpm aligning to the S. maltophilia genome are presented in grey. The linear regressions associating sample input mass with ERCC, E. coli, and S. cerevisiae are described with the adjusted R2 and p value. Right: a histogram of the studentized residual for each observation informing the linear regression between log10-transformed sequencing reads (E. coli in blue, S. maltophilia in grey) and log10-transformed sample input mass. Studentized residuals approximate a near-normal distribution between − 2 and + 2 such that outliers can be rapidly identified (red)

