Skip to main content

Table 1 Performance of GRAViTy as evaluated by threefold cross-validation analysis

From: The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification

Sub-pipeline ‘Known’ viruses1 ‘Unknown’ virus2
n Assigned to the correct group Assigned to a wrong group Assigned as ‘unknown’ n Assigned as ‘unknown’ Assigned to an existing group
Group I: dsDNA virus CV1 192 189 98.44% 0 0.00% 3 1.56% 1117 1117 100.00% 0 0.00%
CV2 194 188 96.91% 1 0.52% 5 2.58% 1117 1117 100.00% 0 0.00%
CV3 192 190 98.96% 0 0.00% 2 1.04% 1124 1124 100.00% 0 0.00%
Overall 98.10% 0.17% 1.73% 100.00% 0.00%
Group II: ssDNA virus CV1 369 369 100.00% 0 0.00% 0 0.00% 940 939 99.89% 1 0.11%
CV2 371 369 99.46% 0 0.00% 2 0.54% 940 939 99.89% 1 0.11%
CV3 370 370 100.00% 0 0.00% 0 0.00% 946 945 99.89% 1 0.11%
Overall 99.82% 0.00% 0.18% 99.89% 0.11%
Group III: dsRNA virus CV1 69 68 98.55% 0 0.00% 1 1.45% 1240 1233 99.44% 7 0.56%
CV2 70 67 95.71% 0 0.00% 3 4.29% 1241 1232 99.27% 9 0.73%
CV3 69 67 97.10% 0 0.00% 2 2.90% 1247 1239 99.36% 8 0.64%
Overall 97.12% 0.00% 2.88% 99.36% 0.64%
Group IV: (+)ssRNA virus CV1 415 415 100.00% 0 0.00% 0 0.00% 894 891 99.66% 3 0.34%
CV2 412 411 99.76% 1 0.24% 0 0.00% 899 897 99.78% 2 0.22%
CV3 415 412 99.28% 1 0.24% 2 0.48% 901 896 99.45% 5 0.55%
Overall 99.68% 0.16% 0.16% 99.63% 0.37%
Group V: (−)ssRNA virus CV1 176 176 100.00% 0 0.00% 0 0.00% 1133 1130 99.74% 3 0.26%
CV2 177 177 100.00% 0 0.00% 0 0.00% 1134 1132 99.82% 2 0.18%
CV3 180 179 99.44% 0 0.00% 1 0.56% 1136 1135 99.91% 1 0.09%
Overall 99.81% 0.00% 0.19% 99.82% 0.18%
Groups VI and VII: RT virus CV1 47 47 100.00% 0 0.00% 0 0.00% 1262 1262 100.00% 0 0.00%
CV2 46 46 100.00% 0 0.00% 0 0.00% 1265 1265 100.00% 0 0.00%
CV3 49 49 100.00% 0 0.00% 0 0.00% 1267 1267 100.00% 0 0.00%
Overall 100.00% 0.00% 0.00% 100.00% 0.00%
Overall 99.09% 0.06% 0.86% 99.78% 0.22%
  1. 1Known in the sense that members of the family were in the reference dataset and that viruses in the same family in the test dataset should be classifiable
  2. 2Unknown in the sense that no members of the family were in the reference dataset, and therefore, viruses of that family in the test dataset should not be assigned