Skip to main content

Table 1 For each gene family studied, we report the KEGG orthology group, number of reads assigned to that group by DIAMOND, number of reference gene sequences that exist in the synthetic community, and number of reference genes “detected” by each method: MEGAN, IDBA-UD, Ray, SOAPdenovo, and Xander

From: Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads

Gene family

KEGG

Reads

References

MEGAN

IDBA-UD

Ray

SOAP

Xander

Acetyl-CoA C-acetyltransferase

K00626

58,135

64

31

16

12

12

23

Archael rpoB1

K03044

17,875

16

7

7

6

3

9

Archael rpoB2

K03045

12,025

16

8

8

6

5

5

Cell division protein

K03531

45,881

48

37

39

12

7

12

Bacterial rpoB

K03043

105,212

64

43

16

12

13

50

Phenylalanyl-tRNA synthetase alpha subunit

K01889

44,779

64

57

56

47

51

48

Phenylalanyl-tRNA synthetase beta subunit

K01890

73,072

64

53

50

42

38

35

Phosphoribosylformylglycinamidine cyclo ligase

K01933

31,919

64

58

59

46

45

54

Ribonuclease HII

K03470

18,707

64

54

55

53

45

48

Ribosomal protein L1

K02863

24,190

64

57

49

45

48

53

Ribosomal protein L10

K02864

23,970

64

58

48

55

57

55

Ribosomal protein L11

K02867

17,113

64

60

50

51

60

59

Ribosomal protein L13

K02871

17,642

64

58

54

53

57

45

Ribosomal protein L14

K02874

13,435

64

56

42

49

58

60

Ribosomal protein L15

K02876

13,087

64

59

56

50

55

55

Ribosomal protein L16

K02878

10,058

64

46

34

36

44

44

Ribosomal protein L18

K02881

14,856

64

57

48

56

57

55

Ribosomal protein L2

K02886

29,849

64

60

54

46

55

57

Ribosomal protein L22

K02890

15,875

64

59

54

55

57

51

Ribosomal protein L24

K02895

11,786

64

60

46

56

58

44

Ribosomal protein L25

K02897

12,941

64

41

41

39

42

41

Ribosomal protein L29

K02904

4913

64

29

8

33

34

8

Ribosomal protein L3

K02906

30,192

64

59

47

51

57

51

Ribosomal protein L4

K02926

14,539

64

44

41

39

43

44

Ribosomal protein L5

K02931

20,533

64

60

58

55

59

58

Ribosomal protein L6

K02933

20,645

64

58

41

56

59

60

Ribosomal protein S10

K02946

11,327

64

56

42

48

56

54

Ribosomal protein S11

K02948

10,793

64

47

43

51

52

56

Ribosomal protein S12

K02950

14,199

64

61

41

48

60

58

Ribosomal protein S13

K02952

13,975

64

59

46

56

60

58

Ribosomal protein S15

K02956

10,795

64

54

16

43

55

50

Ribosomal protein S17

K02961

10,235

64

58

36

49

44

60

Ribosomal protein S19

K02965

12,479

64

59

39

51

59

58

Ribosomal protein S2

K02967

25,926

64

61

46

41

53

48

Ribosomal protein S3

K02982

25,722

64

59

46

48

57

57

Ribosomal protein S5

K02988

21,761

64

59

55

53

53

56

Ribosomal protein S7

K02992

20,520

64

60

42

54

60

61

Ribosomal protein S8

K02994

14,543

64

62

57

58

60

57

Ribosomal protein S9

K02996

12,927

64

59

52

52

58

61

Signal recognition particle protein

K03110

27,386

64

35

48

36

19

46

Two-component system

K03407

47,904

64

29

17

15

15

27

    Mean absolute deviation

9.34

19.73

18.24

15.41

14.17

  1. Best results are shown in bold. Mean absolute deviation between the number of references genes and the number detected by each method is reported as a summary statistic