Skip to main content

Table 1 Datasets used in this study

From: Proportion-based normalizations outperform compositional data transformations in machine learning applications

Name, reference, and accession number

Number of samples

Metadata categories

Vangay [17]

PRJEB28687

634

Recruitment.Location, Researcher, Sub.Study, Birth.Year, Age, Highest.Education, Ethnicity, Religion, Birth.Location, Type.Birth.Location, Arrival.in.US, Years.in.US, Location.before.US, Type.location.before.US, Years.lived.in.Location.before.US, Tobacco.Use, Alcohol.Use, HeightWeight, Waist, BMI, BMI.Class, Breastfed, Age.at.Arrival, Sample.Group, Waist.Height.Ratio

Jones [18]

PRJNA397450

233

Age, BMI, Genotype, sex, Treatment, Visit, type

Zeller [13]

PRJNA397450

226

Age, host_subject_id, geographic_location_(country_and/or_sea region), Collection_date, AJCC_Stage, localization, tissue_type

Noguera-Julian [19]

PRJNA307231

700

Host_Age, ETHNICITY, geo_loc_name_country, HIV_RiskGroup, HIV_serostatus, host_other_gender, host_sex, HIV_Profile, PCR_human_papilloma_virus, host_allergy, host_deposition_frequency_per_day, host_abdominal_transit_alterations, host_Residency_Area, HCV_coinfection, Anal_cytology, host_sexual_orientation, Syphilis_serology, HBV_coinfection, PCR_Neisseria_gonorrhoeae, PCR_Chlamydia_trachomatis, HIV_viral_load, CD4 + _Tcell_counts, leukocytes, stool_consistency, lymphocytes, host_body_mass_index