Indoor bacterial, fungal and viral species and functional genes in urban and rural schools in Shanxi Province, China–association with asthma, rhinitis and rhinoconjunctivitis in high school students

Background Studies in developed countries have reported that the prevalence of asthma and rhinitis is higher in urban areas than in rural areas, and this phenomenon is associated with urbanization and changing indoor microbiome exposure. Developing countries such as China have experienced rapid urbanization in past years, but no study has investigated microbiome exposure and urban-rural health effects in these countries. Methods Nine high schools from urban and rural areas were randomly selected in Shanxi Province, China, and classroom vacuum dust was collected for shotgun metagenomic sequencing. A self-administered questionnaire was collected from 1332 students for personal information and health data. Three-level logistic regression was performed between microbial richness/abundance/functional pathways and the occurrence of asthma and rhinitis symptoms. Results Consistent with developed countries, the prevalence of wheeze and rhinitis was higher in urban areas than in rural areas (p < 0.05). Metagenomic profiling revealed 8302 bacterial, 395 archaeal, 744 fungal, 524 protist and 1103 viral species in classroom dust. Actinobacteria (mean relative abundance 49.7%), Gammaproteobacteria (18.4%) and Alphaproteobacteria (10.0%) were the most abundant bacterial classes. The overall microbiome composition was significantly different between urban and rural schools (p = 0.001, Adonis). Species from Betaproteobactera, Gammaproteobacteria and Bacilli were enriched in urban schools, and species from Actinobacteria and Cyanobacteria were enriched in rural schools. Potential pathogens were present in higher abundance in urban schools than in rural schools (p < 0.05). Pseudoalteromonas, Neospora caninum and Microbacterium foliorum were positively associated with the occurrence of wheeze, rhinitis and rhinoconjunctivitis, and Brachybacterium was protectively (negatively) associated with rhinitis (p < 0.01). The abundance of human endocrine and metabolic disease pathways was positively associated with rhinitis (p = 0.008), and butyrate and propionate metabolic genes and pathways were significantly enriched in rural schools (p < 0.005), in line with previous findings that these short-chain fatty acids protect against inflammatory diseases in the human gut. Conclusions We conducted the first indoor microbiome survey in urban/rural environments with shotgun metagenomics, and the results revealed high-resolution microbial taxonomic and functional profiling and potential health effects. Video abstract Supplementary Information The online version contains supplementary material available at 10.1186/s40168-021-01091-0.


Introduction
Since World War II, the prevalence of many chronic allergic and inflammatory diseases such as asthma and rhinitis has increased dramatically [1,2]. The number of asthma and rhinitis patients has been estimated to exceed 350 and 700 million worldwide, and an even higher number of people suffer from various asthmatic symptoms such as shortness of breath or chest tightness [1,[3][4][5]. The prevalence of asthma and rhinitis may reach a plateau in developed countries in recent years, but a fast increasing trend is still observed in developing countries. For example, the prevalence of doctor-diagnosed asthma increased from 0.91% in 1999 to 6.8% in 2012 for children 1-8 years old in China [6]. The increasing trend of asthma and rhinitis poses a substantial economic and medical burden for individuals and society.
An interesting phenomenon of these chronic respiratory diseases is that the occurrence of the diseases is higher in urban areas than in traditional farm or rural areas [7][8][9][10][11]. For example, doctor-diagnosed asthma and rhinitis for children in the non-farming area was 88% and 172% higher than children in the farming area [11]. Recent progress in culture-independent high-throughput microbiome studies revealed that indoor microbiome exposure was closely related to the development of asthma and rhinitis [7,12,13]. Children living on farms are exposed to a broader range of environmental microorganisms than children living in urban areas [7]. Exposure to diverse environmental bacteria, including Actinobacteria, Alphaproteobacteria and Cyanobacteria, may facilitate the maturation of the immune system and reduce allergic and inflammatory diseases such as asthma and rhinitis [13][14][15]. However, most of the studies were conducted in developed countries such as Germany, Finland and the USA, and no study reported microbiome comparisons between urban and rural areas in developing countries. Many developing countries have experienced a fast urbanization process in the past twenty years, and it is interesting to see the prevalence of asthma and rhinitis and the exposed microbiome in these regions.
Amplicon sequencing of taxonomic marker genes such as the 16S ribosomal RNA gene of bacteria and the internal transcribed spacer (ITS) region of fungi is the standard protocol to characterize the indoor microbiome composition. The second-generation sequencing technique from Illumina is widely used in many indoor microbiome surveys [16][17][18]. However, due to the short read length limitation, the Illumina sequencing strategy can only produce a partial region of the marker genes, leading to reduced taxonomic resolution [19]. Thirdgeneration sequencing, such as PacBio, can produce a full-length sequence for amplicon genes with a species-level resolution, but functional inference is not reliable [20]. Shotgun metagenomic sequencing is an amplification-free approach to characterize microbial species and a whole catalogue of functional genes. This approach has been widely used in human gut microbiome studies but has been applied in only a few microbiome studies in built environments such as in commercial aircrafts, metros and hospitals [21][22][23][24]. However, no health data were collected in these studies, and thus, the health effects of the microbial species, functional genes and metabolic products were unclear.
Shanxi is a province located in the central northern part of China. The capital of Shanxi Province is Taiyuan, an industrial city with approximately four million residents. The prevalence of doctor-diagnosed asthma and rhinitis among pupils was lower in Taiyuan than in megacities of China such as Beijing and Shanghai [25], but the prevalence of asthmatic symptoms such as wheeze and shortness of breath in Taiyuan was comparable with these megacities [6]. The low prevalence of doctor-diagnosed asthma in this region might be due to the unawareness of the disease in society or lack of access to medical services or other protective factors [25,26]. The prevalence of asthma and rhinitis among adolescents was not reported in Shanxi Province.
In this study, we conducted the first shotgun metagenomic sequencing in urban and rural indoor environments. Nine high schools in Shanxi Province were randomly selected to characterize the bacterial, fungal, archaeal and viral composition. The prevalence of asthma and rhinitis symptoms, including wheeze, shortness of breath and rhinoconjunctivitis, was surveyed in students, and the association between the indoor microbiome and functional genes/pathways and respiratory symptoms was assessed.

Study design and health data collection
In this study, ten high schools, five from urban areas and five from rural areas, were randomly selected in Shanxi Province, China. Urban and rural areas were defined according to the local administrative management system; urban areas have a much higher density of population and traffic than rural areas. Rural schools were located approximately 30-40 km from Taiyuan city, the capital of Shanxi Province, and urban schools were located in Taiyuan city ( Figure S1). Rural schools were defined as school numbers 1-5, and urban schools were defined as school numbers 6-10 (Table S1). In each school, 4 or 5 classes were randomly selected to collect vacuum dust for microbiome profiling. Ten samples failed to produce enough high-quality DNA. In total, 33 dust samples from 9 schools (4 rural and 5 urban) were qualified and sequenced by amplification-free shotgun metagenomics. In each class,~40 students were randomly selected to collect the health data, and in total, 1332 students completed the self-administered questionnaire. Vacuum dust and self-administered questionnaires were collected in March 2008. The study was approved by the School Board of Taiyuan city, principals and teachers in each school and Ethics Committee at Fudan University (IRB#08-03-0119), Shanghai, China. All students gave their formal written consent, and the records were kept at Fudan University.
A self-administered questionnaire in Chinese was distributed to all participants to collect personal information, including gender, age, smoking habits, and parental asthma and allergy, as well as health data, including asthma and rhinitis symptoms in the last 12 months. Questions about doctor-diagnosed asthma and asthma symptoms were obtained from the European Community Respiratory Health Survey (ECRHS), and questions about rhinitis and rhinoconjunctivitis were obtained from the International Study of Asthma and Allergies in Childhood (ISAAC) study. The questions were the following: "Have you had diagnosed asthma by a doctor?" "Have you had wheezing or whistling in the chest in the last 12 months?" "Have you had daytime shortness of breath during rest or after exercise in the last 12 months?" "Have you had a problem with sneezing, or a runny or a blocked nose when you DID NOT have a cold or the flu in the last 12 months?" (rhinitis), and if the answer is yes, "Has this nose problem been accompanied by itchy watery eyes?" (rhinoconjunctivitis).
Students answered the questionnaire at home with the help of their parents, and medical staff from our research group went through the questionnaire. The students had no information regarding the sampling and data collected in the classrooms when answering the questionnaire.
Vacuum dust sampling, DNA extraction and shotgun metagenomics sequencing Dust in a classroom was collected by a vacuum cleaner (400 W) equipped with a dust sampler and a Millipore filter (ALK Abello, Copenhagen, Denmark). The Millipore filter was made of cellulose acetate with a pore size of 6 μm, retaining 74% of particles in the 0.3-0.5-μm size range, 81% in 0.5-1.0 μm, 95% in 1-10 μm and 100% particles in > 10 μm. Each classroom was vacuumed for 4 min, vacuumed for 2 min on the floor and vacuumed for 2 min on the upper surfaces of desks, chairs, bookshelves, teaching platforms and curtain surfaces. The dust samples were sieved through a 0.3-mm mesh screen to obtain fine dust. The fine dust was stored in a − 80°C freezer until DNA extraction.
DNA extraction and shotgun metagenomic sequencing were conducted at Personal Biotechnology Co., Ltd. (Shanghai, China). Total microbial genomic DNA in dust was extracted by a DNeasy PowerSoil Kit (QIAG EN, Hilden, Germany), following the manufacturer's instructions. The quality and quantity of the extracted DNA were assessed by agarose gel electrophoresis and a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). The qualified DNA was processed to construct the shotgun metagenomics sequencing library by a TruSeq DNA Nano High Throughput Library Preparation Kit (Illumina, San Diego, CA, USA). The sequencing strategy was pairedend 150 bp reads with an insert size of 400 bp. A dual indexed barcode structure was applied for multiplexing, and 1% PhiX Control v3 was added into the library for quality monitoring. The prepared libraries were stored at -20°C before sequencing. The sequencing platform was Illumina HiSeq X-ten (Illumina, San Diego, CA, USA). The cluster density was in the range of 1255-1412 K clusters/mm 2 , and the error rate was < 0.05% for the sequencing run.

Metagenomics data assembly and analyses
Raw sequenced reads were first processed to obtain high-quality clean reads. Adapter sequences were removed by Cutadapt (v1.2.1) [27], and raw reads were processed by a 5-bp sliding window to trim low-quality sequences (< Q20, read accuracy < 99%). Trimmed reads with length > 50 bp and no ambiguous bases were kept for further analyses. Human reads were removed by KneadData (v0.9.0) and BMTagger (v3.101). The processed clean reads were deposited in the Genome Sequence Archive (https://bigd.big.ac.cn/gsa) in the National Genomic Data Center, Beijing Institute of Genomics with the accession number CRA003476 [28,29]. The clean reads were assembled by MEGAHIT (v1.0.5) with a succinct de Bruijn graph approach [30]. The coding sequences (CDS, > 300 bp) were predicted by MetaGeneMark (v3.25) [31]. CDSs were clustered by CD-HIT (v4.8.1) [32] at 90% amino acid sequence identity to obtain a non-redundant gene catalogue. The abundance of genes was calculated as the number of aligned reads by SOAPdenovo2 (v1.0) [33]. The taxonomy was annotated by searching against the NCBI-NT database by BLASTN (e value < 0.001) and annotated by MEGAN with the lowest common ancestor approach [34]. The functional gene was annotated by searching the sequence of the non-redundant genes against the KEGG databases (release 90.0) by DIAMOND protein aligner (v2.0.4) with e value < 0.001 and coverage ratio > 40% [35]. LEfSe (linear discriminant analysis effect size) analyses [36] were analyzed on the Galaxy website (http://huttenhower.sph.harvard.edu/galaxy/, v1.0) for the characteristic microbial taxa and functional genes/pathways in urban and rural schools. Microbial compositional variation (beta diversity) was calculated by Bray-Curtis distance metrics and visualized by non-metric multidimensional scaling (NMDS) hierarchical clustering [37,38]. Permutation analysis (10,000 permutations) was conducted for microbial taxonomic and functional composition between urban and rural samples by the Adonis function in R (v3.6.1). The growth rate of high abundance bacteria was calculated by GRiD withc = 0.2 [39]. Twenty-two species were included in the GRiD analysis with the following criteria: species relative abundance > 0.5%, taxa annotation resolved at the species level with a high-quality reference, species coverage > 0.2 and low species heterogeneity < 0.3. Reference genomes were downloaded from the prokaryote database of the NCBI genome browser (https://www.ncbi.nlm.nih.gov/genome/browse#!/ prokaryotes/). If multiple reference genomes were available, the genome with the smallest number of scaffolds was chosen. The Quantitative Insights Into Microbial Ecology (QIIME, v1.8.0) pipeline and R (v3.6.1) were used throughout the study for data processing, analysis and visualization [40].

Association analysis between microbial diversity/ abundance and symptoms
Three-level (class and school as second and third level) logistic regression was calculated between microbial richness and asthma and rhinitis symptoms by StataSE 15.0 (StataCorp LLC). Current smoking, gender and parental asthma and allergies were adjusted in the regression model. The microbial richness was represented as the number of observed species in major microbial lineages, including the domains Bacteria, Archaea, Eukaryota and Viruses, the kingdom Fungi and Protista and the major taxonomic classes. The associations between the relative abundance of microbial species and KEGG functional genes/pathways and asthma and rhinitis symptoms were also calculated by three-level logistic regression with the same adjustments. To reduce the number of multiple comparisons, we tested only microbial species and KEGG pathways differentially present in urban and rural schools (LDA score > 2) in the regression analyses. Associations with a p value < 0.01 were considered significant results throughout the study, and the false discovery rate (FDR) was also calculated by the p.adjust function with the Benjamini-Hochberg procedure in R (v3.6.1).

Prevalence of asthma and rhinitis symptoms
In this study, self-administered questionnaires were collected from 1332 students in the selected classrooms to assess the prevalence of asthma and rhinitis symptoms. All students were Chinese. The students were aged from 15 to 18 years, with a mean age of 16.1 years. A total of 610 students were female (45.8%), and 722 students were male (54.2%). A total of 924 students were from urban schools (69.4%), and 411 students were from rural schools (30.6%). The prevalence of doctor-diagnosed asthma was low in Shanxi (0.9%), and the prevalence in urban schools was higher than in rural schools, with borderline significance (1.2% vs 0.2%, p = 0.09). The prevalence of wheeze was significantly higher in male students than in female students (7.5% vs 4.6%, p = 0.03), and the prevalence of shortness of breath was significantly higher in female students than in male students (28.0% vs 37.7%, p < 0.001). The prevalence of wheeze (7.1% vs 4.1%) and rhinitis (41.9% vs 32.4%) was significantly higher in urban schools than in rural schools (chi-square test, p < 0.05; Table 1). The prevalence of shortness of breath (33.5% vs 30.4%) and rhinoconjunctivitis (15.9% vs 13.3%) was also higher in urban schools than in rural schools but did not reach statistical significance (p > 0.05).

Shotgun metagenomics sequencing statistics
The sequenced samples were referred to as "S1-03" (school number 1-class number 3) throughout the study (Tables S2 and S3). In total, 837.6 Gb raw sequence data were produced with 5.6 billion reads. All samples had > 90% nucleotides with base accuracy > 99% (Q20) ( Table  S2). Human DNA fragments mainly from the skin accounted for a large proportion of our data. A total of 66.2% of reads were removed as having a human annotation, and 261.8 Gb clean microbial data were kept for further analyses (2.2 to 20.8 Gb per sample; Table S2). The maximum scaffold length ranged from 54 kb (S9-04) to 523 kb (S8-04), and the total length of the assembled scaffolds ranged from 2.2 Gb (S10-03) to 3.3 Gb (S10-02; Table S3).
We further characterized the overall microbiome composition variation by non-metric multidimensional scaling (NMDS) hierarchical clustering. Samples from urban and rural schools were clustered mainly on the left and right sides of NMDS1, respectively (Fig. 1B), indicating that urban and rural classrooms contain different microbial compositions. The variation was also confirmed by the permutation analysis (p = 0.001, Adonis, 10,000 permutations). School 5 showed large compositional variation compared with schools 1-4 along the axes of NMDS2 (Fig. 1B), suggesting that other environmental characteristics were also involved in shaping the variation.
The potential pathogens defined by NIAID (National Institute of Allergy and Infectious Diseases, USA) were also characterized. Overall, the potential pathogens were present in low abundance (Table S4; total abundance 0.057%). Clostridium pefringens (0.021%) and Listeria   monocytogenes (0.011%) were the top pathogens in the school environment, with other pathogens all < 0.01%. Interestingly, the abundance of pathogens was higher in urban schools than in rural schools (p = 0.046, t test; rural average abundance 0.034%, urban 0.066%). Specifically, L. monocytogenes, Campylobacter jejuni, Yersinia pestis and Toxoplasma gondii were present in higher abundance in urban schools than in rural schools (p = 0.001, 0.043, 0.018 and 0.002, t test). Burkholderia pseudomallei was the only species present in higher abundance in rural schools than in urban schools (p < 0.001). However, shotgun metagenomics data were assembled in fragmented scaffolds, and thus, the potential health effects of these pathogens should be interpreted with caution.
We also compared the bacterial assemblage of this study to previously reported habitats ( Table 3). The bacterial composition in Shanxi high schools was more similar to the air and human skin microbiomes than to the soil, freshwater, human gut and saliva microbiomes (calculated from the Earth Microbiome Project [41]). Additionally, some high-abundance bacterial genera in Shanxi high schools were also present in high abundance in other indoor environments such as the dormitories of Shanxi University, China and junior high school classrooms in Malaysia [12,42].

Associations between microbial richness/abundance and asthma and rhinitis symptoms
The associations between microbial richness and asthma and rhinitis symptoms were examined by three-level logistic regression. The prevalence of doctordiagnosed asthma was very low and was not included in the analysis. The microbial richness in the domains of bacteria, archaea and eukaryotes and the kingdoms of fungi and viruses were not significantly associated with the wheeze, breathlessness, rhinitis and rhinoconjunctivitis (p > 0.05; Tables S5-S8). Similarly, no significant associations were found between the richness of major microbial classes and these symptoms (p > 0.05; Table S5-S8).
The associations between microbial species abundance and wheeze, shortness of breath, rhinitis and rhinoconjunctivitis were examined with the same regression model. To reduce the number of tests, we examined only 117 microbial species differentially present in urban and rural schools (LDA > 2) (Table S9). Five bacteria and one protist were associated with these symptoms. An uncharacterized Pseudoalteromonas from the class Gammaproteobacteria was positively associated with wheeze (p = 0.008; Table 4). Brachybacterium sp. P6-10-X1 was protectively/negatively associated with rhinitis (p = 0.009), and uncharacterized Betaproteobacteria and Pseudoalteromonas were positively associated with rhinitis (p = 0.002 and p < 0.001). Uncharacterized Pseudoalteromonas was positively associated with both wheeze and rhinitis. The protist Neospora caninum was positively associated with rhinitis (p = 0.002). An uncharacterized Flavobacteriaceae species was negatively associated with rhinoconjunctivitis (p = 0.009), and Microbacterium foliorum was positively associated with rhinoconjunctivitis (p = 0.006).

Abundance of functional genes in urban and rural schools
In this study, 15 million non-redundant functional genes were extracted from the shotgun metagenomic assemblies. The functional genes were annotated according to the KEGG Orthology database and classified into different KEGG functional pathways. More than half of the functional genes were annotated as "Metabolism" (56.4%), followed by "Genetic Information Processing" (10.7%), "Environmental Information Processing" (7.7%), "Human Diseases" (6.8%), "Cellular Processes" (6.1%), "Organismal Systems" (5.1%) and "Not included in Pathway" (7.4%; Fig. 3A). Samples from urban and rural schools were also separately clustered in the NMDS analysis (Fig. 3B), and the variation was also supported by the permutation analysis (p = 0.005, Adonis, 10,000 permutations).
The differential KEGG functional pathways in urban and rural schools were also characterized by LEfSe analysis. Pathways at the second KEGG hierarchical level are shown in Fig. 4. A higher abundance of genes related to Table 3 Average relative abundance of the top 15 bacterial genera in high schools in Shanxi and their relative abundance in other habitats. The bacterial abundance in air, non-saline soil and freshwater, and the human gut, saliva and skin environment were calculated from the Earth Microbiome Project [41]. The bacterial abundance in the university dormitory and junior high school was calculated from two recent publications [12,42]. Taxa with relative abundance > 0.1% were formatted with bold font  Table 4 Association between the abundance of indoor microbial species and wheeze, rhinitis and rhinoconjunctivitis in high schools in Shanxi, China. The β coefficient and 95% confidence interval (CI) were calculated by 3-level logistic regression models adjusted for gender, smoking and parental asthma and allergy. Regression analyses for wheeze and rhinitis were conducted for species differentially present in urban and rural schools (LDA > 2) and mean relative abundance > 0.05%; and thus, 117 species were analyzed. Regression analyses for rhinoconjunctivitis were only conducted for 21 species that were potentially associated with rhinitis (p < 0.05). Only associations with p < 0.01 are presented in this human disease and the immune system was detected in urban schools, including "Human disease; Infectious disease Bacterial", "Human disease; Infectious disease Viral", "Human disease; Cancers Specific types", "Organismal Systems: Immune system" and "Organismal Systems: Digestive system" (LDA > 3). A higher abundance of metabolic genes was detected in rural schools, including "Lipid metabolism", "Amino acid metabolism", "Carbohydrate metabolism", "Metabolism of cofactors and vitamins" and "Xenobiotics biodegradation and metabolism" (LDA > 3). A recent study reported that the production of short-chain fatty acids (SCFAs) by gut microbiota, including butyrate and propionate, protected against allergic inflammation in the lungs [43]. Thus, we further conducted LEfSe analysis specific for carbohydrate metabolism at the third KEGG hierarchical level. We found a higher abundance of genes from "Butanoate metabolism" (LDA = 2.7, p = 0.004) and "Propanoate metabolism" (LDA = 2.9, p = 0.002) pathways in rural schools than in urban schools, indicating that more butyrate and propionate may be produced by the indoor microbiome in the rural environment.
The associations between the abundance of KEGG functional genes/pathways and health symptoms were also examined by regression. A higher abundance of the "Human Disease; Endocrine and metabolic diseases" pathway was associated with a higher occurrence of rhinitis (p = 0.008, β = 0.55), but no specific gene was significantly associated with rhinitis (p > 0.01). Genes involved in primary metabolism were protectively associated with rhinitis. Two enzymes involved in butyrate metabolism, including 4aminobutyrate aminotransferase and diaminobutyrate-2oxoglutarate transaminase, were protectively associated with rhinitis (p < 0.01), consistent with the LEfSe analysis. Two acyltransferases, including NAD-dependent deacetylase and streptothricin acetyltransferase, were protectively associated with rhinitis (p < 0.01).

Bacterial growth rate in urban and rural schools
Recent progress in bioinformatics tools enables us to estimate the growth rate of bacterial species by calculating the read coverage in replication origin and terminal regions [39]. Twenty-two species with high read coverage were analyzed. The majority of the species (21 out of 22) had stopped growing or had a very low growth rate (GRiD score < 1.3). Arthrobacter agilis was the only species with a GRiD score > 2 (Table S10). Compared with fast-growing species such as Bdellovibrio in aquaculture (GRiD score > 5) [39], the indoor species from high schools had a relatively slow growth rate. Additionally, the growth rates of these species were not significantly different in urban and rural schools (p > 0.05, t test). This result indicates that the high microbiome variation between the urban and rural schools is not due to bacterial growth in the indoor environment; rather, it should be due to the variation in the sources of microorganisms, such as outdoor greenness, traffic or other environmental characteristics.

Discussion
In this study, we conducted the first shotgun metagenomic sequencing in an urban/rural indoor environment and assessed the health effects of microbial exposure. Like Western countries, the prevalence of asthma and rhinitis symptoms was higher in urban areas than in rural areas in China. The overall indoor microbiome taxonomic and functional composition was significantly different between urban and rural schools. Specifically, species enriched in urban schools were mainly from the class Actinobacteria and Cyanobacteria, and species enriched in rural schools were mainly from the class Betaproteobacteria, Gammaproteobacteria, Bacilli and Actinobacteria. Potential NIAID-defined pathogens were present in higher abundance in urban schools. Five bacterial and one protist species were significantly associated with wheeze, rhinitis and rhinoconjunctivitis. Microbial genes in human diseases and immune systems were enriched in urban schools, whereas genes in butanoate and propanoate metabolic pathways were enriched in rural schools. The abundance of genes in the "Human Disease; Endocrine and metabolic diseases" pathway was positively associated with the occurrence of rhinitis.

Strengths and limitations of the study
This is the first study to reveal urban/rural microbiome exposure and asthma and rhinitis in an Asian developing country, complementing previous observations in Western countries. Another strength is that we applied shotgun metagenomic sequencing to characterize microbiome composition in the school environment. Compared with traditional amplicon sequencing, the technique expands the scope of microbiome detection by characterizing bacteria, archaea, fungi, protists and viruses together with high taxonomic resolution. Additionally, shotgun metagenomics is not subject to PCR amplification biases in 16S rRNA and ITS sequencing, which enables accurate microbial abundance estimation. In addition, the technique enables functional inference, which is not feasible by amplicon sequencing.
A limitation of this study is that the cross-sectional study design restricts causal inference. Additionally, the questionnaire-based study relies on self-assessment of illness or symptoms, but medical staff went through the questionnaire and explained the uncertainty to the participants. The prevalence of doctor-diagnosed asthma was low in Taiyuan (0.9%), and thus, we did not conduct microbial association analysis. The low prevalence is likely due to unawareness of the disease in society and not access to medical services [6,25,26]. Unawareness of the disease is probably more common in less developed or rural regions such as Shanxi, than in megacities, such as Shanghai or Beijing. In this study, eleven students in urban schools reported diagnosed asthma, whereas only one student in rural schools reported diagnosed asthma. The numbers were too low to make a solid statistical inference, but it is in line with the idea that unawareness of the disease is more common in rural regions. Unlike diagnosed asthma, the prevalence of wheeze and shortness of breath in Shanxi were comparable or even higher than in developed countries [6,25]. Thus, it is important to survey asthma symptoms to represent respiratory health in Shanxi. In this study, we conducted a false discovery rate correction (Benjamini-Hochberg procedure) to control type I error and potentially false-positive results in regression [44]. The approach was widely used in genome-wide association studies (GWAS) but was suggested to be overly conservative for microbiome association analysis [45,46]. A recent algorithm, dsFDR, was proposed to increase the efficiency and power of detection [45]. However, the algorithm does not support multilevel adjustment, which is important in school microbiome studies with several levels of nested data (individual, class, school levels). An alternative solution could relax the standard threshold in the FDR test (< 0.05) to improve the sensitivity of detection. Thus, we reported six potential species associated with asthma and rhinitis symptoms (p < 0.01), and the FDR values were all < 0.2. Finally, metagenomics can provide the functional potential of the microbial community surveyed but does not confirm that the relevant proteins or gene products are expressed and present in the dust. Thus, the functional implications should be treated with caution.

Microorganisms in urban and rural schools and health implications
We found significantly different microbiome compositions between urban and rural schools in Shanxi. Specifically, urban schools were enriched with species from Betaproteobacteria, Gammaproteobacteria, Bacilli and Actinobacteria, and rural schools were enriched with species from Actinobacteria and Cyanobacteria. The results were consistent with a home study in farm and non-farm areas from Finland and Germany [13]. In that study, the authors reported that Streptococcaceae (a family of Bacilli) was enriched in homes from the nonfarm area, and Alphaproteobacteria, Actinobacteria and Cyanobacteria were enriched in homes from the farm area; the latter taxa might provide protective effects against the development of asthma.
In addition to the abundance analysis, a more stringent regression model with multiple adjustments was also applied to examine the potential health-related microorganisms. Brachybacterium sp. P6-10-X1 and uncharacterized Flavobacteriaceae were negatively associated with rhinitis and rhinoconjunctivitis. These two taxa are found in a wide variety of outdoor environments, including marine, freshwater, soil, sediment, plants and animals [47,48], but no study has reported their potential health effects. Three bacteria (uncharacterized Pseudoalteromonas, uncharacterized Betaproteobacteria and Microbacterium foliorum) and one protist species (Neospora caninum) had potential risk effects on wheeze and rhinitis. Pseudoalteromonas species are Gram-negative bacteria inhabiting soil, lakes and marine environments [49]. The genus is wellknown for its wide application in the pharmaceutical industry. For example, Pseudoalteromonas phenolica can produce phenolic substances that suppress the growth of methicillin-resistant Staphylococcus aureus (MRSA) [50]. Pseudoalteromonas species can also produce cyclodigiosin hydrochloride, suppressing T-cell proliferation as an immunosuppressant therapeutic agent [51]. Homoeostasis between the type 1 and type 2 immune responses suppresses the development of asthma [14]. Severe immunosuppression in humans overactivates the type 2 response, and prolonged type 2 immunity could increase the risk of atopic sensitization and asthma [52]. Thus, Pseudoalteromonas species could provide potential beneficial antibiotic effects and detrimental health effects for asthma. Microbacterium foliorum is a Gram-positive bacterium widely used in the production of food ingredients. The species were assessed and found to be non-mutagenic and non-clastogenic in a murine model [53], but no study reported the potential health effects on asthma and rhinitis. Neospora caninum can cause abortion and neurologic disease in cattle, and the occurrence of Neospora infection is common in dairy herds in the USA [54], but an antibody titer test showed that the species was unlikely to infect humans directly [55]. Additionally, no study has reported the potential health effects of the species on asthma and rhinitis.
In this study, Brachybacterium sp. and Pseudoalteromonas were associated with rhinitis, and Microbacterium foliorum and uncharacterized Flavobacteriaceae were associated with rhinoconjunctivitis. Different species were linked to rhinitis and rhinoconjunctivitis, probably because rhinoconjunctivitis is more specific for allergic rhinitis, while rhinitis also includes nonallergic rhinitis [2,4,56].
Another interesting finding of this study is that the potential pathogens were more enriched in urban schools than in rural schools, including L. monocytogenes, Campylobacter jejuni, Yersinia pestis and Toxoplasma gondii. This result is consistent with the concept that urbanization may increase infectious diseases [57]. A recent shotgun metagenomics survey also reported that antibiotic-resistant pathogens were commonly detected in the sewage of major cities in China [58], raising awareness of pathogens spreading through the septic system in urban areas.

Functional genes and health implication
In this study, we found that the genes and pathways related to metabolism were enriched in rural schools and that the pathways related to human diseases and immune systems were enriched in urban schools. The results were supported by a relaxed abundance analysis (LEfSe) and a stringent logistic regression analysis. A study in European countries reported that exposure to rural environments in early life increased the maturation rate of gut microbiota, leading to increased production of short-chain fatty acids (SCFAs), including butyrate and propionate [43]. These SCFAs impair the viability of eosinophils and are particularly important in mediating the protective effect on asthma and inflammatory bowel disease (IBD) [43,59] [60,61]. A higher abundance of Bacteroidetes, Alistipes and Lactobacillus in the murine gut produced a higher level of butyrate, increasing the expression of MUC2 and intestinal epithelial barrier function and further reducing gastrointestinal inflammation [61,62]. In this study, we found that SCFAs in the indoor environment may also have anti-inflammatory effects, as in the human gut. The genes and pathways related to butanoate and propanoate production were significantly enriched in rural schools compared with urban schools (p < 0.005). Modern humans spend more than 90% of their time in the indoor environment [63]; numerous microorganisms and their metabolic products in the air can be inhaled into the human body. The elevated production of butyrate and propionate by microorganisms in the indoor environment could provide the same protective effects as the microorganisms inhabiting the human gut. Thus, this finding may provide a new perspective for future prevention and remediation strategies for asthma and rhinitis.
In this study, a higher abundance of genes in the "Human Disease; Endocrine and metabolic diseases" pathway was associated with a higher occurrence of rhinitis. The association between metabolic dysfunction, such as obesity and asthma, is supported by many studies [64,65]. The prevalence of asthma increases with the body mass index (BMI) of children [66]. Recent progress has also reported that other metabolic dysfunctions also affect the prevalence of asthma. One study surveyed 4000 people in Korea and reported that the number of metabolic dysfunctions, including insulin resistance and systemic inflammation, was positively associated with asthma [67], consistent with our finding. Thus, exposure to microorganisms related to metabolic diseases in the indoor environment may also increase the occurrence of asthma.

Conclusions
We conducted the first shotgun metagenomics sequencing between urban and rural indoor environments, revealing high-resolution microbial taxonomic and functional profiling and potential health effects. A significantly higher prevalence of asthma and rhinitis symptoms in urban areas compared with rural areas in Shanxi, China. Microbial composition also varied significantly between the two areas, and several potential protective and risk microorganisms were associated with these symptoms. Genes and pathways related to butyrate and propionate metabolism were significantly enriched in rural schools, in line with previous findings that these short-chain fatty acids in the human gut protect against various inflammatory diseases. This study expanded our understanding of the indoor microbiome and respiratory health, providing new insights for indoor microbial exposure from a functional perspective.