### Simulations and empirical data

Below, data generation is briefly described. More detailed model descriptions can be found in Additional file 2, while model parameters are given in Additional file 1: Table S1.

#### Neutral (Hubbell) model

We initialized the local community with 10 species of even proportions and omitted the first 1000 time steps, except for the immigration rate of 0.1, where we omitted the first 5000 time steps due to the slower convergence of the dynamics with low immigration rates. The metacommunity species proportions were set to the initial species proportions. The speciation rate in the metacommunity is zero; hence, the metacommunity composition is constant.

We generated all neutral test time series with the simHubbell function in the seqtime R package. As a control, we also computed noise types for time series generated with the untb [42] and the WrightFisher R packages [30].

#### Interaction matrix generation

The SOI, Ricker, and gLV models take an interaction matrix as a parameter, which specifies which species interacts with which other species.

We used the algorithm by Klemm and Eguíluz [43] to generate modular and scale-free interaction matrices that reproduce properties of inferred microbial networks [44, 45]. We set the clique number parameter of the Klemm and Eguíluz algorithm to 10.

We assigned interaction strengths by setting diagonal values to − 1 and sampling off-diagonal values from a uniform distribution between 0 and 1. We then adjusted interaction matrix connectance (the ratio of non-zero to all values in the interaction matrix omitting the diagonal) to 0.05 or 0.01, which is close to the range reported for food webs [46] and within the range of inferred microbial networks [47].

Interaction matrices need to contain a large number of negative interactions to avoid unbounded increase of species abundances [33, 48, 49]. We therefore converted randomly selected positive interactions into negative ones. After each conversion, we tested matrix stability with a Ricker simulation and stopped once a stable matrix was obtained. In this way, we generated interaction matrices with a positive edge percentage of 0, 16, 40, and 64%.

#### Generalized Lotka-Volterra (gLV) model

The gLV model describes community dynamics as a function of growth rates and species interactions. We generated the interaction matrix as described above and sampled the growth rates from a uniform distribution with values between 0 and 0.5.

#### Ricker model

The Ricker model is a discrete version of the gLV model. In addition to the interaction matrix, it takes a vector of carrying capacities as input. We generated the carrying capacities from a uniform distribution with values between 0 and 0.5. As suggested by Fisher and Mehta [14], we also include a noise term with strength *σ*.

#### SOI model

The SOI model (based on model B, [22]) is individual-based and takes into account species-specific immigration and extinction probabilities as well as asymmetric interactions between individuals. We set the immigration rates to the initial species proportions (described below) and generated extinction rates from a uniform distribution between 0 and 1.

#### Dirichlet-multinomial distribution

The DM distribution takes two parameters, namely the species proportion vector (set to the initial species proportions) and the overdispersion parameter *θ*, set to 0.2, 0.02, or 0.002. These overdispersion values have been reported for sequencing data previously [47].

#### Time series simulations

With each model, we simulated the dynamics of 100 species for 3000 time steps. We generated initial species proportions with the broken stick process [50] implemented in vegan’s function bstick. We also generated test time series with even initial species proportions.

We tested three sampling rates: once every time step, once every 5 time steps, and once every 10 time steps.

#### Stool time series data

The stool data consist of two metagenomic time series of fecal samples that were collected almost daily by two individuals [3]. We rarefied the counts to 10,000 reads per sample and omitted the last time point from individual B, since there was a gap of 66 days between it and the previous sample. We then interpolated the data with function stineman in the stinepack R package [51] to ensure equidistant time intervals. A few small negative values introduced by the interpolation were set to zero. After interpolation, the data set from individual A included 365 time points and the data set from individual B 253 time points. Finally, we selected the 100 top-abundant OTUs, ranked by their sum across time points.

#### Poisson noise

We scaled gLV and Ricker time series by a factor of 1000, Hubbell time series by a factor of 2, DM data by a factor of 1, and SOI time series by a factor of 50 to obtain counts for gLV and Ricker and similar sequencing depths across models. We then generated noisy time series according to the formula: *y*_{ij} = Pois(*x*_{ij}), where *x*_{ij} is the count of the *i*th species in the *j*th sample and *y*_{ij} is the Poisson-distributed value. We applied LIMITS to selected noisy time series including 12 Ricker, 12 Hubbell, and 12 SOI time series.

#### Multinomial noise

Noise was generated by applying the multinomial distribution to the taxon proportions in each sample. The sequencing depth was varied randomly between 1000 and 1500 with a uniform distribution. Data were converted into relative abundances before noise-type classification.

### Computation of time series properties

All properties were computed for the full-length time series as well as for the first 100 time points. Raw abundances were converted into relative abundances.

#### Noise types

Frequency and spectral density are calculated for each species with R function spectrum with detrending enabled. Detrending removes linear trends by computing the residuals of the least-squares fit of a line. In log-log scale, a slope of − 1 indicates pink (1/f) noise, a slope of − 2 brown noise, and a slope below black noise, whereas white noise is characterized by a slope around 0. We determine the slope by first fitting a spline with function smooth.spline (whose degree of freedom is set to the maximum of [2,log10(length(time series))]) and then computing the minimum of the first derivative of the spline. In this way, we can accommodate to an extent non-linear relationships between frequency and spectral density, where the amount of non-linearity allowed depends on the length of the time series. We then classify a species as black when the slope is below − 2.25, as brown when it is in the range of (− 1.75, − 2.25], as pink when in the range of (− 0.5, −1.75], and as white otherwise. These boundaries avoid unclassified species. However, since the boundaries are arbitrarily chosen, we also tested a more stringent definition with an allowed deviation of ± 0.2 from − 1 for pink and from − 2 for brown noise, which introduced unclassified species, but did not affect our conclusions (data not shown).

#### Maximal autocorrelation and Hurst exponent

For each species, we computed the maximal autocorrelation for lags larger than 0 with R function acf and the Hurst exponent with function HurstK in R package FGN. We assigned species to four arbitrarily selected maximal autocorrelation bins (< 0.3, [0.3,0.6), [0.6,0.95), > 0.95) and Hurst exponent bins (< 0.6, [0.6,0.8),[0.8,0.9), > 0.9) and computed the percentage of species in each bin.

### Neutrality test and LIMITS

#### Neutrality test

The neutrality test [30] tests the per-capita equivalence of species by determining whether or not the covariances between species are invariant to grouping. The test relies on a constant-volatility transformation that stabilizes the volatility of a two-group (two-species) neutral community irrespective of how species are grouped. Neutrality was tested on relative abundances using 500 randomly drawn constant-volatility transformations through the function NeutralCovTest in the WrightFisher R package (https://github.com/reptalex/WrightFisher) with method logitnorm. *p* values produced from the neutral covariance test were used as a measure of the incompatibility of the data with the neutral model.

#### LIMITS

We translated the LIMITS algorithm [14], originally implemented in Mathematica, into R. We then ran LIMITS on the 60 top-abundant species of relative abundance time series. When the inferred interaction matrix had at least one eigenvalue with a positive real part, we applied a Schur decomposition and modified the diagonal part to avoid explosions when predicting time series. We assessed the accuracy by computing the mean correlation of the known and the inferred interaction matrix rows and the goodness of fit as the mean correlation of the observed and predicted community time series. The predicted time series was computed with the parameterized Ricker model in a step-wise manner, i.e. the values at each time point are computed from the original values at the preceding time point using the predicted interaction matrix. The carrying capacity of a species was estimated as the mean of its abundance across time points.

#### Network analysis

Links between phyla were counted as the number of entries in the interaction matrix, including the diagonal. The significance of intra- and inter-phylum link number was assessed by repeatedly (100 times) randomizing the interaction matrix while preserving the total number of entries and computing parameter-free *p* values.