There are two main sections to this report: robust cross-correlation and robust Pearson cross-correlation. A third section summarises findings across antimicrobial classes. A summary of results, including antimicrobial classes no featured in this report, is described in section 3. Full reports for the remaining six antimicrobial class correlations can be found in the following links:
Amoxicilin (present report) https://rpubs.com/arm_dtu/692570
Tetracycline https://rpubs.com/arm_dtu/694820
Erythromycin https://rpubs.com/arm_dtu/694823
Sulphonamide https://rpubs.com/arm_dtu/694824
Neomycin https://rpubs.com/arm_dtu/694825
Amphenicol https://rpubs.com/arm_dtu/694826
Lincosamide https://rpubs.com/arm_dtu/694827
We measure cross-correlation between two series to identify factors or predictors that may explain the variation of the series over time. To determine wether a relationship exists between time series, we look at large correlations surrounded by correlation values that quickly decay and become insignificant. A correlation is significant when the absolute value is greater than:
\[\begin{align*} 2/ \sqrt{n - |k|}\\ \end{align*}\]
where n is the number of observations and k is the lag. This calculation is based on large-sample normal approximation and is considered a standard method for estimating threshold values. We reject the hypothesis that the population cross correlation of lag k equals zero and for a significance level of approximately 5%. Cross-correlation in bivariate data can also be done with robust methods for significance of correlation, robust meaning methods where multiple forms of heteroskedastic and dependence departures from identically and independently distributed (i.i.d) noise is possible. We look at this method simultaneously with the standard approach as mentioned above (Figure 1). A second correlation metric, pairwise Pearson correlation in multivariate data (a linear measure of dependence), can also evaluate significance of cross-correlation and so we include the robust version of this methodology by Dalla et al. (2019) for a selected range of transformed time-series values (I did not check normality assumption though, but the differenciation removed trend and so identical distribution was met).
The cross correlation function depends on the assumption that there is no autocorrelation and series are stationary, which we need to verify and correct accordingly. We first transform the original data (Figure 2) to a form where local random variations are consistent over time and local variations look mostly constant over time (constant mean and variance). This is habitually achieved by log-normalizing the variables (Figure 3). We performe this transformation for the three age groups, as the metagenomic and phenotype data are already represented as log-ratios (ALR). We can visually inspect for a strong variation in the mean along the time series. There seems to be a slight increase in mean for the 56 aged group and a spike in ALR values for phenotype E.coli (Figure 3).
A stationary time series’ mean, variance, and autocorrelation structure do not change over time. Visually, a stationary time series will be a flat looking series, with no trend and autocorrelation. Logged transformation of the data may fail to produce stationary time-series: we further attempt stationarity by differencing the data. One step differencing is usually sufficient. We can further attempt stationarity with a second order differentiation. From the transformations, we select the series that produces stationary.
At each moment of differenciation, we inspect for trend stationary or difference stationary by performing two tests: the Kwiatkowski-Phillips-Schmidt-Shin test (KPSS) and the augmented Dickey–Fuller test (ADF) are performed on logged, first, and second differenced data, to test for a unit root versus stationarity. A trend stationary process is one whereby, if the trend was removed from your data, then we would be left with a stationary dataset. However, this would not be the case in data that has a unit root present and where Dickey-Fuller test cannot be rejected. The KPSS test is used to determine whether the data is trend stationary or not, with your null hypothesis being a trend stationary series, and the alternative being the presence of a unit root.
If we reject the null hypothesis for the KPSS test (small p-value), but not the ADF test (large p-value), the series has a unit root; to reject the null hypothesis for the ADF test (small p-value), but not the KPSS tests (large p-value) implied that the data is stationary; to reject the null hypothesis for both the KPSS test and the ADF test implies that both hypotheses are competent.
Missing data was interpolated using spline interpolation. Age group 55 had no missing values, age group 56 had 1 missing value, age group 57 had 4 missing values, Amoxicilin ALR had 4 missing values and E.coli had 9 missing values. There were a totl of 38 time-series monthly values from 2015-03-01 to 2018-04-01.
Figure 1: Interpreting AMR and AMU across cross-correlations and lags. Straight lines define thresholds for standard cross-correlation significance, while the squiggly lines define the threshold of significance according to the robust methodology by Dala et al (2019).
Figure 1: Plots for the original data.
Figure 2: Plots for the logged data. Since ALR values are logged, here th transformation is performed for the age group variables.
In Figures 4 to 9 we can inspect the cross-correlation plots and cumulative test plots of significance for the time-series transformations that yielded significant stationarity with the least amount of transformation (i.e. we prefer cross-correlating a time-series made stationary with a logged transformation to one that required further differencing). Cross-correlation plots include lines of robust (color coding error I need to fix; these are the squiggly lines; Dalla et al, 2019) and standard significance levels (straight lines). We preferebly consider correlation at lags where both robust and standard standard significance are achieved. For Figure 4, cross correlation was performed for the time series gp55 twice-diferencedand twice differenced metagenome ALR values (also known as second order differenced). This transformation was also applied to age group 56 and 57, and Amoxicilin cross-correlations. For the age group 55 and amoxicilin cross-correlation in Figure 4, both the standard and robust methods indicate a positive correlation at lag 11 and negative correlation at lag 10. In Figure 5 both methods indicate a strong negative correlation between gp56 and Amoxicilin resistance at lags 4 and 10 and a positive correlation at lag 3. These negative correlations may come from over-differencing. Age group 57 and amoxicilin cross-corrrelate positively at lag 11 and negatively at lag 7 (Figure 6). For the remaining plots of cross-correlation between age groups and E.coli, robust and standard methods fail to identify significant cross-correlation.
Figure 4: Cross correlation of age group 55 and Amoxicilin.
Figure 4: Cross correlation of age group 55 and Amoxicilin.
Figure 5: Cross correlation of age group 56 and Amoxicilin.
Figure 5: Cross correlation of age group 56 and Amoxicilin.
Figure 6: Cross correlation of age group 57 and Amoxicilin.
Figure 6: Cross correlation of age group 57 and Amoxicilin.
Figure 7: Cross correlation of age group 55 and E. coli.
Figure 7: Cross correlation of age group 55 and E. coli.
Figure 8: Cross correlation of age group 56 and E. coli.
Figure 8: Cross correlation of age group 56 and E. coli.
Figure 9: Cross correlation of age group 57 and E. coli.
Figure 9: Cross correlation of age group 57 and E. coli.
We can test the null hypothesis of no correlation between time-series at alpha significance level using the sample Pearson correlation and the p-value of the robust t-type statistic (Dalla et al, 2019). In testing for Pearson correlation at least one of the series needs to have constant mean and to be serially uncorrelated. This allows for more freedome when choosing the variables and their transformation. In keeping with the previous choices of stationary time-series, we perform two-by-two matrix correlations for easy comparability. The cross-correlation results are presented as heatmaps of the sample Pearson correlations among pairs of variables and their p-values (in parenthesis) for testing significance of correlation (Figures 10 to 15). Four shades of red, from dark to light, indicate significance at level alpha = 0.1%, 1%, 5%, 10%, respectively, and white indicates non-significance at level alpha = 10%. No significant correlations were detected between each of the age groups and Amoxicilin or E.coli time-series. When analysing a final heatmap of differenced logged transformed time-series, age categories correlate together and still failed to correlate with metagenomic (Amoxicilion) or phenotype (E.coli) variables (Figure 16).
Figure 10: Cross correlation of age group 55 and Amoxicilin.
Figure 11: Cross correlation of age group 56 and Amoxicilin.
Figure 12: Cross correlation of age group 57 and Amoxicilin.
Figure 13: Cross correlation of age group 55 and E. coli.
Figure 14: Cross correlation of age group 56 and E. coli.
Figure 15: Cross correlation of age group 57 and E. coli.
Figure 16: Cross correlation among all age groups (x1 for 55, x2 for 56 and x3 for 57), Amoxicilin (x4) and E.coli (x5).
This analysis considered the necessary case-by-case transformation of time-series data and the analysis of two correlation approaches (cross-correlation and Pearson bivariate/multivariate correlation), the robust and standard approaches, simulteaneously.
The strongest and significant correlation for age groups 55 and 56 and Amoxicilin resistance is negative and occured at lag 10. This suggests that higher than average Amoxicilin resistance is associated to lower usage among these age groups 10 months later. There was also one moment of positive cross-correlation for the age group of 55 and Amoxicilin ALR at lag 11 meaning higher than average Amoxicilin resistance leads to higher than average usage for this age group 11 months later. A positive autocorrelation at lag 11 was also identified between age group 57 and Amoxicilin ALR, meaning that the two are similarly positively related to each other over time: higher antimicrobial resistance leads to higher antimicrobial usage, 11 months later. For this age group, a second significant but negative lag at 7 suggests higher AMR leads to higher usage 7 months later. There were no significant cross-correlations between age groups and E.coli resistance that satisfied both robust and standard methods. The Pearson cross-correlation method, on twice and single differenced data, also failed to identify significant correlations.
Once differenced variables ACF plots (not shown) produced negative and bellow -0.5 autocorrelation at lag-1, indicating that further differencing would greatly increase these negative autocorrelations at subsequent lags. However, and to achieve stationarity by the ADF and KPSS tests, a second difference was determined and these twice differenced age groups, metagenome and phenotype variables, may have been over-differenced. We see that cross-correlations are frequent at high lags, which is often accidental, apart from where seasonal effects are important. Though we do not expect seasonal variation, we do consider higher lags biologically possible. The cross-correlation function, like the autocorrelation function, will typically show a smooth pattern of significantly positive correlations out to a high number of lags if the variables have not been properly stationarized, which was not the case here where our lags were occurring at high values but were sporadic. It is generally accepted that only the cross-correlations at lags 0, 1, and 2 should be taken seriously. However, for this analysis, it is expected that the large lags may better reflect the effect of antimicrobial usage over antimicrobial resistance. Given these considerations, we will consider cross-correlations for higher lags them in spite of the shortcomings of the cross-correlation methodology. The cross-correlations found in this analysis should therefore be interpreted conservatively.
Table 1: Robust and standard cross-correlation results across all 7 antimicrobial classes.
|
|
Antimicrobial |
|||||
|
|
55 |
56 |
57 |
|||
|
|
+CC |
-CC |
+CC |
-CC |
+CC |
-CC |
|
Amoxicillin |
11 |
10 |
- |
10 |
11 |
7 |
|
Tetracycline |
- |
- |
-10, 10 |
-9, -18 |
|
-18 |
|
Erythromycin |
10,-1 |
11 |
-2, 10 |
-1 |
-5 |
- |
|
Sulphonamide |
0, 13 |
- |
- |
- |
- |
- |
|
Neomycin |
- |
7 |
- |
- |
– |
– |
|
Amphenicol |
-4, -17 |
-5, -7, -18 |
|
4 |
-14 |
-15 |
|
Lincosamide |
- |
- |
10, 11, -2 |
11,9 |
10 |
11, 9, -1 |
|
|
E. coli |
|||||
|
|
55 |
56 |
57 |
|||
|
|
+CC |
-CC |
+CC |
-CC |
+CC |
-CC |
|
Amoxicillin |
- |
- |
- |
- |
- |
- |
|
Tetracycline |
- |
- |
- |
- |
- |
-9 |
|
Erythromycin |
- |
- |
- |
- |
- |
- |
|
Sulphonamide |
- |
-9 |
-1,4 |
- |
- |
- |
|
Neomycin |
- |
- |
- |
- |
|
|
|
Amphenicol |
- |
- |
- |
- |
-17 |
- |
|
Lincosamide |
– |
– |
– |
– |
– |
– |
Table 2: Robust Pearson cross-correlation results across all 7 antimicrobial classes.
|
|
Antimicrobial |
|||||
|
|
55 |
56 |
57 |
|||
|
|
+ CC |
-CC |
+CC |
-CC |
+CC |
-CC |
|
Amoxicillin |
- |
- |
- |
- |
- |
- |
|
Tetracycline |
- |
- |
- |
- |
- |
- |
|
Erythromycin |
- |
- |
- |
- |
Yes |
- |
|
Sulphonamide |
Yes |
- |
Yes |
- |
- |
- |
|
Neomycin |
- |
- |
Yes |
- |
- |
- |
|
Amphenicol |
- |
- |
- |
- |
- |
- |
|
Lincosamide |
- |
- |
- |
- |
- |
- |
|
|
E. coli |
|||||
|
|
55 |
56 |
57 |
|||
|
|
+ CC |
-CC |
+CC |
-CC |
+CC |
-CC |
|
Amoxicillin |
- |
- |
- |
- |
- |
- |
|
Tetracycline |
- |
Yes |
- |
Yes |
- |
- |
|
Erythromycin |
- |
- |
- |
- |
- |
- |
|
Sulphonamide |
- |
- |
- |
Yes |
- |
- |
|
Neomycin |
- |
- |
- |
- |
- |
- |
|
Amphenicol |
- |
- |
- |
Yes |
- |
- |
|
Lincosamide |
- |
- |
- |
- |
- |
- |
Dalla V, Giraitis L, Phillips PCB (2019). “Robust Tests for White Noise and Cross-Correlation.” Cowles Foundation, Discussion Paper No. 2194, URL https://cowles.yale.edu/sites/ default/files/files/pub/d21/d2194.pdf.
Dalla V, Giraitis L, Phillips PCB (2019b). The testcorr Package. https://cran.r-project.org/web/packages/testcorr/vignettes/testcorr.pdf