This is just a document to show an example of the fitting of time-varying coefficient (TVC) linear models using the tvReg package in R. Rather than including it all in the paper (which will then likely be deleted because it’s not a major result), I thought it would be easiest to just show here.
First - select two sites (C1 and MS10) and fit both a linear model and a TVC linear model. Only the code for C1 is shown, but the same code is run for MS10. Deciding the bandwidth (which is a tuning parameter that controls the smoothness of the varying coefficients and influences the model fit) is a bit of an open question. Previous studies use LOOCV to decide the parameter value, but I see the benefit to using the same tuning parameter across all sites. Currently I am just specifying the parameter (a smaller value for bandwidth is a less smooth model - and bandwidth of 1 is the same as a regular linear regression). I’d love input on this decision if possible. See Li et al. 2014 for a discussion of their selection process!
## Pick a site and filter the data
site = "C1"
tvreg.air.water = all.air.water[all.air.water$site == site, ]
## Set the bandwidth to whatever % of the data includes 30 observations.
## Note that if bw is set to 1 (or 100% of the data), then the linear regression and time-varying regression will look nearly identical
bw = 100/nrow(na.omit(tvreg.air.water))
## Run the models. If bw is left empty, it will be estimated using LOOCV (the suggested method in previous papers)
coef.lm <- stats::lm(daily.value.water ~ 1 + daily.value.air, data = tvreg.air.water)
model.tvLM <- tvLM(daily.value.water ~ 1 + daily.value.air, data = tvreg.air.water, bw = bw)
## Show model summaries
summary(coef.lm)
##
## Call:
## stats::lm(formula = daily.value.water ~ 1 + daily.value.air,
## data = tvreg.air.water)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1100 -1.2981 -0.0939 1.1962 10.0473
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.366935 0.083605 52.23 <2e-16 ***
## daily.value.air 0.589971 0.007351 80.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.092 on 2543 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.7169, Adjusted R-squared: 0.7168
## F-statistic: 6441 on 1 and 2543 DF, p-value: < 2.2e-16
summary(model.tvLM)
##
## Call:
## tvLM(formula = daily.value.water ~ 1 + daily.value.air, data = tvreg.air.water,
## bw = bw)
##
## Class: tvlm
##
## Summary of time-varying estimated coefficients:
## ================================================
## (Intercept) daily.value.air
## Min. 1.699 -0.04692
## 1st Qu. 3.597 0.36596
## Median 4.518 0.49815
## Mean 5.129 0.47246
## 3rd Qu. 6.002 0.63575
## Max. 13.501 0.79868
##
## Bandwidth: 0.0393
## Pseudo R-squared: 0.9218
Predictions vs. observations for each site/ season combo. Note that the linear regression is just a single regression across all years and seasons (not broken up by season and year like our analysis) and is only used for visualization purposes. Our summary metrics are more equivalent to a piecewise linear regression model.
These graphs below show the estimated intercept and slope for the TCV models through the entire timeseries.
The graphs below just show a comparison between our summary metrics (season/year combination) and the range of intercept-slope values for the TVC linear models. Generally, there’s agreement between the two.