Homework Part 2: Dependent Data

Part I: Ozone time-series

  1. Geography

  1. Decomposition

I used the Puerto Rico time series. It looks like ozone levels peak in the spring and drops in late fall/early winter. After accounting for the seasonal fluctuations in ozone levels, there might be a slight increasing trend in ozone levels over time.

  1. Autocorrelation and ARMA model The decomposition seems to effectively account for the seasonal autocorrelation. We don’t see a strong lag effect after it initially drops off. The ARMA 2,0,2 model is the best fit for the PR data as indicated by having the lowest AIC score.
##                      df      AIC
## arima(Y, c(2, 0, 2))  6 365.8447
## arima(Y, c(3, 0, 2))  7 371.5267
## arima(Y, c(3, 0, 3))  8 372.0320
## arima(Y, c(1, 0, 0))  3 372.5920
## arima(Y, c(0, 0, 1))  3 372.6096
## arima(Y, c(0, 0, 2))  4 374.4973
## arima(Y, c(2, 0, 0))  4 374.5905
## arima(Y, c(1, 0, 1))  4 374.5917
## arima(Y, c(1, 0, 2))  5 376.2106
## arima(Y, c(2, 0, 1))  5 376.3971
  1. Modeling the trend
# remove the seasonal component
PR.noseason <- PR.decomp$x - PR.decomp$seasonal

# perform a simple linear model of Ydeseasoned against time
summary(lm(PR.noseason ~ time(PR.noseason)))$coefficients
##                       Estimate  Std. Error   t value     Pr(>|t|)
## (Intercept)       -3347.395504 856.0400529 -3.910326 2.107120e-04
## time(PR.noseason)     1.815204   0.4284572  4.236604 6.796665e-05
# fit a generalized least squares model using the ARMA model
summary(gls(PR.noseason ~ time(PR.noseason), correlation = corARMA(p=2,q=2)))
## Generalized least squares fit by REML
##   Model: PR.noseason ~ time(PR.noseason) 
##   Data: NULL 
##        AIC      BIC    logLik
##   465.5457 481.2851 -225.7728
## 
## Correlation Structure: ARMA(2,2)
##  Formula: ~1 
##  Parameter estimate(s):
##        Phi1        Phi2      Theta1      Theta2 
## -0.01940772  0.25889314  0.39676560  0.07987273 
## 
## Coefficients:
##                        Value Std.Error   t-value p-value
## (Intercept)       -2931.0499 1461.2618 -2.005835  0.0487
## time(PR.noseason)     1.6068    0.7314  2.196891  0.0313
## 
##  Correlation: 
##                   (Intr)
## time(PR.noseason) -1    
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -2.8507269 -0.4191124  0.1973152  0.5857927  2.5546680 
## 
## Residual standard error: 6.504804 
## Degrees of freedom: 72 total; 70 residual

Part II: Ozone spatial correlations

  1. Map the ozone concentration for any month/year combo
##     lat   long time value
## 1 -21.2 -113.8    1   260
## 2 -18.7 -113.8    1   258
## 3 -16.2 -113.8    1   258
## 4 -13.7 -113.8    1   254
## 5 -11.2 -113.8    1   252
## 6  -8.7 -113.8    1   252
## convUL: For the UTM conversion, automatically detected zone 16.
## convUL: Converting coordinates within the northern hemisphere.
## [1] 12

  1. Perform a straightforward linear regression of ozone concentrations against East and North.
## Analysis of Variance Table
## 
## Response: value
##            Df Sum Sq Mean Sq  F value    Pr(>F)    
## X           1   5302    5302  30.2132 5.837e-08 ***
## Y           1  33338   33338 189.9918 < 2.2e-16 ***
## X:Y         1    122     122   0.6927    0.4056    
## Residuals 572 100370     175                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Estimate the spatial scale of autocorrelation in ozone concentrations.
## Generalized least squares fit by REML
##   Model: value ~ X * Y 
##   Data: myO3 
##        AIC      BIC    logLik
##   3555.253 3581.348 -1771.626
## 
## Correlation Structure: Gaussian spatial correlation
##  Formula: ~X + Y 
##  Parameter estimate(s):
##    range 
## 386.1719 
## 
## Coefficients:
##                 Value Std.Error  t-value p-value
## (Intercept) 242.55123  4.334363 55.96007  0.0000
## X            -0.00162  0.000448 -3.62177  0.0003
## Y             0.00590  0.001860  3.17274  0.0016
## X:Y           0.00000  0.000000  0.91416  0.3610
## 
##  Correlation: 
##     (Intr) X      Y     
## X    0.978              
## Y   -0.409 -0.400       
## X:Y -0.400 -0.409  0.978
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -2.3595819 -1.4968214 -0.5698697  1.0149312  4.6283886 
## 
## Residual standard error: 8.75908 
## Degrees of freedom: 576 total; 572 residual
  1. Does taking that correlation into account change your conclusions about longitudinal or latitudinal trends in ozone concentration?

Homework Part 3

  1. Summarize the conclusion of this analysis.
  • The best-fit model included diet as a fixed effect as well as temporal autocorrelation in weight among individual chicks, as determined by AIC. When we run the best-fit model, we find that weight varies significantly among diet treatments and with time. Diet and time also interactively affect weight, but depends on the diet of interest and the polynomial term for time. Our random effect (chick) and correlatino structure accounted for a large amount of error, indicating that they are appropriate to include in the analysis.
  1. How many parameters are estimated in the “best” model? Report all of their values.
  • There are 15 parameters in the best model:
## $fixed
##          (Intercept)                Diet2                Diet3 
##             99.45233             21.25888             40.61424 
##                Diet4       poly(Time, 2)1       poly(Time, 2)2 
##             33.96666            964.36315             69.64777 
## Diet2:poly(Time, 2)1 Diet3:poly(Time, 2)1 Diet4:poly(Time, 2)1 
##            398.15282            843.53246            536.49581 
## Diet2:poly(Time, 2)2 Diet3:poly(Time, 2)2 Diet4:poly(Time, 2)2 
##             61.88236            200.25514             12.34270 
## 
## $random
## $random$Chick
##          Time
## 18 -2.2764667
## 16 -3.5044138
## 15 -2.9372945
## 13 -3.0885444
## 9  -2.6979995
## 20 -2.0850454
## 10 -1.7602940
## 8  -1.2514795
## 17 -1.0199655
## 19 -0.6577041
## 4  -0.2158893
## 6   0.1041061
## 11  1.0696130
## 3   1.6511694
## 1   1.6211551
## 12  1.6937982
## 2   2.1798238
## 5   2.6781571
## 14  4.6356132
## 7   5.8616609
## 24 -6.0450239
## 30 -2.5292309
## 22 -2.0369298
## 23 -1.6173524
## 27 -1.0138030
## 28  0.7507881
## 26  1.3649915
## 25  2.2182044
## 29  3.5208625
## 21  5.3874935
## 33 -4.8003480
## 37 -4.0469594
## 36 -1.7172797
## 31 -0.9172652
## 39 -0.2554927
## 38  0.6595190
## 32  1.4940847
## 40  1.9390989
## 34  2.9856523
## 35  4.6589901
## 44 -2.6063238
## 45 -1.5503753
## 43 -0.9376122
## 41 -1.1380645
## 47 -1.0247609
## 49  0.2318008
## 46  0.1738955
## 50  1.4094139
## 42  1.9243765
## 48  3.5176501
## [1] 14.51408
## [1] 0.8558511
  1. Expand the model selection to include models without the second order Diet term. Are any of those “better”.
  • No, none of the models without diet are better.
##                          .id       V1
## 1            ChickModel.Diet 4789.380
## 2          ChickModel.NoDiet 4842.816
## 3   ChickModel.Diet.AutoCorr 4206.544
## 4 ChickModel.NoDiet.AutoCorr 4245.153
## 5      ChickModel.NoDietTerm 5047.113
## 
## Model selection based on AICc:
## 
##                             K    AICc Delta_AICc AICcWt Cum.Wt       LL
## ChickModel.Diet.AutoCorr   15 4207.40       0.00      1      1 -2088.27
## ChickModel.NoDiet.AutoCorr  9 4245.47      38.07      0      1 -2113.58
## ChickModel.Diet            14 4790.13     582.73      0      1 -2380.69
## ChickModel.NoDiet           8 4843.07     635.67      0      1 -2413.41
## ChickModel.NoDietTerm       3 5047.15     839.76      0      1 -2520.56
  1. Expand the model selection to include corresponding linear models, with individual chick fixed effects. How do they compare?
  • Is it appropriate to use model selection with different types of models?
    • I tried running lms with individual chick as a fixed effect but then I couldn’t use AICtab to compare with the lmes.
    • Then I tried running lmes with random = ~1 but kept getting the following error: Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1
    • I looked up the error and it seems like the model is overparameterized.