Homework 2 parts b & c

Homework Part 2: Dependent Data

Part I: Ozone time-series

Geography

Decomposition

I used the Puerto Rico time series. It looks like ozone levels peak in the spring and drops in late fall/early winter. After accounting for the seasonal fluctuations in ozone levels, there might be a slight increasing trend in ozone levels over time.

Autocorrelation and ARMA model The decomposition seems to effectively account for the seasonal autocorrelation. We don’t see a strong lag effect after it initially drops off. The ARMA 2,0,2 model is the best fit for the PR data as indicated by having the lowest AIC score.

##                      df      AIC
## arima(Y, c(2, 0, 2))  6 365.8447
## arima(Y, c(3, 0, 2))  7 371.5267
## arima(Y, c(3, 0, 3))  8 372.0320
## arima(Y, c(1, 0, 0))  3 372.5920
## arima(Y, c(0, 0, 1))  3 372.6096
## arima(Y, c(0, 0, 2))  4 374.4973
## arima(Y, c(2, 0, 0))  4 374.5905
## arima(Y, c(1, 0, 1))  4 374.5917
## arima(Y, c(1, 0, 2))  5 376.2106
## arima(Y, c(2, 0, 1))  5 376.3971

Modeling the trend

Time is strongly significant in the lm model, and is positively related to ozone levels.
The glm is still significant, but the p value is larger than the lm.

# remove the seasonal component
PR.noseason <- PR.decomp$x - PR.decomp$seasonal

# perform a simple linear model of Ydeseasoned against time
summary(lm(PR.noseason ~ time(PR.noseason)))$coefficients

##                       Estimate  Std. Error   t value     Pr(>|t|)
## (Intercept)       -3347.395504 856.0400529 -3.910326 2.107120e-04
## time(PR.noseason)     1.815204   0.4284572  4.236604 6.796665e-05

# fit a generalized least squares model using the ARMA model
summary(gls(PR.noseason ~ time(PR.noseason), correlation = corARMA(p=2,q=2)))

## Generalized least squares fit by REML
##   Model: PR.noseason ~ time(PR.noseason) 
##   Data: NULL 
##        AIC      BIC    logLik
##   465.5457 481.2851 -225.7728
## 
## Correlation Structure: ARMA(2,2)
##  Formula: ~1 
##  Parameter estimate(s):
##        Phi1        Phi2      Theta1      Theta2 
## -0.01940772  0.25889314  0.39676560  0.07987273 
## 
## Coefficients:
##                        Value Std.Error   t-value p-value
## (Intercept)       -2931.0499 1461.2618 -2.005835  0.0487
## time(PR.noseason)     1.6068    0.7314  2.196891  0.0313
## 
##  Correlation: 
##                   (Intr)
## time(PR.noseason) -1    
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -2.8507269 -0.4191124  0.1973152  0.5857927  2.5546680 
## 
## Residual standard error: 6.504804 
## Degrees of freedom: 72 total; 70 residual

Part II: Ozone spatial correlations

Map the ozone concentration for any month/year combo

I chose December 1995
There’s very little variation North to South and a strong gradient on the eastern edge of the map. Ozone levels are relatively low in most of the mapped area, but rapidly increase off the east coast of central America.

##     lat   long time value
## 1 -21.2 -113.8    1   260
## 2 -18.7 -113.8    1   258
## 3 -16.2 -113.8    1   258
## 4 -13.7 -113.8    1   254
## 5 -11.2 -113.8    1   252
## 6  -8.7 -113.8    1   252

## convUL: For the UTM conversion, automatically detected zone 16.

## convUL: Converting coordinates within the northern hemisphere.

## [1] 12

Perform a straightforward linear regression of ozone concentrations against East and North.

There are significant East-West and North-South trends.
The X:Y interaction term was not significant.

## Analysis of Variance Table
## 
## Response: value
##            Df Sum Sq Mean Sq  F value    Pr(>F)    
## X           1   5302    5302  30.2132 5.837e-08 ***
## Y           1  33338   33338 189.9918 < 2.2e-16 ***
## X:Y         1    122     122   0.6927    0.4056    
## Residuals 572 100370     175                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimate the spatial scale of autocorrelation in ozone concentrations.

The spatial scale is 386 km

## Generalized least squares fit by REML
##   Model: value ~ X * Y 
##   Data: myO3 
##        AIC      BIC    logLik
##   3555.253 3581.348 -1771.626
## 
## Correlation Structure: Gaussian spatial correlation
##  Formula: ~X + Y 
##  Parameter estimate(s):
##    range 
## 386.1719 
## 
## Coefficients:
##                 Value Std.Error  t-value p-value
## (Intercept) 242.55123  4.334363 55.96007  0.0000
## X            -0.00162  0.000448 -3.62177  0.0003
## Y             0.00590  0.001860  3.17274  0.0016
## X:Y           0.00000  0.000000  0.91416  0.3610
## 
##  Correlation: 
##     (Intr) X      Y     
## X    0.978              
## Y   -0.409 -0.400       
## X:Y -0.400 -0.409  0.978
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -2.3595819 -1.4968214 -0.5698697  1.0149312  4.6283886 
## 
## Residual standard error: 8.75908 
## Degrees of freedom: 576 total; 572 residual

Does taking that correlation into account change your conclusions about longitudinal or latitudinal trends in ozone concentration?

Nope, there are still longitudinal and latitudinal trends in ozone concentration.

Homework Part 3

Summarize the conclusion of this analysis.

The best-fit model included diet as a fixed effect as well as temporal autocorrelation in weight among individual chicks, as determined by AIC. When we run the best-fit model, we find that weight varies significantly among diet treatments and with time. Diet and time also interactively affect weight, but depends on the diet of interest and the polynomial term for time. Our random effect (chick) and correlatino structure accounted for a large amount of error, indicating that they are appropriate to include in the analysis.

How many parameters are estimated in the “best” model? Report all of their values.

There are 15 parameters in the best model:

## $fixed
##          (Intercept)                Diet2                Diet3 
##             99.45233             21.25888             40.61424 
##                Diet4       poly(Time, 2)1       poly(Time, 2)2 
##             33.96666            964.36315             69.64777 
## Diet2:poly(Time, 2)1 Diet3:poly(Time, 2)1 Diet4:poly(Time, 2)1 
##            398.15282            843.53246            536.49581 
## Diet2:poly(Time, 2)2 Diet3:poly(Time, 2)2 Diet4:poly(Time, 2)2 
##             61.88236            200.25514             12.34270 
## 
## $random
## $random$Chick
##          Time
## 18 -2.2764667
## 16 -3.5044138
## 15 -2.9372945
## 13 -3.0885444
## 9  -2.6979995
## 20 -2.0850454
## 10 -1.7602940
## 8  -1.2514795
## 17 -1.0199655
## 19 -0.6577041
## 4  -0.2158893
## 6   0.1041061
## 11  1.0696130
## 3   1.6511694
## 1   1.6211551
## 12  1.6937982
## 2   2.1798238
## 5   2.6781571
## 14  4.6356132
## 7   5.8616609
## 24 -6.0450239
## 30 -2.5292309
## 22 -2.0369298
## 23 -1.6173524
## 27 -1.0138030
## 28  0.7507881
## 26  1.3649915
## 25  2.2182044
## 29  3.5208625
## 21  5.3874935
## 33 -4.8003480
## 37 -4.0469594
## 36 -1.7172797
## 31 -0.9172652
## 39 -0.2554927
## 38  0.6595190
## 32  1.4940847
## 40  1.9390989
## 34  2.9856523
## 35  4.6589901
## 44 -2.6063238
## 45 -1.5503753
## 43 -0.9376122
## 41 -1.1380645
## 47 -1.0247609
## 49  0.2318008
## 46  0.1738955
## 50  1.4094139
## 42  1.9243765
## 48  3.5176501

## [1] 14.51408

## [1] 0.8558511

Expand the model selection to include models without the second order Diet term. Are any of those “better”.

No, none of the models without diet are better.

##                          .id       V1
## 1            ChickModel.Diet 4789.380
## 2          ChickModel.NoDiet 4842.816
## 3   ChickModel.Diet.AutoCorr 4206.544
## 4 ChickModel.NoDiet.AutoCorr 4245.153
## 5      ChickModel.NoDietTerm 5047.113

## 
## Model selection based on AICc:
## 
##                             K    AICc Delta_AICc AICcWt Cum.Wt       LL
## ChickModel.Diet.AutoCorr   15 4207.40       0.00      1      1 -2088.27
## ChickModel.NoDiet.AutoCorr  9 4245.47      38.07      0      1 -2113.58
## ChickModel.Diet            14 4790.13     582.73      0      1 -2380.69
## ChickModel.NoDiet           8 4843.07     635.67      0      1 -2413.41
## ChickModel.NoDietTerm       3 5047.15     839.76      0      1 -2520.56

Expand the model selection to include corresponding linear models, with individual chick fixed effects. How do they compare?

Is it appropriate to use model selection with different types of models?
- I tried running lms with individual chick as a fixed effect but then I couldn’t use AICtab to compare with the lmes.
- Then I tried running lmes with random = ~1 but kept getting the following error: Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1
- I looked up the error and it seems like the model is overparameterized.

Homework 2 parts b & c

Alex Novarro

1/26/2017

Homework Part 2: Dependent Data

Part I: Ozone time-series

Part II: Ozone spatial correlations

Homework Part 3