Prelude

You will need the following data sets for this HW: Ocean Indexes.csv.

Throughout, pay attention to which of the variables you are being asked to work with as not all variables will be used for all problems.

Please turn in a Word doc or PDF with your answers and plots. To copy your plots to the clipboard, use the Export menu in the Plots pane and select “Copy to Clipboard.” This will put a nicer looking, more legible plot in the file you turn in. It is helpful if you paste your code at the end of your document. But please also turn in your script. Use comments and line breaks so your code is legible.

Problem 1

Using the Ocean Indexes data, evaluate the correlation between sea surface temperature (SST) and the Pacific Decadal Oscillation (PDO). (You will have to select these variables from the larger data set.)

1a. Before doing so statistically, plot the data, and make a guess. Show your plot (axes labeled), give your guesstimate, and explain why you choose this number.

1b. Are these two variables correlated? Why or why not? Provide statistical support and explain in plain language what it means. (I am not expecting detailed information about oceanography–these variables could be anything; just consider how/why they are related or not and how we know)

OI<-read.csv("Ocean Indexes.csv")
OI<-read.csv("Ocean Indexes.csv")
head(OI)
##   Year        MEI        PDO       UPW      SST      SAL         NPGO    PNI
## 1 1970 -0.5475833 -0.3975000 -17.16667 8.794167 31.43583  0.061043154  0.123
## 2 1971 -1.4175833 -1.2908333  -7.75000 8.520833 31.03667  0.381984271 -1.388
## 3 1972  0.8678333 -0.9216667 -15.41667 8.524167 30.74417 -0.659303726 -1.382
## 4 1973 -0.4600000 -0.8041667 -18.83333 8.742500 31.55167  0.126585564  0.291
## 5 1974 -1.1784167 -0.3366667 -11.00000 8.890000 31.27417 -0.009310217 -1.008
## 6 1975 -1.3032500 -1.1016667 -14.16667 8.757500 31.74250  0.798500537 -0.908
##         Clam
## 1 0.06317295
## 2 0.05432885
## 3 0.02706698
## 4 0.04190803
## 5 0.06969651
## 6 0.01198178
#1 Plot the data and make a guess
plot(OI$SST, OI$PDO, xlab="SST", ylab="PDO") #0.3

cor(OI$SST, OI$PDO)
## [1] 0.4777927
cor.test(OI$SST, OI$PDO)
## 
##  Pearson's product-moment correlation
## 
## data:  OI$SST and OI$PDO
## t = 3.6078, df = 44, p-value = 0.0007851
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2176885 0.6745316
## sample estimates:
##       cor 
## 0.4777927
# data:  OI$SST and OI$PDO
# t = 3.6078, df = 44, p-value = 0.0007851
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
#  0.2176885 0.6745316
# sample estimates:
#       cor 
# 0.4777927 

#2 The variables are moderately positively correlated, with r=0.478 (t=3.6, p<0.001).

Problem 2

Clams, as largely sessile organisms living within the benthos, have been shown to be good indicators of climate. Like trees, bivalves deposit annual growth rings which can be used to make inference about their living conditions (this is called sclerochronology). So, we have gone to a local clam bed (location classified because I don’t give out my clamming locations to just anyone!) and collected some clams. We brought them back to the lab and measured the growth increments. We only used clams that had a record from at least 1970-2015.

Using the Ocean Indexes data, conduct an analysis (here, a simple linear regression) to see if an indicator of ocean condition, upwelling (UPW), can describe clam growth (cm/yr). The upwelling index used here is from 0 to -50, where negative numbers are indicative of stronger upwelling (more water moving offshore). We know upwelling brings nutrient-rich deep waters to our coast, bringing the raw materials for the phytoplankton clams eat. Given this biological link, we might expect years with strong upwelling to result in higher growth. Again, you will need to subset the data or otherwise call the variables of interest (there are multiple ways to do this).

Your steps should include:
* Plotting your data
* Assessing assumptions (here we will proceed assuming the data are normal enough, but do check)
* Fitting your model and plotting the model fit
* Describing the output, including any statistical tests that will aid in your interpretation
* Stating in plain language what you conclude.

Full credit will be given for a complete and well executed analysis.

OI2<-OI[,c(1,4,9)]
OI2 #Get the variables of interest
##    Year        UPW       Clam
## 1  1970 -17.166667 0.06317295
## 2  1971  -7.750000 0.05432885
## 3  1972 -15.416667 0.02706698
## 4  1973 -18.833333 0.04190803
## 5  1974 -11.000000 0.06969651
## 6  1975 -14.166667 0.01198178
## 7  1976 -14.083333 0.08751025
## 8  1977 -14.750000 0.10086715
## 9  1978 -15.250000 0.10108000
## 10 1979 -27.083333 0.11934122
## 11 1980 -32.166667 0.14319477
## 12 1981 -39.000000 0.15323831
## 13 1982  -6.666667 0.09403933
## 14 1983 -45.916667 0.21586355
## 15 1984 -22.250000 0.14965872
## 16 1985 -10.416667 0.13578962
## 17 1986 -41.666667 0.17452399
## 18 1987 -29.500000 0.24434033
## 19 1988 -15.666667 0.13705313
## 20 1989  -6.333333 0.08407399
## 21 1990  -1.166667 0.06774177
## 22 1991 -13.833333 0.06313096
## 23 1992 -21.333333 0.15568780
## 24 1993 -23.916667 0.17882583
## 25 1994 -17.833333 0.08623534
## 26 1995 -48.833333 0.14696154
## 27 1996 -14.750000 0.14357765
## 28 1997 -30.583333 0.20455024
## 29 1998 -37.833333 0.10723879
## 30 1999 -41.416667 0.02248370
## 31 2000 -25.000000 0.04285771
## 32 2001 -29.333333 0.04901938
## 33 2002 -29.166667 0.09949536
## 34 2003 -41.500000 0.15867651
## 35 2004 -17.583333 0.12314304
## 36 2005 -11.583333 0.12597173
## 37 2006  -8.416667 0.09763152
## 38 2007  -7.083333 0.07317572
## 39 2008  -1.916667 0.06759364
## 40 2009 -11.500000 0.04197175
## 41 2010 -45.916667 0.07257655
## 42 2011 -13.750000 0.01182668
## 43 2012 -22.750000 0.01965654
## 44 2013   5.916667 0.04902043
## 45 2014 -20.166667 0.16796428
## 46 2015  -9.250000 0.20796346
head(OI2) #Check data for expectations
##   Year       UPW       Clam
## 1 1970 -17.16667 0.06317295
## 2 1971  -7.75000 0.05432885
## 3 1972 -15.41667 0.02706698
## 4 1973 -18.83333 0.04190803
## 5 1974 -11.00000 0.06969651
## 6 1975 -14.16667 0.01198178
dim(OI2) #46x3
## [1] 46  3
str(OI2)
## 'data.frame':    46 obs. of  3 variables:
##  $ Year: int  1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 ...
##  $ UPW : num  -17.17 -7.75 -15.42 -18.83 -11 ...
##  $ Clam: num  0.0632 0.0543 0.0271 0.0419 0.0697 ...
# #'data.frame':    46 obs. of  3 variables:
#  $ Year: int  1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 ...
#  $ UPW : num  -17.17 -7.75 -15.42 -18.83 -11 ...
#  $ Clam: num  0.0632 0.0543 0.0271 0.0419 0.0697 ...

#Plotting the data (plots should be neat, with clam on the dependent axis)
plot(OI2$UPW, OI2$Clam) #Maybe a slight negative relationship

hist(OI2$UPW) #Normal

hist(OI2$Clam) #Normal

summary(OI2) #Mean=-20,6 for UPW, 0.10 for Clam
##       Year           UPW               Clam        
##  Min.   :1970   Min.   :-48.833   Min.   :0.01183  
##  1st Qu.:1981   1st Qu.:-29.292   1st Qu.:0.06314  
##  Median :1992   Median :-17.375   Median :0.09856  
##  Mean   :1992   Mean   :-20.556   Mean   :0.10421  
##  3rd Qu.:2004   3rd Qu.:-11.521   3rd Qu.:0.14612  
##  Max.   :2015   Max.   :  5.917   Max.   :0.24434
sd(OI2$Clam) #0.0583
## [1] 0.05830105
sd(OI2$UPW) #13.11
## [1] 13.1151
#Neither SD is greater than the mean.
#Assumptions have been addressed: samples are independent, respon. var. is normal as is predic., no real violation of normality of variance
#Fitting the model
mod1<-lm(Clam~UPW, data=OI2)
summary(mod1)
## 
## Call:
## lm(formula = Clam ~ UPW, data = OI2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.116198 -0.029658 -0.000305  0.036298  0.125350 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0702430  0.0151222   4.645 3.08e-05 ***
## UPW         -0.0016525  0.0006221  -2.656    0.011 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05473 on 44 degrees of freedom
## Multiple R-squared:  0.1382, Adjusted R-squared:  0.1186 
## F-statistic: 7.055 on 1 and 44 DF,  p-value: 0.01097
anova(mod1)
## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq   Mean Sq F value  Pr(>F)  
## UPW        1 0.021136 0.0211356  7.0548 0.01097 *
## Residuals 44 0.131820 0.0029959                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#B0=0.7
#B1=-0.0017, sig. dif. from zero, t=-2.656, p=0.011
#F: p=0.011 (because only 1 variable, it is the same as the slope P value)
#R^2: 0.119 (Terrible)
#Plot the model fit
pred<-predict(mod1)
plot(OI2$UPW, OI2$Clam, xlab="UPW", ylab="Clam Growth", col="darkgray", pch=16)
lines(OI2$UPW, pred)

#B1 is stat. different from 0 and F test is sig, suggesting that upwelling could be a predictor.
#R2 is 0.12, which is not very good though
#There is a weak association between clam growth and UPW with a slope of -0.001. Stronger upwelling (more negative values) results in greater clam growth, and as upwelling decreases (index values move toward 0), clam growth responds at -0.002 cm/yr for every change in UPW. The R^2 value for this model is very low indicating that there remains a large amount of unexplained variance and other predictors for clam growth should be considered.

Problem 3

Our simple linear model did not have much explanatory power. But we do have additional data and can possibly improve our model fit by adding a variable. Using the same Ocean Indexes data, conduct an analysis (here, a linear regression, potentially with multiple explanatory variables) to see if the indicators of ocean condition: PDO, North Pacific Gyre Oscillation (NPGO), and Upwelling (UPW) can describe clam growth. For simplicity, model the main effects only. Pay attention to what variables you are including–you will need to select the appropriate ones.

Your steps should include:
* Plotting your data
* Checking for correlations among the predictor variables (you may use any number of approaches for this, use the internet and R help to find a way; the lattice package has some nice scatterplot matrix options as does GGally)
* Assessing assumptions
* Justifying the inclusion/exclusion of any correlated (collinear) variables
* Fitting your model
* Describing the output. For this exercise, rely on adjusted R2 as the primary means of assessing model fit in addition to checking for significant slopes and varaince explained through the ANOVA. We will cover additional methods in future weeks, but here, rely on R2 to chose your best model.
* Stating in plain language what you conclude.

If you are confused, refer to Zuur et al. 2009; this is a great resource.

The best answers will be organized and demonstrate a systematic approach to answering the question and will use statistical evidence to support the results and inference.

library(lattice)
OI3<-OI[,c(1,3:5,7,9)] #Get data into dataframe
OI3
##    Year          PDO        UPW       SST         NPGO       Clam
## 1  1970 -0.397500000 -17.166667  8.794167  0.061043154 0.06317295
## 2  1971 -1.290833333  -7.750000  8.520833  0.381984271 0.05432885
## 3  1972 -0.921666667 -15.416667  8.524167 -0.659303726 0.02706698
## 4  1973 -0.804166667 -18.833333  8.742500  0.126585564 0.04190803
## 5  1974 -0.336666667 -11.000000  8.890000 -0.009310217 0.06969651
## 6  1975 -1.101666667 -14.166667  8.757500  0.798500537 0.01198178
## 7  1976  0.008333333 -14.083333  8.664167  1.863211350 0.08751025
## 8  1977  0.230833333 -14.750000  8.976667  1.249270341 0.10086715
## 9  1978  0.235833333 -15.250000  9.180833  0.504466636 0.10108000
## 10 1979  0.335000000 -27.083333  9.060000 -0.737188575 0.11934122
## 11 1980  0.602500000 -32.166667  9.260833 -0.749107276 0.14319477
## 12 1981  0.918333333 -39.000000  9.420833 -0.221840562 0.15323831
## 13 1982  0.114166667  -6.666667  9.545000 -0.686691381 0.09403933
## 14 1983  1.648333333 -45.916667 10.171667 -0.092064694 0.21586355
## 15 1984  0.837500000 -22.250000  9.330833  0.620750037 0.14965872
## 16 1985  0.449166667 -10.416667  8.865833 -0.393901047 0.13578962
## 17 1986  1.239166667 -41.666667  9.218333 -0.732847288 0.17452399
## 18 1987  1.820833333 -29.500000  9.560833  0.429174729 0.24434033
## 19 1988  0.531666667 -15.666667  9.232500  1.323636763 0.13705313
## 20 1989 -0.179166667  -6.333333  9.090000  0.612240617 0.08407399
## 21 1990 -0.355833333  -1.166667  9.930833 -0.201546121 0.06774177
## 22 1991 -0.419166667 -13.833333  9.287500 -0.498914711 0.06313096
## 23 1992  0.928333333 -21.333333  9.728333 -1.094918870 0.15568780
## 24 1993  1.416666667 -23.916667  9.205833 -1.901992588 0.17882583
## 25 1994 -0.151666667 -17.833333  9.775000 -1.211598783 0.08623534
## 26 1995  0.642500000 -48.833333  9.981667 -1.039208546 0.14696154
## 27 1996  0.640833333 -14.750000  9.520000 -0.972836612 0.14357765
## 28 1997  1.460833333 -30.583333 10.045000 -0.657690616 0.20455024
## 29 1998  0.245833333 -37.833333 10.454167  0.544553283 0.10723879
## 30 1999 -1.063333333 -41.416667  9.606667  1.515780517 0.02248370
## 31 2000 -0.590000000 -25.000000  9.647500  1.885861292 0.04285771
## 32 2001 -0.562500000 -29.333333  9.580833  2.080693375 0.04901938
## 33 2002  0.220833333 -29.166667  9.586667  1.473593960 0.09949536
## 34 2003  0.969166667 -41.500000  9.921667  0.958507189 0.15867651
## 35 2004  0.345000000 -17.583333  9.895000  0.209460249 0.12314304
## 36 2005  0.375000000 -11.583333  9.975000 -1.380983404 0.12597173
## 37 2006  0.190833333  -8.416667  9.902500 -0.553980163 0.09763152
## 38 2007 -0.195833333  -7.083333  9.465000  0.593350999 0.07317572
## 39 2008 -1.292500000  -1.916667  9.197500  1.455314878 0.06759364
## 40 2009 -0.612500000 -11.500000  9.406667  0.736744278 0.04197175
## 41 2010 -0.312500000 -45.916667  9.689167  1.369721176 0.07257655
## 42 2011 -1.230833333 -13.750000  9.279167  0.857513808 0.01182668
## 43 2012 -1.100000000 -22.750000  9.140833  1.476851611 0.01965654
## 44 2013 -0.517500000   5.916667  9.021667  0.344371399 0.04902043
## 45 2014  1.130833333 -20.166667  9.570000 -0.249464974 0.16796428
## 46 2015  1.634166667  -9.250000 10.152500 -1.395119958 0.20796346
head(OI3)
##   Year        PDO       UPW      SST         NPGO       Clam
## 1 1970 -0.3975000 -17.16667 8.794167  0.061043154 0.06317295
## 2 1971 -1.2908333  -7.75000 8.520833  0.381984271 0.05432885
## 3 1972 -0.9216667 -15.41667 8.524167 -0.659303726 0.02706698
## 4 1973 -0.8041667 -18.83333 8.742500  0.126585564 0.04190803
## 5 1974 -0.3366667 -11.00000 8.890000 -0.009310217 0.06969651
## 6 1975 -1.1016667 -14.16667 8.757500  0.798500537 0.01198178
plot(OI3$Year, OI3$Clam) #Time series suggests variable growth by year

hist(OI3$Clam)

hist(OI3$NPGO)

hist(OI3$SST)

hist(OI3$UPW)

hist(OI3$PDO)

#All look normal enough to proceed
summary(OI3)
##       Year           PDO               UPW               SST        
##  Min.   :1970   Min.   :-1.2925   Min.   :-48.833   Min.   : 8.521  
##  1st Qu.:1981   1st Qu.:-0.4929   1st Qu.:-29.292   1st Qu.: 9.103  
##  Median :1992   Median : 0.2058   Median :-17.375   Median : 9.414  
##  Mean   :1992   Mean   : 0.1247   Mean   :-20.556   Mean   : 9.408  
##  3rd Qu.:2004   3rd Qu.: 0.6421   3rd Qu.:-11.521   3rd Qu.: 9.719  
##  Max.   :2015   Max.   : 1.8208   Max.   :  5.917   Max.   :10.454  
##       NPGO              Clam        
##  Min.   :-1.9020   Min.   :0.01183  
##  1st Qu.:-0.6589   1st Qu.:0.06314  
##  Median : 0.1680   Median :0.09856  
##  Mean   : 0.1746   Mean   :0.10421  
##  3rd Qu.: 0.8428   3rd Qu.:0.14612  
##  Max.   : 2.0807   Max.   :0.24434
#Units are different among indexes but not drastically so (mostly index values, except temp, so will retain as is)

splom(OI3) #Scatterplot matrix for clams, only PDO and Clams are strongly correlated, but SST and PDO are weakly correlated

cor(OI3$SST, OI3$PDO) #0.478, not terrible, proceed
## [1] 0.4777927
#Model fitting
#Start with full model

mod2<-lm(Clam~PDO + NPGO + UPW, data=OI3)
summary(mod2)
## 
## Call:
## lm(formula = Clam ~ PDO + NPGO + UPW, data = OI3)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.013459 -0.008554 -0.002433  0.003618  0.057509 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.685e-02  4.083e-03  23.722   <2e-16 ***
## PDO          6.667e-02  3.231e-03  20.633   <2e-16 ***
## NPGO        -3.508e-04  2.533e-03  -0.138    0.891    
## UPW          4.315e-05  1.833e-04   0.235    0.815    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01416 on 42 degrees of freedom
## Multiple R-squared:  0.9449, Adjusted R-squared:  0.941 
## F-statistic: 240.3 on 3 and 42 DF,  p-value: < 2.2e-16
anova(mod2)
## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq  Mean Sq  F value Pr(>F)    
## PDO        1 0.144513 0.144513 720.7303 <2e-16 ***
## NPGO       1 0.000010 0.000010   0.0476 0.8283    
## UPW        1 0.000011 0.000011   0.0555 0.8150    
## Residuals 42 0.008421 0.000201                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(mod2)
## Single term deletions
## 
## Model:
## Clam ~ PDO + NPGO + UPW
##        Df Sum of Sq      RSS     AIC
## <none>              0.008421 -387.86
## PDO     1  0.085364 0.093786 -278.99
## NPGO    1  0.000004 0.008425 -389.84
## UPW     1  0.000011 0.008433 -389.80
#Only PDO is sig., R^2=0.94

#Will fit all combos; using R^2 as goodness of fit
mod3<-lm(Clam~PDO + UPW, data=OI3)
summary(mod3) #R2=0.9424, Only PDO sig
## 
## Call:
## lm(formula = Clam ~ PDO + UPW, data = OI3)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.013560 -0.009122 -0.002375  0.003575  0.057269 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.691e-02  4.011e-03  24.163   <2e-16 ***
## PDO         6.691e-02  2.666e-03  25.095   <2e-16 ***
## UPW         5.069e-05  1.730e-04   0.293    0.771    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.014 on 43 degrees of freedom
## Multiple R-squared:  0.9449, Adjusted R-squared:  0.9424 
## F-statistic: 368.8 on 2 and 43 DF,  p-value: < 2.2e-16
anova(mod3) 
## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq  Mean Sq  F value Pr(>F)    
## PDO        1 0.144513 0.144513 737.5538 <2e-16 ***
## UPW        1 0.000017 0.000017   0.0859 0.7709    
## Residuals 43 0.008425 0.000196                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod4<-lm(Clam~PDO, data=OI3)
summary(mod4) #R2=0.9436, very marginally better, but only 1 predictor
## 
## Call:
## lm(formula = Clam ~ PDO, data = OI3)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.013748 -0.009301 -0.002787  0.003997  0.057779 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.095904   0.002065   46.45   <2e-16 ***
## PDO         0.066607   0.002427   27.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01385 on 44 degrees of freedom
## Multiple R-squared:  0.9448, Adjusted R-squared:  0.9436 
## F-statistic: 753.2 on 1 and 44 DF,  p-value: < 2.2e-16
anova(mod4)
## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq  Mean Sq F value    Pr(>F)    
## PDO        1 0.144513 0.144513   753.2 < 2.2e-16 ***
## Residuals 44 0.008442 0.000192                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod5<-lm(Clam~UPW+NPGO, data=OI3)
summary(mod5) #R^2=0.358 (much worse)
## 
## Call:
## lm(formula = Clam ~ UPW + NPGO, data = OI3)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.092400 -0.028232 -0.005093  0.030923  0.131824 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0731693  0.0129218   5.662 1.13e-06 ***
## UPW         -0.0017575  0.0005314  -3.307 0.001910 ** 
## NPGO        -0.0291252  0.0069745  -4.176 0.000142 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0467 on 43 degrees of freedom
## Multiple R-squared:  0.3868, Adjusted R-squared:  0.3583 
## F-statistic: 13.56 on 2 and 43 DF,  p-value: 2.709e-05
anova(mod5) #Both predictors are significant but the model fit is not good.
## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq  Mean Sq F value    Pr(>F)    
## UPW        1 0.021136 0.021136  9.6905 0.0032890 ** 
## NPGO       1 0.038034 0.038034 17.4384 0.0001421 ***
## Residuals 43 0.093786 0.002181                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Best model appears to be Clam~PDO using R^2 and significance of predictors--only PDO is a strong predictor and model is simplest
#Compare with AIC
AIC(mod2, mod3, mod4, mod5)
##      df       AIC
## mod2  5 -255.3161
## mod3  4 -257.2951
## mod4  3 -259.2034
## mod5  4 -146.4453
#Model 4 is the lowest, although model3 has support also (<3 AIC units less)

#PDO is the most significant predictor of clam growth and was in all the models; a model with just PDO had lower AIC than with any other factors, although UPW explains a little variation and a model with this term is supported based on AIC. 
#Best model
mod4<-lm(Clam~PDO, data=OI)
summary(mod4) #R2=0.9436, very marginally better, but only 1 predictor
## 
## Call:
## lm(formula = Clam ~ PDO, data = OI)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.013748 -0.009301 -0.002787  0.003997  0.057779 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.095904   0.002065   46.45   <2e-16 ***
## PDO         0.066607   0.002427   27.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01385 on 44 degrees of freedom
## Multiple R-squared:  0.9448, Adjusted R-squared:  0.9436 
## F-statistic: 753.2 on 1 and 44 DF,  p-value: < 2.2e-16
anova(mod4)
## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq  Mean Sq F value    Pr(>F)    
## PDO        1 0.144513 0.144513   753.2 < 2.2e-16 ***
## Residuals 44 0.008442 0.000192                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#PDO is a strong positive predictor of clam growth (F=753.2, P<0.001), with clams increasing 0.07 cm/yr for every unit increase in PDO index (warmer waters). The model shows a good fit to the data with R^2=0.944


#Clam
pred2<-predict(mod4)
plot(OI$PDO, OI$Clam, xlab="PDO", ylab="Clam Growth", col="dodgerblue", pch=16)
lines(OI$PDO, pred2)