HW Linear Models

Problem 1

Evaluate the correlation between sea surface temperature (SST) and the Pacific Decadal Oscillation (PDO).

1a. Plot the data and make a guess.

From just plotting the data, PDO and SST appear to be positively correlated, but with a low r and high variance.

1b. Are these two variables correlated?

Pearson correlation tests assume bivariate normality and random sampling. We can assume random sampling in the data, so I made histograms to test if SST and PDO are normally distributed.

PDO is approximately normal.

SST is also normally distributed, so I used a Pearson Correlation Coefficient test to determine if they are correlated.

cor.test(ocean$PDO, ocean$SST)

## 
##  Pearson's product-moment correlation
## 
## data:  ocean$PDO and ocean$SST
## t = 3.6078, df = 44, p-value = 0.0007851
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2176885 0.6745316
## sample estimates:
##       cor 
## 0.4777927

Since our r = 0.478 is significantly different from 0 (t = 3.608, p < 0.001), we can see that PDO and SST are positively correlated with each other. This means that as PDO index increases, sea surface temperature increases but with some variance (but PDO does not necessarily cause the change in sea surface temperature).

Problem 2.

“Clams, as largely sessile organisms living within the benthos, have been shown to be good indicators of climate. Like trees, bivalves deposit annual growth rings which can be used to make inference about their living conditions (this is called sclerochronology). So, we have gone to a local clam bed (location classified because I don’t give out my clamming locations to just anyone!) and collected some clams. We brought them back to the lab and measured the growth increments. We only used clams that had a record from at least 1970-2015.

Using the Ocean Indexes data, conduct an analysis (here, a simple linear regression) to see if an indicator of ocean condition, upwelling (UPW), can describe clam growth (cm/yr). The upwelling index used here is from 0 to -50, where negative numbers are indicative of stronger upwelling (more water moving offshore). We know upwelling brings nutrient-rich deep waters to our coast, bringing the raw materials for the phytoplankton clams eat. Given this biological link, we might expect years with strong upwelling to result in higher growth. Again, you will need to subset the data or otherwise call the variables of interest (there are multiple ways to do this).”

There appears to be an approximate negative correlation between UPW and clam growth.

cor.test(ocean$UPW, ocean$Clam)

## 
##  Pearson's product-moment correlation
## 
## data:  ocean$UPW and ocean$Clam
## t = -2.6561, df = 44, p-value = 0.01097
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.59754328 -0.09127916
## sample estimates:
##        cor 
## -0.3717271

From the correlation t-test, our r = -0.372 is significantly different from 0 (t = -2.656, p < 0.011) and therefore our variables are negatively correlated.

Testing assumptions: Normality

Both upwelling and clam growth are sufficiently normal to satisfy the assumptions of a linear model.

Testing assumptions: Equal Variance

Clam growth and upwelling have similar variances, allowing us to move forward with a linear model.

Fitting the model:

clam.lm <- lm(Clam ~ UPW, data = ocean)
summary(clam.lm)

## 
## Call:
## lm(formula = Clam ~ UPW, data = ocean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.116198 -0.029658 -0.000305  0.036298  0.125350 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0702430  0.0151222   4.645 3.08e-05 ***
## UPW         -0.0016525  0.0006221  -2.656    0.011 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05473 on 44 degrees of freedom
## Multiple R-squared:  0.1382, Adjusted R-squared:  0.1186 
## F-statistic: 7.055 on 1 and 44 DF,  p-value: 0.01097

anova(clam.lm)

## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq   Mean Sq F value  Pr(>F)  
## UPW        1 0.021136 0.0211356  7.0548 0.01097 *
## Residuals 44 0.131820 0.0029959                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The linear model produces an intercept of 0.0702 and a slope of -0.00165 that is significantly different from 0 (t = -0.2656, p < 0.011). Although the model explains a small proportion of the variance in the data (\(R^2\) = 0.119), the model does fit the data well enough (\(F\)_(1,44) = 7.055, p < 0.011).

Diagnostic plots:

## 
## Call:
## lm(formula = Clam ~ UPW, data = ocean)
## 
## Coefficients:
## (Intercept)          UPW  
##    0.070243    -0.001652

Residuals appear normally distributed (QQ plot) and have equal variance (non-clumped spread in the residuals vs fitted plot). There are a few outliers identified in the QQ plot (data points 18, 30, and 46).

The confidence intervals (shaded area) in the plot show the rough fit of the model to the data (\(R^2\) = 0.1186). When upwelling is stronger, clams tend to grow more, and decrease growth as upwelling slows. However, much of the variation in clam growth is not explained by changes in upwelling, as seen by the number of points not contained in the 95% confidence interval.

Problem 3.

“Our simple linear model did not have much explanatory power. But we do have additional data and can possibly improve our model fit by adding a variable. Using the same Ocean Indexes data, conduct an analysis (here, a linear regression, potentially with multiple explanatory variables) to see if the indicators of ocean condition: PDO, North Pacific Gyre Oscillation (NPGO), and Upwelling (UPW) can describe clam growth. For simplicity, model the main effects only. Pay attention to what variables you are including–you will need to select the appropriate ones.”

PDO appears strongly positively correlated to clam growth, whereas NPGO and UPW appear to have a slight negative correlation with clam growth.

Testing for correlation between our three variables:

PDO is correlated with NPGO (t = -3.701, p < 0.001, r = -0.487) and upwelling (t = -2.8295, p < 0.001, r = -0.392) but NPGO and upwelling are not correlated (t = -0.314, p < 0.755). There was no correlation between independent variables greater than 0.6, so for our purposes, we can continue with a multivariate linear model.

We can also see from the plots above that our variables are each normally distributed.

Using boxplots to compare variances, they are equivalent enough to satisfy the assumptions of a linear model.

clam.lm2 <- lm(Clam ~ PDO + NPGO + UPW, data = ocean)
summary(clam.lm2)

## 
## Call:
## lm(formula = Clam ~ PDO + NPGO + UPW, data = ocean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.013459 -0.008554 -0.002433  0.003618  0.057509 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.685e-02  4.083e-03  23.722   <2e-16 ***
## PDO          6.667e-02  3.231e-03  20.633   <2e-16 ***
## NPGO        -3.508e-04  2.533e-03  -0.138    0.891    
## UPW          4.315e-05  1.833e-04   0.235    0.815    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01416 on 42 degrees of freedom
## Multiple R-squared:  0.9449, Adjusted R-squared:  0.941 
## F-statistic: 240.3 on 3 and 42 DF,  p-value: < 2.2e-16

anova(clam.lm2)

## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq  Mean Sq  F value Pr(>F)    
## PDO        1 0.144513 0.144513 720.7303 <2e-16 ***
## NPGO       1 0.000010 0.000010   0.0476 0.8283    
## UPW        1 0.000011 0.000011   0.0555 0.8150    
## Residuals 42 0.008421 0.000201                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Including all three variables with no interaction terms, only PDO has a slope significantly different from 0 at 0.067 (t = 20.633, p < 0.001). NPGO and upwelling had no significant slope (t = -0.138, p < 0.891 and t = 0.235, p < 0.815, respectively). The model is a good fit for the data (\(R^2\) = 0.941, \(F\)_(3,42) = 240.3, p < 0.001), but only PDO contributed significantly to the model (\(F\)_(1,42) = 720.73, p < 0.001). The most parsimonious model would remove the NPGO and upwelling terms.

NPGO and PDO had a significant interaction in the correlation test, so I refit the model with an interaction term, just to check if it was contributing to the model in the background.

clam.lm3 <- lm(Clam ~ PDO * NPGO + UPW, data = ocean)
summary(clam.lm3)

## 
## Call:
## lm(formula = Clam ~ PDO * NPGO + UPW, data = ocean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.017249 -0.008978 -0.003254  0.003413  0.053792 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.618e-02  4.173e-03  23.049   <2e-16 ***
## PDO          6.723e-02  3.311e-03  20.307   <2e-16 ***
## NPGO         1.239e-04  2.604e-03   0.048    0.962    
## UPW          6.241e-05  1.853e-04   0.337    0.738    
## PDO:NPGO    -2.368e-03  2.820e-03  -0.840    0.406    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01421 on 41 degrees of freedom
## Multiple R-squared:  0.9459, Adjusted R-squared:  0.9406 
## F-statistic: 179.1 on 4 and 41 DF,  p-value: < 2.2e-16

anova(clam.lm3)

## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq  Mean Sq  F value Pr(>F)    
## PDO        1 0.144513 0.144513 715.6769 <2e-16 ***
## NPGO       1 0.000010 0.000010   0.0473 0.8289    
## UPW        1 0.000011 0.000011   0.0551 0.8156    
## PDO:NPGO   1 0.000142 0.000142   0.7055 0.4058    
## Residuals 41 0.008279 0.000202                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The interaction between PDO and NPGO has neither a significant slope (t = -0.840, p < 0.406) nor contributes to the model (\(F\)_(1,42) = 0.705, p < 0.406). This reaffirms that the best model for clam growth based on these variables would include only PDO.

ocean2 <- ocean[order(ocean$PDO),]

clam.lm4 <- lm(Clam ~ PDO, data = ocean2)
summary(clam.lm4)

## 
## Call:
## lm(formula = Clam ~ PDO, data = ocean2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.013748 -0.009301 -0.002787  0.003997  0.057779 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.095904   0.002065   46.45   <2e-16 ***
## PDO         0.066607   0.002427   27.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01385 on 44 degrees of freedom
## Multiple R-squared:  0.9448, Adjusted R-squared:  0.9436 
## F-statistic: 753.2 on 1 and 44 DF,  p-value: < 2.2e-16

anova(clam.lm4)

## Analysis of Variance Table
## 
## Response: Clam
##           Df   Sum Sq  Mean Sq F value    Pr(>F)    
## PDO        1 0.144513 0.144513   753.2 < 2.2e-16 ***
## Residuals 44 0.008442 0.000192                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This model has the highest \(R^2\) of our options at 0.9436 (\(F\)_(1,44) = 753.2, p < 0.001) and is the most parsimonious (fewest parameters). It follows that PDO explains the majority of the variation in clam growth of our data set.

HW Linear Models

Luke Ghallahorne

2023-11-10

Problem 1

Problem 2.

Problem 3.