Lab #7: Regression Models

Loading Data

Island=c("Santa Barbara","Anacapa","San Miguel","San Nicolas","San     Clemente","Santa Catalina","Santa Rosa","Santa Cruz")
Area=c(2.6, 2.9, 37, 58, 145, 194, 217, 294)
Dist=c(61, 20, 42, 98, 79, 32, 44, 30)
Native=c(88,190,198,139,272,421,387,480)
Endemic=c(14,22,18,18,47,37,42,45)
Exotic=c(44,75,69,131,110,185,98,170)
Total=c(132,265,267,270,382,604,484,650)

ChannelIslands=data.frame(Island, Area, Dist, Native, Endemic, Exotic, Total)
ChannelIslands$Island <- factor(ChannelIslands$Island)

These lines of code inputs the data, puts the data vectors into a dataframe, and converts the island variable into a factor variable.

Total Species on Island Area

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

This plot shows Total Species on Island Area as the x-axis.

Question 1: The plot is above, and the working code is shown below: ggplot(ChannelIslands, aes(x = Area, y = Total)) + geom_point(aes(color = Area, size = Area), alpha = 0.9) + geom_smooth() +
xlab(“Island Area (km^2)”) + ylab(“Total Species Richness”) + ggtitle(“Scatterplot of Total Species and Island Area”)

Question 2: This plot suggests that there is a positive relationship between the number of plant species found on different islands and the size of the island, which makes intuitive sense. Bigger islands will have more species diversity.

Question 3: The island with the fewest species is Santa Barbra. It is smaller than other islands, 0.3 km squared away from the next smallest, and over 290 km squared away from the largest.

Question 4: The island with the greatest number of species is Santa Cruz, and it is the largest island at 294 km squared.

Question 5: The smooth line tells me that there is a roughly linear or arithmatic relatinoship between these values, rather than an exponential or negative one.

Part 2: Linear Regressions

linear_regression <- lm(Native ~ Area, data = ChannelIslands)
summary(linear_regression)
## 
## Call:
## lm(formula = Native ~ Area, data = ChannelIslands)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.612 -34.226  -7.542  34.551  61.581 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 124.8303    25.9310   4.814 0.002958 ** 
## Area          1.2376     0.1653   7.488 0.000293 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared:  0.9033, Adjusted R-squared:  0.8872 
## F-statistic: 56.07 on 1 and 6 DF,  p-value: 0.0002931

Question 6: The slope coefficient is 1.2376, which tells us that as island areas increases by each 1 km squared, the expected number of native species will increase by about 1.2376.

Question 7: According to the model, there would be 125 species on an island of 0 km squared, but this is only a y-intercept that is used for building a model. In reality, there would likely be no native plant species on a non-existant island.

Question 8: I would conclude that this is a significant relationship between island size and native plant species.

Question 9: The R-squared value is 0.9033, which means that that proportion of native plant species richness can be explained by island area, rather than other factors.

Question 10: The SSY value is 142434.875.

Question 11: 13767.4817

Question 12: The proportionate reduction of SSE to SSY is about 0.9033, which is the same as the R-squared value.

Question 13: k=1, n=8 This value is 56.0745, and it corresponds with the f-value.

Part 3: Richness on Area

ggplot(ChannelIslands) +
  geom_point(aes(x = Area, y = Native), color="forestgreen", shape=15, size = 2.5) +
  geom_smooth(aes(x = Area, y = Native), color="forestgreen", method="lm") +
  geom_point(aes(x = Area, y = Endemic), color="dodgerblue", shape=16, size = 2.5) +
  geom_smooth(aes(x = Area, y = Endemic), color="dodgerblue", method="lm") +
  geom_point(aes(x = Area, y = Exotic), color="firebrick1", shape=17, size = 2.5) +
  geom_smooth(aes(x = Area, y = Exotic), color="firebrick1", method="lm") +
  xlab("Area (km^2)") + ylab("Species Richness") +
  ggtitle("Native, Exotic, Endemic richness vs Area") +
  theme_gray()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Question 14: See code above.

Question 15: The slope of the relationships differs by species type. The strongest positive relationship is for native species, meaning they increase more with area. Endemic and exotic species are both still positive, but their slopes are smaller, meaning they will not increase as quickly with island area.

Part 4: Plotting Residuals onto Distance

ChannelIslands$Residuals <- native_model$residuals
residuals_part_four <- data.frame(Island = ChannelIslands$Island,
                       Residuals = ChannelIslands$Residuals,
                       Distance = ChannelIslands$Dist)

First step is to create a new dataframe.

ggplot(residuals_part_four, aes(x = Distance, y = Residuals)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
  xlab("Distance") + ylab("Residuals") +
  ggtitle("Residuals of Native/Area vs Distance")
## `geom_smooth()` using formula = 'y ~ x'

Question 16: Next, we can plot it. Included below is the code used. ggplot(residuals_part_four, aes(x = Distance, y = Residuals)) + geom_point() + geom_smooth(method = “lm”, se = TRUE) + geom_hline(yintercept = 0, linetype = “dashed”, color = “gray”) + xlab(“Distance”) + ylab(“Residuals”) + ggtitle(“Residuals of Native/Area vs Distance”)

Question 17: The plot suggests that deviation from modeled richness with increase (total species go down) as distance from the mainland increases. This means that islands farther away from the mainland will likely have fewer native spacies than predicted from the model.

Question 18:

residuals_distance <- lm(Residuals ~ Distance, data = residuals_part_four)
summary(residuals_distance)
## 
## Call:
## lm(formula = Residuals ~ Distance, data = residuals_part_four)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.849 -18.320   8.098  15.904  29.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  71.3151    20.4815   3.482  0.01311 * 
## Distance     -1.4052     0.3621  -3.880  0.00817 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared:  0.7151, Adjusted R-squared:  0.6676 
## F-statistic: 15.06 on 1 and 6 DF,  p-value: 0.008167

With a negative slope of about -1.4, species richness will decrease by this amount with each additional 1 mile further away from the mainland. The p-value shows this is a significant relationship. The R squared value shows that about 67% of the chance in species richness can be explained by distance from the mainland, versus other factors.

Question 19: Area explained about 90.3% of the total variance in Native Richness. About 67% of the remaining variance was explained by distance.