Part 1

Question 1

ggplot(ChannelIslands, aes(x = Area, y = Total)) + 
  geom_point(aes(color='blue', size='2')) +
  geom_smooth() +
  xlab("Area") +
  ylab("Total") +
  ggtitle("Total Number of Species by Island Area")
## Warning: Using size for a discrete variable is not advised.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Question 2

The plot suggests that larger islands foster greater numbers of species.

Question 3

The island with the fewest species is Santa Barbara, which is also the smallest island by land area of the Channel Islands.

Question 4

The island with the most species is Santa Cruz, which is also the largest island by land area of the Channel Islands.

Question 5

The smooth line tells you that the relationship between area and species number is a positive one that is somewhat linear.

Part 2

Question 6

The slope tells us that the number of native species tends to increase as islands get larger.

Question 7

You would find about 125 native plant species on an island with a size of 0 km2.

Question 8

I would conclude that area is statistically significant in determining the number of native species on any given island.

Question 9

Based on the R2 value, I would say that the model is a pretty good predictor of native species.

Question 10

The SSY value is 142,434.875.

Question 11

The SSE value 13,767.4817.

Question 12

The proportionate reduction of SSE relative to SSY is 10.34575.

Question 13

The value in question is 56.07448, which corresponds with the F-statistic.

Part 3

Question 14

ggplot(data=ChannelIslands) +
  geom_point(mapping=aes(x=Area, y=Native), color="forestgreen", shape=15, size = 2.5) +
  geom_smooth(mapping=aes(x=Area, y=Native), color="forestgreen") +
  geom_point(mapping=aes(x=Area, y=Endemic), color="dodgerblue", shape=16, size = 2.5) +
  geom_smooth(mapping=aes(x=Area, y=Endemic), color="dodgerblue") +
  geom_point(mapping=aes(x=Area, y=Exotic), color="firebrick1", shape=17, size = 2.5) +
  geom_smooth(mapping=aes(x=Area, y=Exotic), color="firebrick1") +
  xlab("Area") +
  ylab("Number of a Species Type") +
  ggtitle("Species Richness by Island Area") +
  theme_gray()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Question 15

The slope for native species is rather large, suggesting a pronounced relationship between the number of native species and island area. The slopes for endemic and exotic species is rather small, however, suggesting a less pronounced or nonexistent relationship between the number of endemic/exotic species and island area.

Part 4

Question 16

NAM_resid <- NativeAreaModel$residuals
CI_islands <- ChannelIslands$Island
CI_dist <- ChannelIslands$Dist
NAM_ResidFrame <- data.frame(NAM_resid, CI_islands, CI_dist)

ggplot(NAM_ResidFrame, aes(x = CI_dist, y = NAM_resid)) + 
  geom_point(aes(color='purple', size='2')) +
  geom_smooth() +
  xlab("Distance") +
  ylab("Resdiuals") +
  ggtitle("Residuals by Distance")
## Warning: Using size for a discrete variable is not advised.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The plot shows that islands that are further from the mainland have lower residuals.

Question 17

The plot suggests a negative relationship between the deviation from the modeled richness of native species and distance.

Question 18

ResidDistModel = lm(NAM_resid ~ CI_dist)
summary(ResidDistModel)
## 
## Call:
## lm(formula = NAM_resid ~ CI_dist)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.849 -18.320   8.098  15.904  29.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  71.3151    20.4815   3.482  0.01311 * 
## CI_dist      -1.4052     0.3621  -3.880  0.00817 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared:  0.7151, Adjusted R-squared:  0.6676 
## F-statistic: 15.06 on 1 and 6 DF,  p-value: 0.008167

The slope suggests a negative relationship between the modeled native richness and distance, with 1.4 fewer native species per km. The t-statistic and P>|t| suggest that the relationship is statistically significant, with less than 1% chance that it was the result of randomness. The R2 tells us that 71.51% of the variance in the modeled native richness can be explained by the distance from the mainland.

Question 19

90.33% of the total variance in native richness could be explained by area. 71.51% of the remaining variance could be explained by distance, or about 6.94% of the total variance.