Question 1

ggplot(ChannelIslands, aes(x = Area, y = Total)) +
  geom_point(aes(color = Area, size = Area)) +
  geom_smooth() +
  xlab("Island Area (km²)") +
  ylab("Total Plant Species") +
  ggtitle("Total Species on Land Area")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Question 2

The plot suggests a positive correlation between island land area and total plant species. As the size of the island increases, the number of plant species also increases.

Question 3

Santa Barbara has the fewest species. It is also the smallest island.

Question 4

Santa Cruz has the most species. It is also the largest island.

Question 5

The smooth line shows there is a positive linear relationship between the two variables.

Question 6

The slope coefficient is 1.2376, meaning there is a positive correlation between the two variables. It can be concluded that larger land area correlates with more native species.

Question 7

You would expect to find around 125 native species on an island with a size of 0.0 km² because the intercept is 124.8303.

Question 8

The t-value is a positive number meaning there is a positive correlation between land area and number of species. The Pr(>|t|) is less than 0.05, meaning the null hypothesis that the relationship between land area and number of species is random can be rejected.

Question 9

The R-squared value is 0.9033 meaning 90.33% of the native species variation is caused by land area and the rest is caused by other factors.

Question 10

The total sum of squares is 142434.9.

Question 11

The value of the error sum of squares is 13767.48.

Question 12

The proportionate reduction of SSE relative to SSY is 0.9033419.

Question 13

The value is 56.07448 which corresponds to the F-statistic.

Question 14

ggplot(data=ChannelIslands) +
  geom_point(mapping=aes(x=Area, y=Native), color="forestgreen", shape=15, size=2.5) +
  geom_smooth(mapping=aes(x=Area, y=Native), color="forestgreen") +
  geom_point(mapping=aes(x=Area, y=Endemic), color="dodgerblue", shape=16, size=2.5) +
  geom_smooth(mapping=aes(x=Area, y=Endemic), color="dodgerblue") +
  geom_point(mapping=aes(x=Area, y=Exotic), color="firebrick1", shape=17, size=2.5) +
  geom_smooth(mapping=aes(x=Area, y=Exotic), color="firebrick1") +
  theme_gray()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Question 15

Native species have the steepest positive slope, exotic species have a more moderate positive slope, and endemic species have the most moderate slope.

Question 16

Residuals <- mymodel$residuals
Distance <- ChannelIslands$Dist
Island <- ChannelIslands$Island
ResidData <- data.frame(Island, Residuals, Distance)

ggplot(data=ResidData, aes(x=Distance, y=Residuals)) +
  geom_point() +
  geom_smooth(method="lm")
## `geom_smooth()` using formula = 'y ~ x'

The model shows the relationship between the residuals and the distance the island is from the main land. The negative slope shows that as an island gets further from the mainland, it has fewer native species than expected based on the size of the island. The t value and p value suggest a statistically significant relationship and the r-squared reveals that about 71.51% of the decrease is explained by distance from mainland.

Question 17

The model suggests a negative relationship between residuals and island distance from mainland. As distance from the mainland increases, the modeled richness based on the model of Native Species on Area decreases.

Question 18

resid_model <- lm(Residuals ~ Distance, data=ResidData)
summary(resid_model)
## 
## Call:
## lm(formula = Residuals ~ Distance, data = ResidData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.849 -18.320   8.098  15.904  29.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  71.3151    20.4815   3.482  0.01311 * 
## Distance     -1.4052     0.3621  -3.880  0.00817 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared:  0.7151, Adjusted R-squared:  0.6676 
## F-statistic: 15.06 on 1 and 6 DF,  p-value: 0.008167

The t value is -3.880 and the the Pr(>|t|) is 0.00817. The t value and Pr(>|t|) suggest a statistically significant relationship and the r-squared reveals that about 71.51% of the decrease is explained by distance from mainland.The slope is -1.4052 showing a negative relationship between the two variables.

Question 19

90.33% of total variance in Native Richness was explained by Area and 71.51% of the remaining variance was explained by Distance.