Part 1: Plotting

Problem 1

ggplot(ChannelIslands, aes(x = Area, y =Total)) +
geom_point(aes(color=Dist, size=Area)) +
geom_smooth() +
xlab("Island Area") +
ylab("Total Species") +
ggtitle("Scatterplot of Total Species on Island Area")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Problem 2

This plot suggests that there is a somewhat linear relationship between island size and plant species richness. As Island Area increases in value, so does the number of Total Species.

Problem 3

The island with the fewest species is the Santa Barbara one. It is also the smallest island at 2.6 km^2.

Problem 4

The island with the most species is the Santa Cruz one. It is the largest island at 294 km^2.

Problem 5

The smooth, relatively straight shape of the line tells me that the relationship between Island Size and Total Species is linear.

Part 2: Linear Regression

nativeislandspecies = lm(Native ~ Area, data = ChannelIslands)
summary(nativeislandspecies)
## 
## Call:
## lm(formula = Native ~ Area, data = ChannelIslands)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.612 -34.226  -7.542  34.551  61.581 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 124.8303    25.9310   4.814 0.002958 ** 
## Area          1.2376     0.1653   7.488 0.000293 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared:  0.9033, Adjusted R-squared:  0.8872 
## F-statistic: 56.07 on 1 and 6 DF,  p-value: 0.0002931

Problem 6

The slope coefficient tells us that for every km^2 increase in island area, there are 1.2376 more total native species.

Problem 7

On an island with a size of 0 km^2, this model tells us we’d expect to find 124.8303 total native species.

Problem 8

Based on the t-value and Pr(>|t|) for the slope term, we can conclude that island area is a very strong predictor of total species richness and that the results are statistically significant.

Problem 9

The model explains 90.3% of the variation in species richness across the islands.

Problem 10

SSY <- sum((ChannelIslands$Total - mean(ChannelIslands$Total))^2 )
SSY
## [1] 233469.5

Problem 11

SSE <- sum(resid(nativeislandspecies)^2)
SSE
## [1] 13767.48

Problem 12

PR <- 1 - (SSE / SSY)
PR
## [1] 0.9410309

Problem 13

n <- (SSY - SSE)/(SSE/6)
n
## [1] 95.74824

This value corresponds to the F-statistic.

Part 3: Richness on Area by Species

Problem 14

CI_long <- ChannelIslands %>%
  pivot_longer(
    cols = c(Native, Endemic, Exotic),
    names_to = "Type",
    values_to = "Richness"
  )

ggplot(CI_long, aes(Area, Richness, color = Type, shape = Type)) +
  geom_point(size = 2.5) +
  geom_smooth(se = FALSE) +
  scale_color_manual(values = c(
    Native = "forestgreen",
    Endemic = "dodgerblue",
    Exotic = "firebrick1"
  )) +
  theme_gray()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Problem 15

The slope of the relationship between richness and area differs across species. The slope of the native species is the steepest, indicating that the number of native species grows significantly as island area increases. On the other hand, both exotic species and endemic species have a weaker relationship to area of the island. In other words, the number of those species stays relatively the same regardless of the size of the island.

Part 4

Problem 16

res_df <- ChannelIslands %>%
  mutate(Residual = resid(nativeislandspecies)) %>%
  select(Island, Residual, Dist)

ggplot(res_df, aes(x = Dist, y = Residual)) +
  geom_point() + 
  geom_smooth(se = TRUE) + 
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    x = "Distance",
    y = "Residuals",
    title = "Residuals vs Distance"
  ) +
  theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Islands farther from mainland tend to have lower-than-expected species richness, but this trend is not very strong. This means distance does not have a significant effect on the residuals after accounting for island area. Although distance explains some variation visually, statistically the effect is not strong enough to be considered reliable.

Problem 17

The residual plot shows a slight downward trend; islands located farther from the mainland tend to support fewer native species than predicted by the species–area model. On the other hand, the spread of points and the non-significant slope show that this relationship is weak; distance does not strongly or consistently explain deviations from modeled richness.

Problem 18

resid_model <- lm(Residual ~ Dist, data = res_df)
summary(resid_model)
## 
## Call:
## lm(formula = Residual ~ Dist, data = res_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.849 -18.320   8.098  15.904  29.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  71.3151    20.4815   3.482  0.01311 * 
## Dist         -1.4052     0.3621  -3.880  0.00817 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared:  0.7151, Adjusted R-squared:  0.6676 
## F-statistic: 15.06 on 1 and 6 DF,  p-value: 0.008167

This regression produced a negative slope (Estimate = –1.4052). This means that, after accounting for island area, islands that are farther from the mainland tend to have fewer native species than expected. The t-value for the slope term (t = –3.88) shows that the effect is strong and in the negative direction.

The p-value for Distance is 0.00817; distance is a statistically significant predictor of the residual variation in native richness. Isolation significantly improves the model beyond what island area alone explains.

Distance explains about 71.5% of the remaining variation in residual native species richness after area has been accounted for. The overall model p-value (0.008167) confirms that the model is statistically significant.

Problem 19

Island area explains about 90.3% of the total variance in native species richness. Distance from the mainland explains 71.5% of the remaining variance, which corresponds to about 6.9% of the original total variance.