ggplot(ChannelIslands, aes(x = Area, y =Total)) +
geom_point(aes(color=Dist, size=Area)) +
geom_smooth() +
xlab("Island Area") +
ylab("Total Species") +
ggtitle("Scatterplot of Total Species on Island Area")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
This plot suggests that there is a somewhat linear relationship between island size and plant species richness. As Island Area increases in value, so does the number of Total Species.
The island with the fewest species is the Santa Barbara one. It is also the smallest island at 2.6 km^2.
The island with the most species is the Santa Cruz one. It is the largest island at 294 km^2.
The smooth, relatively straight shape of the line tells me that the relationship between Island Size and Total Species is linear.
nativeislandspecies = lm(Native ~ Area, data = ChannelIslands)
summary(nativeislandspecies)
##
## Call:
## lm(formula = Native ~ Area, data = ChannelIslands)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.612 -34.226 -7.542 34.551 61.581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.8303 25.9310 4.814 0.002958 **
## Area 1.2376 0.1653 7.488 0.000293 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared: 0.9033, Adjusted R-squared: 0.8872
## F-statistic: 56.07 on 1 and 6 DF, p-value: 0.0002931
The slope coefficient tells us that for every km^2 increase in island area, there are 1.2376 more total native species.
On an island with a size of 0 km^2, this model tells us we’d expect to find 124.8303 total native species.
Based on the t-value and Pr(>|t|) for the slope term, we can conclude that island area is a very strong predictor of total species richness and that the results are statistically significant.
The model explains 90.3% of the variation in species richness across the islands.
SSY <- sum((ChannelIslands$Total - mean(ChannelIslands$Total))^2 )
SSY
## [1] 233469.5
SSE <- sum(resid(nativeislandspecies)^2)
SSE
## [1] 13767.48
PR <- 1 - (SSE / SSY)
PR
## [1] 0.9410309
n <- (SSY - SSE)/(SSE/6)
n
## [1] 95.74824
This value corresponds to the F-statistic.
CI_long <- ChannelIslands %>%
pivot_longer(
cols = c(Native, Endemic, Exotic),
names_to = "Type",
values_to = "Richness"
)
ggplot(CI_long, aes(Area, Richness, color = Type, shape = Type)) +
geom_point(size = 2.5) +
geom_smooth(se = FALSE) +
scale_color_manual(values = c(
Native = "forestgreen",
Endemic = "dodgerblue",
Exotic = "firebrick1"
)) +
theme_gray()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The slope of the relationship between richness and area differs across species. The slope of the native species is the steepest, indicating that the number of native species grows significantly as island area increases. On the other hand, both exotic species and endemic species have a weaker relationship to area of the island. In other words, the number of those species stays relatively the same regardless of the size of the island.
res_df <- ChannelIslands %>%
mutate(Residual = resid(nativeislandspecies)) %>%
select(Island, Residual, Dist)
ggplot(res_df, aes(x = Dist, y = Residual)) +
geom_point() +
geom_smooth(se = TRUE) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
x = "Distance",
y = "Residuals",
title = "Residuals vs Distance"
) +
theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Islands farther from mainland tend to have lower-than-expected species richness, but this trend is not very strong. This means distance does not have a significant effect on the residuals after accounting for island area. Although distance explains some variation visually, statistically the effect is not strong enough to be considered reliable.
The residual plot shows a slight downward trend; islands located farther from the mainland tend to support fewer native species than predicted by the species–area model. On the other hand, the spread of points and the non-significant slope show that this relationship is weak; distance does not strongly or consistently explain deviations from modeled richness.
resid_model <- lm(Residual ~ Dist, data = res_df)
summary(resid_model)
##
## Call:
## lm(formula = Residual ~ Dist, data = res_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.849 -18.320 8.098 15.904 29.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.3151 20.4815 3.482 0.01311 *
## Dist -1.4052 0.3621 -3.880 0.00817 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared: 0.7151, Adjusted R-squared: 0.6676
## F-statistic: 15.06 on 1 and 6 DF, p-value: 0.008167
This regression produced a negative slope (Estimate = –1.4052). This means that, after accounting for island area, islands that are farther from the mainland tend to have fewer native species than expected. The t-value for the slope term (t = –3.88) shows that the effect is strong and in the negative direction.
The p-value for Distance is 0.00817; distance is a statistically significant predictor of the residual variation in native richness. Isolation significantly improves the model beyond what island area alone explains.
Distance explains about 71.5% of the remaining variation in residual native species richness after area has been accounted for. The overall model p-value (0.008167) confirms that the model is statistically significant.
Island area explains about 90.3% of the total variance in native species richness. Distance from the mainland explains 71.5% of the remaining variance, which corresponds to about 6.9% of the original total variance.