Island=c("Santa Barbara","Anacapa","San Miguel","San Nicolas","San Clemente","Santa Catalina","Santa Rosa","Santa Cruz")
Area=c(2.6, 2.9, 37, 58, 145, 194, 217, 294)
Dist=c(61, 20, 42, 98, 79, 32, 44, 30)
Native=c(88,190,198,139,272,421,387,480)
Endemic=c(14,22,18,18,47,37,42,45)
Exotic=c(44,75,69,131,110,185,98,170)
Total=c(132,265,267,270,382,604,484,650)
ChannelIslands=data.frame(Island, Area, Dist, Native, Endemic, Exotic, Total)
view(ChannelIslands)
rm(Island, Area, Dist, Native, Endemic, Exotic, Total)
ChannelIslands$Island <- factor(ChannelIslands$Island)
ggplot(ChannelIslands, aes(x = Area, y = Total)) +
geom_point(aes(color = Island, size = Area)) +
geom_smooth() +
xlab("Area (squared km)") +
ylab("Total Species") +
ggtitle("Scatterplot of Total Species on Islands")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The plot suggests that as island area increases, the number of total species of plants increases too. This is evident in the positive correlation depicted.
The island with the fewest species is Santa Barbara, with only 132. But it has an area of only 2.6km^2, which is the smaller than the other islands as well.
The island with the greatest number of species is Santa Cruz with 650 total species. It has an area of 294km^2 which makes it the largest of any of the others in the dataset.
The smooth line tells me that there is a positive relationships between area and total species. Larger islands have more total species than smaller islands do.
The slope coefficient is 1.2376. This shows that the expected number of species would be expected to increase by 1.2376 when island area size increases by 1km^2.
On a hypothetical island with 0.0km^2 area you would expect to find 124 native plant species.
The t-value is 7.488 and Pr(>|t|) is 0.000293. The p-value is samller than 0.05 therefore the null hypothesis,that there is no relationship, is rejected. The large t-value indicates that the relationship is statistically significant.
The Multiple R-squared value is 0.9033. This indicates a very strong relationship between island area and total species on the island. 90.33% of the relationship in variation can be explained through island area.
SSY = 142,434.875
SSE = 13767.48
Proportionate reduction of SSE relative to SSY is 0.9033
The value obtained is 56.07449. This corresponds to the F-statistic.
Richness_plot <- ggplot(data=ChannelIslands) +
geom_point(mapping=aes(x=Area, y=Native), color="forestgreen", shape=15, size = 2.5) +
geom_smooth(mapping=aes(x=Area, y=Native), color="forestgreen") +
geom_point(mapping=aes(x=Area, y=Exotic), color="dodgerblue", shape=15, size = 2.5) +
geom_smooth(mapping=aes(x=Area, y=Exotic), color="dodgerblue") +
geom_point(mapping=aes(x=Area, y=Endemic), color="firebrick1", shape=15, size = 2.5) +
geom_smooth(mapping=aes(x=Area, y=Endemic), color="firebrick1") +
theme_gray()
Richness_plot
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The slope of the relationship of richness to area of exotic (dodgerblue) and endemic (firebrick1) plant species do not vary much at all. They are both fairly flat, although exotic is slightly steeper so diverges gently as island area increases. The native species (forestgreen) slope is much steeper compared to the other two.
mymodel = lm(ChannelIslands$Native ~ ChannelIslands$Area)
summary(mymodel)
##
## Call:
## lm(formula = ChannelIslands$Native ~ ChannelIslands$Area)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.612 -34.226 -7.542 34.551 61.581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.8303 25.9310 4.814 0.002958 **
## ChannelIslands$Area 1.2376 0.1653 7.488 0.000293 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared: 0.9033, Adjusted R-squared: 0.8872
## F-statistic: 56.07 on 1 and 6 DF, p-value: 0.0002931
ChannelIslands_P4=data.frame(ChannelIslands$Island, ChannelIslands$Dist, mymodel$residuals)
ggplot(ChannelIslands_P4) +
geom_point(mapping=aes(x=ChannelIslands.Dist, y=mymodel.residuals), color = "forestgreen", shape = 15, size = 2.5) +
geom_smooth(mapping=aes(x=ChannelIslands.Dist, y=mymodel.residuals), color="forestgreen") +
xlab("Channel Islands Distance") +
ylab("Residuals") +
ggtitle("Scatter of Residuals and Channel Islands Distance")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The plot suggests that there is a negative relationship between Channel Island distance and deviation from model richness. This means that far islands have fewer plant species than expected by the Area model.
P4_res_mod <- lm(mymodel.residuals ~ ChannelIslands.Dist, data = ChannelIslands_P4)
summary(P4_res_mod)
##
## Call:
## lm(formula = mymodel.residuals ~ ChannelIslands.Dist, data = ChannelIslands_P4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.849 -18.320 8.098 15.904 29.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.3151 20.4815 3.482 0.01311 *
## ChannelIslands.Dist -1.4052 0.3621 -3.880 0.00817 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared: 0.7151, Adjusted R-squared: 0.6676
## F-statistic: 15.06 on 1 and 6 DF, p-value: 0.008167
The model’s summary shows a slope coefficient of -1.4052 and an intercept of 71.3151. This means that there is a negative relationship where as distance increases by 1 unit, the expected residuals decreases by 1.4052. Additionally, at 0.0 distance, one should expect 71.3151 to be the residual. The t-value is -3.880, F-statistic is 15.06, and multiple R-squared value is 0.7151.
90.33% of the total variance in Native richness was be explained by Area. That means there is 9.67% remaining, and 71.51% of this remaining variance is explained by Distance.