Island=c("Santa Barbara","Anacapa","San Miguel","San Nicolas","San Clemente","Santa Catalina","Santa Rosa","Santa Cruz")
Area=c(2.6, 2.9, 37, 58, 145, 194, 217, 294)
Dist=c(61, 20, 42, 98, 79, 32, 44, 30)
Native=c(88,190,198,139,272,421,387,480)
Endemic=c(14,22,18,18,47,37,42,45)
Exotic=c(44,75,69,131,110,185,98,170)
Total=c(132,265,267,270,382,604,484,650)
ChannelIslands = data.frame(Island, Area, Dist, Native, Endemic, Exotic, Total)
rm(Island, Area, Dist, Native, Endemic, Exotic, Total)
ChannelIslands$Island <- factor(ChannelIslands$Island)
ggplot(ChannelIslands, aes(x=Area,y=Total)) +
geom_point(aes(color=Area, size=Total))+
geom_smooth()+
xlab("Island Area")+
ylab("Total Species")+
ggtitle("Total Species by Island Area")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The plot suggests that the total number of plant species has a positive relationship with the size of the island; that is, the larger the island, the more total species are on the island.
The island that has the fewest species is Santa Barbara. Santa Barbara is also the smallest Island in the data set.
The island that has the greatest number of species is Santa Cruz, which is also the largest island in the data set.
The smooth line suggests a positive relationship between the two variables.
The slope coefficient suggests that for a unit increase in island area, there is an associated increase in native species by 1.2376.
According to the model, an island with an area of 0 would be expected to have 124.8303 native species.
If the relationship between island area and native species was truly zero, there would be only a 0.0293% chance that a t-value of 7.488 would occur; therefore, my conclusion would be to reject the null hypothesis.
The R squared of 0.8872 with an f-statistic of 56.07 and a p-value of 0.0002931 suggests that the model is statistically significantly different from the naive predictor.
The total sum of squares for native species is 142434.9
The error sum of squares from the model is 13767.48
The proportionate reduction of SSE relative to SSY is 0.9033419.
After following the formula, the result is 56.07448, which corresponds to the f-statistic of the model.
p = ggplot(data=ChannelIslands) +
geom_point(mapping=aes(x=Area, y=Native), color="magenta", shape=20, size = 2) +
geom_smooth(mapping=aes(x=Area, y=Native), color="magenta") +
geom_point(mapping=aes(x=Area, y=Exotic), color="cyan", shape=11, size = 2) +
geom_smooth(mapping=aes(x=Area, y=Exotic), color="cyan") +
geom_point(mapping=aes(x=Area, y=Endemic), color="yellow", shape=8, size = 2) +
geom_smooth(mapping=aes(x=Area, y=Endemic), color="yellow") +
xlab("Island Area (sq.km.)") +
ylab("Number of Native Species") +
ggtitle("Number of Species by Island Size") +
theme_gray()
ggplotly(p)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
According to the plot, native species appear to have the strongest positive correlation with increased island area, followed by exotic species, with endemic species having the weakest positive relationship with increased island area.
CImod <- lm(ChannelIslands$Native~ChannelIslands$Area)
CImod2 <- lm(CImod$residuals~ChannelIslands$Dist)
CIFrame2 <- data.frame(ChannelIslands$Island, ChannelIslands$Dist, CImod$residuals)
ggplot(CIFrame2, aes(x=ChannelIslands$Dist,y=CImod$residuals)) +
geom_point(aes(color=ChannelIslands$Dist, size=CImod$residuals))+
geom_smooth()+
xlab("Island Distance")+
ylab("Residuals")+
ggtitle("Residuals by Island Distace")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The plot suggests that the deviation from modeled richness is negatively related to distance from the mainland.
summary(CImod2)
##
## Call:
## lm(formula = CImod$residuals ~ ChannelIslands$Dist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.849 -18.320 8.098 15.904 29.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.3151 20.4815 3.482 0.01311 *
## ChannelIslands$Dist -1.4052 0.3621 -3.880 0.00817 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared: 0.7151, Adjusted R-squared: 0.6676
## F-statistic: 15.06 on 1 and 6 DF, p-value: 0.008167
According to the model, for every unit increase in island distance there is an associated change in deviation from modeled richness by -1.4052. With a p-value of 0.00817, there is a 0.8% chance that a t-value of -3.880 would be observed if the actual affect of island distance on deviation from modeled richness was truly zero. An R^2 value of 0.6676 with an f-statistic of 15.06 and a p-value of 0.008167 suggests a correlation between the variables.
The percentage of total variance in native richness explained by area was 88.72% and, of that remaining unexplained variance, 66.76% of the variance was explained by island distance.