##Input Data for Each of 7 Columns of Data:
Island=c("Santa Barbara","Anacapa","San Miguel","San Nicolas","San Clemente","Santa Catalina","Santa Rosa","Santa Cruz")
Area=c(2.6, 2.9, 37, 58, 145, 194, 217, 294)
Dist=c(61, 20, 42, 98, 79, 32, 44, 30)
Native=c(88,190,198,139,272,421,387,480)
Endemic=c(14,22,18,18,47,37,42,45)
Exotic=c(44,75,69,131,110,185,98,170)
Total=c(132,265,267,270,382,604,484,650)
##Coerce Data Vectors Into a Dataframe:
ChannelIslands=data.frame(Island, Area, Dist, Native, Endemic, Exotic, Total)
##Remove individual files now that they are assembled into a dataframe.
rm(Island, Area, Dist, Native, Endemic, Exotic, Total)
##Convert “Island” (the variable containing island names) into a Factor Variable:
ChannelIslands$Island <- factor(ChannelIslands$Island)
ggplot(ChannelIslands, aes(x = Area, y = Total)) +
geom_point(aes(color = Area, size = Area)) +
geom_smooth(method = "loess", se = TRUE) +
scale_color_continuous(name = "Area km^2") +
xlab("Island Area km^2") +
ylab("Total Species") +
ggtitle("Total Species vs Island Area")
## `geom_smooth()` using formula = 'y ~ x'
My plot shows me a strong positive relationship between island size and total species richness As the islands size increases the species richness also increase.
The smallest total species are on the far left (Santa Barbara) and this is also where the smalest islands are. So the smaller islands are associated with the lowest richness
The biggest total species are on the far right (Santa Cruz) which is also where the biggest islands are. The biggest islands correspond to the most richness
Because the line is curving upward and is smooth its showing us the relationship between the area of island and species richness are both positivly influenced by eachother.
m_native_area <- lm(Native ~ Area, data = ChannelIslands)
summary(m_native_area)
##
## Call:
## lm(formula = Native ~ Area, data = ChannelIslands)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.612 -34.226 -7.542 34.551 61.581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.8303 25.9310 4.814 0.002958 **
## Area 1.2376 0.1653 7.488 0.000293 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared: 0.9033, Adjusted R-squared: 0.8872
## F-statistic: 56.07 on 1 and 6 DF, p-value: 0.0002931
The slope is 1.2376. This slope means larger islands tend to support more native species and the rate of this is 1.2376 per Km^2
The Intercept is 124.8303, at and Area of 0. So with 0.0 km^2 that would be the answer with the numbers we have because its the starting point.
The Slope T Value is 7.488 and the Slope P Value is 0.000293. P Value is far below 0.01 which tells us the relatopnship between area and native species richness is very significant. and we can also reject the null hypothesis.
The Multiple R-Squared Value is 0.9033. This means that ~90% of the variation in native species could be explained by the island area becaues the percentage is supper high!
The SSY I calculated would be 124,434.875 I believe.
m_native_area$residuals
## 1 2 3 4 5 6 7
## -40.048145 61.580569 27.377745 -57.612265 -32.285160 56.071484 -6.393764
## 8
## -8.690465
After calculating them all we can add them up - 1603.852 + 3792.223 + 749.553 + 3319.177 + 1042.335 + 3144.020 + 40.868 + 75.503 = 12,767.531 This answer is the SSE = 12,767.531
SSE = 13,767.531 SSY = 142,434.875 (13,767.531/142,434.875) - 1 = 0.9033 OR 90.33%
142,434.875 − 13,767.531 = SSR 128,667.344 8 − 1 − 1 = 6 13,767.531 / 6 = MSE 2,294.5885
128,667.344 / 2,294.5885 = Variable Calculated 56.07 This statistic corresponds with the F-Statistic
ggplot(data=ChannelIslands) +
geom_point(aes(x=Area, y=Native), color="forestgreen", shape=15, size=2.5) +
geom_smooth(aes(x=Area, y=Native), color="forestgreen", method="loess", se=TRUE) +
geom_point(aes(x=Area, y=Endemic), color="dodgerblue", shape=16, size=2.5) +
geom_smooth(aes(x=Area, y=Endemic), color="dodgerblue", method="loess", se=TRUE) +
geom_point(aes(x=Area, y=Exotic), color="firebrick1", shape=17, size=2.5) +
geom_smooth(aes(x=Area, y=Exotic), color="firebrick1", method="loess", se=TRUE) +
xlab("Island Area (km^2)") +
ylab("Species Richness") +
ggtitle("Native, Endemic, and Exotic species Richness vs Island Area")
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
Native species are green and is the most sloping upards which shows that the richness of these species are highly dependant on there island area.
Exotic species are red and there is a small and suddle upward slope which tells us that they are also dependant of island size but not nearly as much.
Endemic species are the blue line and there is no upard trend its very liner. This shows us that these species arent really depantand on the island size.
residuals_native <- residuals(m_native_area)
ResidualsDF <- data.frame(Island = ChannelIslands$Island,
Residuals = residuals_native,
Distance = ChannelIslands$Dist)
plot(Residuals ~ Distance, data = ResidualsDF, pch=16,
xlab = "Distance to Mainland (km)", ylab = "Residuals (Observed - Predicted Native)")
abline(h=0, lty=2)
m_resid_dist <- lm(Residuals ~ Distance, data = ResidualsDF)
summary(m_resid_dist)
##
## Call:
## lm(formula = Residuals ~ Distance, data = ResidualsDF)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.849 -18.320 8.098 15.904 29.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.3151 20.4815 3.482 0.01311 *
## Distance -1.4052 0.3621 -3.880 0.00817 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared: 0.7151, Adjusted R-squared: 0.6676
## F-statistic: 15.06 on 1 and 6 DF, p-value: 0.008167
This model has a negative slope which means for every additional km from the mainland the redifuals decrease . Islands farther from the mainland have fewer native species than expected based on area.
Islands that are closer to the mainland tend to have a postive residual which means they have more native species than the other model predicted with area only.
ResidualsDF <- data.frame(Island = ChannelIslands$Island,Residuals = residuals(m_native_area),Distance = ChannelIslands$Dist)
m_resid_dist <- lm(Residuals ~ Distance, data = ResidualsDF)
summary(m_resid_dist)
##
## Call:
## lm(formula = Residuals ~ Distance, data = ResidualsDF)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.849 -18.320 8.098 15.904 29.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.3151 20.4815 3.482 0.01311 *
## Distance -1.4052 0.3621 -3.880 0.00817 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared: 0.7151, Adjusted R-squared: 0.6676
## F-statistic: 15.06 on 1 and 6 DF, p-value: 0.008167
Percentage of total variance in Native Richness explained by area - 90.33% Percentage of remaining variance explained by distance - 71.51%