# Construct a dataframe that contains the data in the table below
library(readxl)
ChannelIslands <- read_excel("Lab-Practice7.xlsx")
attach(ChannelIslands)
ggplot(data = ChannelIslands, mapping = aes(x=Area, y=Native))+
geom_point(aes(color=Island, size=Dist))+
geom_smooth()+
geom_smooth(method='lm')+
xlab("Island area (sq.km).")+
ylab("Number of Native Species")+
ggtitle("Number of Species by Island Size")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

# 1. Include Plot in Write-up along w/working code
# 2. What does the plot suggest about the relationship btw the number of plant species and island size?
# 3. Which island has the fewest species? How does its size compare to other islands?
# 4. Which island has the greatest number of species? How does size compare to other islands?
# 5. What does the smooth line tell you about the form (shape) of the relationship btw the two variables?
- Answer: The plot suggest that smaller islands have higher rates of
native species on average, as the larger circles (larger number of
species) are concentrated towards the lower and middle area size.
- Answer: Santa Barbara has the fewest native species(88) but a middle
to large-sized circle compared to other islands.
- Answer: Santa Cruz has the greatest number of native species (480)
but a smaller circle compared to the other islands.
- Answer: The smooth line tells me that there is a positive
relationship between island size and native species, islands with
greater area (km2) should have higher amounts of native bird
species.
# PART 2: Create lin reg of Native Species
Natarea <- lm(ChannelIslands$Native~ChannelIslands$Area)
summary(Natarea)
##
## Call:
## lm(formula = ChannelIslands$Native ~ ChannelIslands$Area)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.612 -34.226 -7.542 34.551 61.581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.8303 25.9310 4.814 0.002958 **
## ChannelIslands$Area 1.2376 0.1653 7.488 0.000293 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared: 0.9033, Adjusted R-squared: 0.8872
## F-statistic: 56.07 on 1 and 6 DF, p-value: 0.0002931
# 6. What does the slope coef tell you about how the expected number of native species changes w/changes in island area?
# 7. According to this model, how many native plant species would you expect to find on an island with a size of 0.0 km?
# 8. what do you conclude based on an interpretation of the t-value and pr(>|t|) for the slope term?
# 9. Interpret the r2 value
- Answer: The slope coefficient tells me that for every increase in
the island size (km2) there is an associated increase in species by
1.2376 on average.
- Answer: On an island with a size of 0.0 km2 I would expect to find
124.83 native plant species on average.
- Answer: Based on the t-value and p-value I conclude that the
relationship between island size and species number is statistically
significant (t is greater than 1.96, p is less than 0.05)
- Answer: The r2 = 0.9033, indicating that the linear regression model
fits the relationship between islands and species very well, it explains
about 90% of the variation of number of native species on average
# 10. In this model, the number of native species is y. Calculate the SSY. What is the value?
sum((ChannelIslands$Native-mean(ChannelIslands$Native))^2)
## [1] 142434.9
- Answer: 142434.9
# 11. Calculate the error sum of squares from the model (SSE). What is the value?
Natarea$residuals
## 1 2 3 4 5 6 7
## -40.048145 61.580569 27.377745 -57.612265 -32.285160 56.071484 -6.393764
## 8
## -8.690465
sum(Natarea$residuals^2)
## [1] 13767.48
- Answer: 13767.48
# 12. Calculate the proportionate reduction of SSE relative to SSY. What is it?
(13767.48/142434.9)
## [1] 0.09665805
- Answer: 0.09665805
# 13. Calculate the following. What is the value obbtained, and what element of your model summay does it correspond with?
# ? = ((SSY-SSE)/k)/(SSE/(n-k-1))
((142434.9-13767.48)/1)/(13767.48/(8-1-1))
## [1] 56.0745
- Answer: 56.0745, it corresponds with the F-statistic
# PART 3: Plot richness on area
p = ggplot(data=ChannelIslands)+
geom_point(mapping=aes(x=Area, y=Native), color="forestgreen", shape=15, size=2.5)+
geom_smooth(mapping=aes(x=Area, y=Native), color="forestgreen")+
geom_point(mapping=aes(x=Area, y=Exotic), color="dodgerblue", shape=16, size=2.5)+
geom_smooth(mapping=aes(x=Area, y=Exotic), color="dodgerblue")+
geom_point(mapping=aes(x=Area, y=Endemic), color="firebrick1", shape=17, size=2.5)+
geom_smooth(mapping=aes(x=Area, y=Endemic), color="firebrick1")+
xlab("Number of Native Species")+
ylab("Number of Species by Island Size")+
ggtitle("Scatterplot of Number of Species by Type per Island Size")+
theme_gray()
ggplotly(p)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
# 14. Include script and plot in report
# 15. Based on scatterplot, how does the slope of the relationship between Richness and Area differ for the three different species types?
- Answer: Based on the scatter plot, the slope of the relationship
between richness and area is the most extreme for native species, and
increases as the size increases until beginning to plateau around 300
species. The relationship between species and size of the island remains
fairly constant for both exotic and endemic species, increasing slightly
before plateauing around 50 species.
# PART 4 Plot residuals from model in PART 2 on Dist. Interpret the model summary, paying attention to magnitude and sign of slope term, t-value, f-value, r2. Create a new data frame with variables Island, Residuals, Distance. Plot and model using new data frame.
list(Natarea$residuals)
## [[1]]
## 1 2 3 4 5 6 7
## -40.048145 61.580569 27.377745 -57.612265 -32.285160 56.071484 -6.393764
## 8
## -8.690465
library(readxl)
Lab7 <- read_excel("Lab7.xlsx")
view(Lab7)
attach(Lab7)
## The following object is masked from ChannelIslands:
##
## Island
distarea <- lm(Native~Distance*Natarea$residuals)
plot(Native, distarea$fitted)
abline(distarea)
## Warning in abline(distarea): only using the first two of 4 regression
## coefficients

p2 = ggplot(data=Lab7)+
geom_point(mapping=aes(x=Distance, y=Residuals), color="orange", size=2.5)+
geom_smooth(mapping=aes(x=Distance, y=Residuals), color="orange")
ggplotly(p2)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
summary(p2)
## data: Island, Residuals, Distance [8x3]
## faceting: <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## -----------------------------------
## mapping: x = ~Distance, y = ~Residuals
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
##
## mapping: x = ~Distance, y = ~Residuals
## geom_smooth: na.rm = FALSE, orientation = NA, se = TRUE
## stat_smooth: na.rm = FALSE, orientation = NA, se = TRUE
## position_identity
# 16. Include script and plot. Interpret the model.
# 17. What does the plot suggest about how the deviation from modeled richness relates to distance from the mainland?
- Answer: I’ve gotta be honest I made a plot and the summary has
nothing to interpret, I’m positive that I did this incorrectly but I’ve
hit a wall and so my answer is simply I have no clue and I apologize for
that :(
- Answer: The plot suggests that distance does not have a strong
interaction with the richness of native species and island area
# 18. Create a new model that regresses the residuals from Part 2 on Distance. Include model summary and interpret model, focusing on slope, t-statistic, P>|t|, and r2.
summary(distarea)
##
## Call:
## lm(formula = Native ~ Distance * Natarea$residuals)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -185.71 -84.01 -125.84 43.99 48.60 120.20 58.12 124.65
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 389.73647 260.48539 1.496 0.209
## Distance -1.40429 5.99180 -0.234 0.826
## Natarea$residuals -2.48731 3.30733 -0.752 0.494
## Distance:Natarea$residuals 0.05321 0.08516 0.625 0.566
##
## Residual standard error: 154.1 on 4 degrees of freedom
## Multiple R-squared: 0.3327, Adjusted R-squared: -0.1677
## F-statistic: 0.6649 on 3 and 4 DF, p-value: 0.616
# 19. What percentage of total variance in Native Richness was explained by Area? What percentage of the remaining variance was explained by Distance?
- Answer: The slope is -1.4, meaning that for every km increase in
distance there is an associated decrease in species by 1.4052 on
average. The t-value and p-value are non-significant for both distance,
residuals, and their interaction. The r2 = 0.3327 meaning this model
explains about 33% of the variance of the native species on islands
areas by distance.
- Answer: About 90% of total variance in Native Richness was explained
by Area, and about 71.5% of the remaining variance was explained by
Distance.