# Construct a dataframe that contains the data in the table below
library(readxl)
ChannelIslands <- read_excel("Lab-Practice7.xlsx")
attach(ChannelIslands)
ggplot(data = ChannelIslands, mapping = aes(x=Area, y=Native))+
geom_point(aes(color=Island, size=Dist))+
  geom_smooth()+
  geom_smooth(method='lm')+
  xlab("Island area (sq.km).")+
  ylab("Number of Native Species")+
  ggtitle("Number of Species by Island Size")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

# 1. Include Plot in Write-up along w/working code
# 2. What does the plot suggest about the relationship btw the number of plant species and island size?
# 3. Which island has the fewest species? How does its size compare to other islands?
# 4. Which island has the greatest number of species? How does size compare to other islands?
# 5. What does the smooth line tell you about the form (shape) of the relationship btw the two variables?
  1. Answer: The plot suggest that smaller islands have higher rates of native species on average, as the larger circles (larger number of species) are concentrated towards the lower and middle area size.
  2. Answer: Santa Barbara has the fewest native species(88) but a middle to large-sized circle compared to other islands.
  3. Answer: Santa Cruz has the greatest number of native species (480) but a smaller circle compared to the other islands.
  4. Answer: The smooth line tells me that there is a positive relationship between island size and native species, islands with greater area (km2) should have higher amounts of native bird species.
# PART 2: Create lin reg of Native Species
Natarea <- lm(ChannelIslands$Native~ChannelIslands$Area)
summary(Natarea)
## 
## Call:
## lm(formula = ChannelIslands$Native ~ ChannelIslands$Area)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.612 -34.226  -7.542  34.551  61.581 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         124.8303    25.9310   4.814 0.002958 ** 
## ChannelIslands$Area   1.2376     0.1653   7.488 0.000293 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared:  0.9033, Adjusted R-squared:  0.8872 
## F-statistic: 56.07 on 1 and 6 DF,  p-value: 0.0002931
# 6. What does the slope coef tell you about how the expected number of native species changes w/changes in island area?
# 7. According to this model, how many native plant species would you expect to find on an island with a size of 0.0 km?
# 8. what do you conclude based on an interpretation of the t-value and pr(>|t|) for the slope term?
# 9. Interpret the r2 value
  1. Answer: The slope coefficient tells me that for every increase in the island size (km2) there is an associated increase in species by 1.2376 on average.
  2. Answer: On an island with a size of 0.0 km2 I would expect to find 124.83 native plant species on average.
  3. Answer: Based on the t-value and p-value I conclude that the relationship between island size and species number is statistically significant (t is greater than 1.96, p is less than 0.05)
  4. Answer: The r2 = 0.9033, indicating that the linear regression model fits the relationship between islands and species very well, it explains about 90% of the variation of number of native species on average
# 10. In this model, the number of native species is y. Calculate the SSY. What is the value?
sum((ChannelIslands$Native-mean(ChannelIslands$Native))^2)
## [1] 142434.9
  1. Answer: 142434.9
# 11. Calculate the error sum of squares from the model (SSE). What is the value?
Natarea$residuals
##          1          2          3          4          5          6          7 
## -40.048145  61.580569  27.377745 -57.612265 -32.285160  56.071484  -6.393764 
##          8 
##  -8.690465
sum(Natarea$residuals^2)
## [1] 13767.48
  1. Answer: 13767.48
# 12. Calculate the proportionate reduction of SSE relative to SSY. What is it?
(13767.48/142434.9)
## [1] 0.09665805
  1. Answer: 0.09665805
# 13. Calculate the following. What is the value obbtained, and what element of your model summay does it correspond with?
# ? = ((SSY-SSE)/k)/(SSE/(n-k-1))
((142434.9-13767.48)/1)/(13767.48/(8-1-1))
## [1] 56.0745
  1. Answer: 56.0745, it corresponds with the F-statistic
# PART 3: Plot richness on area
p = ggplot(data=ChannelIslands)+
  geom_point(mapping=aes(x=Area, y=Native), color="forestgreen", shape=15, size=2.5)+
  geom_smooth(mapping=aes(x=Area, y=Native), color="forestgreen")+
  geom_point(mapping=aes(x=Area, y=Exotic), color="dodgerblue", shape=16, size=2.5)+
  geom_smooth(mapping=aes(x=Area, y=Exotic), color="dodgerblue")+
  geom_point(mapping=aes(x=Area, y=Endemic), color="firebrick1", shape=17, size=2.5)+
  geom_smooth(mapping=aes(x=Area, y=Endemic), color="firebrick1")+
  xlab("Number of Native Species")+
  ylab("Number of Species by Island Size")+
  ggtitle("Scatterplot of Number of Species by Type per Island Size")+
  theme_gray()
ggplotly(p)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
# 14. Include script and plot in report
# 15. Based on scatterplot, how does the slope of the relationship between Richness and Area differ for the three different species types?
  1. Answer: Based on the scatter plot, the slope of the relationship between richness and area is the most extreme for native species, and increases as the size increases until beginning to plateau around 300 species. The relationship between species and size of the island remains fairly constant for both exotic and endemic species, increasing slightly before plateauing around 50 species.
# PART 4 Plot residuals from model in PART 2 on Dist. Interpret the model summary, paying attention to magnitude and sign of slope term, t-value, f-value, r2. Create a new data frame with variables Island, Residuals, Distance. Plot and model using new data frame.
list(Natarea$residuals)
## [[1]]
##          1          2          3          4          5          6          7 
## -40.048145  61.580569  27.377745 -57.612265 -32.285160  56.071484  -6.393764 
##          8 
##  -8.690465
library(readxl)
Lab7 <- read_excel("Lab7.xlsx")
view(Lab7)
attach(Lab7)
## The following object is masked from ChannelIslands:
## 
##     Island
distarea <- lm(Native~Distance*Natarea$residuals)
plot(Native, distarea$fitted)
abline(distarea)
## Warning in abline(distarea): only using the first two of 4 regression
## coefficients

p2 = ggplot(data=Lab7)+
  geom_point(mapping=aes(x=Distance, y=Residuals), color="orange", size=2.5)+
  geom_smooth(mapping=aes(x=Distance, y=Residuals), color="orange")
ggplotly(p2)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
summary(p2)
## data: Island, Residuals, Distance [8x3]
## faceting: <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg>
## -----------------------------------
## mapping: x = ~Distance, y = ~Residuals 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity 
## 
## mapping: x = ~Distance, y = ~Residuals 
## geom_smooth: na.rm = FALSE, orientation = NA, se = TRUE
## stat_smooth: na.rm = FALSE, orientation = NA, se = TRUE
## position_identity
# 16. Include script and plot. Interpret the model.
# 17. What does the plot suggest about how the deviation from modeled richness relates to distance from the mainland?
  1. Answer: I’ve gotta be honest I made a plot and the summary has nothing to interpret, I’m positive that I did this incorrectly but I’ve hit a wall and so my answer is simply I have no clue and I apologize for that :(
  2. Answer: The plot suggests that distance does not have a strong interaction with the richness of native species and island area
# 18. Create a new model that regresses the residuals from Part 2 on Distance. Include model summary and interpret model, focusing on slope, t-statistic, P>|t|, and r2.
summary(distarea)
## 
## Call:
## lm(formula = Native ~ Distance * Natarea$residuals)
## 
## Residuals:
##       1       2       3       4       5       6       7       8 
## -185.71  -84.01 -125.84   43.99   48.60  120.20   58.12  124.65 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)
## (Intercept)                389.73647  260.48539   1.496    0.209
## Distance                    -1.40429    5.99180  -0.234    0.826
## Natarea$residuals           -2.48731    3.30733  -0.752    0.494
## Distance:Natarea$residuals   0.05321    0.08516   0.625    0.566
## 
## Residual standard error: 154.1 on 4 degrees of freedom
## Multiple R-squared:  0.3327, Adjusted R-squared:  -0.1677 
## F-statistic: 0.6649 on 3 and 4 DF,  p-value: 0.616
# 19. What percentage of total variance in Native Richness was explained by Area? What percentage of the remaining variance was explained by Distance?
  1. Answer: The slope is -1.4, meaning that for every km increase in distance there is an associated decrease in species by 1.4052 on average. The t-value and p-value are non-significant for both distance, residuals, and their interaction. The r2 = 0.3327 meaning this model explains about 33% of the variance of the native species on islands areas by distance.
  2. Answer: About 90% of total variance in Native Richness was explained by Area, and about 71.5% of the remaining variance was explained by Distance.