##Input Data for Each of 7 Columns of Data:
Island=c("Santa Barbara","Anacapa","San Miguel","San Nicolas","San     Clemente","Santa Catalina","Santa Rosa","Santa Cruz")
Area=c(2.6, 2.9, 37, 58, 145, 194, 217, 294)
Dist=c(61, 20, 42, 98, 79, 32, 44, 30)
Native=c(88,190,198,139,272,421,387,480)
Endemic=c(14,22,18,18,47,37,42,45)
Exotic=c(44,75,69,131,110,185,98,170)
Total=c(132,265,267,270,382,604,484,650)
##Coerce Data Vectors Into a Dataframe:
ChannelIslands=data.frame(Island, Area, Dist, Native, Endemic, Exotic, Total)
##Remove individual files now that they are assembled into a dataframe.
rm(Island, Area, Dist, Native, Endemic, Exotic, Total)
##Convert “Island” (the variable containing island names) into a Factor Variable:
ChannelIslands$Island <- factor(ChannelIslands$Island)

Part 1

Q1. Include the plot in your write-up along with the working code.

ggplot(ChannelIslands, aes(x = Area, y = Total)) +
  geom_point(aes(color = Area, size = Area)) +
  geom_smooth(method = "loess", se = TRUE) +
  scale_color_continuous(name = "Area km^2") +
  xlab("Island Area km^2") +
  ylab("Total Species") +
  ggtitle("Total Species vs Island Area")
## `geom_smooth()` using formula = 'y ~ x'

Q2. What does the plot suggest about the relationship between the number of plant species found on different islands and the size of the islands?

My plot shows me a strong positive relationship between island size and total species richness As the islands size increases the species richness also increase.

Q3. Which island has the fewest species? How does its size compare to other islands?

The smallest total species are on the far left (Santa Barbara) and this is also where the smalest islands are. So the smaller islands are associated with the lowest richness

Q4. Which island has the greatest number of species? How does its size compare to the other islands?

The biggest total species are on the far right (Santa Cruz) which is also where the biggest islands are. The biggest islands correspond to the most richness

Q5. What does the smooth line tell you about the form (or shape) of the relationship between the two variables?

Because the line is curving upward and is smooth its showing us the relationship between the area of island and species richness are both positivly influenced by eachother.

Part 2

m_native_area <- lm(Native ~ Area, data = ChannelIslands)
summary(m_native_area)
## 
## Call:
## lm(formula = Native ~ Area, data = ChannelIslands)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.612 -34.226  -7.542  34.551  61.581 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 124.8303    25.9310   4.814 0.002958 ** 
## Area          1.2376     0.1653   7.488 0.000293 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared:  0.9033, Adjusted R-squared:  0.8872 
## F-statistic: 56.07 on 1 and 6 DF,  p-value: 0.0002931

Q6. (2) What does the slope coefficient tell you about how the expected number of native species changes with changes in island area?

The slope is 1.2376. This slope means larger islands tend to support more native species and the rate of this is 1.2376 per Km^2

Q7. (2) According to this model, how many native plant species would you expect to find on an island with a size of 0.0 km2?

The Intercept is 124.8303, at and Area of 0. So with 0.0 km^2 that would be the answer with the numbers we have because its the starting point.

Q8. (2) What do you conclude based on an interpretation of the t-value and Pr(>|t|) for the slope term.

The Slope T Value is 7.488 and the Slope P Value is 0.000293. P Value is far below 0.01 which tells us the relatopnship between area and native species richness is very significant. and we can also reject the null hypothesis.

Q9. (2) Interpret the R2 (Multiple R-squared) value.

The Multiple R-Squared Value is 0.9033. This means that ~90% of the variation in native species could be explained by the island area becaues the percentage is supper high!

Q10. (2) In this model, the number of native species is your Y variable. Calculate the total sum of squares (SSY). What is the value?

The SSY I calculated would be 124,434.875 I believe.

Q11. (2) Calculate the error sum of squares from the model (SSE). There are several ways to access the Y values. The easiest is: mymodel$residuals. What is the value of SSE?

m_native_area$residuals
##          1          2          3          4          5          6          7 
## -40.048145  61.580569  27.377745 -57.612265 -32.285160  56.071484  -6.393764 
##          8 
##  -8.690465

After calculating them all we can add them up - 1603.852 + 3792.223 + 749.553 + 3319.177 + 1042.335 + 3144.020 + 40.868 + 75.503 = 12,767.531 This answer is the SSE = 12,767.531

Q12. (2) Calculate the proportionate reduction of SSE relative to SSY. What is it?

SSE = 13,767.531 SSY = 142,434.875 (13,767.531/142,434.875) - 1 = 0.9033 OR 90.33%

Q13. (2) Calculate the following, where k is the number of explanatory variables in the model (which is 1) and n is the number of islands (8). What is the value obtained, and what element of your model summary does it correspond with.

142,434.875 − 13,767.531 = SSR 128,667.344 8 − 1 − 1 = 6 13,767.531 / 6 = MSE 2,294.5885

128,667.344 / 2,294.5885 = Variable Calculated 56.07 This statistic corresponds with the F-Statistic

Part 3

Q14. (4) Include your script and the plot in your report.

ggplot(data=ChannelIslands) +
  geom_point(aes(x=Area, y=Native), color="forestgreen", shape=15, size=2.5) +
  geom_smooth(aes(x=Area, y=Native), color="forestgreen", method="loess", se=TRUE) +
  geom_point(aes(x=Area, y=Endemic), color="dodgerblue", shape=16, size=2.5) +
  geom_smooth(aes(x=Area, y=Endemic), color="dodgerblue", method="loess", se=TRUE) +
  geom_point(aes(x=Area, y=Exotic), color="firebrick1", shape=17, size=2.5) +
  geom_smooth(aes(x=Area, y=Exotic), color="firebrick1", method="loess", se=TRUE) +
  xlab("Island Area (km^2)") +
  ylab("Species Richness") +
  ggtitle("Native, Endemic, and Exotic species Richness vs Island Area")
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Q15. (2) Based on the scatterplot you created, how does the slope of the relationship between Richness and Area differ for the three different species types?

Native species are green and is the most sloping upards which shows that the richness of these species are highly dependant on there island area.

Exotic species are red and there is a small and suddle upward slope which tells us that they are also dependant of island size but not nearly as much.

Endemic species are the blue line and there is no upard trend its very liner. This shows us that these species arent really depantand on the island size.

Part 4

Q16. (4) Include your script and the plot. Interpret the model.

residuals_native <- residuals(m_native_area)
ResidualsDF <- data.frame(Island = ChannelIslands$Island,
                          Residuals = residuals_native,
                          Distance = ChannelIslands$Dist)

plot(Residuals ~ Distance, data = ResidualsDF, pch=16,
     xlab = "Distance to Mainland (km)", ylab = "Residuals (Observed - Predicted Native)")
abline(h=0, lty=2)

m_resid_dist <- lm(Residuals ~ Distance, data = ResidualsDF)
summary(m_resid_dist)
## 
## Call:
## lm(formula = Residuals ~ Distance, data = ResidualsDF)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.849 -18.320   8.098  15.904  29.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  71.3151    20.4815   3.482  0.01311 * 
## Distance     -1.4052     0.3621  -3.880  0.00817 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared:  0.7151, Adjusted R-squared:  0.6676 
## F-statistic: 15.06 on 1 and 6 DF,  p-value: 0.008167

This model has a negative slope which means for every additional km from the mainland the redifuals decrease . Islands farther from the mainland have fewer native species than expected based on area.

Q17. What does the plot suggest about how the deviation from modeled richness (based on the model of Native Species on Area) relates to distance from the mainland?

Islands that are closer to the mainland tend to have a postive residual which means they have more native species than the other model predicted with area only.

Q18. Create a new model that regresses the residuals from your original model (from PART 2) on Distance. Include the model summary and interpret the model, focusing on the slope, t-statistic, P>|t|, and R2.

ResidualsDF <- data.frame(Island = ChannelIslands$Island,Residuals = residuals(m_native_area),Distance = ChannelIslands$Dist)

m_resid_dist <- lm(Residuals ~ Distance, data = ResidualsDF)

summary(m_resid_dist)
## 
## Call:
## lm(formula = Residuals ~ Distance, data = ResidualsDF)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.849 -18.320   8.098  15.904  29.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  71.3151    20.4815   3.482  0.01311 * 
## Distance     -1.4052     0.3621  -3.880  0.00817 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared:  0.7151, Adjusted R-squared:  0.6676 
## F-statistic: 15.06 on 1 and 6 DF,  p-value: 0.008167

Q19. What percentage of total variance in Native Richness was explained by Area? What percentage of the remaining variance was explained by Distance?

Percentage of total variance in Native Richness explained by area - 90.33% Percentage of remaining variance explained by distance - 71.51%