R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

##Input Data for Each of 7 Columns of Data:
Island=c("Santa Barbara","Anacapa","San Miguel","San Nicolas","San     Clemente","Santa Catalina","Santa Rosa","Santa Cruz")
Area=c(2.6, 2.9, 37, 58, 145, 194, 217, 294)
Dist=c(61, 20, 42, 98, 79, 32, 44, 30)
Native=c(88,190,198,139,272,421,387,480)
Endemic=c(14,22,18,18,47,37,42,45)
Exotic=c(44,75,69,131,110,185,98,170)
Total=c(132,265,267,270,382,604,484,650)
##Coerce Data Vectors Into a Dataframe:
ChannelIslands=data.frame(Island, Area, Dist, Native, Endemic, Exotic, Total)
##Remove individual files now that they are assembled into a dataframe.
rm(Island, Area, Dist, Native, Endemic, Exotic, Total)
##Convert “Island” (the variable containing island names) into a Factor Variable:
ChannelIslands$Island <- factor(ChannelIslands$Island)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(plotly)
## Warning: package 'plotly' was built under R version 4.5.2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Part 1

Question 1:
ggplot(ChannelIslands, aes(x = Area, y = Total)) +
  geom_point(aes(color = Area, size = Area)) +
  geom_smooth(method = "lm", se = TRUE, color = "black") +
  xlab("Island Area (km²)") +
  ylab("Total Plant Species Richness") +
  ggtitle("Total Plant Species Richness vs. Island Area (Channel Islands)")
## `geom_smooth()` using formula = 'y ~ x'

Question 2: The plot shows a positive relationship where larger islands have more species, and the shaded area increases with the two extremes of the plot.
Questoin 3: The island with the fewest species is Santa Barbara with 132 species, it’s the smallest island with an area of 2.6 km.
Question 4: The island with the fewest species is Santa Cruz with 650 species, it’s the largest island with an area of 294 km.
Question 5: The smooth line shows a positive and increasing trend with the species richness rising in small increases in area when islands are small, then continues upward more gradually for larger islands.

Part 2

m_native <- lm(Native ~ Area, data = ChannelIslands)
summary(m_native)
## 
## Call:
## lm(formula = Native ~ Area, data = ChannelIslands)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.612 -34.226  -7.542  34.551  61.581 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 124.8303    25.9310   4.814 0.002958 ** 
## Area          1.2376     0.1653   7.488 0.000293 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 47.9 on 6 degrees of freedom
## Multiple R-squared:  0.9033, Adjusted R-squared:  0.8872 
## F-statistic: 56.07 on 1 and 6 DF,  p-value: 0.0002931
Question 6: For each extra 1 km of area, the expected number of native species increases by around 1.24 species.
Question 7: According to the intercept, the model predicts around 125 native species for an island of 0 km.
Question 8: The slope is highly significant, meaning island area is a strong predictor of native species richness, and we can reject the null hypothesis that slope = 0.
Question 9: About 90.3% of the variance in native species richness is explained by island area.
Question 10: 118,404.5
Question 11: 11,491
Question 12: It matches the model’s R^2 of 90.33%.
Question 13: 56.07, it corresponds with the F-statistic in the summary.

Part 3

Question 14:
  ggplot(ChannelIslands)+
  geom_point(aes(x = Area, y = Native),
             color = "forestgreen", shape = 15, size = 2.5) +
  geom_smooth(aes(x = Area, y = Native),
              color = "forestgreen", se = FALSE) +

    geom_point(aes(x = Area, y = Endemic),
             color = "dodgerblue", shape = 16, size = 2.5) +
  geom_smooth(aes(x = Area, y = Endemic),
              color = "dodgerblue", se = FALSE) +

    geom_point(aes(x = Area, y = Exotic),
             color = "firebrick1", shape = 17, size = 2.5) +
  geom_smooth(aes(x = Area, y = Exotic),
              color = "firebrick1", se = FALSE) +

  xlab("Island Area (km)") +
  ylab("Species Richness") +
  ggtitle("Area for Native, Endemic, and Exotic Plants") +
  theme_gray()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Question 15: Native species has the strongest positive slope with richness and area. The exotic species slope also increase with area, but not near as steeply as Native species. The endemic species have the weakest slope, showing only a slight positive relationship with area.

Part 4

Question 16: The residuals show a negative trend as distance from the mainland increases. This means that farther islands tend to have fewer native species than expected for their size alone.Islands close to the mainland tend to have positive residuals.
res_df <- data.frame(
  Island = ChannelIslands$Island,
  Distance = ChannelIslands$Dist,
  Residuals = m_native$residuals)
ggplot(res_df, aes(x = Distance, y = Residuals)) +
  geom_point(size = 3) +
  geom_smooth(se = FALSE) +
  xlab("Distance from Mainland (km)") +
  ylab("Residual Native Richness") +
  ggtitle("Residuals from Native ~ Area vs. Distance")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Question 17: The trend in the plot shows that the farther an island is from the mainland, the more negative its residual value tends to be. This shows that the isolation reduces species richness relative to what would be predicted by area alone.
Question 18: The slope is –1.4052, meaning that for each additional kilometer an island is from the mainland, the residual native species richness decreases by about 1.41 species. This tells us islands farther from the mainland have fewer native species than expected based on their area alone. The t-value for the slope is –3.880, which indicates a negative relationship between distance and residual richness. The p-value for the slope is 0.00817, which is statistically significant at the 0.01 level, confirming that distance has a meaningful effect on native species richness after accounting for island area. The R^2 value is 0.7151, meaning that distance explains about 71.5% of the variation in residual richness. This shows that distance is an important secondary factor after area.
m_resid_dist <- lm(Residuals ~ Distance, data = res_df)
summary(m_resid_dist)
## 
## Call:
## lm(formula = Residuals ~ Distance, data = res_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.849 -18.320   8.098  15.904  29.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  71.3151    20.4815   3.482  0.01311 * 
## Distance     -1.4052     0.3621  -3.880  0.00817 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.57 on 6 degrees of freedom
## Multiple R-squared:  0.7151, Adjusted R-squared:  0.6676 
## F-statistic: 15.06 on 1 and 6 DF,  p-value: 0.008167
Question 19: Island area explains about 90.3% of the total variance in native species richness. Distance explains an additional 6.9% of the total variance, 71.5% of the residual variation, after accounting for area.