Markdown Author: Jessie Bell
Download this Rmd: Top right corner → Code → Download Rmd
cherry_data <- read.csv("CherryPoint_051624.csv")
usa <- map_data("state")
washington <- subset(usa, region=="washington")
a <- ggplot(data=washington, aes(x=long, y=lat)) +
geom_polygon(data=washington, aes(group=group), color="white", fill="gray") +
guides(fill=FALSE)+
geom_point(data=cherry_data, mapping=aes(x=long, y=lat, color=Latin_binom), show.legend = F)+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank(), #remove gray grid
text = element_text(family = "Corbel"), #corbel font
plot.title = element_text(hjust = 0.5))+
labs(title='Cherry Point, WA Pocket Estuary', y=(expression(paste("z ", symbol('\256')))), x="")+
geom_rect(aes(xmin = -123, xmax = -122.5, ymin = 48.7, ymax = 49
), color = "red", fill = NA)
photo <- readJPEG("IMG_8343.JPG",native=TRUE)
photo_panel <- ggdraw() + draw_image(photo, scale = 0.8)
a
Long/Lat: Was collected only in limited photographs taken during plant exploration, and there was no lat/long data collected during sampling. This means that this data is limited and incorrect, but I wanted to use it to attempt to spatially interpolate salinity and elevation. It did not work, as you will see in data analysis.
Plant Height (m): This data was collected from Wikipedia for each species. It is the maximum end of the range for plant size located on each species Wikipedia page.
Plant Elevation (cm): Measured using rod reading during survey at sample site. This is supposed to be an indication of flooding, as the higher the elevation, the less likely it is to flood.
Salinity (ppt): Collected at the same time as elevation using a refractometer. Soil was cored until water could be collected, it was then filtered to reduce turbitidy and measured.
cherry_data %>%
mutate(row = row_number(), .by = c(Plant, Latin_binom)) %>%
mutate(across(c(Plant, Latin_binom), ~ifelse(row == 1, ., "")))%>%
gt()|>
cols_label(
Latin_binom=md("Latin binomial"),
Salinity_ppt = md("Salinity (ppt)"),
Elevation_cm = md("Elevation (cm)"),
plant_height_m = md("Plant height (m)"))%>%
tab_options(table.font.size = px(10L))%>%
opt_table_font("Corbel")%>%
opt_stylize(style=4)%>%
fmt_number(columns = where(is.numeric), n_sigfig = 3, drop_trailing_zeros = T)
| Plant | Latin binomial | Salinity (ppt) | Elevation (cm) | long | lat | Plant height (m) | row |
|---|---|---|---|---|---|---|---|
| Baltic Rush | Juncus balticus | 5.00 | 20.0 | −123 | 48.9 | 0.749 | 1.00 |
| 5.00 | 9.00 | −123 | 48.9 | 0.749 | 2.00 | ||
| 9.00 | 9.00 | −123 | 48.9 | 0.749 | 3.00 | ||
| Broad-leaf Cattail | Typha latifolia | 2.00 | 8.00 | −123 | 48.9 | 3.00 | 1.00 |
| 0 | 7.00 | −123 | 48.9 | 3.00 | 2.00 | ||
| Douglas Aster | Aster subspicatus | 15.0 | 2.00 | −123 | 48.9 | 1.20 | 1.00 |
| 2.00 | 9.00 | −123 | 48.9 | 1.20 | 2.00 | ||
| 10.0 | 5.00 | −123 | 48.9 | 1.20 | 3.00 | ||
| 5.00 | 13.0 | −123 | 48.9 | 1.20 | 4.00 | ||
| Pacific Silverweed | Potentilla anserina | 8.00 | 0 | −123 | 48.9 | 0.800 | 1.00 |
| 12.0 | 4.00 | −123 | 48.9 | 0.800 | 2.00 | ||
| Pickleweed | Salicornia virginica | 22.0 | 2.00 | −123 | 48.9 | 1.22 | 1.00 |
| 28.0 | 12.0 | −123 | 48.9 | 1.22 | 2.00 | ||
| 31.0 | 5.00 | −123 | 48.9 | 1.22 | 3.00 | ||
| Salt Grass | Distichlis spicata | 26.0 | 4.00 | −123 | 48.9 | 0.100 | 1.00 |
| 21.0 | 5.00 | −123 | 48.9 | 0.100 | 2.00 | ||
| 20.0 | 2.00 | −123 | 48.9 | 0.100 | 3.00 |
ggplot()+
geom_boxplot(data=cherry_data, mapping=aes(reorder(Latin_binom,Elevation_cm), Elevation_cm, color=Latin_binom), show.legend = F)+
labs(
x = "",
y = "Elevation (cm)")+
theme(axis.text.x = element_text(angle=45, vjust = 1, hjust=1, face="italic"),
legend.text = element_text(face="italic"),
legend.title = element_blank())
Figure 1: Species as a function of elevation (cm) at the Cherry Point pocket estuary. It was found that elevation was not was not a good predictor of plant height (either alone, or as an interaction in a model with salinity).
p1 <- ggplot(cherry_data, aes(reorder(Latin_binom,Elevation_cm), y=Elevation_cm, fill=Latin_binom)) + geom_boxplot(show.legend = F)+
theme(axis.text.x = element_text(angle=45, vjust = 1, hjust=1, face="italic"))+
labs(x="", y="Elevation (cm)")
p2 <- ggplot(cherry_data, aes(reorder(Latin_binom,plant_height_m), y=plant_height_m, color=Latin_binom)) + geom_boxplot(show.legend = F) +
theme(axis.text.x = element_text(angle=45, vjust = 1, hjust=1, face="italic"))+
labs(x="", y="Plant Height (m)")
p3 <- ggplot(cherry_data, aes(reorder(Latin_binom,Salinity_ppt), y=Salinity_ppt, fill=Latin_binom)) + geom_boxplot(show.legend = F)+
theme(axis.text.x = element_text(angle=45, vjust = 1, hjust=1, face="italic"))+
labs(x="", y="Salinity (ppt)")
p3
Figure 2: Species as a function of salinity (ppt). The regression for plant type as a function of salinity was statistically significant (\(R^2_{adj.}\) = 0.17, \(F_{1,15}\)= 4.33, p = 0.05). Even though salinity was statistically significant, the \(R^2\) value is pretty low, which suggests that there might be other factors that have not been measured that could help create a better fit model
ggplot(cherry_data, aes(x = Salinity_ppt,
y = after_stat(density), color=Latin_binom, fill=Latin_binom))+
geom_density(color = "black",
size = 0.5, alpha=0.7)+
labs(
x = "Salinity (ppt)",
y = "Density")+
theme_bw()+
theme(legend.text = element_text(face="italic"),
legend.title = element_blank())
Figure 3: Density vs. salinity colored by species. This was my attempt to create something similar to the productivity vs. salinity plot that was drawn on the board in class – but it does not make sense.
ggplot(cherry_data, aes(x = Salinity_ppt,
y = plant_height_m, color=Latin_binom))+
geom_line()+
labs(
x = "Salinity (ppt)",
y = "Plant height (m)")+
theme_bw()+
theme(legend.text = element_text(face="italic"),
legend.title = element_blank())
Figure 4: Showing the maximum growth for each plant vs. salinity (ppt). Here you can see that the higheset growing plant, Typha latifolia, grows in areas of lowest salinity. Most middle height plants prefer medium salinity (except for Salicornia virginica, which can tolerate higher salinities. Distichilis spicata can only grow to about 0.2 m and it can tolerate pretty high salinities too.
ggplot()+
geom_point(data=cherry_data, mapping = aes(x=Elevation_cm, y=Salinity_ppt, color=Latin_binom))
Figure 5: Trying to determine if elevation can be a predictor of salinity. It does not seem that there is an obvious trend, but there is also an obvious outlier in the data. The data seem to have an s-curve about them.
ggplot()+
geom_point(data=cherry_data, mapping = aes(x=Elevation_cm, y=plant_height_m, color=Latin_binom))
Figure 6: Here you can see if plant height is influenced by elevation. There does not seem to be an obvious trend here.
ggplot()+
geom_point(data=cherry_data, mapping = aes(x=Salinity_ppt, y=plant_height_m, color=Latin_binom))
Figure 7: Plant height vs. salinity. There seems to be a negative correlation here, but Salicornia virginica doesn’t follow the same rules as the rest of the measurements. If you take this plant out, you can see that as salinity increases, plant height decreases. This means that biomass might be correlated with salinity.
suppressWarnings((plot_ly(cherry_data,
x = ~Salinity_ppt,
y = ~plant_height_m,
z = ~Elevation_cm,
color = ~Elevation_cm) |>
add_markers() |>
layout(scene = list(xaxis = list(title = 'Salinity x'),
yaxis = list(title = 'Productivity y (m)'),
zaxis = list(title = 'Elevation z (cm)'))) |>
layout(show.legend = T)))
Figure 8: Showing a 3-d map of all variables. There does not seem to be an obvious trend here, but showing the data like this is fun!
#convert the cherry data to sf
cherry_pointsf <- cherry_data %>% st_as_sf(coords = c("long", "lat")) %>%
st_set_crs(value = 5070)
#now the empty grid to interpolate the datad
lat <- cherry_data$lat
long <- cherry_data$long
ID <- c(1:17)
homemadeGrid <- as.tibble(cbind(lat, long, ID))
gridCP_sf <- st_as_sf(homemadeGrid,
coords=c("long", "lat"),
crs=st_crs(cherry_pointsf))
# plot it
a <- cherry_pointsf %>% ggplot() +
geom_sf(aes(fill=Salinity_ppt,size=Salinity_ppt),color="white",
shape=21,alpha=0.8) +
scale_fill_continuous(type = "viridis",name="ppt") +
labs(title="Salinity (ppt)") +
scale_size(guide="none")
b <- cherry_pointsf %>% ggplot() +
geom_sf(aes(fill=Elevation_cm,size=Elevation_cm),color="white",
shape=21,alpha=0.8) +
scale_fill_continuous(type = "viridis",name="cm") +
labs(title="Elevation (cm)") +
scale_size(guide="none")
ggarrange(a, b)
Figure 9: I attempted to pull in spatial data to do some spatial interpolation. I had taken images of certain plants while we were running through identification, however, I did not get the coordinates for each of my sampling points and so I had to just reuse spatial data for each species. I think, because of this, there is not enough data to interpolate. I will gather this information next time!
#The exponential Model
CPVarAuto <- autofitVariogram(formula = Salinity_ppt~1,input_data = cherry_pointsf)
plot(CPVarAuto)
Kriging isn’t working because too few data (I think).
latin <- cherry_data[2:4]
productivity <- cherry_data[7]
combined <- cbind(latin, productivity)
ggpairs(combined)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 10: Correlelogram to show correlations between variables. There are no statistically signficant relationships according to this. I will further assess salinity and plant height though, because I think there could be something there that is not showing up here.
#models
m1 <- lm(plant_height_m ~ Salinity_ppt * Elevation_cm, data = combined) #assuming there is interaction between salinity and elevation
#models
m4 <- lm(plant_height_m ~ Salinity_ppt + Elevation_cm, data = combined) #assuming there is interaction between salinity and elevation
summ(m4)
## MODEL INFO:
## Observations: 17
## Dependent Variable: plant_height_m
## Type: OLS linear regression
##
## MODEL FIT:
## F(2,14) = 2.03, p = 0.17
## R² = 0.22
## Adj. R² = 0.11
##
## Standard errors: OLS
## -------------------------------------------------
## Est. S.E. t val. p
## ------------------ ------- ------ -------- ------
## (Intercept) 1.65 0.49 3.37 0.00
## Salinity_ppt -0.04 0.02 -1.93 0.07
## Elevation_cm -0.01 0.04 -0.13 0.90
## -------------------------------------------------
#now look to see if interaction:
interact_plot(model=m1, pred=Salinity_ppt, modx = Elevation_cm, plot.points=T, colors=c("#c178f7", "#00ba38"))
Figure 11: Interaction plot between elevation and salinity. Because the lines are not parallel, there is no interaction between the two variables, salinity and elevation. I am surprised! Since there is no interaction between the two, and models with both terms are not statistically significant, I will try models with only 1 variable.
summ(m1)
## MODEL INFO:
## Observations: 17
## Dependent Variable: plant_height_m
## Type: OLS linear regression
##
## MODEL FIT:
## F(3,13) = 1.73, p = 0.21
## R² = 0.29
## Adj. R² = 0.12
##
## Standard errors: OLS
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 2.17 0.69 3.14 0.01
## Salinity_ppt -0.08 0.04 -1.86 0.09
## Elevation_cm -0.06 0.07 -0.93 0.37
## Salinity_ppt:Elevation_cm 0.01 0.01 1.05 0.31
## --------------------------------------------------------------
m3 <- lm(plant_height_m ~ Salinity_ppt, data = combined)
summ(m3)
## MODEL INFO:
## Observations: 17
## Dependent Variable: plant_height_m
## Type: OLS linear regression
##
## MODEL FIT:
## F(1,15) = 4.33, p = 0.05
## R² = 0.22
## Adj. R² = 0.17
##
## Standard errors: OLS
## -------------------------------------------------
## Est. S.E. t val. p
## ------------------ ------- ------ -------- ------
## (Intercept) 1.61 0.31 5.25 0.00
## Salinity_ppt -0.04 0.02 -2.08 0.05
## -------------------------------------------------
m5 <- lm(plant_height_m ~ Elevation_cm, data = combined)
summ(m5)
## MODEL INFO:
## Observations: 17
## Dependent Variable: plant_height_m
## Type: OLS linear regression
##
## MODEL FIT:
## F(1,15) = 0.27, p = 0.61
## R² = 0.02
## Adj. R² = -0.05
##
## Standard errors: OLS
## ------------------------------------------------
## Est. S.E. t val. p
## ------------------ ------ ------ -------- ------
## (Intercept) 0.94 0.35 2.67 0.02
## Elevation_cm 0.02 0.04 0.52 0.61
## ------------------------------------------------
I think salinity is the only statistically significant predictor of plant height (p = 0.05), and the best model is without elevation. I find this very surprising due to the fact that I originally thought that as elevation increased, salinity would decrease proportionally.
See Table 1.
See Figures 1 & 2.
After running a regression model on the data, it seems that salinity is a better predictor of plant distribution, (see m3 above). The optimal fitted regression model using Akaike information criteria was, lm(plant_height_m ~ Salinity_ppt). The overall regression was statistically significant (\(R^2_{adj.}\) = 0.17, \(F_{1,15}\)= 4.33, p = 0.05). It was found that elevation (flooding) was not was not a good predictor of plant height (either alone, or as an interaction in a model with salinity). Even though salinity was statistically significant, the \(R^2\) value is pretty low, which suggests that there might be other factors that have not been measured that could help create a better fit model.