Introduction- Rockpools offer a glimpse into ecological differences
on a miniature scale. Smaller rocks caught in larger ones create these
unique formations, like the James rockpools, eroded by water over
millions of years. Each pool presents abiotic and biotic elements,
making it a microcosm of larger ecosystems. The investigation at the
James Rockpools aims to explore these interactions and the relationship
between biodiversity and the rockpools’ physical traits, like elevation,
tidal flows, and nutrient availability. We hypothesize that each
rockpool’s shape influences species richness.
install.packages("readr")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library("readr")
data = read_csv("data.csv")
## Rows: 9 Columns: 77
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Percent Aquatic Vegetation Cover, Percent Canopy Cover
## dbl (75): Pool Number, Average Length of Pool (cm), Average Width of Pool (c...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
install.packages("ggplot2")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library(ggplot2)
install.packages("psych")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
ggplot(data, aes(x = `Average Volume of Pool (cm3)`, y = `Species Richness`)) +
geom_point() +
labs(title = "Species Richness vs Pool Volume",
x = "Pool Volume (cm3)",
y = "Species Richness") +
theme_minimal()

This scatterplot shows the connection between pool volume and
species richness. Each black dot represents a pool’s volume and the
corresponding species richness. The X-axis displays the volume of each
pool while the Y-axis represents species richness.
# Convert to numeric and replace non-numeric characters with NA
data$`Average Volume of Pool (cm3)` <- as.numeric(gsub("[^0-9.]", "", data$`Average Volume of Pool (cm3)`))
data$`Species Richness` <- as.numeric(gsub("[^0-9.]", "", data$`Species Richness`))
data$`Percent Canopy Cover` <- as.numeric(gsub("[^0-9.]", "", data$`Percent Canopy Cover`))
data$`Percent Aquatic Vegetation Cover` <- as.numeric(gsub("[^0-9.]", "", data$`Percent Aquatic Vegetation Cover`))
# Check for NA values after conversion
sum(is.na(data$`Average Volume of Pool (cm3)`))
## [1] 0
sum(is.na(data$`Species Richness`))
## [1] 0
sum(is.na(data$`Percent Canopy Cover`))
## [1] 0
sum(is.na(data$`Percent Aquatic Vegetation Cover`))
## [1] 0
# Remove rows with NA if they are few
cleaned_data <- na.omit(data)
# Fit the linear model
lm_model <- lm(`Species Richness` ~ `Average Volume of Pool (cm3)`, data = cleaned_data)
# Extract R-squared value
r_squared <- summary(lm_model)$r.squared
# Create the ggplot
ggplot(cleaned_data, aes(x = `Average Volume of Pool (cm3)` , y = `Species Richness`)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
annotate("text", x = 10, y = 5, label = sprintf("R² = %.2f", r_squared), hjust = 0, vjust = 0) +
labs(title = "Species Richness vs Pool Volume",
x = "Pool Volume (cm3)",
y = "Species Richness") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

This scatterplot shows the connection between pool volume and
species richness. Each black dot represents a pool’s volume and the
corresponding species richness. The X-axis displays the volume of each
pool while the Y-axis represents species richness. The blue line of best
fit suggests a slight decrease in species richness as pool volume
increases. The R² value of 0.01 suggests other factors not accounted for
in the graph may influence species richness significantly, however a R²
value of 0.01 suggests that there is a very weak linear relationship
between pool volume and species richness.
data$`Surface Sweep Speceis: Alantic silverside` <- as.numeric(gsub("[^0-9.]", "", data$`Surface Sweep Speceis: Alantic silverside`))
data <- na.omit(data)
ggplot(data, aes(x = factor(`Pool Number`), y = `Surface Sweep Speceis: Alantic silverside` - `Species Richness` )) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Species Count by Pool",
x = "Pool Number",
y = "Species Count Difference") +
theme_minimal()

This histogram shows the distribution of pool volumes. The orange
bars represent various volume ranges, with the height of each bar
demonstrating the frequency of pools in that range. The X-axis denotes
the measured cubic centimeters (cm³) of each pool, divided into
intervals or bins and increasing from left to right. Meanwhile, the
Y-axis displays the number of pools within each range, increasing from
bottom to top. By examining the histogram, we gain insight into the most
common volume ranges and the distribution of pool volumes within the
dataset. The width of each bar to a matches a specific volume range,
while the height shows the number of pools within that range. The bar
with the greatest height represents the most common volume range.
Overall, this histogram helps us visualize how pool volumes are
distributed throughout the dataset.
# Install and load the corrplot package
install.packages("corrplot")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library(corrplot)
## corrplot 0.92 loaded
# Calculate correlations
correlations <- cor(cleaned_data[, c("Average Volume of Pool (cm3)","Species Richness" , "Percent Canopy Cover", "Percent Aquatic Vegetation Cover")])
# Plot the correlogram
corrplot(correlations, method = "circle", title = "Correlogram for Multiple Variables")

This matrix shows the correlation analysis of variables such as
Average Pool Volume (Cm³), Species Richness, Percentage Canopy Cover,
and Vegetation Cover. It is displayed as a grid with variables listed on
both the x-axis and y-axis. Each cell in the matrix demonstrates the
correlation between the variables on its corresponding row and column.
Larger and darker circles indicate stronger correlations, with blue
circles representing positive correlations, and red circles representing
negative correlations. The correlogram can help identify notable
correlations, but it does not inherently provide statistical
significance.