RMDA Bowne Summative

Author

Talane Bowne

Why are we here?

Here we are at the end of term, for me to create the summative assigned to assess my knowledge of the course, and likely if you’re reading this you’re here to do the assessing. If you are not the assessor…you’re probably lost. To keep it brief, as each student gets 3,000 words of your time in the paper, here you will find the code for the tests done to play with the data enough to know what statistical analyses are appropriate for each question as well as the tests and figures deemed appropriate for utilizing in the paper.

Please excuse the fun I’m having with my wording here as the paper has to be professional and formal.

Libraries are Open

#| label: Opening the libraries

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(ggplot2)
library(tidyr)
library(knitr)
library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

library(broom)
# This is just calling the libraries that will be used in this project

#| label: Naming file 1

sharks <- read.csv("C:/Users/talan/Documents/R/Bowne_files/sharks.csv")
# While my files were imported and running I had to do this to make the files render. I honestly do not know why but it made it work

#| label: Naming file 2

sharksub <- read.csv("C:/Users/talan/Documents/R/Bowne_files/sharksub.csv")
# same as previous chunk just with the second dataset

Question 1

Is there a correlation between the air and water variables?

Checking the Data

The first thing for choosing a test was checking the data. The histogram, qq test and shapiro-wilk test was checking for normality in the distribution of the data. The shapiro-wilk result was small so that was an indication that the data was not normally distributed. The processes were the same for both air and water to check the data. Both variables were continuous and numeric, The data was unpaired and due to the non-normal distribution a non-parametric test was appropriate thus I went with the spearman’s correlation.

Air

#| label: Air Distribution Check


ggplot(data = sharks, aes(x = air)) + # creating my histogram of air temps in the shark data 
  geom_histogram(bins = 30, col = "purple4", fill = "darkorchid") + # setting the bins aka bars and picking colors
    theme_dark() + # lines 67-75 are just setting the theme to dark mode
  theme(
    plot.background = element_rect(fill = "black"), 
    panel.background = element_rect(fill = "black"), 
    panel.grid.major = element_line(color = "gray"), 
    panel.grid.minor = element_line(color = "gray30"),
    axis.text = element_text(color = "white"), 
    axis.title = element_text(color = "white"), 
    plot.title = element_text(color = "white", size = 14, face = "bold") 
  ) + 
  scale_color_brewer() + 
    labs(x = "Air (Celsius)", # 78-80 are creating labels
       y = "Frequency", 
       title = "Data Distribution of Ambient Air Temperatures")

#| label: QQplot Air

par(bg = "black", col.axis = "white", col.lab = "white", col.main = "white") # turing my plot to dark mode

qqnorm(sharks$air, pch = 1, col = "white", frame = FALSE, #makiing a QQ plot of the air data in the shark dataset
       main = "QQ Plot of Air Data", #labels for the next 3 lines
       xlab = "Theoretical Quantiles", 
       ylab = "Sample Quantiles")
qqline(sharks$air, col = "darkorchid", lwd = 2) # adding the line to the plot and making it a pretty color

#| label: Air Shapiro-Wilk

shapiro.test(sharks$air)


    Shapiro-Wilk normality test

data:  sharks$air
W = 0.95885, p-value = 1.338e-10

# shapiro-wilk test to check for normality of air temp distribution in the shark dataset

Water

#| label: Water Distribution Check


ggplot(data = sharks, aes(x = water)) +
  geom_histogram(bins = 30, col = "purple4", fill = "darkorchid") + 
    theme_dark() +
  theme(
    plot.background = element_rect(fill = "black"), 
    panel.background = element_rect(fill = "black"), 
    panel.grid.major = element_line(color = "gray"), 
    panel.grid.minor = element_line(color = "gray30"), 
    axis.text = element_text(color = "white"), 
    axis.title = element_text(color = "white"), 
    plot.title = element_text(color = "white", size = 14, face = "bold") 
  ) + 
  scale_color_brewer() + 
  labs(x = "Water temperature at surface at time of capture (Celsius)",  
       y = "Frequency",  
       title = "Data Distribution of Surface Water Temperature")

# repeated the process for the air histogram with the water temps from the shark dataset

#| label: QQplot Water

par(bg = "black", col.axis = "white", col.lab = "white", col.main = "white")


qqnorm(sharks$water, pch = 1, col = "white", frame = FALSE, 
       main = "QQ Plot of Water Data", 
       xlab = "Theoretical Quantiles", 
       ylab = "Sample Quantiles")
qqline(sharks$water, col = "darkorchid", lwd = 2)

#repeated the process from the air QQ plot with the water temps from the shark dataset

#| label: Water Shapiro-Wilk


shapiro.test(sharks$water)


    Shapiro-Wilk normality test

data:  sharks$water
W = 0.96035, p-value = 2.371e-10

# shapiro-wilk test to check for normality of water temp distribution in the shark dataset

The Stats

After running the spearman’s correlation the p value was above 0.05 and as such above the threshold for significance showing no relationship between the air and water variables.

#| label: Soearmans's Correlation for Air & Water


cor.test(x=sharks$air, y=sharks$water, method = 'spearman')


    Spearman's rank correlation rho

data:  sharks$air and sharks$water
S = 22007692, p-value = 0.2082
alternative hypothesis: true rho is not equal to 0
sample estimates:
        rho 
-0.05637344

# code to run the spearman's correlation to check for a relationship between air and water temps

Question 2

Do multiple captures have an effect on blotching time?

Checking the Data

Checking the data for this question started just the same as question one with creating histograms, QQ plots and running the shapiro-wilk test to check for normality in the distribution of the data. For this test like the previous the data was continuous and numeric. However this question is based upon looking for differences and the data in this question was paired. The 3 distribution checks for this question showed that the blotching times for the first and second capture was normally distributed. In this care the shapiro-wilk test had a high p value which indicated normal distribution. For this question based upon the checks of the data was the paired T-test.

First Capture

#| label: Blotch1 Distribution Check

ggplot(data = sharksub, aes(x = blotch1)) +
  geom_histogram(bins = 30, col = "purple4", fill = "darkorchid") + 
   theme_dark() +
  theme(
    plot.background = element_rect(fill = "black"), 
    panel.background = element_rect(fill = "black"), 
    panel.grid.major = element_line(color = "gray"), 
    panel.grid.minor = element_line(color = "gray30"), 
    axis.text = element_text(color = "white"), 
    axis.title = element_text(color = "white"), 
    plot.title = element_text(color = "white", size = 14, face = "bold") 
  ) + 
  scale_color_brewer() +
  labs(x = "Time for blotching to cover 30% of ventral surface (s)", 
       y = "Frequency", 
       title = "Data Distribution of blotching time for first capture")

# Same as the previous histograms but with blotching time for the first capture in the sharksub dataset

#| label: QQplot Blotch1

par(bg = "black", col.axis = "white", col.lab = "white", col.main = "white")


qqnorm(sharksub$blotch1, pch = 1, col = "white", frame = FALSE, 
       main = "QQ Plot of Blotch1 Data", 
       xlab = "Theoretical Quantiles", 
       ylab = "Sample Quantiles")
qqline(sharksub$blotch1, col = "darkorchid", lwd = 2)

# Same as previous QQ plot code but for first capture blotch time in the sharksub dataset

#| label: Shapiro-Wilk blotch1


shapiro.test(sharksub$blotch1)


    Shapiro-Wilk normality test

data:  sharksub$blotch1
W = 0.97958, p-value = 0.5345

# shapiro-wilk normality check for first capture blotch times of sharksub dataset

####Second Capture

#| label: Blotch2 Distribution Check


ggplot(data = sharksub, aes(x = blotch2)) +
  geom_histogram(bins = 30, col = "purple4", fill = "darkorchid") + 
   theme_dark() +
  theme(
    plot.background = element_rect(fill = "black"), 
    panel.background = element_rect(fill = "black"), 
    panel.grid.major = element_line(color = "gray"), 
    panel.grid.minor = element_line(color = "gray30"), 
    axis.text = element_text(color = "white"), 
    axis.title = element_text(color = "white"), 
    plot.title = element_text(color = "white", size = 14, face = "bold") 
  ) + 
  scale_color_brewer() + 
  labs(x = "Time for blotching to cover 30% of ventral surface (s)", 
       y = "Frequency", 
       title = "Data Distribution of Blotching Time for Second Capture")

# only change in this histogram is checking the second capture blotch time in the sharksub dataset

#| label: QQplot Blotch2

par(bg = "black", col.axis = "white", col.lab = "white", col.main = "white")


qqnorm(sharksub$blotch2, pch = 1, col = "white", frame = FALSE, 
       main = "QQ Plot of Blotch2 Data", 
       xlab = "Theoretical Quantiles", 
       ylab = "Sample Quantiles")
qqline(sharksub$blotch2, col = "darkorchid", lwd = 2)

# Only change is second capture blotch times in the sharksub dataset

#| label: Shapiro-Wilk Blotch2


shapiro.test(sharksub$blotch2)


    Shapiro-Wilk normality test

data:  sharksub$blotch2
W = 0.97936, p-value = 0.5255

# shapiro-wilk normality check for second capture blotch times of sharksub dataset

The Stats

The paired T-test showed that there was a significant difference between the means of the blotch times for the first catch and the second catch. The combined histogram had transparent bars to show where the two different capture blotch times overlap. The combined histogram illustrates how the second capture blotch times trends to longer than the first capture.

#| label: Blotch1&2 Paired T-test

t.test(sharksub$blotch1, sharksub$blotch2 , paired = TRUE)


    Paired t-test

data:  sharksub$blotch1 and sharksub$blotch2
t = -17.39, df = 49, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -1.037176 -0.822301
sample estimates:
mean difference 
     -0.9297384

# code for the paired t-test which test for difference in means of the first capture and second capture blotch times in the sharksub dataset

sharksub_long <- sharksub %>% 
  pivot_longer(cols = c(blotch1, blotch2), 
  names_to = "Capture", values_to = "BlotchTime")  # these three lines of code reshaped the data of blotching time into long format to better visualize it in the combined histogram. utilized the blotching times for the first and second capture in the sharksub dataset 


ggplot(data = sharksub_long, aes(x = BlotchTime, fill = Capture)) + # the same as previous histograms but calling that long format data to used for the combines histogram 
  geom_histogram(position = "identity", alpha = 0.85, bins = 30, color = "black") +
  theme_dark() +
  theme(plot.background = element_rect(fill = "black"), 
    panel.background = element_rect(fill = "black"), 
    panel.grid.major = element_line(color = "gray"), 
    panel.grid.minor = element_line(color = "gray30"), 
    axis.text = element_text(color = "white"), 
    axis.title = element_text(color = "white"), 
    plot.title = element_text(color = "white", size = 14, face = "bold") ) + 
  scale_fill_manual(values = c("blotch1" = "darkorchid1", "blotch2" = "purple4")) +  
  labs(x = "Time for blotching to cover 30% of ventral surface (s)",
       y = "Frequency",
       title = "Distribution of Blotching Time After Capture") +
  theme(legend.title = element_blank())  # taking out the legend title

Question 3

Is it possible to predict blotching time?

Checking the Data

Initially a mutlitple regression was chosen to check the relationship between multiple variables and how they affected blotching time. However as the only variable that affected blotching time was depth, this became a convenient way to check that a lineear regression model was the most appropriate test, something that can be called a happy accident. The data was unpaired, continuous and numeric for all variables. The question that was asked was looking for a relationship where a dependent variable was being altered by an independent variable.

#| label: Multi-Regression Blotching Time

multi_reg <- lm(blotch ~ BPM + weight + length + air + water + meta + depth, data = sharks) # creates the multi regression
summary(multi_reg) # displays the summary of the data for the multi regression


Call:
lm(formula = blotch ~ BPM + weight + length + air + water + meta + 
    depth, data = sharks)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.83745 -0.66117 -0.00702  0.60110  2.74108 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.1405851  1.8958668   5.876 7.74e-09 ***
BPM         -0.0019723  0.0031890  -0.618    0.537    
weight       0.0016283  0.0033511   0.486    0.627    
length       0.0012295  0.0009710   1.266    0.206    
air         -0.0281474  0.0318707  -0.883    0.378    
water       -0.0188934  0.0270782  -0.698    0.486    
meta        -0.0009712  0.0025951  -0.374    0.708    
depth        0.5061285  0.0223191  22.677  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.002 on 492 degrees of freedom
Multiple R-squared:  0.514, Adjusted R-squared:  0.507 
F-statistic: 74.32 on 7 and 492 DF,  p-value: < 2.2e-16

regression_summary <- tidy(multi_reg) #creating a version of the multi regression summary which can easily be displayed in the table

regression_summary %>% # calling the tidy version of the data to be put into the table 
  kable(col.names = c("Variable", "Estimate", "Std. Error", "t-value", "p-value"), #labeling the columns 
    caption = "Regression Model Summary for Blotch Prediction",
    format = "html") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"), # a shadow occurs when the mouse hovers over the table 
    full_width = FALSE,
    font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "purple4", color = "white") %>% # the last bit of this chunk is just making pretty colors
  column_spec(1, background = "purple", bold = TRUE) %>%            
  column_spec(5, 
              background = ifelse(regression_summary$p.value < 0.05, "darkviolet", "violet"),
              color = ifelse(regression_summary$p.value < 0.05, "pink", "black"))

Regression Model Summary for Blotch Prediction
Variable	Estimate	Std. Error	t-value	p-value
(Intercept)	11.1405851	1.8958668	5.8762490	0.0000000
BPM	-0.0019723	0.0031890	-0.6184807	0.5365447
weight	0.0016283	0.0033511	0.4859077	0.6272489
length	0.0012295	0.0009710	1.2662214	0.2060330
air	-0.0281474	0.0318707	-0.8831753	0.3775729
water	-0.0188934	0.0270782	-0.6977369	0.4856714
meta	-0.0009712	0.0025951	-0.3742564	0.7083748
depth	0.5061285	0.0223191	22.6769242	0.0000000

The Stats

The linear regression of the relationship between blotch time and depth revealed a positive correlation between the two. A scatter plot with a regression line was created which displayed this positive relationship where blotching time was longer when the sharks were captured at a deeper depth. To further confirm the linear regression was accurate a qq plot of the residuals was done and showed a normal distribution which is the assumption of the residuals from ta linear regrression.

#| label: linear Regression of Depth and Blotching

lin_reg <- lm(blotch ~ depth, data = sharks) #creating the linear regression
summary(lin_reg) # calling the summary to display the data from the linear regression


Call:
lm(formula = blotch ~ depth, data = sharks)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.81869 -0.65427 -0.01035  0.58825  2.83116 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.82178    1.11207   8.832   <2e-16 ***
depth        0.50467    0.02216  22.772   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1 on 498 degrees of freedom
Multiple R-squared:  0.5101,    Adjusted R-squared:  0.5091 
F-statistic: 518.6 on 1 and 498 DF,  p-value: < 2.2e-16

#| label: Linear Model 95% Confidence

confint(lin_reg) # generates a confidence interval for the linear regression

                2.5 %     97.5 %
(Intercept) 7.6368581 12.0067017
depth       0.4611309  0.5482156

lin_reg_summary <- tidy(lin_reg)

lin_reg_summary %>%
  kable(col.names = c("Variable", "Estimate", "Std. Error", "t-value", "p-value"),
    caption = "Linear Regression Model Summary: Depth as Predictor of Blotch",
    format = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "purple4", color = "white") %>%  
  column_spec(1, background = "purple", bold = TRUE) %>%                
  column_spec(5, background = ifelse(lin_reg_summary$p.value < 0.05, "darkviolet", "violet"),
  color = ifelse(lin_reg_summary$p.value < 0.05, "pink", "black"))

Linear Regression Model Summary: Depth as Predictor of Blotch
Variable	Estimate	Std. Error	t-value	p-value
(Intercept)	9.8217799	1.1120672	8.832002	0
depth	0.5046733	0.0221619	22.772112	0

# I'm not going to keep repeating things anymore

ggplot(sharks, aes(x = depth, y = blotch)) + # creates a scatter plot with blotch time as a dependent variable of depth
  geom_point(color = "purple3", size = 3) + 
  geom_smooth(method = "lm", color = "darkorchid", se = TRUE) +  # adds in the regression line
  labs(title = "Blotch vs. Depth",
       x = "Depth (m)",
       y = "Blotch (seconds)") +
  theme_dark() + # do I even need to point out dark mode anymore? 
  theme(plot.background = element_rect(fill = "black"), 
    panel.background = element_rect(fill = "black"), 
    panel.grid.major = element_line(color = "gray"), 
    panel.grid.minor = element_line(color = "gray30"), 
    axis.text = element_text(color = "white"), 
    axis.title = element_text(color = "white"), 
    plot.title = element_text(color = "white", size = 14, face = "bold"))

`geom_smooth()` using formula = 'y ~ x'

#| label: QQplot Residuals

par(bg = "black", col.axis = "white", col.lab = "white", col.main = "white")


qqnorm(lin_reg$residuals, pch = 1, col = "white", frame = FALSE, 
       main = "QQ Plot of residuals", 
       xlab = "Theoretical Quantiles", 
       ylab = "Sample Quantiles")
qqline(lin_reg$residuals, col = "darkorchid", lwd = 2)

Extra

Bits of extra code to supplement the paper

I don’t even know if I’ll use any of these, but just in case.

table <- sharks %>%
  group_by(sex) %>%
  summarize(count = n()) %>%
ungroup()

#| label: Summary of Sex of sharks dataset table 

kable(table, col.names = c("Sex", "Count"), caption = "Summary of Sharks by Sex") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, bold = TRUE, color = "white", background = "purple4") %>% 
  row_spec(1:2, background = "darkorchid")

Summary of Sharks by Sex
Sex	Count
Female	236
Male	264

#| label: Summary of Sex of sharksub dataset 

table2 <- sharksub %>%
  group_by(sex) %>%
  summarize(count = n()) %>%
ungroup()

#| label: Summary of Sex of sharksub dataset table 

kable(table2, col.names = c("Sex", "Count"), caption = "Summary of Sharks by Sex") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, bold = TRUE, color = "white", background = "purple4") %>% 
  row_spec(1:2, background = "darkorchid")

Summary of Sharks by Sex
Sex	Count
Female	25
Male	25

summary(sharksub)

      ID                sex               blotch1         blotch2     
 Length:50          Length:50          Min.   :32.49   Min.   :33.47  
 Class :character   Class :character   1st Qu.:34.38   1st Qu.:35.31  
 Mode  :character   Mode  :character   Median :34.94   Median :35.94  
                                       Mean   :35.03   Mean   :35.96  
                                       3rd Qu.:35.90   3rd Qu.:36.78  
                                       Max.   :37.07   Max.   :38.18

summary_table <- sharksub %>% summarize(blotch1_Minimum = min(blotch1, na.rm = TRUE), # summarizing the mean, min, and max of the first and second capture blotch times 
    blotch1_Mean = mean(blotch1, na.rm = TRUE),
    blotch1_Maximum = max(blotch1, na.rm = TRUE),
    blotch2_Minimum = min(blotch2, na.rm = TRUE),
    blotch2_Mean = mean(blotch2, na.rm = TRUE),
    blotch2_Maximum = max(blotch2, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(), names_to = "Metric", values_to = "Value")

summary_table <- summary_table %>% mutate(Variable = case_when(
      grepl("blotch1", Metric) ~ "blotch1",
      grepl("blotch2", Metric) ~ "blotch2"),
    Statistic = case_when(grepl("Minimum", Metric) ~ "Minimum",
      grepl("Mean", Metric) ~ "Mean",
      grepl("Maximum", Metric) ~ "Maximum")) %>%
  select(Variable, Statistic, Value) %>%
  arrange(Variable, Statistic)

#| label: Making a table of the blotch times 

kable(summary_table, caption = "Comparison of Blotch1 and Blotch2")

Comparison of Blotch1 and Blotch2
Variable	Statistic	Value
blotch1	Maximum	37.07165
blotch1	Mean	35.03042
blotch1	Minimum	32.49322
blotch2	Maximum	38.18380
blotch2	Mean	35.96016
blotch2	Minimum	33.46802

#| label: Making a pretty table of blotch times 

kable(summary_table, col.names = c("Variable", "Statistic", "Value"), caption = "Summary of Blotch Times") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, bold = TRUE, color = "white", background = "purple4") %>% 
  row_spec(1:6, background = "darkorchid")

Summary of Blotch Times
Variable	Statistic	Value
blotch1	Maximum	37.07165
blotch1	Mean	35.03042
blotch1	Minimum	32.49322
blotch2	Maximum	38.18380
blotch2	Mean	35.96016
blotch2	Minimum	33.46802

#| label: Summary of shark dataset 

summary(sharks)

      ID                sex                blotch           BPM       
 Length:500         Length:500         Min.   :30.78   Min.   :119.0  
 Class :character   Class :character   1st Qu.:34.16   1st Qu.:129.0  
 Mode  :character   Mode  :character   Median :35.05   Median :142.0  
                                       Mean   :35.13   Mean   :141.8  
                                       3rd Qu.:36.05   3rd Qu.:153.2  
                                       Max.   :40.08   Max.   :166.0  
     weight           length           air            water      
 Min.   : 65.10   Min.   :128.3   Min.   :33.00   Min.   :20.01  
 1st Qu.: 75.68   1st Qu.:172.0   1st Qu.:34.42   1st Qu.:21.55  
 Median : 87.82   Median :211.1   Median :35.43   Median :23.11  
 Mean   : 87.94   Mean   :211.0   Mean   :35.54   Mean   :23.02  
 3rd Qu.:100.40   3rd Qu.:251.8   3rd Qu.:36.71   3rd Qu.:24.37  
 Max.   :110.94   Max.   :291.0   Max.   :38.00   Max.   :25.99  
      meta            depth      
 Min.   : 50.03   Min.   :44.64  
 1st Qu.: 67.39   1st Qu.:48.90  
 Median : 82.45   Median :50.14  
 Mean   : 82.04   Mean   :50.14  
 3rd Qu.: 95.97   3rd Qu.:51.35  
 Max.   :112.45   Max.   :56.83

summary_table2 <- sharks %>% # pulling the data and stats I want from the sharks dataset 
  summarize(air_Minimum = min(air, na.rm = TRUE),
    air_Mean = mean(air, na.rm = TRUE),
    air_Maximum = max(air, na.rm = TRUE),
    water_Minimum = min(water, na.rm = TRUE),
    water_Mean = mean(water, na.rm = TRUE),
    water_Maximum = max(water, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
    names_to = "Metric",
    values_to = "Value") %>%
  mutate(Variable = case_when(grepl("air", Metric) ~ "Air",
      grepl("water", Metric) ~ "Water"),
    Statistic = case_when(grepl("Minimum", Metric) ~ "Minimum",
      grepl("Mean", Metric) ~ "Mean",
      grepl("Maximum", Metric) ~ "Maximum")) %>%
  select(Variable, Statistic, Value) %>%
  arrange(Variable, Statistic)

kable(summary_table2, # creating the table of the data I want from the sharks dataset of the mean, min and max of the air and water temperatures 
  col.names = c("Variable", "Statistic", "Value"),
  caption = "Summary of Air and Water Temperatures")

Summary of Air and Water Temperatures
Variable	Statistic	Value
Air	Maximum	37.99978
Air	Mean	35.53526
Air	Minimum	33.00454
Water	Maximum	25.98523
Water	Mean	23.02052
Water	Minimum	20.00503

#| label: Another pretty table 

kable(summary_table2, col.names = c("Variable", "Statistic", "Value"), caption = "Summary Air and Water Temperatures") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, bold = TRUE, color = "white", background = "purple4") %>% 
  row_spec(1:6, background = "darkorchid")

Summary Air and Water Temperatures
Variable	Statistic	Value
Air	Maximum	37.99978
Air	Mean	35.53526
Air	Minimum	33.00454
Water	Maximum	25.98523
Water	Mean	23.02052
Water	Minimum	20.00503

#| label: Another Summary created...Depth this time 

depth_summary <- sharks %>% # summarizing the mean, min and max depths of the sharks dataset 
  summarize(Depth_Minimum = min(depth, na.rm = TRUE),
    Depth_Mean = mean(depth, na.rm = TRUE),
    Depth_Maximum = max(depth, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
    names_to = "Statistic",
    values_to = "Value") %>%
  mutate(Statistic = case_when(grepl("Minimum", Statistic) ~ "Minimum",
      grepl("Mean", Statistic) ~ "Mean",
      grepl("Maximum", Statistic) ~ "Maximum"))


kable(depth_summary,col.names = c("Statistic", "Value"), # another table made of the summarized data
  caption = "Summary Statistics for Depth in Sharks Data")

Summary Statistics for Depth in Sharks Data
Statistic	Value
Minimum	44.64474
Mean	50.13864
Maximum	56.82916

#| label: Pretty table of depth data 

kable(depth_summary, col.names = c("Statistic", "Value"), caption = "Summary of Catch Depth") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, bold = TRUE, color = "white", background = "purple4") %>% 
  row_spec(1:3, background = "darkorchid")

Summary of Catch Depth
Statistic	Value
Minimum	44.64474
Mean	50.13864
Maximum	56.82916