Is there a correlation between the variables air and water?
Code
# Set CRAN mirroroptions(repos =c(CRAN ="https://cran.rstudio.com/"))# Install and load necessary packagesinstall.packages("tidyverse")
Installing package into 'C:/Users/sanke/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
package 'tidyverse' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\sanke\AppData\Local\Temp\RtmpiqP3bM\downloaded_packages
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(dplyr)install.packages("readxl")
Installing package into 'C:/Users/sanke/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
package 'readxl' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\sanke\AppData\Local\Temp\RtmpiqP3bM\downloaded_packages
Code
library(readxl)sharks <-read_excel("C:\\Users\\sanke\\OneDrive - Nottingham Trent University\\RMDA\\sharks.xlsx")# Summarise and inspect the datasetsharks %>%summary()
ID sex blotch BPM
Length:500 Length:500 Min. :30.78 Min. :119.0
Class :character Class :character 1st Qu.:34.16 1st Qu.:129.0
Mode :character Mode :character Median :35.05 Median :142.0
Mean :35.13 Mean :141.8
3rd Qu.:36.05 3rd Qu.:153.2
Max. :40.08 Max. :166.0
weight length air water
Min. : 65.10 Min. :128.3 Min. :33.00 Min. :20.01
1st Qu.: 75.68 1st Qu.:172.0 1st Qu.:34.42 1st Qu.:21.55
Median : 87.82 Median :211.1 Median :35.43 Median :23.11
Mean : 87.94 Mean :211.0 Mean :35.54 Mean :23.02
3rd Qu.:100.40 3rd Qu.:251.8 3rd Qu.:36.71 3rd Qu.:24.37
Max. :110.94 Max. :291.0 Max. :38.00 Max. :25.99
meta depth
Min. : 50.03 Min. :44.64
1st Qu.: 67.39 1st Qu.:48.90
Median : 82.45 Median :50.14
Mean : 82.04 Mean :50.14
3rd Qu.: 95.97 3rd Qu.:51.35
Max. :112.45 Max. :56.83
Code
# Check the first few rows of the datasethead(sharks)
# Load ggplot2 librarylibrary(ggplot2)# Create a customised scatter graph with themeggplot(sharks, aes(x = air, y = water)) +geom_point(color ="lightgreen", size =3, shape =16, alpha =0.6) +geom_smooth(method ="lm", se =FALSE, color ="red", linetype ="dashed") +labs(title ="Relationship between air and water temperatures", x ="Air temperature (°C)", y ="Water temperature (°C)")
`geom_smooth()` using formula = 'y ~ x'
Code
# Calculate correlation between air and watercor.test(sharks$air, sharks$water)
Pearson's product-moment correlation
data: sharks$air and sharks$water
t = -1.2346, df = 498, p-value = 0.2176
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.14224207 0.03260803
sample estimates:
cor
-0.05524051
These results indicate that there is no statistically significant difference between ambient air temperature and surface water temperature.
Does multiple capture have an effect on blotching time?
ID sex blotch1 blotch2
Length:50 Length:50 Min. :32.49 Min. :33.47
Class :character Class :character 1st Qu.:34.38 1st Qu.:35.31
Mode :character Mode :character Median :34.94 Median :35.94
Mean :35.03 Mean :35.96
3rd Qu.:35.90 3rd Qu.:36.78
Max. :37.07 Max. :38.18
library(ggplot2)library(tidyr) # For gathering the data into long format# Convert data from wide to long format for ggplot2 compatibilitysharksub_long <- sharksub %>%gather(key ="blotch_type", value ="time", blotch1, blotch2)# Boxplot of blotch1 vs blotch2ggplot(sharksub_long, aes(x = blotch_type, y = time, fill = blotch_type)) +geom_boxplot() +labs(title ="Comparison of blotching times",x ="Blotches",y ="Time (seconds)") +theme_minimal() +scale_fill_manual(values =c("lightblue", "lightgreen"))
Code
# Perform independent t-test between blotch1 and blotch2t.test(sharksub$blotch1, sharksub$blotch2)
Welch Two Sample t-test
data: sharksub$blotch1 and sharksub$blotch2
t = -4.1143, df = 97.658, p-value = 8.113e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.3782038 -0.4812731
sample estimates:
mean of x mean of y
35.03042 35.96016
These results indicate that multiple capture has a statistically significant effect on blotching time in that blotching is increased when the sharks were captured more than once.
Is it possible to predict blotching time?
Blotch vs. Depth
Code
sharks <-read_excel("C:\\Users\\sanke\\OneDrive - Nottingham Trent University\\RMDA\\sharks.xlsx")# Perform linear regressionmodel <-lm(depth ~ blotch, data = sharks)# Output the regression summarysummary(model)
Call:
lm(formula = depth ~ blotch, data = sharks)
Residuals:
Min 1Q Median 3Q Max
-4.3570 -0.9453 -0.0124 0.9863 4.7997
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.63435 1.56040 9.379 <2e-16 ***
blotch 1.01079 0.04439 22.772 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.415 on 498 degrees of freedom
Multiple R-squared: 0.5101, Adjusted R-squared: 0.5091
F-statistic: 518.6 on 1 and 498 DF, p-value: < 2.2e-16
Code
# Plotting the data and regression lineggplot(sharks, aes(x = blotch, y = depth)) +geom_point(color ="#1f78b4", size =3, alpha =0.7) +# Scatter plot with color and transparencygeom_smooth(method ="lm", se =TRUE, color ="red", size =1) +# Regression line with confidence intervallabs(title ="Relationship between blotching and depth",x ="Blotch (seconds)",y ="Depth (metres)") +theme_minimal() +# Minimal theme for cleaner looktheme(plot.title =element_text(size =16, face ="bold", hjust =0.5),axis.title =element_text(size =14),axis.text =element_text(size =12),panel.grid.major =element_line(color ="grey", linetype ="dashed", size =0.5),panel.grid.minor =element_blank() ) +theme(legend.position ="none") # Remove legend (not needed for this plot)
`geom_smooth()` using formula = 'y ~ x'
Code
# Compute the Pearson correlation coefficient between 'blotch' and 'depth'cor.test(sharks$blotch, sharks$depth)
Pearson's product-moment correlation
data: sharks$blotch and sharks$depth
t = 22.772, df = 498, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.6683963 0.7546509
sample estimates:
cor
0.7142247
Call:
lm(formula = meta ~ blotch, data = sharks)
Residuals:
Min 1Q Median 3Q Max
-31.856 -14.556 0.426 13.857 30.687
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 86.1257 19.2463 4.475 9.48e-06 ***
blotch -0.1162 0.5475 -0.212 0.832
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.46 on 498 degrees of freedom
Multiple R-squared: 9.051e-05, Adjusted R-squared: -0.001917
F-statistic: 0.04508 on 1 and 498 DF, p-value: 0.8319
Code
ggplot(sharks, aes(x = blotch, y = meta)) +geom_point(color ="#1f78b4", size =3, alpha =0.7) +geom_smooth(method ="lm", se =TRUE, color ="red", size =1) +labs(title ="Relationship between blotching and cortisol levels",x ="Blotch (seconds)",y ="Cortisol levels (mcg/dL)") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold", hjust =0.5),axis.title =element_text(size =14),axis.text =element_text(size =12),panel.grid.major =element_line(color ="grey", linetype ="dashed", size =0.5),panel.grid.minor =element_blank() ) +theme(legend.position ="none")
`geom_smooth()` using formula = 'y ~ x'
Code
# Compute the Pearson correlation coefficient between 'blotch' and 'cortisol levels'cor.test(sharks$blotch, sharks$meta)
Pearson's product-moment correlation
data: sharks$blotch and sharks$meta
t = -0.21232, df = 498, p-value = 0.8319
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.09712341 0.07824201
sample estimates:
cor
-0.009513855
Blotch vs. Sex
Code
# Boxplot to compare blotching between male and female sharksboxplot(sharks$blotch ~ sharks$sex, data = sharks,main ="Comparison of blotching in sharks by sex",xlab ="Sex",ylab ="Blotching (seconds)",col =c("lightpink", "lightblue"))
Code
# T-test to compare blotching between sexest_test_result <-t.test(sharks$blotch ~ sharks$sex, sharks = df)print(t_test_result)
Welch Two Sample t-test
data: sharks$blotch by sharks$sex
t = -3.0282, df = 494.67, p-value = 0.002589
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
-0.6322714 -0.1346620
sample estimates:
mean in group Female mean in group Male
34.92294 35.30641
These results indicate that there is statistically significant difference between blotching times in male and female sharks.