install.packages("readxl") # Install the necessary libraries for reading Excel file
Installing package into 'C:/Users/renad/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
package 'readxl' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\renad\AppData\Local\Temp\Rtmp2DlQSQ\downloaded_packages
library(readxl) # Load the Library
Warning: package 'readxl' was built under R version 4.4.2
sharks_data <-read_excel("C:/Users/renad/Desktop/NTU/1 Study/1 slides/1 Rsrch methods and data analyis/summative R/data files/sharks.xlsx")sharksub_data <-read_excel("C:/Users/renad/Desktop/NTU/1 Study/1 slides/1 Rsrch methods and data analyis/summative R/data files/sharksub.xlsx") # Use the read_excel() function to load the data. Note excel files path
names (sharksub_data) # a quick view of the variables names
[1] "ID" "sex" "blotch1" "blotch2"
Sharksdata wrangling
library (dplyr) # loading dplyr package to use the group_by function
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Grouping by sex, calculating means, and counting each group
# Grouping and summarizingsharks_data_grouped <- sharks_data %>%group_by(sex) %>%summarize(mean_air_temp =mean(air, na.rm =TRUE),mean_water_temp =mean(water, na.rm =TRUE),count =n() # Count of observations in each group )
print(sharks_data_grouped) # View the grouped summary
# A tibble: 2 × 4
sex mean_air_temp mean_water_temp count
<chr> <dbl> <dbl> <int>
1 Female 35.5 23.1 236
2 Male 35.6 22.9 264
Results
the number of female sharks = 236
the number of male sharks = 264
The average air temperature for female sharks was 35.48716°C, and the average water temperature was 23.10948°C
The average air temperature for male sharks was 35.57826°C, and the average water temperature was 22.94100°C
sharksub_data wrangling
Is there a correlation between the variables air and water?
air_water_data <-select(sharks_data, air, water) # Create a new table with only air and water columns
write.csv(air_water_data, "air_water_data.csv", row.names =FALSE) # save as csv file
Normality of data
shapiro.test(sharks_data$air)
Shapiro-Wilk normality test
data: sharks_data$air
W = 0.95885, p-value = 1.338e-10
shapiro.test(sharks_data$water) # Check for normality to determine whether to use Pearson's correlation (for normally distributed data) or Spearman's correlation (for non-normally distributed data)
Shapiro-Wilk normality test
data: sharks_data$water
W = 0.96035, p-value = 2.371e-10
Tip
If the p-value from the Shapiro-Wilk test is < 0.05, the data is not normally distributed.
p-value for air = 1.338e-10 and for water = 2.371e-10 which indicates a statistically significant result. The data is not normally distributed (reject the null hypothesis of normality) > we’ll use Spearman’s correlation
cor.test(air_water_data$air, air_water_data$water, method ="spearman") # testing the correlation
Spearman's rank correlation rho
data: air_water_data$air and air_water_data$water
S = 22007692, p-value = 0.2082
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.05637344
Tip
rho (Spearman’s rank correlation coefficient) measures the strength and direction of the monotonic relationship between two variables
rho = 1 : Perfect positive monotonic relationship (as one variable increases, the other increases).
rho = -1: Perfect negative monotonic relationship (as one variable increases, the other decreases).
rho = 0 : No monotonic relationship between the variables.
Tip
Reject H₀: If the p-value is less than or equal to 0.05 > there is significant evidence to reject the null hypothesis in favor of the alternative hypothesis.
Fail to Reject H₀: If the p-value is greater than 0.05 > there is insufficient evidence to reject the null hypothesis.
Results
p-value = 0.2082 > more than 0.05 > not enough evidence to reject > no significant difference
rho = -0.05637344 > a very weak negative relationship between air and water (very weak correlation)
This means that as air increases, water slightly tends to decrease, but the effect is very small
Conclusion
There is no strong or significant monotonic relationship between air temperature and water temperature.
Visualizing
library(ggplot2) # Load required libraries
Warning: package 'ggplot2' was built under R version 4.4.2
ggplot(air_water_data, aes(x = air, y = water)) +geom_point(color ="blue") +ggtitle("Correlation between air and water") +xlab("air") +ylab("water") +theme_minimal() # Scatterplot
i dont think i can do it this easily i must wrangle the data properly first, but at least i kinda know how to answer the correlation question :)