library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetC <- read_excel("C:/Users/tejas/Downloads/DatasetC-2.xlsx")

Independent variable - inches and Dependent variable - pounds

mean(DatasetC$inches)
## [1] 69.27122
sd(DatasetC$inches)
## [1] 2.738448

Mean - 69.27122 SD - 2.738448

mean(DatasetC$pounds)
## [1] 195.7736
sd(DatasetC$pounds)
## [1] 29.0096

Mean - 195.7736 SD - 29.0096

hist(DatasetC$inches,
     main = "inches",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetC$pounds,
     main = "pounds",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “inches” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve. The variable “pounds” appears normally distributed. The data looks positively skewed (most data is on the left). The data also appears to have a proper bell curve.

shapiro.test(DatasetC$inches) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetC$inches
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetC$pounds)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetC$pounds
## W = 0.97289, p-value = 0.03691

The Shaprio-Wilk p-value for inches normality test is greater than .05 (0.93), so the data is normal. The Shapiro-Wilk p-value for the pounds normality test is less than .05 (.03), so the data is not normal.

cor.test(DatasetC$inches, DatasetC$pounds, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetC$inches and DatasetC$pounds
## S = 166000, p-value = 0.9693
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.00390039

The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 0.96, which is greater than .05. This means the results are statistically not significant. The null hypothesis is supported. The rho-value is 0.00390039. The correlation is positive, which means as inches increases, pounds increases. The correlation value is less than 0.50, which means no relationship.

ggscatter(
  DatasetC,
  x = "inches",
  y = "pounds",
  add = "reg.line",
  xlab = "inches",
  ylab = "pounds"
)

The line is horizontal, this indicates no relationship. Specifically, there is no relationship between the independent and dependent variables. As age increases, USD increases. The dots loosely hug the line. This means there is a weak relationship between the variables. The dots form a curved pattern, the relationship is non-linear. If the relationship is non-linear, a Spearman Correlation should be used. There is possibly one outlier. However, the dot is towards the center of the line of best fit. Therefore, it does not appear to impact the relationship between the independent and dependent variables.