Assignment 4 DataSetC

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

DatasetC <- read_excel("C:/Users/yenug/Downloads/DatasetC.xlsx")

IV = Height(inches) DV = Weight(pounds)

mean(DatasetC$inches)

## [1] 69.27122

sd(DatasetC$inches)

## [1] 2.738448

mean(DatasetC$pounds)

## [1] 195.7736

sd(DatasetC$pounds)

## [1] 29.0096

hist(DatasetC$inches,
     main = "Height(Inches)",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetC$pounds,
     main = "Weight(Pounds)",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “Inches” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.

The variable “Pounds” appears NOT normally distributed. The data looks Positively skewed (most data is on the left).

shapiro.test(DatasetC$inches)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetC$inches
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetC$pounds)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetC$pounds
## W = 0.97289, p-value = 0.03691

The Shaprio-Wilk p-value for Inches normality test is greater than .05 (.93), so the data is normal.

The Shapiro-Wilk p-value for the Pounds normality test is less than .05 (.03), so the data is NOT normal.

cor.test(DatasetC$inches, DatasetC$pounds, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetC$inches and DatasetC$pounds
## S = 166000, p-value = 0.9693
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.00390039

The Spearman Correlation test was selected because one of both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests.

The p-value (probability value) is 0.9693, which is above .05. This means the results are NOT statistically significant. The Null hypothesis is supported.

The rho-value is 0.00390039.

The correlation is positive, which means as height increases, weight increases but in this case there is no relationship between the variables.

The correlation value falls within 0.00 and 0.09, which means there is no relationship.

ggscatter(
  DatasetC,
  x = "inches",
  y = "pounds",
  add = "reg.line",
  xlab = "inches",
  ylab = "pounds"
)

The line of best fit is horizontal. This means there is no relationship between the variables.

The dots are randomly around the line. This means there is a no relationship between the variables.

The dots form a curved pattern. This means the data is non-linear.

There is possibly one outlier.

library(rmarkdown)

Assignment 4 DataSetC

Madhuri Yenuganti

2026-02-09