library(readxl)
library(ggpubr)
## Loading required package: ggplot2
Loading DATASET A
DatasetA <- read_excel("C:/Users/datta/Downloads/DatasetA.xlsx")
In the datasetA, Independant variable is StudyHours and Dependant Variable is ExamScore
Descriptive Statistics:
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
Study Hours: Mean = 6.14, SD = 1.37 Exam Score: Mean = 90.07, SD = 6.80
Part 3: Checking Normality
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
StudyHours appears normally distributed. The histogram is approximately
symmetrical with most values in the center and a bell-shaped curve.
hist(DatasetA$ExamScore,
main = "ExamScores",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
ExamScore appears slightly non-normal. The histogram shows some skewness and does not form a perfectly symmetrical bell curve.
Checking Normality Stastistically:
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
The Shapiro-Wilk p-value for StudyHours was greater than .05 (p = .935), indicating the data is normally distributed. The Shapiro-Wilk p-value for ExamScore was less than .05 (p = .006), indicating the data is not normally distributed.
Part 4: Correlation Analysis A Spearman correlation was selected because at least one variable in each dataset violated normality assumptions according to the histograms and Shapiro-Wilk tests
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The Spearman correlation was statistically significant as the p-value is less than 0.05. Alternate hypothesis is supported. The rho value was .90, indicating a strong, positive relationship. As study hours increased, exam scores increased.
Part 5: Scatterplots
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
The line of best fit points upwards, indicating a positive relationship.
The points closely follow the line, suggesting a strong relationship
between the variables. As Study hours increases, the Exam score also
increased. The pattern is linear, and no extreme outliers are observed.
So there is no outlier that is impacting the relationship between the
variables.
Part 6: Report the Results The independent variable, StudyHours (M = 6.14, SD = 1.37), was correlated with the dependent variable, ExamScore (M = 90.07, SD = 6.80), ρ(98) = .90, p < .001. The relationship was positive and strong. As study hours increased, exam scores increased.
Loading DATASET B
DatasetB <- read_excel("C:/Users/datta/Downloads/DatasetB.xlsx")
For the Dataset B, the Independant variable Screen Time and Dependant Variable is Sleeping Hours
Descriptive Statistics:
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
ScreenTime: Mean = 5.06, SD = 2.0568 SleepingHours: Mean = 6.9384, SD = 1.3513
Part 3: Check Normality
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
ScreenTime appears abnormally distributed. The histogram shows noticeable skewness with values clustering toward one side.
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
SleepingHours appears normally distributed. The histogram is symmetrical with a clear bell-shaped curve.
Checking Normality Statistically:
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
The Shapiro-Wilk p-value for ScreenTime was less than .05 (p < .001), indicating the data is not normally distributed. The Shapiro-Wilk p-value for SleepingHours was greater than .05 (p = .300), indicating the data is normally distributed.
Part 4: Correlation Analysis A Spearman correlation was selected because at least one variable in each dataset violated normality assumptions according to the histograms and Shapiro-Wilk tests.
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The p-value is less than 0.05, so Spearman correlation was statistically significant. The Alternate hypothesis is supported. The rho value was -0.55, indicating a strong, negative relationship. The correlation between variables is negative. As Screen time is increased, Sleeping Hours is decreased.
Part 5: Scatterplots
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of best fit is pointing downward from left to right. This
indicates that the direction of the data is negative. As Screen Time
increases, Sleeping Hours decreaseS. The dots closely hug the best fit
line, indicating a strong relationship between variables. The dots form
a straight-line pattern. This means the data is linear. There may be a
outlier, however the dots closely hug the line, so there is no impact on
the relationship between the variables.
Part 6: Report the Results The independent variable, ScreenTime (M = 5.06, SD = 2.06), was correlated with the dependent variable, SleepingHours (M = 6.94, SD = 1.35), ρ(98) = −.55, p < .001. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.