library(readxl)
library(ggpubr)
## Loading required package: ggplot2
Loaded required Packages
PART -1 IMPORT THE DATASET A
DatasetA <- read_excel("C:/Users/spvar/Downloads/DatasetA.xlsx")
PART -2 DESCRIPTIVE STATISTICS
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
sd(DatasetA$ExamScore)
## [1] 6.795224
mean(DatasetA$ExamScore)
## [1] 90.06906
The mean of the independent variable StudyHours is 6.135609 and the SD is 1.369224
The mean of the dependent variable Examscore is 90.06906 and the SD is 6.795224
PART-3 CHECK NORMALITY
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “StudyHours” appears approximately normally distributed.
The histogram is fairly symmetrical with most values clustered around
the center, indicating low skewness and appropriate kurtosis.
hist(DatasetA$ExamScore,
main = "ExamScore",
breaks = 20,
col = "lightgreen",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ExamScore” appears slightly skewed. Although most values
cluster toward higher scores, the distribution does not perfectly form a
bell curve, suggesting some skewness and non-ideal kurtosis.
PART - 4 CORRELATION ANALYSIS
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
The Shapiro-Wilk p-value for StudyHours was greater than .05, indicating that the data for study hours is normally distributed. The Shapiro-Wilk p-value for ExamScore was less than .05, indicating that the data for exam scores is not normally distributed.
PART-5 SCATTERPLOTS
cor.test(DatasetA$StudyHours,DatasetA$ExamScore, method ="spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
A Spearman Correlation was selected because at least one variable (ExamScore) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant.The alternate hypothesis is supported. The rho value was .90. The correlation is positive, which means as Study Hours increases, Exam Score increases, indicating a strong positive relationship between study hours and exam scores.
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
The line of best fit is pointing to the top right. This means the
diretion of the data is positive. As Study Hours increases, Exam Score
increases.The dots closely hug the line indicating the strong
relationship between the variables.The dots form a straight-line
pattern. This means the data is linear.There is may be a outlier,
however the dot is towards the center of the line of best fit.
Therefore, it does not appear to impact the relationship between the
independent and dependent variables.
PART- 6 REPORT THE RESULTS
Study Hours and Exam Score: The independent variable, study hours (M = 6.14, SD = 1.37), was correlated with the dependent variable, exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p = .000. The relationship was positive and strong. As study hours increased, exam scores increased.
PART 1 : IMPORT THE DATA SET B
DatasetB <- read_excel("C:/Users/spvar/Downloads/DatasetB.xlsx")
PART 2: DESCRIPTIVE STATISTICS
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
The mean of the independent variable Screen Time is 5.063296 and the SD is 2.056833
The mean of the dependent variable Sleeping Hours is 6.938459 and the SD is 1.351332
PART-3 CHECK NORMALITY
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "orange",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
PART 4: CORRELATION ANALYSIS
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
The Shapiro-Wilk p-value for ScreenTime was less than .05, indicating that the data for screen time is not normally distributed. The Shapiro-Wilk p-value for SleepingHours was greater than .05, indicating that the data for sleeping hours is normally distributed.
PART-5 SCATTERPLOTS
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
A Spearman Correlation was selected because at least one variable (ScreenTime) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant. The rho value was -.55, indicating a moderate negative relationship between screen time and sleeping hours.
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of bestfit is pointing downward from left to right.This indicates that the direction of the data is negative. As Screen Time increases, Sleeping Hours decreases. The dots closely hug the best fit libe, indicating a strong relationship between variables.The dots form a straight-line pattern. This means the data is linear. There may be a outlier, however the dots closely hug the line. So there is no impact on the relationship between the variables.
PART 6 : REPORT THE RESULTS
Screen Time and Sleeping Hours: The independent variable, screen time (M = 5.06, SD = 2.06), was correlated with the dependent variable, sleeping hours (M = 6.94, SD = 1.35), ρ(98) = −.55, p = .000. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.