library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(rmarkdown)
What is the relationship between how much students study (hours) and their exam score (percentage)?
DatasetA <- read_excel("/Users/anupshrestha/Downloads/DatasetA.xlsx")
Mean and SD of Independent Variable StudyHours for DatasetA
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
Mean and SD of Dependent Variable ExamScore for DatasetA
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
#Part 3: Check Normality Histogram for Independent Variable StudyHours for DatasetA
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.
Histogram for Dependent Variable ExamScore for DatasetA
hist(DatasetA$ExamScore,
main = "ExamScore",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ExamScore” does not appears normally distributed.The data is negatively skewed. The data does not look symmetrical . The data does not appears to have a proper bell curve(too tall).
#Part3: Shapiro-Wilk test Shapiro-Wilk test for DatasetA
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
since p-value for dependent variable is 0.006465 which is less than .05, spearman correlatioon should be used
#Part4: Correlation Analysis
Correlation analysis for DatasetA
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is .2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.9008825. The correlation is positive, which means as Study Hours increases, Exam Score increases. The correlation value is greater -0.50, which means the relationship is strong.
#part5: Scatterplots Creating a Scatterplot for DatasetA
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
The line of best fit is pointing to the top right. This means the diretion of the data is positive. As StudyHours increases, ExamScore increases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is possibly no outlier
study Hours (M = 6.14, SD = 1.37) was correlated with Exam Score (M = 90.07, SD = 6.80), ρ(28) = 0.9008825, p = 2.2e-16(very close to 0). The relationship was positive and strong. As the study hour increased, the exam score increased
#Part6: Research Question 1:What is the relationship between how much students study (hours) and their exam score (percentage)?
Means and standard deviations
Mean study hours is 6.135609
Standard deviation is 1.369224
mean exam score is 90.06906
standard deviation is 6.795224
Correlation coefficient (r or ρ) is 0.9008825
p-value is 2.2e-16
the strengtn of relationship is strong and the direction is positive
What is the relationship between how much a person uses their phone (hours) and how much they sleep (hours)?
DatasetB <- read_excel("/Users/anupshrestha/Downloads/DatasetB.xlsx")
Mean and SD of Independent Variable ScreenTime for DatasetB
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
Mean and SD of Dependent Variable SleepingHours for DatasetB
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
#Part 3: Check Normality
Histogram for Independent Variable ScreenTime for DatasetB
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ScreenTime” does not appears normally distributed.The data is positively skewed. The data does not look symmetrical . The data does not appears to have a proper bell curve.
Histogram for Independent Variable ScreenTime for DatasetB
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “SleepingHours” does not appears normally distributed.The data is negatively skewed. The data does not look symmetrical . The data does not appears to have a proper bell curve(too flat).
#Part3: Shapiro-Wilk test Shapiro-Wilk test for DatasetB
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
since p-value for independent variable is 1.914e-06 which is less than .05, spearman correlatioon should be used
#Part4: Correlation Analysis
Correlation analysis for DatasetB
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The Spearman Correlation test was selected because independent variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests.
The p-value (probability value) is 3.521e-09, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.5544674. The correlation is negative, which means as ScreenTime increases, hours sleeping decreases. The correlation value is greater -0.50, which means the relationship is strong.
#part5: Scatterplots
Creating a Scatterplot for DatasetB
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of best fit is pointing to the bottom right. This means the diretion of the data is negative. As ScreenTime increases, SleepingHour Decreases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is possibly no outlier
Screen Time (M = 5.063296, SD = 2.056833) was correlated with Sleeping Hours (M = 6.938459, SD = 1.351332), ρ() = -0.5544674, p = 3.521e-09 (very close to 0). The relationship was negetive and strong. As the screen time increased, the sleeping hour increased
#Part6: Research Question 2:What is the relationship between how much a person uses their phone (hours) and how much they sleep (hours)?
Means and standard deviations
Mean screen time is 5.063296
Standard deviation is 2.056833
mean sleeping hours is 6.938459
standard deviation is 1.351332
Correlation coefficient (r or ρ) is 0.5544674
p-value is 3.521e-09
the strengtn of relationship is strong and the direction is negative