library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetB <- read_excel("/Users/alexiaprudencio/Desktop/Applied Analytics 1/Assingment 4/DatasetB.xlsx")
Research Question: What is the relationship between how much a person uses their phone (hours) and how much they sleep (hours)? 1. Descriptive Statistics
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "orange",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ScreenTime” does not appear normally distributed. The data
looks positively skewed as most of the data is on the left side,
represeting lower hours, while the tails extends to the right. The data
does not have a proper bell curve; it looks somehwhat irregular with a
large spike at the very begining.
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “SleepingHours” appears normally distributed. It is
symmetrical as most of the data is concentrated in the middle around 6-8
hours. The data appears to have a proper bell curve as it is not
excessively flat or tall.
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
Shapir-Wilk Output Interpretation The Shapiro-Wilk p-value for the ScreenTime normality test is less than .05 (1.914e-06), so the data is not normal. The Shapiro-Wilk p-value for the SleepingHours normality test is greater than .05 (0.3004), so the data is normal. Since one of the variables (ScreenTime) is not normally distributed (p-value < .05), I will use a Spearman Correlation for this reason.
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman", exact = FALSE)
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 2.161e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The Spearman Correlation test was selected because the variable “ScreenTime” was abnormally distributed according to the histogram and the Shapiro-Wilk test (p < .05). The p-value is 2.161e-09, which is significantly below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.55. The correlation is negative, which means as screen time increases, sleeping hours decrease. The correlation value is greater than 0.50, which means the relationship is strong. Note: The argument “exact=FALSE” was added to the code because the data contained “ties” (duplicate values), preventing R from computing an exact p-value otherwise.
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of best fit is pointing to the bottom right. This means the
direction of the data is negative. As screen time increases, sleeping
hours decrease. The dots follow the line, although they are more spread
out than the previous graph. This confirms the strong relationship (rho
= -0.55) found in the test. The dots form a straight-line pattern. This
means the data is linear. There appear to be a few potential outliers,
such as the one high up on the left, but overall the data follows the
general downward trend.