library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetB <- read_excel("/Users/alexiaprudencio/Desktop/Applied Analytics 1/Assingment 4/DatasetB.xlsx")

Research Question: What is the relationship between how much a person uses their phone (hours) and how much they sleep (hours)? 1. Descriptive Statistics

mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours) 
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
  1. Histograms & Visually Check Normality
hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "orange",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “ScreenTime” does not appear normally distributed. The data looks positively skewed as most of the data is on the left side, represeting lower hours, while the tails extends to the right. The data does not have a proper bell curve; it looks somehwhat irregular with a large spike at the very begining.

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “SleepingHours” appears normally distributed. It is symmetrical as most of the data is concentrated in the middle around 6-8 hours. The data appears to have a proper bell curve as it is not excessively flat or tall.

  1. Statistically Test Normality
shapiro.test(DatasetB$ScreenTime) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

Shapir-Wilk Output Interpretation The Shapiro-Wilk p-value for the ScreenTime normality test is less than .05 (1.914e-06), so the data is not normal. The Shapiro-Wilk p-value for the SleepingHours normality test is greater than .05 (0.3004), so the data is normal. Since one of the variables (ScreenTime) is not normally distributed (p-value < .05), I will use a Spearman Correlation for this reason.

  1. Test Hypotheses - Conduct Correlation Test
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman", exact = FALSE)
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 2.161e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The Spearman Correlation test was selected because the variable “ScreenTime” was abnormally distributed according to the histogram and the Shapiro-Wilk test (p < .05). The p-value is 2.161e-09, which is significantly below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.55. The correlation is negative, which means as screen time increases, sleeping hours decrease. The correlation value is greater than 0.50, which means the relationship is strong. Note: The argument “exact=FALSE” was added to the code because the data contained “ties” (duplicate values), preventing R from computing an exact p-value otherwise.

  1. Scatterplot to Visualize the Relationship
ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of best fit is pointing to the bottom right. This means the direction of the data is negative. As screen time increases, sleeping hours decrease. The dots follow the line, although they are more spread out than the previous graph. This confirms the strong relationship (rho = -0.55) found in the test. The dots form a straight-line pattern. This means the data is linear. There appear to be a few potential outliers, such as the one high up on the left, but overall the data follows the general downward trend.

  1. Report the Results The ScreenTime (M = 5.063296, SD = 2.056833) was correlated with the SleepingHours (M = 6.938459, SD = 1.351332), ρ(98) = -.55, p = .000. The relationship was negative and strong. As ScreenTime increased, SleepingHours decreased.