library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(effectsize)
library(effsize)
Dataset6.2 <- read_excel("/Users/alexiaprudencio/Desktop/Applied Analytics 1/Assingment 6/Dataset6.2.xlsx")
Dataset6.2 %>%
group_by(Work_Status) %>%
summarise(
Mean = mean(Study_Hours, na.rm = TRUE),
Median = median(Study_Hours, na.rm = TRUE),
SD = sd(Study_Hours, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## Work_Status Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Does_Not_Work 9.62 8.54 7.45 30
## 2 Works 6.41 5.64 4.41 30
hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"],
main = "Histogram of Works Group Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "darkblue",
breaks = 10)
hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"],
main = "Histogram of Does Not Work Group Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightyellow",
border = "darkgoldenrod1",
breaks = 10)
The data on the Works Group Scores appears positively skewed. Most data
is in the left side and the tail on the right. The kurtosis is hard to
define but appears to be abnormal. The data on the No Tutoring Group
Scores also appears positively skewed. The kurtosis is also hard to
define but it looks too tall on the left side. We may need to use a
Mann-Whitney U test.
ggboxplot(Dataset6.2, x = "Work_Status", y = "Study_Hours",
color = "Work_Status",
palette = "jco",
add = "jitter")
The Does not Work Group boxplot appears abnormal. There are dots past
the whiskers; there are two outliers and the data is not normal. The
Works Group boxplot appears normal with most the data points within the
reach of the whiskers. There is one outside, but does not disrupt the
normality of the data as it is close to the top whisker. We may need to
use a Mann-Whitney U test.
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"])
##
## Shapiro-Wilk normality test
##
## data: Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"]
## W = 0.94582, p-value = 0.1305
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"])
##
## Shapiro-Wilk normality test
##
## data: Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"]
## W = 0.83909, p-value = 0.0003695
The data for the Works Group is normal, p-value = 0.1305 > .05. The data for the Does Not Work Group is abnormal, p-value = 0.0003695 < .05. Since the Does Not Work group p-values do not pass the Shapiro-Wilk Test of Normality, we need to use the Mann-Whitney U test to compare the study hours between the two groups.
wilcox.test(Study_Hours ~ Work_Status, data = Dataset6.2)
##
## Wilcoxon rank sum exact test
##
## data: Study_Hours by Work_Status
## W = 569, p-value = 0.07973
## alternative hypothesis: true location shift is not equal to 0
p-value = 0.07973 > .05, this means the results were not significant, which means we do not need to calculate the Effect Size.