library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(effectsize)
library(effsize)
Dataset6.2 <- read_excel("C:/Users/DELL/Documents/Applied Analytics/Assignment6/Dataset6.2-2.xlsx")
Dataset6.2 %>%
group_by(Work_Status) %>%
summarise(
Mean = mean(Study_Hours, na.rm = TRUE),
Median = median(Study_Hours, na.rm = TRUE),
SD = sd(Study_Hours, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## Work_Status Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Does_Not_Work 9.62 8.54 7.45 30
## 2 Works 6.41 5.64 4.41 30
hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"],
main = "Histogram of Study hours of student who does not work",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 10)
hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"],
main = "Histogram of Study hours of student who works",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 10)
For the study hours of students who does not work, the data appears positively skewed. It is difficult to state the exact kurtosis, but it appears abnormal. For the study hours of students who works, the data appears positively skewed. It is difficult to state the exact kurtosis, but it appears abnormal. We may need to use a Mann-Whitney U test.
ggboxplot(Dataset6.2, x = "Work_Status", y = "Study_Hours",
color = "Work_Status",
palette = "jco",
add = "jitter")
The study hours of students who does not work boxplot appears abnormal. There are few dots past the whiskers. Although some are very close to the whiskers, some are arguably far away. The study hours of students who works boxplot appears normal. We may need to use a Mann-Whitney U test.
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"])
##
## Shapiro-Wilk normality test
##
## data: Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"]
## W = 0.83909, p-value = 0.0003695
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"])
##
## Shapiro-Wilk normality test
##
## data: Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"]
## W = 0.94582, p-value = 0.1305
The data for study hours of students who does not work was not normal (p < .05). The data for study hours of students who works was normal (p > .05). After conducting all three normality tests, it is clear we must use a Mann-Whitney U test.
wilcox.test(Study_Hours ~ Work_Status, data = Dataset6.2)
##
## Wilcoxon rank sum exact test
##
## data: Study_Hours by Work_Status
## W = 569, p-value = 0.07973
## alternative hypothesis: true location shift is not equal to 0
Study Hours of student Does_Not_Work (Mdn = 8.54) were significantly different from Study Hours of student who works (Mdn = 5.64). The p = 0.07973 which is greater than 0.05, this means the results were NOT significant. So the effect size was not calculated.