Open the Installed Packages

library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(effectsize)
library(effsize)

Import and Name Dataset

Dataset6.2 <- read_excel("C:/Users/DELL/Documents/Applied Analytics/Assignment6/Dataset6.2-2.xlsx")

Calculate Descriptive Statistics for Each Group

Dataset6.2 %>%
  group_by(Work_Status) %>%
  summarise(
    Mean = mean(Study_Hours, na.rm = TRUE),
    Median = median(Study_Hours, na.rm = TRUE),
    SD = sd(Study_Hours, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Work_Status    Mean Median    SD     N
##   <chr>         <dbl>  <dbl> <dbl> <int>
## 1 Does_Not_Work  9.62   8.54  7.45    30
## 2 Works          6.41   5.64  4.41    30

Create Histograms for Each Group

hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"],
     main = "Histogram of Study hours of student who does not work",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 10)

hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"],
     main = "Histogram of Study hours of student who works",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 10)

For the study hours of students who does not work, the data appears positively skewed. It is difficult to state the exact kurtosis, but it appears abnormal. For the study hours of students who works, the data appears positively skewed. It is difficult to state the exact kurtosis, but it appears abnormal. We may need to use a Mann-Whitney U test.

Create Boxplots for Each Group

ggboxplot(Dataset6.2, x = "Work_Status", y = "Study_Hours",
          color = "Work_Status",
          palette = "jco",
          add = "jitter")

The study hours of students who does not work boxplot appears abnormal. There are few dots past the whiskers. Although some are very close to the whiskers, some are arguably far away. The study hours of students who works boxplot appears normal. We may need to use a Mann-Whitney U test.

Shapiro-Wilk Test of Normality for study hours of students who does not work

shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"])
## 
##  Shapiro-Wilk normality test
## 
## data:  Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"]
## W = 0.83909, p-value = 0.0003695

Shapiro-Wilk Test of Normality for study hours of students who works

shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"])
## 
##  Shapiro-Wilk normality test
## 
## data:  Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"]
## W = 0.94582, p-value = 0.1305

The data for study hours of students who does not work was not normal (p < .05). The data for study hours of students who works was normal (p > .05). After conducting all three normality tests, it is clear we must use a Mann-Whitney U test.

Conduct Mann-Whitney U

wilcox.test(Study_Hours ~ Work_Status, data = Dataset6.2)
## 
##  Wilcoxon rank sum exact test
## 
## data:  Study_Hours by Work_Status
## W = 569, p-value = 0.07973
## alternative hypothesis: true location shift is not equal to 0

Report Results

Study Hours of student Does_Not_Work (Mdn = 8.54) were significantly different from Study Hours of student who works (Mdn = 5.64). The p = 0.07973 which is greater than 0.05, this means the results were NOT significant. So the effect size was not calculated.