https://rpubs.com/anup0stha/1398784

library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(effectsize)
library(effsize)
Dataset6.2 <- read_excel("/Volumes/Anup/SLU/3rd Sem/Applied Analytics/Assignment 6/Dataset6.2.xlsx")

Calculating Descriptive Statistics

Dataset6.2 %>%
  group_by(Work_Status) %>%
  summarise(
    Mean = mean(Study_Hours, na.rm = TRUE),
    Median = median(Study_Hours, na.rm = TRUE),
    SD = sd(Study_Hours, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Work_Status    Mean Median    SD     N
##   <chr>         <dbl>  <dbl> <dbl> <int>
## 1 Does_Not_Work  9.62   8.54  7.45    30
## 2 Works          6.41   5.64  4.41    30
## Creating Histograms for Each Work Status
hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"],
     main = "Histogram of Study Hours of Working Students",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 10)

hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"],
     main = "Histogram of Study Hours of Non Working Students",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 10)

for both of the histograms , the histogram is positively skewed.the data seems abnormal. hence we need to choose Mann-Whitney U test.

Create Boxplots for Each Group

ggboxplot(Dataset6.2, x = "Work_Status", y = "Study_Hours",
          color = "Work_Status",
          palette = "jco",
          add = "jitter")

Interpretation of Box Plot

The Works boxplot appears normal. There are no dots past the whiskers.

The Does not Work boxplot appears abnormal. There are several dots past the whiskers. Although some are very close to the whiskers, some are arguably far away.

We may need to use a Mann-Whitney U test.

Shapiro-Wilk Test of Normality

shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"])
## 
##  Shapiro-Wilk normality test
## 
## data:  Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"]
## W = 0.94582, p-value = 0.1305
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"])
## 
##  Shapiro-Wilk normality test
## 
## data:  Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"]
## W = 0.83909, p-value = 0.0003695

Interpretation of Shapiro-Wilk Test

The data for works was normal (p > .05).

The data for Does Not Work was abnormal (p < .05).

After conducting all three normality tests, it is clear we must use a Mann-Whitney U test.

Mann-Whitney U test

 wilcox.test(Study_Hours ~ Work_Status, data = Dataset6.2)
## 
##  Wilcoxon rank sum exact test
## 
## data:  Study_Hours by Work_Status
## W = 569, p-value = 0.07973
## alternative hypothesis: true location shift is not equal to 0

Mann-Whitney U Output Interpretation

If p > .05 (greater than .05), this means the results were NOT significant.

Report the Results

studying hours of students who works (Mdn = 5.64) were not significantly different from studying hours of student who do not work (Mdn = 8.54), U = 569, p = 0.07973