library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(effectsize)
library(effsize)
  1. Data Source
Dataset6.2 <- read_excel("/Users/alexiaprudencio/Desktop/Applied Analytics 1/Assingment 6/Dataset6.2.xlsx")
  1. Descriptive Statistics for Each Group
Dataset6.2 %>%
  group_by(Work_Status) %>%
  summarise(
    Mean = mean(Study_Hours, na.rm = TRUE),
    Median = median(Study_Hours, na.rm = TRUE),
    SD = sd(Study_Hours, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Work_Status    Mean Median    SD     N
##   <chr>         <dbl>  <dbl> <dbl> <int>
## 1 Does_Not_Work  9.62   8.54  7.45    30
## 2 Works          6.41   5.64  4.41    30
  1. Histograms for Each Group
hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"],
     main = "Histogram of Works Group Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "darkblue",
     breaks = 10)

hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"],
     main = "Histogram of Does Not Work Group Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightyellow",
     border = "darkgoldenrod1",
     breaks = 10)

The data on the Works Group Scores appears positively skewed. Most data is in the left side and the tail on the right. The kurtosis is hard to define but appears to be abnormal. The data on the No Tutoring Group Scores also appears positively skewed. The kurtosis is also hard to define but it looks too tall on the left side. We may need to use a Mann-Whitney U test.

  1. Boxplots for Each Group
ggboxplot(Dataset6.2, x = "Work_Status", y = "Study_Hours",
          color = "Work_Status",
          palette = "jco",
          add = "jitter")

The Does not Work Group boxplot appears abnormal. There are dots past the whiskers; there are two outliers and the data is not normal. The Works Group boxplot appears normal with most the data points within the reach of the whiskers. There is one outside, but does not disrupt the normality of the data as it is close to the top whisker. We may need to use a Mann-Whitney U test.

  1. Shapiro-Wilk Test of Normality
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"])
## 
##  Shapiro-Wilk normality test
## 
## data:  Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"]
## W = 0.94582, p-value = 0.1305
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"])
## 
##  Shapiro-Wilk normality test
## 
## data:  Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"]
## W = 0.83909, p-value = 0.0003695

The data for the Works Group is normal, p-value = 0.1305 > .05. The data for the Does Not Work Group is abnormal, p-value = 0.0003695 < .05. Since the Does Not Work group p-values do not pass the Shapiro-Wilk Test of Normality, we need to use the Mann-Whitney U test to compare the study hours between the two groups.

  1. Conduct Inferential Test - Mann-Whitney U test
wilcox.test(Study_Hours ~ Work_Status, data = Dataset6.2)
## 
##  Wilcoxon rank sum exact test
## 
## data:  Study_Hours by Work_Status
## W = 569, p-value = 0.07973
## alternative hypothesis: true location shift is not equal to 0

p-value = 0.07973 > .05, this means the results were not significant, which means we do not need to calculate the Effect Size.

  1. Report the Results Works Group ((Mdn = 5.64) was not significantly different from Group 2 (Mdn = 5.64), U = 569, p = 0.07973.