#Open the Installed Packages

library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(effectsize)
library(effsize)

#Import and Name Dataset

Dataset6.2 <- read_excel('/Users/atharvapitke/Documents/Analytics/Assignment6/Dataset6.2-2.xlsx')

#Calculate Descriptive Statistics for Each Group

Dataset6.2 %>%
  group_by(Work_Status) %>%
  summarise(
    Mean = mean(Study_Hours, na.rm = TRUE),
    Median = median(Study_Hours, na.rm = TRUE),
    SD = sd(Study_Hours, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Work_Status    Mean Median    SD     N
##   <chr>         <dbl>  <dbl> <dbl> <int>
## 1 Does_Not_Work  9.62   8.54  7.45    30
## 2 Works          6.41   5.64  4.41    30

#Create Histograms for Each Group

hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"],
     main = "Histogram of Study Hours of Students who Works ",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 10)

hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"],
     main = "Histogram of Study Hours of Student who Does_Not_Work",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 10)

For the Study Hours of Student who works histogram, the data appears positively skewed. It is difficult to state the exact kurtosis, but it appears abnormal. For the tudy Hours of Student who Does_Not_work histogram, the data appears positively skewed. It is difficult to state the exact kurtosis, but it appears abnormal. We may need to use a Mann-Whitney U test.

#Create Boxplots for Each Group

ggboxplot(Dataset6.2, x = "Work_Status", y = "Study_Hours",
          color = "Work_Status",
          palette = "jco",
          add = "jitter")

The Works boxplot appears normal. There are no dots past the whiskers. The Does_Not_work boxplot appears abnormal. There are several dots past the whiskers. Although some are very close to the whiskers, some are arguably far away. We may need to use a Mann-Whitney U test.

#Shapiro-Wilk Test of Normality

shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"])
## 
##  Shapiro-Wilk normality test
## 
## data:  Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"]
## W = 0.94582, p-value = 0.1305
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"])
## 
##  Shapiro-Wilk normality test
## 
## data:  Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"]
## W = 0.83909, p-value = 0.0003695

The data for Student wth Work_Status as Works was normal (p > .05). The data for Work_Status as Does_Not_Work was abnormal (p < .05). As one or both tests are not normal, switch to Mann-Whitney U test.

After conducting all three normality tests, it is clear we must use a Mann-Whitney U test.

#Mann-Whitney U

wilcox.test(Study_Hours ~ Work_Status, data = Dataset6.2)
## 
##  Wilcoxon rank sum exact test
## 
## data:  Study_Hours by Work_Status
## W = 569, p-value = 0.07973
## alternative hypothesis: true location shift is not equal to 0

The p= 0.079 Therefore, p > .05 (greater than .05), this means the results were NOT significant.

#Results

Students who Work (Mdn = 6.64) were significantly different from Students who Does_Not_Work (Mdn = 8.54) in Study Hours , p = 0.07973. this means the results were NOT significant, therefore effect size was not calculated