#Open the Installed Packages
library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(effectsize)
library(effsize)
#Import and Name Dataset
Dataset6.2 <- read_excel('/Users/atharvapitke/Documents/Analytics/Assignment6/Dataset6.2-2.xlsx')
#Calculate Descriptive Statistics for Each Group
Dataset6.2 %>%
group_by(Work_Status) %>%
summarise(
Mean = mean(Study_Hours, na.rm = TRUE),
Median = median(Study_Hours, na.rm = TRUE),
SD = sd(Study_Hours, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## Work_Status Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Does_Not_Work 9.62 8.54 7.45 30
## 2 Works 6.41 5.64 4.41 30
#Create Histograms for Each Group
hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"],
main = "Histogram of Study Hours of Students who Works ",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 10)
hist(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"],
main = "Histogram of Study Hours of Student who Does_Not_Work",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 10)
For the Study Hours of Student who works histogram, the data appears positively skewed. It is difficult to state the exact kurtosis, but it appears abnormal. For the tudy Hours of Student who Does_Not_work histogram, the data appears positively skewed. It is difficult to state the exact kurtosis, but it appears abnormal. We may need to use a Mann-Whitney U test.
#Create Boxplots for Each Group
ggboxplot(Dataset6.2, x = "Work_Status", y = "Study_Hours",
color = "Work_Status",
palette = "jco",
add = "jitter")
The Works boxplot appears normal. There are no dots past the whiskers. The Does_Not_work boxplot appears abnormal. There are several dots past the whiskers. Although some are very close to the whiskers, some are arguably far away. We may need to use a Mann-Whitney U test.
#Shapiro-Wilk Test of Normality
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"])
##
## Shapiro-Wilk normality test
##
## data: Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Works"]
## W = 0.94582, p-value = 0.1305
shapiro.test(Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"])
##
## Shapiro-Wilk normality test
##
## data: Dataset6.2$Study_Hours[Dataset6.2$Work_Status == "Does_Not_Work"]
## W = 0.83909, p-value = 0.0003695
The data for Student wth Work_Status as Works was normal (p > .05). The data for Work_Status as Does_Not_Work was abnormal (p < .05). As one or both tests are not normal, switch to Mann-Whitney U test.
After conducting all three normality tests, it is clear we must use a Mann-Whitney U test.
#Mann-Whitney U
wilcox.test(Study_Hours ~ Work_Status, data = Dataset6.2)
##
## Wilcoxon rank sum exact test
##
## data: Study_Hours by Work_Status
## W = 569, p-value = 0.07973
## alternative hypothesis: true location shift is not equal to 0
The p= 0.079 Therefore, p > .05 (greater than .05), this means the results were NOT significant.
#Results
Students who Work (Mdn = 6.64) were significantly different from Students who Does_Not_Work (Mdn = 8.54) in Study Hours , p = 0.07973. this means the results were NOT significant, therefore effect size was not calculated