The data below represents the yearly earnings (in $1000s of dollars) that high school and college (BS) graduates earn at a small firm. Determine if there is any difference in pay for the two groups: Be sure to include a complete analysis (i.e., all assumptions checked and full understanding of effects garnered) and a clear summary with all needed statistical details included (see lab keys for examples).
Loading the “moments” package
library(moments)
library(reshape2)
library(car)
## Loading required package: carData
Loading the dataset
library(readxl)
Exam1Q2 <- read_excel("/Users/anacapa/sunny/mera/harrisburg/Term 3/ANLY 510 Analytics II/Exams/Exam 1/Exam1Q1.xlsx")
Exam1Q2
## # A tibble: 39 x 2
## Highschool BS
## <dbl> <dbl>
## 1 42.2 35.4
## 2 34.5 45.8
## 3 44 39.4
## 4 34.1 40
## 5 41.8 39.2
## 6 40.7 40.2
## 7 36.4 44.7
## 8 43.3 37.3
## 9 39.5 40.8
## 10 35.4 39.3
## # … with 29 more rows
From the description we know we are comparing two groups on a continuous dependent variable (income) so this is a prime case for a independent t-test (two sample t). We know for a two sample we want our data to be ~ normal and we want ~ equal variances between the two groups. Lets check that out:
plot(density(Exam1Q2$Highschool))
plot(density(Exam1Q2$BS))
agostino.test(Exam1Q2$Highschool)
##
## D'Agostino skewness test
##
## data: Exam1Q2$Highschool
## skew = -0.24041, z = -0.69269, p-value = 0.4885
## alternative hypothesis: data have a skewness
agostino.test(Exam1Q2$BS)
##
## D'Agostino skewness test
##
## data: Exam1Q2$BS
## skew = 0.53467, z = 1.49330, p-value = 0.1354
## alternative hypothesis: data have a skewness
Rehsaping the data for the levene test:
dataset <- melt(Exam1Q2)
## No id variables; using all as measure variables
leveneTest(value ~ variable, dataset)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.4323 0.5128
## 76
Data isn’t perfectly normal but nothing too bad and the variances are fairly equal. We are ok running an independent t-test with equal variances:
t.test(Exam1Q2$Highschool, Exam1Q2$BS, var.equal = TRUE)
##
## Two Sample t-test
##
## data: Exam1Q2$Highschool and Exam1Q2$BS
## t = 0.16929, df = 76, p-value = 0.866
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.131669 1.341926
## sample estimates:
## mean of x mean of y
## 39.51282 39.40769
tapply(dataset$value, dataset$variable, mean )
## Highschool BS
## 39.51282 39.40769
tapply(dataset$value, dataset$variable, sd)
## Highschool BS
## 2.489470 2.973513
We find no difference. Lets summarize.
We performed an independent t-test comparing the earnings of those with a high school diploma (M = 39.51; SD = 2.49) to those with a bachelors of science (M = 39.41; SD = 2.97). We find no significant difference between the earnings of the two groups, t(76) = .17, p = .87. In sum, it appears that the level of schooling (high school vs a BSc) has negligible impact on earnings in the firm.