Objectives

The data below represents the yearly earnings (in $1000s of dollars) that high school and college (BS) graduates earn at a small firm. Determine if there is any difference in pay for the two groups: Be sure to include a complete analysis (i.e., all assumptions checked and full understanding of effects garnered) and a clear summary with all needed statistical details included (see lab keys for examples).

Loading the “moments” package

library(moments)
library(reshape2)
library(car)
## Loading required package: carData

Loading the dataset

library(readxl)
Exam1Q2 <- read_excel("/Users/anacapa/sunny/mera/harrisburg/Term 3/ANLY 510 Analytics II/Exams/Exam 1/Exam1Q1.xlsx")
Exam1Q2
## # A tibble: 39 x 2
##    Highschool    BS
##         <dbl> <dbl>
##  1       42.2  35.4
##  2       34.5  45.8
##  3       44    39.4
##  4       34.1  40  
##  5       41.8  39.2
##  6       40.7  40.2
##  7       36.4  44.7
##  8       43.3  37.3
##  9       39.5  40.8
## 10       35.4  39.3
## # … with 29 more rows

From the description we know we are comparing two groups on a continuous dependent variable (income) so this is a prime case for a independent t-test (two sample t). We know for a two sample we want our data to be ~ normal and we want ~ equal variances between the two groups. Lets check that out:

plot(density(Exam1Q2$Highschool))

plot(density(Exam1Q2$BS))

agostino.test(Exam1Q2$Highschool)
## 
##  D'Agostino skewness test
## 
## data:  Exam1Q2$Highschool
## skew = -0.24041, z = -0.69269, p-value = 0.4885
## alternative hypothesis: data have a skewness
agostino.test(Exam1Q2$BS)
## 
##  D'Agostino skewness test
## 
## data:  Exam1Q2$BS
## skew = 0.53467, z = 1.49330, p-value = 0.1354
## alternative hypothesis: data have a skewness

Rehsaping the data for the levene test:

dataset <- melt(Exam1Q2)
## No id variables; using all as measure variables
leveneTest(value ~ variable, dataset)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.4323 0.5128
##       76

Data isn’t perfectly normal but nothing too bad and the variances are fairly equal. We are ok running an independent t-test with equal variances:

t.test(Exam1Q2$Highschool, Exam1Q2$BS, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  Exam1Q2$Highschool and Exam1Q2$BS
## t = 0.16929, df = 76, p-value = 0.866
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.131669  1.341926
## sample estimates:
## mean of x mean of y 
##  39.51282  39.40769
tapply(dataset$value, dataset$variable, mean )
## Highschool         BS 
##   39.51282   39.40769
 tapply(dataset$value, dataset$variable, sd)
## Highschool         BS 
##   2.489470   2.973513

We find no difference. Lets summarize.

We performed an independent t-test comparing the earnings of those with a high school diploma (M = 39.51; SD = 2.49) to those with a bachelors of science (M = 39.41; SD = 2.97). We find no significant difference between the earnings of the two groups, t(76) = .17, p = .87. In sum, it appears that the level of schooling (high school vs a BSc) has negligible impact on earnings in the firm.