#Research Question: Is there a significant difference between the yearly earnings for high school graduates and college (BS) graduates at a small firm? 
#H0: There is no significant difference between the yearly earnings  for high school graduates and college (BS) graduates at a small firm. 
#H1: There is a significant difference in the earnings of high school graduates and college (BS) graduates at a small firm. 
library(moments)
library(readxl)
data <- read_excel("C:/Users/apoor/Downloads/Exam1Q1 (1).xlsx")
#We can see from the description and the data that we have what looks to be an independent t-test (two sample t-test). Our aim is for our data to be normal and have equal variances between the two groups. Let’s see if we can get that: 
plot(density(data$Highschool))

plot(density(data$BS))

#We perform the D’agostino test to check if the shape of the distribution is similar to the shape of the normal distribution and if there is any skewness and kurtosis. 
agostino.test(data$Highschool) 
## 
##  D'Agostino skewness test
## 
## data:  data$Highschool
## skew = -0.24041, z = -0.69269, p-value = 0.4885
## alternative hypothesis: data have a skewness

#Since the skewness of High school is -0.24041, it is the left tail.

agostino.test(data$BS)
## 
##  D'Agostino skewness test
## 
## data:  data$BS
## skew = -0.37545, z = -1.06905, p-value = 0.285
## alternative hypothesis: data have a skewness

#Since the skewness of BS is -0.37545, it is the left tail. #Next, we want to check whether the variances are equal between the two groups.

library(car)
## Loading required package: carData
leveneTest(data$Highschool, data$BS)
## Warning in leveneTest.default(data$Highschool, data$BS): data$BS coerced to
## factor.
## Warning in anova.lm(lm(resp ~ group)): ANOVA F-tests on an essentially perfect
## fit are unreliable
## Levene's Test for Homogeneity of Variance (center = median)
##       Df    F value    Pr(>F)    
## group 34 2.6689e+28 < 2.2e-16 ***
##        4                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#The p-value here is less than the significance level of 0.05. Hence, it is statisically significant. This suggests that the assumptions of equal variance is not met.

t.test(data$Highschool,data$BS, var.equal = T)
## 
##  Two Sample t-test
## 
## data:  data$Highschool and data$BS
## t = -10.812, df = 76, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.326942  -9.180751
## sample estimates:
## mean of x mean of y 
##  39.51282  50.76667
sd(data$Highschool)
## [1] 2.48947
sd(data$BS)
## [1] 6.004706
#Summary: 
#We performed an independent t-test for a sample population of 39 comparing the earnings of those with a high school diploma (M = 39.51; SD = 2.49) to those with a BS (M = 50.77; SD = 6.00). We find there is a significant difference between the earnings of the two groups, t(76) = -10.81, p =2.2e-16(where p < alpha=0.05).  Hence, it appears that the level of schooling (high school vs BS) has an impact on earnings in the firm where BS graduates earn more. So we reject the H0.