## n
## 1 0.4215
42.1% of the original sample are employed.
## # A tibble: 2 x 4
## gender xbar s n
## <fct> <dbl> <dbl> <int>
## 1 male 55887. 68768. 470
## 2 female 29244. 32026. 373
At a first glance, it looks as though males have a higher average income than females. The boxplot shows that males’ median income is higher than females’, and tend to have more spread/possible outliers into the upper bounds. The summary statistics show that females have a much lower mean income than males, and have a smaller standard deviation. This means there is less variability in females’ incomes than in males’.
inference(y = income, x = gender, data = acs_emp, statistic = "mean", type = "ci",
method = "theoretical", order = c("male","female"))## Warning: package 'gridExtra' was built under R version 3.6.3
## Warning: package 'broom' was built under R version 3.6.3
## Response variable: numerical, Explanatory variable: categorical (2 levels)
## n_male = 470, y_bar_male = 55887.234, s_male = 68767.8814
## n_female = 373, y_bar_female = 29243.6997, s_female = 32025.9848
## 95% CI (male - female): (19605.3014 , 33681.7672)
We are 95% confident that the difference in mean income between males and females falls between 19,605.30 and 33,681.77 dollars.
There is a statistically significant difference between the average incomes of men and women. The “null hypothesis” in this case is that there is no difference between the incomes of men and women, which would return a difference of “0” in our confidence interval. The p-value must be < 0.05 because our confidence interval does not include 0 at all. The null hypothesis is rejected and our CI is statistically significant.
The confidence level is equivalent to 1 – the alpha level. Therefore, 0.95 = 1 - alpha. Alpha = 0.05, so the significance level for the equivalent hypothesis test is alpha = 0.05.
inference(y = income, x = gender, data = acs_emp, statistic = "mean", type = "ht", null = 0,
method = "theoretical",
alternative = "twosided", order = c("male","female"))## Response variable: numerical
## Explanatory variable: categorical (2 levels)
## n_male = 470, y_bar_male = 55887.234, s_male = 68767.8814
## n_female = 373, y_bar_female = 29243.6997, s_female = 32025.9848
## H0: mu_male = mu_female
## HA: mu_male != mu_female
## t = 7.4437, df = 372
## p_value = < 0.0001
H0: There is no difference between the mean incomes of males and females.
HA: There is a difference between the mean incomes of males and females.
Our p-value is 0.0001, which is smaller than our alpha value of 0.05. We can conclude that a significant difference exists between the average income of males and females. We reject the null hypothesis in favor of the alternative hypothesis.
These results are in agreement with the results of our confidence interval.
acs_emp <- acs_emp %>%
mutate(emp_type = case_when(hrs_work >= 40 ~ "full time",
TRUE ~ "part time"))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 35.00 40.00 38.93 42.00 99.00
The summary statistics of the overall sample show the median hours worked are 40 and the mean is 38.93. The barplot shows that more males are full-time than part-time (81% vs 19%) and that females are slightly more equally divided between full-time and part-time (57% vs 43%). The overall statistics on full-time and part-time employees show 71% of employees are full-time and 29% are part-time.
## # A tibble: 4 x 4
## # Groups: emp_type [2]
## emp_type gender n pct
## <chr> <fct> <int> <dbl>
## 1 full time male 383 0.643
## 2 full time female 213 0.357
## 3 part time male 87 0.352
## 4 part time female 160 0.648
Females are more heavily represented among part-time employees, being about 65% of that workforce. Females are only about 36% of the full-time workforce.
inference(y = income, x = gender, data = acs_fulltime, statistic = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical",
order = c("male","female"))## Response variable: numerical
## Explanatory variable: categorical (2 levels)
## n_male = 383, y_bar_male = 63183.5509, s_male = 68974.109
## n_female = 213, y_bar_female = 39752.1127, s_female = 35288.6675
## H0: mu_male = mu_female
## HA: mu_male != mu_female
## t = 5.4822, df = 212
## p_value = < 0.0001
H0: There is no difference between the mean incomes of full time male and female employees.
HA: There is a difference between the mean incomes of full time male and female employees.
Our p-value is 0.0001, which is smaller than our alpha value of 0.05. We can conclude that a significant difference exists between the average income of full time males and full time females. We reject the null hypothesis in favor of the alternative hypothesis.
Since the difference is significant, we will conduct a confidence interval at 95% confidence. This is because our hypothesis test had a default value of 0.05 for alpha. The confidence level is equivalent to 1 – the alpha level.
1 - 0.05 = 0.95. Confidence level = 0.95, so we will conduct a confidence interval at 95% confidence
inference(y = income, x = gender, data = acs_fulltime, statistic = "mean", type = "ci", conf_level = 0.95,
method = "theoretical", order = c("male","female"))## Response variable: numerical, Explanatory variable: categorical (2 levels)
## n_male = 383, y_bar_male = 63183.5509, s_male = 68974.109
## n_female = 213, y_bar_female = 39752.1127, s_female = 35288.6675
## 95% CI (male - female): (15006.2634 , 31856.6131)
We are 95% confident that the difference in mean income between full time males and full time females falls between 15,006.26 and 31,856.61 dollars.
inference(y = income, x = gender, data = acs_parttime, statistic = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical",
order = c("male","female"))## Response variable: numerical
## Explanatory variable: categorical (2 levels)
## n_male = 87, y_bar_male = 23766.6667, s_male = 58112.1248
## n_female = 160, y_bar_female = 15254.375, s_female = 19859.9366
## H0: mu_male = mu_female
## HA: mu_male != mu_female
## t = 1.3249, df = 86
## p_value = 0.1887
H0: There is no difference between the mean incomes of part time male and female employees.
HA: There is a difference between the mean incomes of part time male and female employees.
Our p-value is 0.1887, which is larger than our alpha value of 0.05. We can conclude that a significant difference does not exist between the average income of part time males and part time females. We fail to reject the null hypothesis.
The findings suggest that working full or part time is a confounding variable in the relationship between gender and income. There isn’t a statistically significant difference between the incomes of part time males and females, but there is one between full time males and females.
We would use an ANOVA test (analysis of variance test). ANOVA tests test for some difference in means of many different groups. Because we do not know which, if any, of the various race/ethnicity groups in the dataset are different from one another, we need to test all of them.
inference(y = income, x = race, data = acs_emp, statistic = "mean", type = "ht",
alternative = "greater", method = "theoretical")## Response variable: numerical
## Explanatory variable: categorical (4 levels)
## n_white = 670, y_bar_white = 44491.0448, s_white = 56564.7207
## n_black = 76, y_bar_black = 29953.2895, s_black = 23313.5402
## n_asian = 39, y_bar_asian = 86406.4103, s_asian = 104998.3911
## n_other = 58, y_bar_other = 29648.2759, s_other = 29511.4562
##
## ANOVA:
## df Sum_Sq Mean_Sq F p_value
## race 3 97229184003.6589 32409728001.2196 10.2616 < 0.0001
## Residuals 839 2649854777471.31 3158348960.0373
## Total 842 2747083961474.97
##
## Pairwise tests - t tests with pooled SD:
## # A tibble: 6 x 3
## group1 group2 p.value
## <chr> <chr> <dbl>
## 1 black white 0.0329
## 2 asian white 0.00000682
## 3 other white 0.0540
## 4 asian black 0.000000421
## 5 other black 0.975
## 6 other asian 0.00000129
H0: There is no difference between the mean incomes of employees in different racial/ethnic groups.
HA: There is a difference between the mean incomes of employees in at least one of the different racial/ethnic groups.
The compared groups that had a smaller p-value than 0.05 (our significance value) were: (1)Black-white p-value = 0.0329, (2)Asian-white p-value = 0.00000682, (3)Asian-black p-value = 0.000000421, (4)Other-asian p-value = 0.00000129.
We can conclude that a significant difference exists between the mean incomes of employees in at least one of the different racial/ethnic groups. We reject the null hypothesis in favor of the alternative hypothesis.
The research question I am asking here is whether or not there is a difference in the average time it takes to get to work depending on marriage status.
H0: There is no difference in the mean time it takes to get to work between married and non-married employees.
HA: There is a difference in the mean time it takes to get to work between married and non-married employees.
inference(y = time_to_work, x = married, data = acs_emp, statistic = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical",
order = c("yes","no"))## Response variable: numerical
## Explanatory variable: categorical (2 levels)
## n_yes = 454, y_bar_yes = 26.859, s_yes = 23.6314
## n_no = 329, y_bar_no = 24.8085, s_no = 20.2717
## H0: mu_yes = mu_no
## HA: mu_yes != mu_no
## t = 1.3023, df = 328
## p_value = 0.1937
Our p-value is 0.1937, which is larger than our alpha value of 0.05. We can conclude that a significant difference does not exist in the average time it takes to get to work between married and non-married employees. We fail to reject the null hypothesis.