The Data used in this analysis was extracted by Barry Becker from the 1994 Census Database. It has been filtered down to show six variables which are Age, Education, Employer, Marital Status, and Income bracket. The set shows 31,536 observations each of which represents an Adult. The variables that will be focused on are the individuals Marital Status
having four levels (Never-Married, Married, Divorced, Widowed) and Income bracket
having two levels (<=50K, >50K). The Income bracket
shows wether or not the individual makes above, equal to, or below $50,000 a year. The purpose of the analysis will be to find out “if an individuals Income bracket
is independent of their Marital status
or related?”.
A Chi-Squared test will be used to evaluate the question. The hypothesis are laid out below and alpha, \(\alpha\) will be set to .05 level of significance.
\[ H_{0}:\ \text{Income level and marital status are not associated.} \] \[ H_{a}:\ \text{Income level and marital are associated.} \]
datatable(Census2, options=list(lengthMenu = c(10,50)), style = "default")
The actual counts for the data is displayed in the table below. The largest count is for individuals who never have been married and make 50,000 dollars or less a year. This makes sense as the adults who most likely fit the category of never having been married are probably younger adults. This means that if they are at the beginning of their careers which tends to be lower paying. It is also important to remember that this data comes from 1994 and since wages have increased with time. Since then, the wage growth rate has been between 2-4% per year.\(^1\) The smallest count is that of Widowed individuals who earn more than 50,000 dollars a year and this is reasonable as the smallest group of people in the Marital status
variable are widows. The count for married individuals is more evenly split between the income groups but the majority still fit in the under 50,000 dollars a year category.
pander(table(Census2$Income, Census2$Marital.Status))
Divorced | Married | Never-married | Widowed | |
---|---|---|---|---|
<=50K | 3980 | 8681 | 10192 | 908 |
>50K | 463 | 6736 | 491 | 85 |
The barplot demonstrates the data from the table above.
barplot(table(Census2$Income, Census2$Marital.Status), beside = TRUE, legend.text=TRUE,args.legend = list(x="topright",bty="y",title="Income Bracket"), col = c("firebrick","green"))
The test results are given below.
pander(chisq.test(table(Census2$Income, Census2$Marital.Status)))
Test statistic | df | P value |
---|---|---|
5945 | 3 | 0 * * * |
In order to make sure the Chi-Squared test requirements are met the expected results table must have counts all greater than 5. As can see below the counts are all above 5 which means that the Chi-Squared test is appropriate.
pander(chisq.test(table.income)$expected)
Divorced | Married | Never-married | Widowed | |
---|---|---|---|---|
<=50K | 3348 | 11616 | 8049 | 748.2 |
>50K | 1095 | 3801 | 2634 | 244.8 |
The Chi-Squared statistic of 5945 with 3 degrees of freedom provides a P-value well below the \(\alpha\) of .05. As a result sufficient evidence exists to conclude the alternative as the null is rejected.
The Pearson residuals show by how much the observed counts differ from the expected counts given that Marital status
and Income bracket
are independent. In other words, by how much do the observed counts differ from the expected counts given the null hypothesis is true. When evaulating the Pearson residuals, it can be seen that those who are married and fit in the greater than 50,000 dollar income bracket contributed most to the significant Chi-Squared test.
pander(chisq.test(table.income)$residuals)
Divorced | Married | Never-married | Widowed | |
---|---|---|---|---|
<=50K | 10.93 | -27.23 | 23.88 | 5.843 |
>50K | -19.11 | 47.61 | -41.75 | -10.21 |
The question for this analysis was to find out “if an individuals Income bracket
is independent of their Marital status
or related?”. The Chi-Squared test performed in the analysis lead to sufficient evidence to conclude the alternative hypothesis, which is that Marital status
and Income bracket
are associated. Therefore a persons Marital status
does impact their Income bracket
. A reason this could be true is that a married individual is most likely to already be working in a career field where they earn a yearly salary. Compare this with the fact that the adult who has never been married is most likely in their early adult years and therfore early on in their careers earning a lesser wage.