Background

The Data used in this analysis was extracted by Barry Becker from the 1994 Census Database. It has been filtered down to show six variables which are Age, Education, Employer, Marital Status, and Income bracket. The set shows 31,536 observations each of which represents an Adult. The variables that will be focused on are the individuals Marital Status having four levels (Never-Married, Married, Divorced, Widowed) and Income bracket having two levels (<=50K, >50K). The Income bracket shows wether or not the individual makes above, equal to, or below $50,000 a year. The purpose of the analysis will be to find out “if an individuals Income bracket is independent of their Marital status or related?”.

Hypothesis

A Chi-Squared test will be used to evaluate the question. The hypothesis are laid out below and alpha, \(\alpha\) will be set to .05 level of significance.

\[ H_{0}:\ \text{Income level and marital status are not associated.} \] \[ H_{a}:\ \text{Income level and marital are associated.} \]

Data

datatable(Census2, options=list(lengthMenu = c(10,50)),  style = "default")

Analysis

Table

The actual counts for the data is displayed in the table below. The largest count is for individuals who never have been married and make 50,000 dollars or less a year. This makes sense as the adults who most likely fit the category of never having been married are probably younger adults. This means that if they are at the beginning of their careers which tends to be lower paying. It is also important to remember that this data comes from 1994 and since wages have increased with time. Since then, the wage growth rate has been between 2-4% per year.\(^1\) The smallest count is that of Widowed individuals who earn more than 50,000 dollars a year and this is reasonable as the smallest group of people in the Marital status variable are widows. The count for married individuals is more evenly split between the income groups but the majority still fit in the under 50,000 dollars a year category.

pander(table(Census2$Income, Census2$Marital.Status))
  Divorced Married Never-married Widowed
<=50K 3980 8681 10192 908
>50K 463 6736 491 85

Barplot

The barplot demonstrates the data from the table above.

barplot(table(Census2$Income, Census2$Marital.Status), beside = TRUE, legend.text=TRUE,args.legend = list(x="topright",bty="y",title="Income Bracket"), col = c("firebrick","green"))

Chi-Squared test

The test results are given below.

pander(chisq.test(table(Census2$Income, Census2$Marital.Status)))
Pearson’s Chi-squared test: table(Census2$Income, Census2$Marital.Status)
Test statistic df P value
5945 3 0 * * *

In order to make sure the Chi-Squared test requirements are met the expected results table must have counts all greater than 5. As can see below the counts are all above 5 which means that the Chi-Squared test is appropriate.

Expected Results

pander(chisq.test(table.income)$expected)
  Divorced Married Never-married Widowed
<=50K 3348 11616 8049 748.2
>50K 1095 3801 2634 244.8

The Chi-Squared statistic of 5945 with 3 degrees of freedom provides a P-value well below the \(\alpha\) of .05. As a result sufficient evidence exists to conclude the alternative as the null is rejected.

Pearson Residuals

The Pearson residuals show by how much the observed counts differ from the expected counts given that Marital status and Income bracket are independent. In other words, by how much do the observed counts differ from the expected counts given the null hypothesis is true. When evaulating the Pearson residuals, it can be seen that those who are married and fit in the greater than 50,000 dollar income bracket contributed most to the significant Chi-Squared test.

pander(chisq.test(table.income)$residuals)
  Divorced Married Never-married Widowed
<=50K 10.93 -27.23 23.88 5.843
>50K -19.11 47.61 -41.75 -10.21

Intepretation

The question for this analysis was to find out “if an individuals Income bracket is independent of their Marital status or related?”. The Chi-Squared test performed in the analysis lead to sufficient evidence to conclude the alternative hypothesis, which is that Marital status and Income bracket are associated. Therefore a persons Marital status does impact their Income bracket. A reason this could be true is that a married individual is most likely to already be working in a career field where they earn a yearly salary. Compare this with the fact that the adult who has never been married is most likely in their early adult years and therfore early on in their careers earning a lesser wage.