Research Questions
1.Do states with higher education levels have higher median household income? 2.Do states with a bigger population size have a higher median household income? 3.Do states with higher median age have a higher median household income?
The histogram of median household income is not normally distributed, so it is important to examine the distribution of residuals using a histogram or Q-Q plot to assess normality assumptions. The correlation results provide initial insight into the strength of relationships between the quantitative variables and income. Bachelor’s Degree % (r = 0.82) and Median Home Value (r = 0.79) both show strong positive correlations with median household income, indicating that states with higher education levels and higher home values tend to have higher incomes. In contrast, Population for Poverty Status shows a weak positive correlation (r = 0.17), suggesting only a minimal linear relationship with income. Median Age (r = 0.01) and Unemployment Rate (r = 0.01) exhibit near-zero correlations, confirming little to no linear relationship.
These findings are consistent with the scatterplots, where Population for Poverty Status, Bachelor’s Degree %, and Median Home Value display positive linear trends with income. There is no evidence of curvilinear relationships in these plots, so the inclusion of higher-order (quadratic) terms is not necessary. For Median Age and Unemployment Rate, the scatterplots show no clear trend, further supporting their weak correlations with income.
In terms of the categorical variables, the summary statistics and boxplots reveal meaningful differences in income across several groups. Region shows clear variation, with the Northeast having the highest mean income (approximately 87,109), followed by the West (~82,514), while the South has the lowest mean income (approximately 69,530). Coastal status also demonstrates a difference, with coastal states having a higher mean income (~82,092) compared to inland states (~72,618). Political affiliation shows one of the largest gaps, with Democrat-leaning states having a higher mean income (~85,751) than Republican-leaning states (~71,127).
In contrast, the Population Category shows relatively small differences in mean income across levels (Small: ~75,524; Medium: ~75,193; Large: ~80,107), suggesting a weaker relationship with the response variable. Additionally, the presence of outliers—particularly within the Republican category—indicates that further diagnostic analysis using Cook’s Distance and residual plots may be necessary to assess their influence on the model.
Finally, interaction plots suggest potential interactions between Region and Population, as well as Political Affiliation and Region. This indicates that the effect of one explanatory variable on income may depend on the level of another variable, and therefore these interactions should be formally tested in the regression modeling stage.
The added technique we selected was k-fold cross validation, which involves the splitting of the data into k subsets. This will be a 5-fold cross validation approach, in which the dataset is split into 5 groups. This specific number of folds was chosen with regards to the size of the dataset, which is relatively small and has 50 observations. By using a smaller number of folds, this will ensure that the estimates are not inaccurate or skewed because of using smaller groups with less variation. With each iteration, 4 groups are used to train the model and the remaining group is used to test it. This process will be repeated 5 times to ensure that each group serves as the test set once. The purpose of this technique is to get a more reliable estimate of how the model will perform on new or unseen data.
$Midwest
Min. 1st Qu. Median Mean 3rd Qu. Max.
67769 69404 71622 73307 75104 85086
$Northeast
Min. 1st Qu. Median Mean 3rd Qu. Max.
73733 81211 84972 87109 96838 99858
$South
Min. 1st Qu. Median Mean 3rd Qu. Max.
54203 60514 67718 69530 74919 98678
$West
Min. 1st Qu. Median Mean 3rd Qu. Max.
62268 74942 80160 82514 93421 95521
$Coastal
Min. 1st Qu. Median Mean 3rd Qu. Max.
54203 73522 82095 82092 94964 99858
$Inland
Min. 1st Qu. Median Mean 3rd Qu. Max.
55948 68157 71810 72618 76444 93421
$Democrat
Min. 1st Qu. Median Mean 3rd Qu. Max.
62268 80270 85029 85751 94784 99858
$Republican
Min. 1st Qu. Median Mean 3rd Qu. Max.
54203 67666 71118 71127 74632 96838
$Large
Min. 1st Qu. Median Mean 3rd Qu. Max.
67631 70804 75780 80107 89931 99858
$Medium
Min. 1st Qu. Median Mean 3rd Qu. Max.
58229 62194 73032 75193 86731 98678
$Small
Min. 1st Qu. Median Mean 3rd Qu. Max.
54203 70804 74590 75524 81361 96838
[1] 0.1656063
[1] 0.8161651
[1] 0.7937665
[1] 0.1656063
[1] 0.8161651
[1] 0.7937665
[1] -0.002710338
[1] 0.01043475
The Council of State Governments. (2023). State tables: 2023-3-3. Book of the States. https://bookofthestates.org/tables/2023-3-3/
Federal Election Commission. (n.d.). Election results and voting information. https://www.fec.gov/introduction-campaign-finance/election-results-and-voting-information/
U.S. Bureau of Labor Statistics. (2024). Local area unemployment statistics: 2023 annual averages. https://www.bls.gov/lau/lastrk23.htm
U.S. Census Bureau. (2023). Educational attainment (Table S1501). American Community Survey 1-year estimates. https://data.census.gov/table/ACSST1Y2023.S1501
U.S. Census Bureau. (2023). Age and sex (Table S0101). American Community Survey 1-year estimates. https://data.census.gov/table/ACSST1Y2023.S0101
U.S. Census Bureau. (n.d.). Census regions and divisions of the United States [Map]. https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf
U.S. Census Bureau. (2023). Median household income in the past 12 months (Table B19013). American Community Survey 1-year estimates. https://data.census.gov/table/ACSDT1Y2023.B19013
U.S. Census Bureau. (2023). ACS demographic and housing estimates (Table DP05). American Community Survey 1-year estimates. https://data.census.gov/table/ACSDP1Y2023.DP05
U.S. Census Bureau. (2023). Selected housing characteristics (Table DP04). American Community Survey 1-year estimates. https://data.census.gov/table/ACSDP1Y2023.DP04
U.S. Census Bureau. (2023). Poverty status in the past 12 months (Table S1701). American Community Survey 1-year estimates. https://data.census.gov/table/ACSST1Y2023.S1701