Project 3

##Is an increased unemployment rate and a higher number of persons per household corelated to a higher poverty rate for children under 5 years of age?

Data is “United States Counties”, which includes data on socioeconomic, educational, housing and employment. Gathered by the US Census Bureau, Bureau of Labor Statistics, and the USDA Economic Research Service on 3,142 US counties. This data set contains 3142 observations of 188 variables. For this project I will only use the following three columns: Poverty_age_under_5 (2017), unemployment_rate (2017), number_of_persons_per_household (2017).

Link to data set: https://www.openintro.org/data/index.php?data=county_complete

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(car)

## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some

CNTY_Data <- read.csv("county_complete.csv")

In the next three chunks I will look at the structure of the data, count total observation, count and remove missing values, look at the summary statistics of the columns I’m using and create a data frame including only the three columns that are being used to answer the project’s question. This new data frame will then be used to plot histograms to see the distribution of the data.

Structure and count.

str(CNTY_Data)

## 'data.frame':    3142 obs. of  188 variables:
##  $ fips                                          : int  1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 ...
##  $ state                                         : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ name                                          : chr  "Autauga County" "Baldwin County" "Barbour County" "Bibb County" ...
##  $ pop2000                                       : int  43671 140415 29038 20826 51024 11714 21399 112249 36583 23988 ...
##  $ pop2010                                       : int  54571 182265 27457 22915 57322 10914 20947 118572 34215 25989 ...
##  $ pop2011                                       : int  55199 186534 27351 22745 57562 10675 20880 117785 34031 25993 ...
##  $ pop2012                                       : int  54927 190048 27175 22658 57595 10612 20688 117219 34092 25958 ...
##  $ pop2013                                       : int  54695 194736 26947 22503 57623 10549 20372 116482 34122 26014 ...
##  $ pop2014                                       : int  54864 199064 26749 22533 57546 10673 20327 115941 33948 25897 ...
##  $ pop2015                                       : int  54838 202863 26264 22561 57590 10419 20141 115505 33968 25741 ...
##  $ pop2016                                       : int  55278 207509 25774 22633 57562 10441 19965 114980 33717 25766 ...
##  $ pop2017                                       : int  55504 212628 25270 22668 58013 10309 19825 114728 33713 25857 ...
##  $ age_under_5_2010                              : num  6.6 6.1 6.2 6 6.3 6.8 6.5 6.1 5.7 5.3 ...
##  $ age_under_5_2017                              : num  5.7 5.7 5.5 5.7 6.1 5.8 5.9 5.7 6.1 4.5 ...
##  $ age_under_18_2010                             : num  26.8 23 21.9 22.7 24.6 22.3 24.1 22.9 22.5 21.4 ...
##  $ age_over_65_2010                              : num  12 16.8 14.2 12.7 14.7 13.5 16.7 14.3 16.7 17.9 ...
##  $ age_over_65_2017                              : num  14.3 19 17.4 15.1 17.4 15.2 18.5 16.5 18.6 21 ...
##  $ median_age_2017                               : num  37.8 42.6 39.7 39.8 40.9 40.8 40.7 39.1 43 46.1 ...
##  $ female_2010                                   : num  51.3 51.1 46.9 46.3 50.5 45.8 53 51.8 52.2 50.4 ...
##  $ white_2010                                    : num  78.5 85.7 48 75.8 92.6 23 54.4 74.9 58.8 92.7 ...
##  $ black_2010                                    : num  17.7 9.4 46.9 22 1.3 70.2 43.4 20.6 38.7 4.6 ...
##  $ black_2017                                    : num  9.55 4.77 24.02 11.03 0.79 ...
##  $ native_2010                                   : num  0.4 0.7 0.4 0.3 0.5 0.2 0.3 0.5 0.2 0.5 ...
##  $ native_2017                                   : num  0.15 0.41 0.1 0.18 0.18 0.52 0.03 0.18 0.14 0.24 ...
##  $ asian_2010                                    : num  0.9 0.7 0.4 0.1 0.2 0.2 0.8 0.7 0.5 0.2 ...
##  $ asian_2017                                    : num  0.47 0.35 0.31 0 0.07 0.35 0.56 0.5 0.5 0.1 ...
##  $ pac_isl_2010                                  : num  NA NA NA NA NA NA 0 0.1 0 0 ...
##  $ pac_isl_2017                                  : num  0.04 0 0 0 0 0 0 0 0 0 ...
##  $ other_single_race_2017                        : num  0.65 0.39 1.87 0.02 0.37 0.01 0.03 0.63 0.35 0.1 ...
##  $ two_plus_races_2010                           : num  1.6 1.5 0.9 0.9 1.2 0.8 0.8 1.7 1.1 1.5 ...
##  $ two_plus_races_2017                           : num  0.84 0.82 0.41 0.42 0.85 0.33 0.74 1.14 0.49 0.52 ...
##  $ hispanic_2010                                 : num  2.4 4.4 5.1 1.8 8.1 7.1 0.9 3.3 1.6 1.2 ...
##  $ hispanic_2017                                 : num  2.67 4.44 4.21 2.35 9.01 0.33 0.32 3.57 2.15 1.58 ...
##  $ white_not_hispanic_2010                       : num  77.2 83.5 46.8 75 88.9 21.9 54.1 73.6 58.1 92.1 ...
##  $ white_not_hispanic_2017                       : num  75.4 83.1 45.7 74.6 87.4 ...
##  $ speak_english_only_2017                       : num  96.2 94.5 94.3 97.8 92.3 97.2 98.5 95.9 98.6 99 ...
##  $ no_move_in_one_plus_year_2010                 : num  86.3 83 83 90.5 87.2 88.5 92.8 82.9 86.2 88.1 ...
##  $ foreign_born_2010                             : num  2 3.6 2.8 0.7 4.7 1.1 1.1 2.5 0.9 0.5 ...
##  $ foreign_spoken_at_home_2010                   : num  3.7 5.5 4.7 1.5 7.2 3.8 1.6 4.5 1.6 1.4 ...
##  $ women_16_to_50_birth_rate_2017                : num  7.4 5.1 7.2 7.6 5.6 3.5 4.8 5.2 4.3 3.8 ...
##  $ hs_grad_2010                                  : num  85.3 87.6 71.9 74.5 74.7 74.7 74.8 78.5 71.8 73.4 ...
##  $ hs_grad_2016                                  : num  87.6 90 73.8 80.7 80 66.6 81.1 82.4 80.3 81.4 ...
##  $ hs_grad_2017                                  : num  87.7 90.2 73.1 82.1 79.8 71.4 81.1 83.2 80.9 79.5 ...
##  $ some_college_2016                             : num  28.7 31.8 26 26.9 34 22.2 25.1 32.6 28.4 31.4 ...
##  $ some_college_2017                             : num  29.1 31.6 25.5 25 34.4 21.3 24.5 33.2 29.1 28.9 ...
##  $ bachelors_2010                                : num  21.7 26.8 13.5 10 12.5 12 11 16.1 10.8 10.5 ...
##  $ bachelors_2016                                : num  24.6 29.5 12.9 12 13.1 10.3 16.1 17.7 12.5 14 ...
##  $ bachelors_2017                                : num  25 30.7 12 13.2 13.1 13.4 16.1 17.9 13.3 12.5 ...
##  $ veterans_2010                                 : int  5817 20396 2327 1883 4072 943 1675 11757 2893 2172 ...
##  $ veterans_2017                                 : num  12.6 11.9 8 7.4 9.6 4.5 8.4 10.9 9.2 11.3 ...
##  $ mean_work_travel_2010                         : num  25.1 25.8 23.8 28.3 33.2 28.1 25.1 22.1 23.6 26.2 ...
##  $ mean_work_travel_2017                         : num  25.8 27 23.4 30 35 29.8 23.2 24.8 23.6 26.5 ...
##  $ broadband_2017                                : num  76.6 74.5 57.2 62 65.8 49.4 58.2 71 62.8 67.5 ...
##  $ computer_2017                                 : num  86.2 86.9 73.4 74.8 78.2 64.2 68.3 82.9 72.7 79.4 ...
##  $ housing_units_2010                            : int  22135 104061 11829 8981 23887 4493 9964 53289 17004 16267 ...
##  $ homeownership_2010                            : num  77.5 76.7 68 82.9 82 76.9 69 70.7 71.4 77.5 ...
##  $ housing_multi_unit_2010                       : num  7.2 22.6 11.1 6.6 3.7 9.9 13.7 14.3 8.7 4.3 ...
##  $ median_val_owner_occupied_2010                : int  133900 177200 88200 81200 113700 66300 70200 98200 82200 97100 ...
##  $ households_2010                               : int  19718 69476 9795 7441 20605 3732 8019 46421 13681 11352 ...
##  $ households_2017                               : int  21054 76133 9191 6916 20690 3670 7050 45099 13694 10795 ...
##  $ persons_per_household_2010                    : num  2.7 2.5 2.52 3.02 2.73 2.85 2.58 2.46 2.51 2.22 ...
##  $ persons_per_household_2017                    : num  2.59 2.63 2.54 2.97 2.76 2.74 2.81 2.49 2.44 2.37 ...
##  $ per_capita_income_2010                        : int  24568 26469 15875 19918 21070 20289 16916 20574 16626 21322 ...
##  $ per_capita_income_2017                        : num  27842 27780 17892 20572 21367 ...
##  $ metro_2013                                    : int  1 1 0 1 1 0 0 1 0 0 ...
##  $ median_household_income_2010                  : int  53255 50147 33219 41770 45549 31602 30659 38407 31467 40690 ...
##  $ median_household_income_2016                  : int  54487 56460 32884 43079 47213 34278 35409 41778 39530 41456 ...
##  $ median_household_income_2017                  : int  55317 52562 33368 43404 47412 29655 36326 43686 37342 40041 ...
##  $ private_nonfarm_establishments_2009           : int  877 4812 522 318 749 120 446 2444 568 350 ...
##  $ private_nonfarm_employment_2009               : int  10628 52233 7990 2927 6968 1919 5400 38324 6241 3600 ...
##  $ percent_change_private_nonfarm_employment_2009: num  16.6 17.4 -27 -14 -11.4 -18.5 2.1 -5.6 -45.8 5.4 ...
##  $ nonemployment_establishments_2009             : int  2971 14175 1527 1192 3501 390 1180 6329 2074 1627 ...
##  $ firms_2007                                    : int  4067 19035 1667 1385 4458 417 1769 8713 1981 2180 ...
##  $ black_owned_firms_2007                        : num  15.2 2.7 NA 14.9 NA NA NA 7.2 NA NA ...
##  $ native_owned_firms_2007                       : num  NA 0.4 NA NA NA NA NA NA NA NA ...
##  $ asian_owned_firms_2007                        : num  1.3 1 NA NA NA NA 3.3 1.6 NA NA ...
##  $ pac_isl_owned_firms_2007                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ hispanic_owned_firms_2007                     : num  0.7 1.3 NA NA NA NA NA 0.5 NA NA ...
##  $ women_owned_firms_2007                        : num  31.7 27.3 27 NA 23.2 38.8 NA 24.7 29.3 14.5 ...
##  $ manufacturer_shipments_2007                   : int  NA 1410273 NA 0 341544 NA 399132 2679991 667283 307439 ...
##  $ mercent_whole_sales_2007                      : int  NA NA NA NA NA NA 56712 NA NA 62293 ...
##  $ sales_2007                                    : int  598175 2966489 188337 124707 319700 43810 229277 1542981 264650 186321 ...
##  $ sales_per_capita_2007                         : int  12003 17166 6334 5804 5622 3995 11326 13678 7620 7613 ...
##  $ accommodation_food_service_2007               : int  88157 436955 NA 10757 20941 3670 28427 186533 23237 13948 ...
##  $ building_permits_2010                         : int  191 696 10 8 18 1 3 107 10 6 ...
##  $ fed_spending_2009                             : int  331142 1119082 240308 163201 294114 108846 195055 1830659 294718 184642 ...
##  $ area_2010                                     : num  594 1590 885 623 645 ...
##  $ density_2010                                  : num  91.8 114.6 31 36.8 88.9 ...
##  $ smoking_ban_2010                              : chr  "none" "none" "partial" "none" ...
##  $ poverty_2010                                  : num  10.6 12.2 25 12.6 13.4 25.3 25 19.5 20.3 17.6 ...
##  $ poverty_2016                                  : num  13.5 11.7 29.9 20.1 14.1 32.6 24.8 17.1 19.9 16.8 ...
##  $ poverty_2017                                  : num  13.7 11.8 27.2 15.2 15.6 28.5 24.4 18.6 18.8 16.1 ...
##  $ poverty_age_under_5_2017                      : num  17.2 19.4 56.8 21.6 29.5 59.7 30.1 31.1 31.9 12.8 ...
##  $ poverty_age_under_18_2017                     : num  20 15.9 44.9 25.9 25.3 50.2 34.8 26.3 28.9 20.1 ...
##  $ civilian_labor_force_2007                     : int  24383 82659 10334 8791 26629 3653 9099 54861 15474 11984 ...
##  $ employed_2007                                 : int  23577 80099 9684 8432 25780 3308 8539 52709 14469 11484 ...
##  $ unemployed_2007                               : int  806 2560 650 359 849 345 560 2152 1005 500 ...
##  $ unemployment_rate_2007                        : num  3.31 3.1 6.29 4.08 3.19 9.44 6.15 3.92 6.49 4.17 ...
##  $ civilian_labor_force_2008                     : int  24687 83223 10161 8749 26698 3634 9051 54564 15012 11996 ...
##   [list output truncated]

count(CNTY_Data)

##      n
## 1 3142

Summary statistics and missing values.

summary(CNTY_Data$persons_per_household_2017)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.400   2.360   2.480   2.517   2.630   4.130       2

summary(CNTY_Data$unemployment_rate_2017)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.620   3.520   4.360   4.611   5.355  19.070       3

summary(CNTY_Data$poverty_age_under_5_2017)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   16.50   23.80   25.17   32.30   90.50       4

New data frame and histograms.

CNTY_Analysis <- CNTY_Data |>
  select(persons_per_household_2017, unemployment_rate_2017, poverty_age_under_5_2017) |>
  filter(!is.na(persons_per_household_2017), 
         !is.na(unemployment_rate_2017), 
         !is.na(poverty_age_under_5_2017)) 

summary(CNTY_Analysis)

##  persons_per_household_2017 unemployment_rate_2017 poverty_age_under_5_2017
##  Min.   :1.830              Min.   : 1.620         Min.   : 0.00           
##  1st Qu.:2.360              1st Qu.: 3.520         1st Qu.:16.50           
##  Median :2.480              Median : 4.360         Median :23.80           
##  Mean   :2.518              Mean   : 4.611         Mean   :25.17           
##  3rd Qu.:2.630              3rd Qu.: 5.357         3rd Qu.:32.30           
##  Max.   :4.130              Max.   :19.070         Max.   :90.50

ggplot(CNTY_Analysis, aes(x = persons_per_household_2017)) +
  geom_histogram(binwidth = 0.25, fill = "#1f77b4", color = "black") +
  labs(title = "Persons per household", x = "Persons per household", y = "Number of counties") +
  theme_minimal()

ggplot(CNTY_Analysis, aes(x = unemployment_rate_2017)) +
  geom_histogram(binwidth = 1, fill = "#1f77b4", color = "black") +
  labs(title = "Unemployment rate", x = "Unemployment rate", y = "Number of counties") +
  theme_minimal()

ggplot(CNTY_Analysis, aes(x = poverty_age_under_5_2017)) +
  geom_histogram(binwidth = 1, fill = "#1f77b4", color = "black") +
  labs(title = "Poverty rate for children under 5", x = "Percentage", y = "Number of counties") +
  theme_minimal()

Multiple Linear Regression

I will use a multiple linear regression to analyze the data.

CNTY_MLG2 <- lm(poverty_age_under_5_2017 ~ persons_per_household_2017 + unemployment_rate_2017, data = CNTY_Data)

summary(CNTY_MLG2)

## 
## Call:
## lm(formula = poverty_age_under_5_2017 ~ persons_per_household_2017 + 
##     unemployment_rate_2017, data = CNTY_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.595  -7.367  -1.060   5.936  61.283 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  7.7759     1.9439   4.000 6.48e-05 ***
## persons_per_household_2017   0.3627     0.7727   0.469    0.639    
## unemployment_rate_2017       3.5738     0.1231  29.028  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.21 on 3135 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.2175, Adjusted R-squared:  0.217 
## F-statistic: 435.8 on 2 and 3135 DF,  p-value: < 2.2e-16

Coefficients: Intercept (7.7759), Persons per household (0.3627), Unemployment rate (3.5738). Standard error: 11.21 P-Values: Persons per household (0.63), Unemployment rate (<2e-16). R-squared: 0.2175

These results indicate that one extra person per household and a one percent increase in unemployment rate per county increase poverty for children under five by 0.36% and 3.57% percent respectively. However only unemployment rate had a p-value that indicated it was statistically significant so according to this model the only significant factor of the two when it comes to predicting poverty for children under 5 is unemployment rate. The R squared value (0.2175) also indicates that this model has the ability to predict 21.75% of variability in the rates of childhood poverty under 5.

##Assumptions and Diagnostics

Linearity

Linearity is not satisfied, regression lines are not straight and residuals are unevenly distributed across the line with most of them clustering on the left side of the line.

crPlots(CNTY_MLG2)

Independence

Independence is satisfied, the residuals are spread evenly across the line and their are no noticeable patterns or clusters.

plot(resid(CNTY_MLG2), type="b",
     main="Residuals vs Order", ylab="Residuals"); abline(h=0, lty=2)

Homosedasticity and normality of residuals

Homoscedasticity is not satisfied, Residuals vs fitted and scale-location plots both have uneven distributions of residuals with clusters on the left side of the line. Scale-location plot also does not have a horizontal regression line.

Normality is satisfied, the Q-Q plot has very slight tails on both ends and Residuals vs Leverage plot has a few outliers but none seem to be very influential.

par(mfrow=c(2,2)); plot(CNTY_MLG2); par(mfrow=c(1,1))

Multicollinearity

There is no multicolinearity between the predictors, the correlation matrix shows a correlation of 0.16 between the predictors which is very low indicating no correlation.

cor(CNTY_Analysis[, c("persons_per_household_2017", "unemployment_rate_2017")], use = "complete.obs")

##                            persons_per_household_2017 unemployment_rate_2017
## persons_per_household_2017                  1.0000000              0.1668269
## unemployment_rate_2017                      0.1668269              1.0000000

Conclusions

The main takeaway from this analysis is that of the two predictors, number of persons per household and unemployment rate only the second one has a statistically significant effect on poverty rates for children under five years of age. With the poverty rate increasing by 3.57% for every 1% increase of the unemployment rate in a county. The limitation of this model is it’s low predicting power with the predictors in this model only accounting for 21.75% of the poverty rate. The next step in researching the factors that affect the poverty rate for children under 5 would be to remove the predictor of persons per household and add different predictors to try and raise the predicting power of the model.

Project 3

Mark Beavers

2026-05-01