This data brief focuses on Poverty analysis of Texans as of current data in March 2025. The data shown to be cleaned here was published in this ArcGIS storymap. Data was mainly sourced from ACS 5-Year estimates from 2023, the latest data as of March 2025.
Some of the data was saved locally and other data was downloaded via tidycensus to make regional maps later on by county, region, and districts.
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
library(stringr)library(ipumsr)library(survey)
Loading required package: grid
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loading required package: survival
Attaching package: 'survey'
The following object is masked from 'package:graphics':
dotchart
library(tidycensus)#pulling from acs data with tidycensus acs23d <-load_variables(year=2023,dataset ="acs5")
Loading Data
#downloaded local ACS dataincome_quintiles_texas <-read.csv("raw_data/ACSDT1Y2023.B19081-2024-11-20T151736.csv") |>mutate(Label..Grouping. =sub(" ", "", Label..Grouping.))poverty_sex_age <-read.csv("raw_data/ACSDT5Y2022.B17001-2024-11-20T174431.csv")poverty_past12 <-read.csv("raw_data/ACSST5Y2022.S1701-2024-11-15T034529.csv")#population estimates by county (from Texas Demographic Center) as well as County region classification from Texas Comptroller Officeregions <-read.csv("County_12_Regions.csv")population_est_overall <-read.csv("2023_txpopest_county/2023_txpopest_county.csv")#sourced from Texas Demographic Center 2023 population estimates (all ages and demographis groups)population_asg <-read.csv("2023_txpopest_county/alldata.csv") #location data for mapping location_fips <-read.csv("county_location_state.csv")
Due to some unconventional and lack of universal naming across datasets, this function was made to help normalize and “fix” names to fit the correct spelling of the counties in Texas.
Cleaning data for children below poverty line by making sure all counties are accounted for and joining to regions data set for region aggregation maps.
missing_regions <-anti_join(regions, children_below_poverty_line_cleaned, by ="County")poverty_with_regions <-left_join(children_below_poverty_line_cleaned, regions , by ="County")head(missing_regions)
Warning: There were 255 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(-Label..Grouping., ~as.numeric(gsub("%", "", .)))`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 254 remaining warnings.
final_pivot_children_past12 <-left_join(location_fips, pivot_children_past12, by ="County") |>mutate(County =paste0(County, " County"))head(final_pivot_children_past12)
Label County estimate
1 48001 Anderson County 23.5
2 48003 Andrews County 13.2
3 48005 Angelina County 22.5
4 48007 Aransas County 29.9
5 48009 Archer County 20.6
6 48011 Armstrong County 5.0
Cleaning and aggregating poverty by age groups for all of Texas
clean_poverty_age <- clean_past12 |>filter(Label..Grouping. %in%c("Under 18 years", "18 to 34 years", "35 to 64 years", "65 years and over")) |>mutate(across(-Label..Grouping., ~as.numeric(gsub("%", "", .)))) |>select(where(~!any(is.na(.)))) |>clean_names() |>select(label_grouping, texas_percent_below_poverty_level_estimate)
Warning: There were 255 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(-Label..Grouping., ~as.numeric(gsub("%", "", .)))`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 254 remaining warnings.
head(clean_poverty_age)
label_grouping texas_percent_below_poverty_level_estimate
1 Under 18 years 19.3
2 18 to 34 years 15.4
3 35 to 64 years 10.2
4 65 years and over 11.4
Cleaning and aggregating by Race and Ethnicity groups for Texas.
clean_past12_re <- poverty_past12 |>select(Label..Grouping., contains("Estimate")) |>mutate(Label..Grouping. =str_trim(Label..Grouping., side ="left"))race_ethnicity_past12 <- clean_past12_re |>filter(Label..Grouping. %in%c("Black or African American alone", "American Indian and Alaska Native alone", "Asian alone", "Native Hawaiian and Other Pacific Islander alone", "Some other race alone", "Two or more races", "Hispanic or Latino origin (of any race)", "White alone, not Hispanic or Latino"))pivot_race_ethnicity_past12 <- race_ethnicity_past12 |>mutate(across(-Label..Grouping., ~as.numeric(gsub(",", "", gsub("%", "", .))))) |>clean_names() |>select(label_grouping, matches("texas_below_poverty_level_estimate|texas_total_estimate")) |>pivot_longer(cols =-label_grouping, names_to ="County",values_to ="estimate" ) |>mutate(Estimate_Type =case_when(grepl("_total_estimate$", County) ~"total_estimate",grepl("_below_poverty_level_estimate$", County) ~"below_poverty_level_estimate" ),County =sapply(sub("_", "", str_to_title(sub("_county_texas_.*", "", County))), fix_names ) ) |>pivot_wider(names_from = Estimate_Type,values_from = estimate ) |>rename(race_ethnicity = label_grouping) |>mutate(race_ethnicity =case_when( race_ethnicity %in%c("Asian alone", "Native Hawaiian and Other Pacific Islander alone" ) ~"Asian and Pacific Islander",TRUE~ race_ethnicity ) ) |>filter(County =="Texastotal_estimate"| County =="Texasbelow_poverty_level_estimate")
Warning: There were 151 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(...)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 150 remaining warnings.
#dataset with tiled data pivot_race_ethnicity_past12
# A tibble: 16 × 4
race_ethnicity County total_estimate below_poverty_level_…¹
<chr> <chr> <dbl> <dbl>
1 Black or African American alone Texas… 3412351 NA
2 Black or African American alone Texas… NA 642879
3 American Indian and Alaska Nati… Texas… 164958 NA
4 American Indian and Alaska Nati… Texas… NA 24249
5 Asian and Pacific Islander Texas… 1490702 NA
6 Asian and Pacific Islander Texas… NA 137648
7 Asian and Pacific Islander Texas… 26391 NA
8 Asian and Pacific Islander Texas… NA 4817
9 Some other race alone Texas… 2248629 NA
10 Some other race alone Texas… NA 447841
11 Two or more races Texas… 4324720 NA
12 Two or more races Texas… NA 726066
13 Hispanic or Latino origin (of a… Texas… 11472875 NA
14 Hispanic or Latino origin (of a… Texas… NA 2172978
15 White alone, not Hispanic or La… Texas… 11480912 NA
16 White alone, not Hispanic or La… Texas… NA 953016
# ℹ abbreviated name: ¹below_poverty_level_estimate
#went in and fixed in excel final_re_poverty <-read.csv("cleaned_data/asian_aggregated_re_poverty.csv") |>mutate(percent = below_poverty_level_estimate / total_estimate)head(final_re_poverty)
race_ethnicity County total_estimate
1 Black or African American alone Texastotal_estimate 3412351
2 American Indian and Alaska Native alone Texastotal_estimate 164958
3 Asian and Pacific Islander Texastotal_estimate 1517093
4 Some other race alone Texastotal_estimate 2248629
5 Two or more races Texastotal_estimate 4324720
6 Hispanic or Latino origin (of any race) Texastotal_estimate 11472875
below_poverty_level_estimate percent
1 642879 0.18839768
2 24249 0.14700105
3 142465 0.09390657
4 447841 0.19916180
5 726066 0.16788740
6 2172978 0.18940135
IPUMS microdata was used to be able to find crossroad between English speaking ability and poverty level within the state of Texas. Microdata is “the Integrated Public Use Microdata Series consists of a series of compatible-format individual-level representative samples of the United States.”
These microdata samples allow researchers to analyze demographic, social, and economic characteristics of populations over time and across different geographic areas. IPUMS provides access to microdata from both U.S. censuses and surveys, as well as international sources.
Use of data from IPUMS USA is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
Since this is a survey format, it needs to be put into a survey design to make sure weights are applied and data is accurate and scaled accordingly to the population it is trying to represent.
#below poverty linesurvey_design_below_pl <-svydesign(id =~CLUSTER, weights =~PERWT, data = below_poverty_threshold)
Since ENG is coded in a number, using that as dummy to calculate percentage for those below and above poverty level.
#poverty survey_design_below_pl <-svydesign(id =~CLUSTER, weights =~PERWT, data = below_poverty_threshold)#all total counts survey_design_all_speakeng_groups <-svydesign(id =~CLUSTER, weights =~PERWT, data = cleaned_ipums)#below poverty line total_speakeng_bfpl <-svyby(~count_dummy, ~SPEAKENG_LABEL, survey_design_below_pl, svytotal, vartype ="ci") |>rename(total_below_fpl = count_dummy) |>select(SPEAKENG_LABEL, total_below_fpl)total_speakeng_bfpl
SPEAKENG_LABEL total_below_fpl
Does not speak English Does not speak English 389586
Yes, but not well Yes, but not well 384473
Yes, speaks only English Yes, speaks only English 2733474
Yes, speaks very well Yes, speaks very well 1072719
Yes, speaks well Yes, speaks well 410834
SPEAKENG_LABEL total_speakeng_est
Does not speak English Does not speak English 1372715
Yes, but not well Yes, but not well 2075095
Yes, speaks only English Yes, speaks only English 28890673
Yes, speaks very well Yes, speaks very well 9972294
Yes, speaks well Yes, speaks well 2953985
SPEAKENG_LABEL total_below_fpl total_speakeng_est percent_poverty
1 Does not speak English 389586 1372715 0.28380691
2 Yes, but not well 384473 2075095 0.18527971
3 Yes, speaks only English 2733474 28890673 0.09461441
4 Yes, speaks very well 1072719 9972294 0.10756993
5 Yes, speaks well 410834 2953985 0.13907789
percent_not_in_poverty
1 0.7161931
2 0.8147203
3 0.9053856
4 0.8924301
5 0.8609221
Warning: • You have not set a Census API key. Users without a key are limited to 500
queries per day and may experience performance limitations.
ℹ For best results, get a Census API key at
http://api.census.gov/data/key_signup.html and then supply the key to the
`census_api_key()` function to use it throughout your tidycensus session.
This warning is displayed once per session.
Function was made to clean poverty data since it repeats many of the same processes for each group
clean_poverty_data <-function(raw_dataset) { cleaned_poverty_dataset <- raw_dataset |>rename(Race_Group = Recoded.detailed.race.code)|>mutate(Race_Group =case_when( Race_Group %in%c("American Indian alone", "Alaska Native alone", "American Indian and Alaska Native tribes specified; or American Indian or Alaska Native, not specified and no other races") ~"American Indian and Alaska Native", Race_Group %in%c("Asian alone", "Native Hawaiian and Other Pacific Islander alone") ~"Asian and Pacific Islander",TRUE~ Race_Group ) ) |>group_by(Race_Group) |>mutate(Total.Recode.for.Recoded.detailed.Hispanic.origin..HISP_RC4. =as.numeric(gsub(",", "", Total.Recode.for.Recoded.detailed.Hispanic.origin..HISP_RC4.)), Hispanic =as.numeric(gsub(",", "", Hispanic)), Non.Hispanic =as.numeric(gsub(",", "", Non.Hispanic)), Total =as.numeric(gsub(",", "", Total))) |>summarise(Hispanic =sum(Hispanic, na.rm =TRUE),Non_Hispanic =sum(Non.Hispanic, na.rm =TRUE), Total.Recode.for.Recoded.detailed.Hispanic.origin..HISP_RC4. =sum(Total.Recode.for.Recoded.detailed.Hispanic.origin..HISP_RC4., na.rum =TRUE), Total =sum(Total, na.rm =TRUE)) hispanic <-as.numeric(cleaned_poverty_dataset[1, 2]) final_wo_hisp <- cleaned_poverty_dataset |>select(Race_Group, Non_Hispanic, Total) |>rename(Race_PL_Total = Non_Hispanic) |>select(Race_Group, Race_PL_Total, Total)print(final_wo_hisp) hispanic_set <-data.frame(Race_Group ="Hispanic",Race_PL_Total = hispanic,Total =9255820 ) final_poverty_dataset <-rbind(final_wo_hisp, hispanic_set) |>filter(Race_Group !=""& Race_Group !='Total') within_povertylevel_total =sum(final_poverty_dataset$Race_PL_Total) final_poverty_dataset <- final_poverty_dataset |>mutate(Percent_of_Race = Race_PL_Total/Total, Percent_of_PL_Group = Race_PL_Total/within_povertylevel_total)return(final_poverty_dataset)}
#below federal poverty levelpoverty_disaggregated_bfpl <-read.csv("raw_data/all_poverty_disaggregated_2023.csv") final_bfpl <-clean_poverty_data(poverty_disaggregated_bfpl) |>mutate(Poverty_Level ="Below Federal Poverty Line")
# A tibble: 8 × 3
Race_Group Race_PL_Total Total
<chr> <dbl> <dbl>
1 "" 1600513 19794033
2 "American Indian and Alaska Native" 5178 195368
3 "Asian and Pacific Islander" 135253 912191
4 "Black or African American alone" 539087 2633751
5 "Some other race alone" 9833 2340874
6 "Total" 1600513 19794033
7 "Two or More Races" 93662 5242552
8 "White alone" 817500 8469297
write.csv(file ="cleaned_data/hispanic_disaggregated_bfpl_2023.csv", final_bfpl, row.names =FALSE)#100 to 199poverty_100_to_199 <-read.csv("raw_data/100_to_199.csv")final_100_to_199 <-clean_poverty_data(poverty_100_to_199) |>mutate(Poverty_Level ="100% to 199% above Federal Poverty Line")
# A tibble: 8 × 3
Race_Group Race_PL_Total Total
<chr> <dbl> <dbl>
1 "" 2148766 19794033
2 "American Indian and Alaska Native" 6844 195368
3 "Asian and Pacific Islander" 167913 912191
4 "Black or African American alone" 623038 2633751
5 "Some other race alone" 20785 2340874
6 "Total" 2148766 19794033
7 "Two or More Races" 125923 5242552
8 "White alone" 1204263 8469297
#200 to 299poverty_200_to_299 <-read.csv("raw_data/200_to_299.csv")final_200_to_299 <-clean_poverty_data(poverty_200_to_299) |>mutate(Poverty_Level ="200% to 299% above Federal Poverty Line")
# A tibble: 8 × 3
Race_Group Race_PL_Total Total
<chr> <dbl> <dbl>
1 "" 2465194 19794033
2 "American Indian and Alaska Native" 5099 195368
3 "Asian and Pacific Islander" 203170 912191
4 "Black or African American alone" 571456 2633751
5 "Some other race alone" 20456 2340874
6 "Total" 2465194 19794033
7 "Two or More Races" 143777 5242552
8 "White alone" 1521236 8469297
#300 to 399poverty_300_to_399 <-read.csv("raw_data/300_to_399.csv")final_300_to_399 <-clean_poverty_data(poverty_300_to_399) |>mutate(Poverty_Level ="300% to 399% above Federal Poverty Line")
# A tibble: 8 × 3
Race_Group Race_PL_Total Total
<chr> <dbl> <dbl>
1 "" 2361133 19794033
2 "American Indian and Alaska Native" 5320 195368
3 "Asian and Pacific Islander" 214664 912191
4 "Black or African American alone" 468303 2633751
5 "Some other race alone" 21061 2340874
6 "Total" 2361133 19794033
7 "Two or More Races" 121393 5242552
8 "White alone" 1530392 8469297
#400 to 499poverty_400_to_499 <-read.csv("raw_data/400_to_499.csv")final_400_to_499 <-clean_poverty_data(poverty_400_to_499) |>mutate(Poverty_Level ="400% to 499% above Federal Poverty Line")
# A tibble: 8 × 3
Race_Group Race_PL_Total Total
<chr> <dbl> <dbl>
1 "" 1940373 19794033
2 "American Indian and Alaska Native" 4102 195368
3 "Asian and Pacific Islander" 161565 912191
4 "Black or African American alone" 354136 2633751
5 "Some other race alone" 13607 2340874
6 "Total" 1940373 19794033
7 "Two or More Races" 90327 5242552
8 "White alone" 1316636 8469297
#500+poverty_500 <-read.csv("raw_data/500+.csv")final_500 <-clean_poverty_data(poverty_500) |>mutate(Poverty_Level ="500% or more above Federal Poverty Line")
# A tibble: 8 × 3
Race_Group Race_PL_Total Total
<chr> <dbl> <dbl>
1 "" 22234 19794033
2 "American Indian and Alaska Native" 60 195368
3 "Asian and Pacific Islander" 3578 912191
4 "Black or African American alone" 2686 2633751
5 "Some other race alone" 0 2340874
6 "Total" 22234 19794033
7 "Two or More Races" 568 5242552
8 "White alone" 15342 8469297
The next few chunks of code clean difference socioeconomic indiactors and tries to break them down by region, county and/or aggregates for the state.
In this section, using sf to help in outputting a file that’s compatible with ArcGIS to make a regional map and dissolving counties to create visual regions on Texas map for children and poverty.
library(sf)
Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
child_poverty_county <-get_acs(survey ="acs5",state ="TX",year =2023,geography ="county",variables =c(totpop ="B05010_002"), #total under 19geometry=T,output="wide") |>mutate(NAME =gsub(" County, Texas", "", NAME)) |>rename(County = NAME)
Getting data from the 2019-2023 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Deleting layer `arcgic_region_child_pov' using driver `GPKG'
Writing layer `arcgic_region_child_pov' to data source
`cleaned_data/arcgic_region_child_pov.gpkg' using driver `GPKG'
Writing 12 features with 2 fields and geometry type Unknown (any).
Nativity and citizenship status and poverty level distributions.
Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on
'cleaned_data/nativity_dataset.csv'
Income DialUp Broadband NoInternet Totals InternetAcesss
1 Less than 10 808 400614 134172 535594 401422
2 10 to 20 802 484331 208995 694128 485133
3 20 to 35 2445 907991 219208 1129644 910436
4 35 to 50 1400 1015436 151349 1168185 1016836
5 50 to 75 2308 1612782 148058 1763148 1615090
6 75 or more 3232 5246201 207108 5456541 5249433
percent_internet percent_nointernet
1 0.7494894 0.25051065
2 0.6989100 0.30109000
3 0.8059495 0.19405052
4 0.8704409 0.12955910
5 0.9160263 0.08397367
6 0.9620441 0.03795591
Healthcare with poverty level distributions.
#Health Insurance Coverage Status by Ratio of Income to Poverty Level in the Past 12 Months by Agehealthcare_access_2023 <-get_acs(survey ="acs5", state ="TX", year =2023, geography ="state", variables =c(totpop ="C27016_001E", tot_below_pov_line ="C27016_002E", children_pov_tot ="C27016_003E", children_with_health ="C27016_004E", children_wo_health ="C27016_005E", nineteen_sixtyfour_tot ="C27016_006E", nineteen_sixtyfour_with_health ="C27016_007E", nineteen_sixtyfour_wo_health ="C27016_008E",sixty_five_plus_with_health_tot ="C27016_009E",sixty_five_plus_with_health ="C27016_010E", sixty_five_plus_wo_health ="C27016_011E" ), geometry=F,output="wide") |>select(totpop, tot_below_pov_line, children_pov_tot, children_with_health, children_wo_health, nineteen_sixtyfour_tot, nineteen_sixtyfour_with_health, nineteen_sixtyfour_wo_health, sixty_five_plus_with_health_tot, sixty_five_plus_with_health, sixty_five_plus_wo_health)
Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on
'cleaned_data/healthcare_poverty.csv'
#Median Earnings in the Past 12 Months (in 2023 Inflation-Adjusted Dollars) by Disability Status by Sex for the Civilian Noninstitutionalized Population 16 Years and Over With Earningsdisability_median_income_earnings_2023 <-get_acs(survey ="acs5", state ="TX", year =2023,geography ="state", variables =c(tot_pop ="B18140_001E", with_dis ="B18140_002E", without_dis ="B18140_005E" ), geometry =FALSE, output ="tidy") |>mutate(variable =case_when( variable =="B18140_001"~"tot_pop", variable =="B18140_002"~"with_dis", variable =="B18140_005"~"without_dis",TRUE~ variable ))
Location TimeFrame Data
1 United States 2013 0.16
2 United States 2018 0.13
3 United States 2015 0.15
4 United States 2014 0.16
5 United States 2016 0.14
6 United States 2017 0.13
Alternative approach that I would have liked to explore after learning more about tidycensus and how to data pull from it.
Data was sourced via tidycensus for ease of access (since doing it locally would require multiple files). Doing it through tidycensus enables the use of requests to the census API that makes process easier and also provides better formatting and up to date data. It honestly also makes it tidier since the tool let’s the columns be named in the data pull.