The Data

The data set is of poverty levels in the US by selected characteristics. It seemed like and incredibly daunting data set but I thought I would challenge myself.

It is laid out such that there are actually 9 data frames, most of which have subsets within the data and all falling below a multilevel header.

Plan of Attack

  1. import data without unneeded top lines

  2. trim footer and category rows

  3. trim total columns and name columns

  4. clean cells (excess periods, white space, a-hats, footnote numbers)

  5. add columns to indicate footnote and remove asterisks from change columns

  6. remove duplicate age rows and move 25+ row

  7. assign row names

  8. plot data

  9. analysis and conclusions


Load data

The first five rows are are title and sourcing notes, or blank lines. The next 6 are the very messy headers which I realized as I moved forward I could discard once I had the names of them.

## [1] 61 13

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————- ——– ——- —- —– —- ——– ——- —- —– —- —— ——

Race3 and Hispanic Origin
White………………………………………… 245,985 27,113 547 11.0 0.2 247,272 26,436 714 10.7 0.3 -677 -0.3 White, not Hispanic……………………… 195,221 17,263 493 8.8 0.3 195,256 16,993 571 8.7 0.3 -270 -0.1
Black………………………………………… 41,962 9,234 388 22.0 0.9 42,474 8,993 373 21.2 0.9 -241 -0.8
Asian………………………………………… 18,879 1,908 175 10.1 0.9 19,475 1,953 190 10.0 1.0 45 -0.1
Hispanic (any race)……………………..…. 57,556 11,137 399 19.4 0.7 59,053 10,790 423 18.3 0.7 -348
-1.1

Sex
Male…………………………………………. 156,677 17,685 395 11.3 0.3 158,116 17,365 483 11.0 0.3 -321 -0.3
Female……………………………………… 163,234 22,931 460 14.0 0.3 164,433 22,333 525 13.6 0.3 -598 *-0.5

Age
Under age 18………..…………………… 73,586 13,253 370 18.0 0.5 73,356 12,808 425 17.5 0.6 -445 -0.6
Aged 18 to 64………..……………………. 197,051 22,795 473 11.6 0.2 198,113 22,209 564 11.2 0.3 -586 *-0.4 Aged 65 and older……..…………………. 49,274 4,568 198 9.3 0.4 51,080 4,681 190 9.2 0.4 114 -0.1

Nativity
Native born……………….……………………….. 276,089 33,999 670 12.3 0.2 277,158 33,095 850 11.9 0.3 -904 *-0.4 Foreign born………….……………………… 43,822 6,617 268 15.1 0.6 45,391 6,603 295 14.5 0.6 -14 -0.6
Naturalized citizen….…………………….. 20,409 2,045 143 10.0 0.7 21,851 2,213 146 10.1 0.6 168 0.1
Not a citizen……….……………………….. 23,413 4,572 222 19.5 0.9 23,540 4,390 238 18.6 0.9 -182 -0.9

Region
Northeast…………….……………………… 55,470 5,969 350 10.8 0.6 55,972 6,373 339 11.4 0.6 404 0.6
Midwest……………………………………… 66,897 7,809 355 11.7 0.5 67,345 7,647 397 11.4 0.6 -162 -0.3
South………………..……………………….. 121,166 17,028 524 14.1 0.4 122,250 16,609 587 13.6 0.5 -420 -0.5
West………………………………………….. 76,377 9,810 373 12.8 0.5 76,982 9,069 400 11.8 0.5 -740 -1.1

Residence4
Inside metropolitan statistical areas…….. 276,296 33,718 835 12.2 0.3 279,537 33,322 857 11.9 0.3 -396 -0.3
Inside principal cities…………………….. 103,252 16,495 643 16.0 0.5 103,860 16,218 634 15.6 0.5 -277 -0.4
Outside principal cities………………….. 173,044 17,223 577 10.0 0.3 175,677 17,105 577 9.7 0.3 -119 -0.2
Outside metropolitan statistical areas…… 43,614 6,898 600 15.8 0.8 43,012 6,376 523 14.8 0.7 -522 -1.0

Work Experience
Total, aged 18 to 64……..….…………… 197,051 22,795 473 11.6 0.2 198,113 22,209 564 11.2 0.3 -586 -0.4 All workers…………………………………… 150,904 8,743 254 5.8 0.2 152,199 8,135 259 5.3 0.2 -608 -0.4 Worked full-time, year-round………………. 107,781 2,416 131 2.2 0.1 109,700 2,422 128 2.2 0.1 6 Z
Less than full-time, year-round…………………….. 43,123 6,327 223 14.7 0.5 42,499 5,714 224 13.4 0.5
-613 *-1.2 Did not work at least 1 week………………. 46,148 14,052 381 30.5 0.7 45,914 14,073 440 30.7 0.7 21 0.2

Disability Status5
Total, aged 18 to 64……..….…………… 197,051 22,795 473 11.6 0.2 198,113 22,209 564 11.2 0.3 -586 -0.4 With a disability……………………………. 15,405 4,123 191 26.8 1.1 15,116 3,764 170 24.9 1.0 -360 *-1.9 With no disability…………………………… 180,783 18,629 409 10.3 0.2 182,042 18,412 504 10.1 0.3 -217 -0.2

Educational Attainment
Total, aged 25 and older……………. 216,921 22,636 425 10.4 0.2 219,830 22,163 516 10.1 0.2 -473 -0.4 No high school diploma……………………… 22,541 5,599 214 24.8 0.8 22,411 5,485 217 24.5 0.9 -113 -0.4
High school, no college……………………. 62,512 8,309 250 13.3 0.4 62,685 7,942 285 12.7 0.4
-367 -0.6 Some college, no degree……………………. 57,765 5,430 202 9.4 0.3 57,810 5,075 206 8.8 0.4 -356 -0.6 Bachelor’s degree or higher……………………… 74,103 3,299 167 4.5 0.2 76,924 3,661 181 4.8 0.2 363 0.3
An asterisk preceding an estimate indicates change is statistically different from zero at the 90 percent confidence level.
Z Represents or rounds to zero.
1A margin of error is a measure of an estimate’s variability.  The larger the margin of error in relation to the size of the estimate, the less reliable the estimate. This number, when added to and subtracted from the estimate, forms the 90 percent confidence interval.  Margins of error shown in this table are based on standard errors calculated using replicate weights.  For more information, see “Standard Errors and Their Use” at <www2.census.gov/library/publications/2018/demo/p60-263sa.pdf>.
2Details may not sum to totals because of rounding.
3Federal surveys give respondents the option of reporting more than one race. Therefore, two basic ways of defining a race group are possible. A group such as Asian may be defined as those who reported Asian and no other race (the race-alone or single-race concept) or as those who reported Asian regardless of whether they also reported another race (the race-alone-or-in-combination concept). This table shows data using the first approach (race alone). The use of the single-race population does not imply that it is the preferred method of presenting or analyzing data. The Census Bureau uses a variety of approaches. Information on people who reported more than one race, such as White and American Indian and Alaska Native or Asian and Black or African American, is available from the 2010 Census through American FactFinder. About 2.9 percent of people reported more than one race in the 2010 Census. Data for American Indians and Alaska Natives, Native Hawaiians and Other Pacific Islanders, and those reporting two or more races are not shown separately.
4The 2016 estimates presented for residence may not match the previously published estimates due to a correction in the assignment of principal city status for a small number of households. For the definition of metropolitan statistical areas and principal cities, see <www.census.gov/programs-surveys/metro-micro/about/glossary.html>.
5The sum of those with and without a disability does not equal the total because disability status is not defined for individuals in the armed forces.
Source: U.S. Census Bureau, Current Population Survey, 2017 and 2018 Annual Social and Economic Supplements.

Trim Rows

Strip out footers

Below the data were several rows of footnotes.

I’ll be creating the working data frame in this code.

V1 V2 V3 V4 V5
——————————– ——– ——- —- —–

Race3 and Hispanic Origin
White………………………………………… 245,985 27,113 547 11.0 White, not Hispanic……………………… 195,221 17,263 493 8.8
Black………………………………………… 41,962 9,234 388 22.0 Asian………………………………………… 18,879 1,908 175 10.1 Hispanic (any race)……………………..…. 57,556 11,137 399 19.4

Sex
Male…………………………………………. 156,677 17,685 395 11.3


Strip out category rows

Each section had a section header (such as “Sex” or “Age”). These can be removed as the names of the rows below are understandable without them, though documentation will be helpful for clarification.

V1 V2 V3
White………………………………………… 245,985 27,113
White, not Hispanic……………………… 195,221 17,263
Black………………………………………… 41,962 9,234
Asian………………………………………… 18,879 1,908
Hispanic (any race)……………………..…. 57,556 11,137
Male…………………………………………. 156,677 17,685
Female……………………………………… 163,234 22,931
Under age 18………..…………………… 73,586 13,253
Aged 18 to 64………..……………………. 197,051 22,795
Aged 65 and older……..…………………. 49,274 4,568

Trim Columns

Strip out “total” columns"

There are a couple of columns that are totals of sections that aren’t necessary.

## [1] 35 11


Name columns

Apply appropriate column names to columns

##  [1] "Characteristic"           "2016 Num"                
##  [3] "2016 Num Margin of error" "2016 Pct"                
##  [5] "2016 Pct Margin of Error" "2017 Num"                
##  [7] "2017 Num Margin of error" "2017 Pct"                
##  [9] "2017 Pct Margin of Error" "Num Change Y2Y"          
## [11] "Pct Change Y2Y"


Clean cells

Strip out excess periods

Throughout Characteristic column there are strings of periods used for spacing. They need to be removed.

Characteristic 2016 Num 2016 Num Margin of error
White 27,113 547
White, not Hispanic 17,263 493
Black 9,234 388
Asian 1,908 175
Hispanic 11,137 399
Male 17,685 395
Female 22,931 460
Under age 18 13,253 370
Aged 18 to 64 22,795 473
Aged 65 and older 4,568 198


Strip out white space

Remove spaces at the front of row names that were subsets

Characteristic 2016 Num
White 27,113
White, not Hispanic 17,263
Black 9,234
Asian 1,908
Hispanic 11,137
Male 17,685
Female 22,931
Under age 18 13,253
Aged 18 to 64 22,795
Aged 65 and older 4,568


Remove a-hats that are attached to several strings

Up until today there had been an a-hat ( â ) at the end of many lines. I had made code to remove them, below, but commented it out so that it would run now that those symbols were no longer present (and so I would have the code in place in case they magically reappeared).

I believe this to be an issue of having worked on the code first on a PC and then on a Mac

if running on a PC please remove comments in this chunk


Strip out Footnote indicators

A few of the row names have footnotes (which have been deleted). The numbers for the footnotes are still there, though.

I had to be careful to only delete numbers that were attached to words.

Characteristic 2016 Num
White 27,113
White, not Hispanic 17,263
Black 9,234
Asian 1,908
Hispanic 11,137
Male 17,685
Female 22,931
Under age 18 13,253
Aged 18 to 64 22,795
Aged 65 and older 4,568


Add footnote indicator

In the Y2Y change columns the data has asterisks in cells where change is statistically different from zero at the 90 percent confidence level.

I’m going to create a separate column for both change measures to indicate whether this note is applicable, allowing the original columns to be treated as numeric.

##  [1]  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
## [23]  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
## [34]  TRUE  TRUE


Remove asterisks from change columns

Now that I have columns indicating whether the changes were footnoted I can remove the asterisks, allowing the data in these columns to be treated as numeric.

Num Change Y2Y Pct Change Y2Y
-677 -0.3
-270 -0.1
-241 -0.8
45 -0.1
-348 -1.1
-321 -0.3
-598 -0.5
-445 -0.6
-586 -0.4
114 -0.1
-904 -0.4
-14 -0.6
168 0.1
-182 -0.9
404 0.6
-162 -0.3
-420 -0.5
-740 -1.1
-396 -0.3
-277 -0.4
-119 -0.2
-522 -1.0
-586 -0.4
-608 -0.4
6 NA
-613 -1.2
21 0.2
-586 -0.4
-360 -1.9
-217 -0.2
-473 -0.4
-113 -0.4
-367 -0.6
-356 -0.6
363 0.3


Create column identifying when there had been a footnote in the change columns

##    Y2Ynum_N Y2Ypct_N
## 1           low conf
## 2                   
## 3                   
## 4                   
## 5           low conf
## 6                   
## 7           low conf
## 8                   
## 9           low conf
## 10                  
## 11          low conf
## 12                  
## 13                  
## 14                  
## 15                  
## 16                  
## 17                  
## 18 low conf low conf
## 19                  
## 20                  
## 21                  
## 22 low conf low conf
## 23          low conf
## 24 low conf low conf
## 25                  
## 26 low conf low conf
## 27                  
## 28          low conf
## 29 low conf low conf
## 30                  
## 31          low conf
## 32                  
## 33 low conf low conf
## 34 low conf low conf
## 35 low conf low conf

Strip out Duplicate age rows

The A18-64 number is found in 3 rows. Removed all but the first.

##  [1] "White"                                 
##  [2] "White, not Hispanic"                   
##  [3] "Black"                                 
##  [4] "Asian"                                 
##  [5] "Hispanic "                             
##  [6] "Male"                                  
##  [7] "Female"                                
##  [8] "Under age 18"                          
##  [9] "Aged 18 to 64"                         
## [10] "Aged 65 and older"                     
## [11] "Native born"                           
## [12] "Foreign born"                          
## [13] "Naturalized citizen"                   
## [14] "Not a citizen"                         
## [15] "Northeast"                             
## [16] "Midwest"                               
## [17] "South"                                 
## [18] "West"                                  
## [19] "Inside metropolitan statistical areas" 
## [20] "Inside principal cities"               
## [21] "Outside principal cities"              
## [22] "Outside metropolitan statistical areas"
## [23] "Total, aged 18 to 64"                  
## [24] "All workers"                           
## [25] "Worked full"                           
## [26] "Less than full"                        
## [27] "Did not work at least 1 week"          
## [28] "Total, aged 18 to 64"                  
## [29] "With a disability"                     
## [30] "With no disability"                    
## [31] "Total, aged 25 and older"              
## [32] "No high school diploma"                
## [33] "High school, no college"               
## [34] "Some college, no degree"               
## [35] "Bachelor"


Rename and move A25+ row up to other ages

There is a row that reflects the number of people aged 25 or older that is with the education section. I moved it up with the other age rows.

##  [1] "White"                                 
##  [2] "White, not Hispanic"                   
##  [3] "Black"                                 
##  [4] "Asian"                                 
##  [5] "Hispanic "                             
##  [6] "Male"                                  
##  [7] "Female"                                
##  [8] "Under age 18"                          
##  [9] "Aged 18 to 64"                         
## [10] "Aged 25 and older"                     
## [11] "Aged 65 and older"                     
## [12] "Native born"                           
## [13] "Foreign born"                          
## [14] "Naturalized citizen"                   
## [15] "Not a citizen"                         
## [16] "Northeast"                             
## [17] "Midwest"                               
## [18] "South"                                 
## [19] "West"                                  
## [20] "Inside metropolitan statistical areas" 
## [21] "Inside principal cities"               
## [22] "Outside principal cities"              
## [23] "Outside metropolitan statistical areas"
## [24] "Total, aged 18 to 64"                  
## [25] "All workers"                           
## [26] "Worked full"                           
## [27] "Less than full"                        
## [28] "Did not work at least 1 week"          
## [29] "Total, aged 18 to 64"                  
## [30] "With no disability"                    
## [31] "Total, aged 25 and older"              
## [32] "No high school diploma"                
## [33] "High school, no college"


Make de jure row names into de facto row names

##  [1] "White"                                 
##  [2] "White, not Hispanic"                   
##  [3] "Black"                                 
##  [4] "Asian"                                 
##  [5] "Hispanic "                             
##  [6] "Male"                                  
##  [7] "Female"                                
##  [8] "Under age 18"                          
##  [9] "Aged 18 to 64"                         
## [10] "Aged 25 and older"                     
## [11] "Aged 65 and older"                     
## [12] "Native born"                           
## [13] "Foreign born"                          
## [14] "Naturalized citizen"                   
## [15] "Not a citizen"                         
## [16] "Northeast"                             
## [17] "Midwest"                               
## [18] "South"                                 
## [19] "West"                                  
## [20] "Inside metropolitan statistical areas" 
## [21] "Inside principal cities"               
## [22] "Outside principal cities"              
## [23] "Outside metropolitan statistical areas"
## [24] "Total, aged 18 to 64"                  
## [25] "All workers"                           
## [26] "Worked full"                           
## [27] "Less than full"                        
## [28] "Did not work at least 1 week"          
## [29] "Total, aged 18 to 64"                  
## [30] "With no disability"                    
## [31] "Total, aged 25 and older"              
## [32] "No high school diploma"                
## [33] "High school, no college"

Show and save final dataframe

Finished product

##                            Characteristic 2016 Num
## 1                                   White   27,113
## 2                     White, not Hispanic   17,263
## 3                                   Black    9,234
## 4                                   Asian    1,908
## 5                               Hispanic    11,137
## 6                                    Male   17,685
## 7                                  Female   22,931
## 8                            Under age 18   13,253
## 9                           Aged 18 to 64   22,795
## 10                      Aged 25 and older    4,123
## 11                      Aged 65 and older    4,568
## 12                            Native born   33,999
## 13                           Foreign born    6,617
## 14                    Naturalized citizen    2,045
## 15                          Not a citizen    4,572
## 16                              Northeast    5,969
## 17                                Midwest    7,809
## 18                                  South   17,028
## 19                                   West    9,810
## 20  Inside metropolitan statistical areas   33,718
## 21                Inside principal cities   16,495
## 22               Outside principal cities   17,223
## 23 Outside metropolitan statistical areas    6,898
## 24                   Total, aged 18 to 64   22,795
## 25                            All workers    8,743
## 26                            Worked full    2,416
## 27                         Less than full    6,327
## 28           Did not work at least 1 week   14,052
## 29                   Total, aged 18 to 64   22,795
## 30                     With no disability   18,629
## 31               Total, aged 25 and older   22,636
## 32                 No high school diploma    5,599
## 33                High school, no college    8,309
##    2016 Num Margin of error 2016 Pct 2016 Pct Margin of Error 2017 Num
## 1                       547     11.0                      0.2   26,436
## 2                       493      8.8                      0.3   16,993
## 3                       388     22.0                      0.9    8,993
## 4                       175     10.1                      0.9    1,953
## 5                       399     19.4                      0.7   10,790
## 6                       395     11.3                      0.3   17,365
## 7                       460     14.0                      0.3   22,333
## 8                       370     18.0                      0.5   12,808
## 9                       473     11.6                      0.2   22,209
## 10                      191     26.8                      1.1    3,764
## 11                      198      9.3                      0.4    4,681
## 12                      670     12.3                      0.2   33,095
## 13                      268     15.1                      0.6    6,603
## 14                      143     10.0                      0.7    2,213
## 15                      222     19.5                      0.9    4,390
## 16                      350     10.8                      0.6    6,373
## 17                      355     11.7                      0.5    7,647
## 18                      524     14.1                      0.4   16,609
## 19                      373     12.8                      0.5    9,069
## 20                      835     12.2                      0.3   33,322
## 21                      643     16.0                      0.5   16,218
## 22                      577     10.0                      0.3   17,105
## 23                      600     15.8                      0.8    6,376
## 24                      473     11.6                      0.2   22,209
## 25                      254      5.8                      0.2    8,135
## 26                      131      2.2                      0.1    2,422
## 27                      223     14.7                      0.5    5,714
## 28                      381     30.5                      0.7   14,073
## 29                      473     11.6                      0.2   22,209
## 30                      409     10.3                      0.2   18,412
## 31                      425     10.4                      0.2   22,163
## 32                      214     24.8                      0.8    5,485
## 33                      250     13.3                      0.4    7,942
##    2017 Num Margin of error 2017 Pct 2017 Pct Margin of Error
## 1                       714     10.7                      0.3
## 2                       571      8.7                      0.3
## 3                       373     21.2                      0.9
## 4                       190     10.0                      1.0
## 5                       423     18.3                      0.7
## 6                       483     11.0                      0.3
## 7                       525     13.6                      0.3
## 8                       425     17.5                      0.6
## 9                       564     11.2                      0.3
## 10                      170     24.9                      1.0
## 11                      190      9.2                      0.4
## 12                      850     11.9                      0.3
## 13                      295     14.5                      0.6
## 14                      146     10.1                      0.6
## 15                      238     18.6                      0.9
## 16                      339     11.4                      0.6
## 17                      397     11.4                      0.6
## 18                      587     13.6                      0.5
## 19                      400     11.8                      0.5
## 20                      857     11.9                      0.3
## 21                      634     15.6                      0.5
## 22                      577      9.7                      0.3
## 23                      523     14.8                      0.7
## 24                      564     11.2                      0.3
## 25                      259      5.3                      0.2
## 26                      128      2.2                      0.1
## 27                      224     13.4                      0.5
## 28                      440     30.7                      0.7
## 29                      564     11.2                      0.3
## 30                      504     10.1                      0.3
## 31                      516     10.1                      0.2
## 32                      217     24.5                      0.9
## 33                      285     12.7                      0.4
##    Num Change Y2Y Pct Change Y2Y Y2Ynum_N Y2Ypct_N
## 1            -677           -0.3          low conf
## 2            -270           -0.1                  
## 3            -241           -0.8                  
## 4              45           -0.1                  
## 5            -348           -1.1          low conf
## 6            -321           -0.3                  
## 7            -598           -0.5          low conf
## 8            -445           -0.6                  
## 9            -586           -0.4          low conf
## 10           -360           -1.9 low conf low conf
## 11            114           -0.1                  
## 12           -904           -0.4          low conf
## 13            -14           -0.6                  
## 14            168            0.1                  
## 15           -182           -0.9                  
## 16            404            0.6                  
## 17           -162           -0.3                  
## 18           -420           -0.5                  
## 19           -740           -1.1 low conf low conf
## 20           -396           -0.3                  
## 21           -277           -0.4                  
## 22           -119           -0.2                  
## 23           -522           -1.0 low conf low conf
## 24           -586           -0.4          low conf
## 25           -608           -0.4 low conf low conf
## 26              6             NA                  
## 27           -613           -1.2 low conf low conf
## 28             21            0.2                  
## 29           -586           -0.4          low conf
## 30           -217           -0.2                  
## 31           -473           -0.4          low conf
## 32           -113           -0.4                  
## 33           -367           -0.6 low conf low conf

Export to CSV

I have tried for about 3 hours now to get the chart below to reflect the descending order. P L E A S E cover this since nothing online seems to have an answer and this is the second time I’ve sunk this much effort into solving something just like this and it makes NO SENSE AT ALL.

Analysis and Conclusion

Poverty is on the decline, in general, though in the Northeast it is still on the rise (+0.6%). Also showing some growth in poverty levels are people who have bachelor degrees (+0.3%), have not worked for at least a week (+0.2%), and are naturalized citizens (+0.1%).

By far the greatest decline in poverty rate is among people with disabilities (-1.9%).

Other strong areas of poverty decline are among those working less than full time (-1.2%), Hispanic (-1.1%), living in the West (-1.1%), in rural areas (Outside MSAs -1.0%), and those who are not citizens (-0.9%).

If there is confluence between the factors that are showing decline in poverty, it paints a picture of a young-middle age non-citizen latino who does part-time work in the rural west, possibly a day worker. This unfortunately supports the “they are taking our jobs” and general xenophobic mentality that is increasingly pervasive in the country.