What is the Black Progress Index?

Published in September 2002, the Black Progress Index was developed in partnership by the NAACP and the Brookings Institution. The index is intended to be a tool that provides a means to understand the factors that negatively or positively impact the health and well being of Black people in the United States.

Overview of Data Structure

I chose to use this data set because it is neatly structured and accessible. The csv files can be downloaded from the interactive data-viz. In the time available, I only began to investigate the county level data. The metro level data is structured identically to the county level data summarized here. There is also a data dictionary. I found this data to be surprisingly difficult to work with and I would like to spend a lot more time working with it.

County Level BPI Variable Importance

vtable(CtyLevelBPIVariableImportance)
CtyLevelBPIVariableImportance
Name Class Values
ctyfip character
trait character
state character
county_name character
cbsa_name character
effect_on_life_expectancy numeric Num: -13.287 to 19.1
local_value_standardized numeric Num: -21.111 to 25.689
label character
#vtable(MetroLevelBPI)
#vtable(MetroLevelBPIVariableImportance)

County Level BPI

I spent a ridiculous amount of time trying to make this smaller and scrollable. Alas, I gave up.

vtable(CtyLevelBPI)
CtyLevelBPI
Name Class Values
county_name character
ctyfip character
state character
cbsa numeric Num: 10140 to 49780
cbsa_name character
black_progress_index numeric Num: 64.365 to 83.492
black_progress_index_centile numeric Num: 1 to 100
index_life_exp_unexplained numeric Num: -12.565 to 19.1
life_exp_bl numeric Num: 62.897 to 96.169
lnb_mort numeric Num: 5.772 to 11.51
low_birth_bl numeric Num: 0.049 to 0.3
foreign_adults_black numeric Num: 0 to 0.727
math_black numeric Num: 6.876 to 83.833
pop25_ba_higher_rate_black numeric Num: 0 to 72.4
black_homeowner_rate numeric Num: 0 to 1
business_owner_rate numeric Num: 0 to 0.045
ln_b_medhh_inc numeric Num: 0.693 to 11.851
density numeric Num: 19.8 to 88862.266
air_pollution numeric Num: 3.4 to 20.9
gun_deaths_pc numeric Num: 1.736 to 65.08
totrate numeric Num: 90.799 to 1924.609
child_no_father numeric Num: 0 to 1
distance_from_friend numeric Num: 34.282 to 5293.303
bl_bike_walk numeric Num: 0 to 0.483
pblack numeric Num: 0.001 to 0.878
pop_total_one_black numeric Num: 40 to 1202260
bpop_under2500 numeric Num: 0 to 1
bpop_2500_50k numeric Num: 0 to 1
bpop_50000 numeric Num: 0 to 1
index_life_exp_unexplained_cent numeric Num: 1 to 100
predict numeric Num: 19.081 to 25.694
index numeric Num: 1 to 100

Population data

I added this table at the last minute because I was hoping to investigate whether there is a correlation between percentage of the population that is black, or total black population, and BPI and/or life expectancy. I ran out of time and did not reach any meaningful conclusions, but I was happy to find this table available in The vast gulfs in Black life expectancy across the U.S. section of the Brookings report.

vtable(BlackPopulationLEData)
BlackPopulationLEData
Name Class Values
cbsa_name character
state character
county_name character
Black life expectancy numeric Num: 62.9 to 96.2
Black progress index numeric Num: 64.4 to 83.5
Percent of population that is Black numeric Num: 0 to 0.88
Black population numeric Num: 40 to 1202260
Black population is under 2500 numeric Num: 0 to 1
Black populaiton is between 2500 and 50000 numeric Num: 0 to 1
Black population is 50000 or over numeric Num: 0 to 1

Summary Tables

Here is a summary of only the data points I had time to explore. I also spent a lot of time trying to make this section look fancy. Again, I failed. This horrible presentation is my desperate last-minute hack job. BUT, true story: I really did use these summaries a lot during my analysis to guide the axis limits on charts and to check my work against the BPI reporting. Very helpful.

  • County Level Black Progress Index:
summary(CtyLevelBPI$black_progress_index)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   64.36   71.94   73.54   74.02   75.84   83.49     234
  • County Level Black Progress Index Centile:
summary(CtyLevelBPI$black_progress_index_centile)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00   25.50   50.00   50.47   75.00  100.00     234
  • State:
summary(CtyLevelBPI$state)
##    Length     Class      Mode 
##      1677 character character
  • County Name:
summary(CtyLevelBPI$county_name)
##    Length     Class      Mode 
##      1677 character character
  • County FIP:
summary(CtyLevelBPI$ctyfip)
##    Length     Class      Mode 
##      1677 character character
  • Percent of population that is black:
summary(BlackPopulationLEData$`Percent of population that is Black`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0300  0.0900  0.1594  0.2400  0.8800
  • Total Black population by county:
summary(BlackPopulationLEData$`Black population`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      40    1748    4790   24412   14363 1202260

Recreating the BPI Charts

I learn best by having a way of checking my work. So, first I set out to recreate the images from the Brookings report. This took up all my time and was a great learning exercise. Here I learned how to add and display PNG files in RMD. The interactive versions are in the Chart tabs below.

Recreating the Average Life Expectancy by BPI centile

BPI Version

BPI Line from website

My Version

For this chart I had to create a calculated field from “CtyLevelBPI$life_exp_bl”

gg.lebycentile <-   ggplot(data=CtyLevelBPIwleavg , aes(x=black_progress_index_centile, y=name))+
  geom_line(size=1.5,color="#21918c")+
  scale_x_continuous(limits=c(1,100))+
  scale_y_continuous(limits=c(65,85))+
  labs(title="Average Life Expectancy by Centile of Black Progress Index",
         x="BPI Centile",
         y="Years")

gg.lebycentile <- gg.lebycentile +
    theme_bw()

gg.lebycentile 

Mapping BPI by county

Here I spent a lot of time trying to figure out why I had whole states missing from my data map. I finally learned that the county fips are missing leading zeros in the csv file. After spending a small but way long enough amount of time trying to fix that in R, I gave up and reformatted the csv column and re-imported it instead. So, if you download the csv files from the website you may run into this problem as well.

BPI Version

BPI Map from website

My Version

I spent way too much time trying to adjust the map inside the object and change the legend, to no avail. I liked looking at the map better with the gradient than with the ranges. Also, it was easier to apply the gradient than to create ranges and display with them.

usmapBPIbycounty <- plot_ly()
#summary(CtyLevelBPI$black_progress_index)
usmapBPIbycounty <- usmapBPIbycounty %>% 
  add_trace(
    type="choropleth",
    geojson=counties,
    locations=CtyLevelBPIwleavg$ctyfip,
    z=CtyLevelBPIwleavg$black_progress_index,
    # colorscale="Viridis",
    zmin=64,
    zmax=84,
    marker=list(line=list(
      width=0),color= CtyLevelBPI$black_progress_index),
    hoverinfo = 'text',
                   text = ~paste('</br> State:', CtyLevelBPIwleavg$state,
                                  '</br> County:', CtyLevelBPIwleavg$county_name,
                                 '</br> BPI:', round(CtyLevelBPIwleavg$black_progress_index,2),
                                 '</br> BPI Centile:', CtyLevelBPIwleavg$black_progress_index_centile,
                                 '</br> Life Expectancy (actual):', round(CtyLevelBPIwleavg$life_exp_bl,2),
                                 '</br> Index Life Exp. Unexplained:', round(CtyLevelBPIwleavg$index_life_exp_unexplained,2)))
    

usmapBPIbycounty <- usmapBPIbycounty %>% colorbar(title = "Black Progress Index")

usmapBPIbycounty <- usmapBPIbycounty %>% layout(
  title = "Black Progress Index by County")

usmapBPIbycounty <- usmapBPIbycounty %>% layout(
  geo = geolist)

usmapBPIbycounty

Top and Bottom Counties by BPI Score

Table 1: Top 20

The 20 U.S. counties with highest Black Progress Index score, actual life expectancy, and life years unexplained by model

#options(knitr.kable.NA = '')
top20 <- slice_max(CtyLevelBPI,black_progress_index, n=20)
TableofTop20 <- kable(top20[1:20,c(1,3,5,6,9,8,26)],
                      digits=2,
      col.names = c("County",
                    "State",
                    "Metro Area",
                    "Black Progress Index",
                    "Black Life Expectancy at birth",
                    "Unexplained year of Black life",
                    "Black population, 2020"))
TableofTop20%>%
  kable_styling()%>%
   column_spec(1, width = "25cm" )%>%
   column_spec(3, width = "45cm" )
County State Metro Area Black Progress Index Black Life Expectancy at birth Unexplained year of Black life Black population, 2020
Putnam County New York New York-Newark-Jersey City, NY-NJ-PA 83.49 85.84 2.35 3349
Warrick County Indiana Evansville, IN-KY 83.18 NA NA 1147
Scott County Minnesota Minneapolis-St. Paul-Bloomington, MN-WI 82.86 89.68 6.82 6286
Cumberland County Maine Portland-South Portland, ME 82.84 79.73 -3.11 9192
Loudoun County Virginia Washington-Arlington-Alexandria, DC-VA-MD-WV 82.68 82.37 -0.30 31241
Collier County Florida Naples-Marco Island, FL 82.56 86.00 3.45 25613
Rockingham County New Hampshire Boston-Cambridge-Newton, MA-NH 82.21 82.21 0.00 2704
Fairfax County Virginia Washington-Arlington-Alexandria, DC-VA-MD-WV 82.16 83.08 0.92 112024
Snohomish County Washington Seattle-Tacoma-Bellevue, WA 82.16 83.14 0.98 26644
Delaware County Ohio Columbus, OH 82.01 79.32 -2.70 7153
Prince William County Virginia Washington-Arlington-Alexandria, DC-VA-MD-WV 81.84 79.29 -2.55 97848
Washington County Oregon Portland-Vancouver-Hillsboro, OR-WA 81.57 82.54 0.97 12931
Barnstable County Massachusetts Barnstable Town, MA 81.48 76.53 -4.95 6098
Wright County Minnesota Minneapolis-St. Paul-Bloomington, MN-WI 81.42 82.82 1.40 2366
Forsyth County Georgia Atlanta-Sandy Springs-Alpharetta, GA 81.40 81.36 -0.05 8574
Saratoga County New York Albany-Schenectady-Troy, NY 81.39 81.76 0.37 3906
Montgomery County Maryland Washington-Arlington-Alexandria, DC-VA-MD-WV 81.34 81.15 -0.20 193450
Nassau County New York New York-Newark-Jersey City, NY-NJ-PA 81.29 78.98 -2.32 157724
Dallas County Iowa Des Moines-West Des Moines, IA 81.16 85.84 4.67 1720
Plymouth County Massachusetts Boston-Cambridge-Newton, MA-NH 81.15 81.43 0.27 48800

Table 2: Bottom 20

The 20 U.S. counties with lowest Black Progress Index score, actual life expectancy, and life years unexplained by model

Rural areas seems to offer less progress, particularly in Mississippi.

bottom20 <- slice_min(CtyLevelBPI,black_progress_index, n=20)
Tableofbottom20 <- kable(bottom20[1:20,c(1,3,5,6,9,8,26)],
                         digits=2,
                      col.names = c("County",
                                    "State",
                                    "Metro Area",
                                    "Black Progress Index",
                                    "Black Life Expectancy at birth",
                                    "Unexplained year of Black life",
                                    "Black population, 2020"))
Tableofbottom20%>%
  kable_styling()%>%
   column_spec(1, width = "25cm" )%>%
   column_spec(3, width = "35cm" )
County State Metro Area Black Progress Index Black Life Expectancy at birth Unexplained year of Black life Black population, 2020
Polk County Oregon Salem, OR 64.36 NA NA 567
Phillips County Arkansas Helena-West Helena, AR 65.36 68.12 2.76 11524
Choctaw County Oklahoma NA 66.94 68.70 1.76 1584
Lowndes County Alabama Montgomery, AL 67.41 68.48 1.07 7440
Pemiscot County Missouri NA 67.46 64.82 -2.63 4416
Leflore County Mississippi Greenwood, MS 67.78 67.29 -0.49 21473
Holmes County Mississippi Jackson, MS 68.01 67.47 -0.54 14602
Mississippi County Arkansas Blytheville, AR 68.29 66.88 -1.41 14486
Petersburg city Virginia Richmond, VA 68.56 67.28 -1.29 23659
Montgomery County Mississippi NA 68.60 67.36 -1.25 4523
Monroe County Arkansas NA 68.66 68.00 -0.66 2907
Washington County Mississippi Greenville, MS 68.79 67.26 -1.53 32445
Dallas County Alabama Selma, AL 68.84 69.94 1.09 27290
Mississippi County Missouri NA 68.90 75.49 6.59 3380
Dillon County South Carolina NA 68.92 69.31 0.39 14556
Wilkinson County Mississippi NA 69.13 68.48 -0.65 6099
St. Louis city Missouri St. Louis, MO-IL 69.15 67.80 -1.36 139140
Adams County Mississippi Natchez, MS-LA 69.16 69.85 0.69 16462
Alcorn County Mississippi Corinth, MS 69.17 71.65 2.48 4535
Wilcox County Alabama NA 69.22 68.11 -1.11 7418

A look at Population

I hoped to test for correlations between population dynamics and BPI or life expectancy, but I both ran out of time and wasn’t quite sure what to do next since the researchers already assessed these correlations using Lasso.

Black Population

blPophist <- ggplot(BlackPopulationLEData,aes(x=`Percent of population that is Black`),na.rm=TRUE)+
  geom_histogram(binwidth=0.05,fill="#277f8e", color="#e9ecef")+
  labs(title="Proportion (Percent) of Population that is Black",
       x="Proportion")+
  scale_x_continuous(breaks =seq(0,1,.1))

blPophist <- blPophist +
  theme_bw()

blPophist

Foreign Black Population

BPopForeignHist <- ggplot(CtyLevelBPI,aes(x=foreign_adults_black),na.rm=TRUE)+
  geom_histogram(binwidth=0.05,fill="#277f8e", color="#e9ecef")+
  labs(title="Proportion (Percent) of Black Population that is foreign born",
       x="Proportion")+
    scale_x_continuous(breaks =seq(0,1,.1))

BPopForeignHist <- BPopForeignHist +
  theme_bw()

BPopForeignHist

All the plots I created in the process (Mostly)

Condensed into tabs for your viewing pleasure.

Key Outcome: Life Expectancy, by state

One of my key takeaways from this plot is that Virginia has outliers at both ends of high and low life expectancy. I began an investigation into Virginia’s data points in an effort to determine which factors had the greatest impact. I did not reach a point of presenting anything from this, and I am still working to understand how the team determined the calculated “variable Importance”.

lifeexpboxplotbystate <- ggplot(
  data = CtyLevelBPI,
  mapping = aes(x=life_exp_bl, y=state))+
  geom_boxplot()+
  theme_bw()
#lifeexpboxplotbystate


VAboxplot <- lifeexpboxplotbystate+
    geom_boxplot(                               # add the highlight points
  data=subset(CtyLevelBPI, state == "Virginia"), 
  aes(x=life_exp_bl, y=state), position=position_dodge(0.8),
  color="#4ac16d", size=1, show.legend = FALSE) +
      labs(title="Life Expectancy by State",
         x="Life Expectancy (years) by County",
         y="State")
VAboxplot

Scatter of Life Expectancy, by state (with linear model)

In trying to recreate the line chart of life expectancy by BPI centile, I was more interested in viewing this as a scatterplot with a linear regression. So here it is.

AvglifeexpbyBPIscatter <- ggplot(
  data = CtyLevelBPI,
  mapping = aes(x=black_progress_index_centile,y=life_exp_bl ))+
  geom_point(color="#365c8d")+
    labs(title="Average Life Expectancy by Centile of Black Progress Index",
         x="BPI Centile",
         y="Years")

gglm <- AvglifeexpbyBPIscatter + geom_smooth(formula = y ~ x,method = "lm", se = TRUE,level=0.9, data = CtyLevelBPI,
                       color="#4ac16d",size=2)
gglm <- gglm +
  theme_bw()

gglm

Adding labels to investigate outliers - Plotly

One important thing I learned here is that I hate plotly. It is so hard to change the basic formatting. Isn’t there a better/easier way to add the hover info?? I didn’t get there.

labeledscatter <- plot_ly(CtyLevelBPIwleavg, x= ~black_progress_index_centile,y= ~life_exp_bl, 
                   color = ~life_exp_bl, type = "scatter",
                   mode = 'markers',
                  # marker=list(color= c(~life_exp_bl)),
                   hoverinfo = 'text',
                   text = ~paste('</br> State:', CtyLevelBPIwleavg$state,
                                  '</br> County:', CtyLevelBPIwleavg$county_name,
                                 '</br> BPI:', round(CtyLevelBPIwleavg$black_progress_index,2),
                                 '</br> BPI Centile:', CtyLevelBPIwleavg$black_progress_index_centile,
                                 '</br> Life Expectancy (actual):', round(CtyLevelBPIwleavg$life_exp_bl,2),
                                 '</br> Index Life Exp. Unexplained:', round(CtyLevelBPIwleavg$index_life_exp_unexplained,2)))

labeledscatter