Published in September 2002, the Black Progress Index was developed in partnership by the NAACP and the Brookings Institution. The index is intended to be a tool that provides a means to understand the factors that negatively or positively impact the health and well being of Black people in the United States.
I chose to use this data set because it is neatly structured and accessible. The csv files can be downloaded from the interactive data-viz. In the time available, I only began to investigate the county level data. The metro level data is structured identically to the county level data summarized here. There is also a data dictionary. I found this data to be surprisingly difficult to work with and I would like to spend a lot more time working with it.
vtable(CtyLevelBPIVariableImportance)| Name | Class | Values |
|---|---|---|
| ctyfip | character | |
| trait | character | |
| state | character | |
| county_name | character | |
| cbsa_name | character | |
| effect_on_life_expectancy | numeric | Num: -13.287 to 19.1 |
| local_value_standardized | numeric | Num: -21.111 to 25.689 |
| label | character |
#vtable(MetroLevelBPI)
#vtable(MetroLevelBPIVariableImportance)I spent a ridiculous amount of time trying to make this smaller and scrollable. Alas, I gave up.
vtable(CtyLevelBPI)| Name | Class | Values |
|---|---|---|
| county_name | character | |
| ctyfip | character | |
| state | character | |
| cbsa | numeric | Num: 10140 to 49780 |
| cbsa_name | character | |
| black_progress_index | numeric | Num: 64.365 to 83.492 |
| black_progress_index_centile | numeric | Num: 1 to 100 |
| index_life_exp_unexplained | numeric | Num: -12.565 to 19.1 |
| life_exp_bl | numeric | Num: 62.897 to 96.169 |
| lnb_mort | numeric | Num: 5.772 to 11.51 |
| low_birth_bl | numeric | Num: 0.049 to 0.3 |
| foreign_adults_black | numeric | Num: 0 to 0.727 |
| math_black | numeric | Num: 6.876 to 83.833 |
| pop25_ba_higher_rate_black | numeric | Num: 0 to 72.4 |
| black_homeowner_rate | numeric | Num: 0 to 1 |
| business_owner_rate | numeric | Num: 0 to 0.045 |
| ln_b_medhh_inc | numeric | Num: 0.693 to 11.851 |
| density | numeric | Num: 19.8 to 88862.266 |
| air_pollution | numeric | Num: 3.4 to 20.9 |
| gun_deaths_pc | numeric | Num: 1.736 to 65.08 |
| totrate | numeric | Num: 90.799 to 1924.609 |
| child_no_father | numeric | Num: 0 to 1 |
| distance_from_friend | numeric | Num: 34.282 to 5293.303 |
| bl_bike_walk | numeric | Num: 0 to 0.483 |
| pblack | numeric | Num: 0.001 to 0.878 |
| pop_total_one_black | numeric | Num: 40 to 1202260 |
| bpop_under2500 | numeric | Num: 0 to 1 |
| bpop_2500_50k | numeric | Num: 0 to 1 |
| bpop_50000 | numeric | Num: 0 to 1 |
| index_life_exp_unexplained_cent | numeric | Num: 1 to 100 |
| predict | numeric | Num: 19.081 to 25.694 |
| index | numeric | Num: 1 to 100 |
I added this table at the last minute because I was hoping to investigate whether there is a correlation between percentage of the population that is black, or total black population, and BPI and/or life expectancy. I ran out of time and did not reach any meaningful conclusions, but I was happy to find this table available in The vast gulfs in Black life expectancy across the U.S. section of the Brookings report.
vtable(BlackPopulationLEData)| Name | Class | Values |
|---|---|---|
| cbsa_name | character | |
| state | character | |
| county_name | character | |
| Black life expectancy | numeric | Num: 62.9 to 96.2 |
| Black progress index | numeric | Num: 64.4 to 83.5 |
| Percent of population that is Black | numeric | Num: 0 to 0.88 |
| Black population | numeric | Num: 40 to 1202260 |
| Black population is under 2500 | numeric | Num: 0 to 1 |
| Black populaiton is between 2500 and 50000 | numeric | Num: 0 to 1 |
| Black population is 50000 or over | numeric | Num: 0 to 1 |
Here is a summary of only the data points I had time to explore. I also spent a lot of time trying to make this section look fancy. Again, I failed. This horrible presentation is my desperate last-minute hack job. BUT, true story: I really did use these summaries a lot during my analysis to guide the axis limits on charts and to check my work against the BPI reporting. Very helpful.
summary(CtyLevelBPI$black_progress_index)## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 64.36 71.94 73.54 74.02 75.84 83.49 234
summary(CtyLevelBPI$black_progress_index_centile)## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 25.50 50.00 50.47 75.00 100.00 234
summary(CtyLevelBPI$state)## Length Class Mode
## 1677 character character
summary(CtyLevelBPI$county_name)## Length Class Mode
## 1677 character character
summary(CtyLevelBPI$ctyfip)## Length Class Mode
## 1677 character character
summary(BlackPopulationLEData$`Percent of population that is Black`)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0300 0.0900 0.1594 0.2400 0.8800
summary(BlackPopulationLEData$`Black population`)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 40 1748 4790 24412 14363 1202260
I learn best by having a way of checking my work. So, first I set out to recreate the images from the Brookings report. This took up all my time and was a great learning exercise. Here I learned how to add and display PNG files in RMD. The interactive versions are in the Chart tabs below.
BPI Line from website
For this chart I had to create a calculated field from “CtyLevelBPI$life_exp_bl”
gg.lebycentile <- ggplot(data=CtyLevelBPIwleavg , aes(x=black_progress_index_centile, y=name))+
geom_line(size=1.5,color="#21918c")+
scale_x_continuous(limits=c(1,100))+
scale_y_continuous(limits=c(65,85))+
labs(title="Average Life Expectancy by Centile of Black Progress Index",
x="BPI Centile",
y="Years")
gg.lebycentile <- gg.lebycentile +
theme_bw()
gg.lebycentile Here I spent a lot of time trying to figure out why I had whole states missing from my data map. I finally learned that the county fips are missing leading zeros in the csv file. After spending a small but way long enough amount of time trying to fix that in R, I gave up and reformatted the csv column and re-imported it instead. So, if you download the csv files from the website you may run into this problem as well.
BPI Map from website
I spent way too much time trying to adjust the map inside the object and change the legend, to no avail. I liked looking at the map better with the gradient than with the ranges. Also, it was easier to apply the gradient than to create ranges and display with them.
usmapBPIbycounty <- plot_ly()
#summary(CtyLevelBPI$black_progress_index)
usmapBPIbycounty <- usmapBPIbycounty %>%
add_trace(
type="choropleth",
geojson=counties,
locations=CtyLevelBPIwleavg$ctyfip,
z=CtyLevelBPIwleavg$black_progress_index,
# colorscale="Viridis",
zmin=64,
zmax=84,
marker=list(line=list(
width=0),color= CtyLevelBPI$black_progress_index),
hoverinfo = 'text',
text = ~paste('</br> State:', CtyLevelBPIwleavg$state,
'</br> County:', CtyLevelBPIwleavg$county_name,
'</br> BPI:', round(CtyLevelBPIwleavg$black_progress_index,2),
'</br> BPI Centile:', CtyLevelBPIwleavg$black_progress_index_centile,
'</br> Life Expectancy (actual):', round(CtyLevelBPIwleavg$life_exp_bl,2),
'</br> Index Life Exp. Unexplained:', round(CtyLevelBPIwleavg$index_life_exp_unexplained,2)))
usmapBPIbycounty <- usmapBPIbycounty %>% colorbar(title = "Black Progress Index")
usmapBPIbycounty <- usmapBPIbycounty %>% layout(
title = "Black Progress Index by County")
usmapBPIbycounty <- usmapBPIbycounty %>% layout(
geo = geolist)
usmapBPIbycounty#options(knitr.kable.NA = '')
top20 <- slice_max(CtyLevelBPI,black_progress_index, n=20)
TableofTop20 <- kable(top20[1:20,c(1,3,5,6,9,8,26)],
digits=2,
col.names = c("County",
"State",
"Metro Area",
"Black Progress Index",
"Black Life Expectancy at birth",
"Unexplained year of Black life",
"Black population, 2020"))
TableofTop20%>%
kable_styling()%>%
column_spec(1, width = "25cm" )%>%
column_spec(3, width = "45cm" )| County | State | Metro Area | Black Progress Index | Black Life Expectancy at birth | Unexplained year of Black life | Black population, 2020 |
|---|---|---|---|---|---|---|
| Putnam County | New York | New York-Newark-Jersey City, NY-NJ-PA | 83.49 | 85.84 | 2.35 | 3349 |
| Warrick County | Indiana | Evansville, IN-KY | 83.18 | NA | NA | 1147 |
| Scott County | Minnesota | Minneapolis-St. Paul-Bloomington, MN-WI | 82.86 | 89.68 | 6.82 | 6286 |
| Cumberland County | Maine | Portland-South Portland, ME | 82.84 | 79.73 | -3.11 | 9192 |
| Loudoun County | Virginia | Washington-Arlington-Alexandria, DC-VA-MD-WV | 82.68 | 82.37 | -0.30 | 31241 |
| Collier County | Florida | Naples-Marco Island, FL | 82.56 | 86.00 | 3.45 | 25613 |
| Rockingham County | New Hampshire | Boston-Cambridge-Newton, MA-NH | 82.21 | 82.21 | 0.00 | 2704 |
| Fairfax County | Virginia | Washington-Arlington-Alexandria, DC-VA-MD-WV | 82.16 | 83.08 | 0.92 | 112024 |
| Snohomish County | Washington | Seattle-Tacoma-Bellevue, WA | 82.16 | 83.14 | 0.98 | 26644 |
| Delaware County | Ohio | Columbus, OH | 82.01 | 79.32 | -2.70 | 7153 |
| Prince William County | Virginia | Washington-Arlington-Alexandria, DC-VA-MD-WV | 81.84 | 79.29 | -2.55 | 97848 |
| Washington County | Oregon | Portland-Vancouver-Hillsboro, OR-WA | 81.57 | 82.54 | 0.97 | 12931 |
| Barnstable County | Massachusetts | Barnstable Town, MA | 81.48 | 76.53 | -4.95 | 6098 |
| Wright County | Minnesota | Minneapolis-St. Paul-Bloomington, MN-WI | 81.42 | 82.82 | 1.40 | 2366 |
| Forsyth County | Georgia | Atlanta-Sandy Springs-Alpharetta, GA | 81.40 | 81.36 | -0.05 | 8574 |
| Saratoga County | New York | Albany-Schenectady-Troy, NY | 81.39 | 81.76 | 0.37 | 3906 |
| Montgomery County | Maryland | Washington-Arlington-Alexandria, DC-VA-MD-WV | 81.34 | 81.15 | -0.20 | 193450 |
| Nassau County | New York | New York-Newark-Jersey City, NY-NJ-PA | 81.29 | 78.98 | -2.32 | 157724 |
| Dallas County | Iowa | Des Moines-West Des Moines, IA | 81.16 | 85.84 | 4.67 | 1720 |
| Plymouth County | Massachusetts | Boston-Cambridge-Newton, MA-NH | 81.15 | 81.43 | 0.27 | 48800 |
bottom20 <- slice_min(CtyLevelBPI,black_progress_index, n=20)
Tableofbottom20 <- kable(bottom20[1:20,c(1,3,5,6,9,8,26)],
digits=2,
col.names = c("County",
"State",
"Metro Area",
"Black Progress Index",
"Black Life Expectancy at birth",
"Unexplained year of Black life",
"Black population, 2020"))
Tableofbottom20%>%
kable_styling()%>%
column_spec(1, width = "25cm" )%>%
column_spec(3, width = "35cm" )| County | State | Metro Area | Black Progress Index | Black Life Expectancy at birth | Unexplained year of Black life | Black population, 2020 |
|---|---|---|---|---|---|---|
| Polk County | Oregon | Salem, OR | 64.36 | NA | NA | 567 |
| Phillips County | Arkansas | Helena-West Helena, AR | 65.36 | 68.12 | 2.76 | 11524 |
| Choctaw County | Oklahoma | NA | 66.94 | 68.70 | 1.76 | 1584 |
| Lowndes County | Alabama | Montgomery, AL | 67.41 | 68.48 | 1.07 | 7440 |
| Pemiscot County | Missouri | NA | 67.46 | 64.82 | -2.63 | 4416 |
| Leflore County | Mississippi | Greenwood, MS | 67.78 | 67.29 | -0.49 | 21473 |
| Holmes County | Mississippi | Jackson, MS | 68.01 | 67.47 | -0.54 | 14602 |
| Mississippi County | Arkansas | Blytheville, AR | 68.29 | 66.88 | -1.41 | 14486 |
| Petersburg city | Virginia | Richmond, VA | 68.56 | 67.28 | -1.29 | 23659 |
| Montgomery County | Mississippi | NA | 68.60 | 67.36 | -1.25 | 4523 |
| Monroe County | Arkansas | NA | 68.66 | 68.00 | -0.66 | 2907 |
| Washington County | Mississippi | Greenville, MS | 68.79 | 67.26 | -1.53 | 32445 |
| Dallas County | Alabama | Selma, AL | 68.84 | 69.94 | 1.09 | 27290 |
| Mississippi County | Missouri | NA | 68.90 | 75.49 | 6.59 | 3380 |
| Dillon County | South Carolina | NA | 68.92 | 69.31 | 0.39 | 14556 |
| Wilkinson County | Mississippi | NA | 69.13 | 68.48 | -0.65 | 6099 |
| St. Louis city | Missouri | St. Louis, MO-IL | 69.15 | 67.80 | -1.36 | 139140 |
| Adams County | Mississippi | Natchez, MS-LA | 69.16 | 69.85 | 0.69 | 16462 |
| Alcorn County | Mississippi | Corinth, MS | 69.17 | 71.65 | 2.48 | 4535 |
| Wilcox County | Alabama | NA | 69.22 | 68.11 | -1.11 | 7418 |
I hoped to test for correlations between population dynamics and BPI or life expectancy, but I both ran out of time and wasn’t quite sure what to do next since the researchers already assessed these correlations using Lasso.
blPophist <- ggplot(BlackPopulationLEData,aes(x=`Percent of population that is Black`),na.rm=TRUE)+
geom_histogram(binwidth=0.05,fill="#277f8e", color="#e9ecef")+
labs(title="Proportion (Percent) of Population that is Black",
x="Proportion")+
scale_x_continuous(breaks =seq(0,1,.1))
blPophist <- blPophist +
theme_bw()
blPophistBPopForeignHist <- ggplot(CtyLevelBPI,aes(x=foreign_adults_black),na.rm=TRUE)+
geom_histogram(binwidth=0.05,fill="#277f8e", color="#e9ecef")+
labs(title="Proportion (Percent) of Black Population that is foreign born",
x="Proportion")+
scale_x_continuous(breaks =seq(0,1,.1))
BPopForeignHist <- BPopForeignHist +
theme_bw()
BPopForeignHistCondensed into tabs for your viewing pleasure.
One of my key takeaways from this plot is that Virginia has outliers at both ends of high and low life expectancy. I began an investigation into Virginia’s data points in an effort to determine which factors had the greatest impact. I did not reach a point of presenting anything from this, and I am still working to understand how the team determined the calculated “variable Importance”.
lifeexpboxplotbystate <- ggplot(
data = CtyLevelBPI,
mapping = aes(x=life_exp_bl, y=state))+
geom_boxplot()+
theme_bw()
#lifeexpboxplotbystate
VAboxplot <- lifeexpboxplotbystate+
geom_boxplot( # add the highlight points
data=subset(CtyLevelBPI, state == "Virginia"),
aes(x=life_exp_bl, y=state), position=position_dodge(0.8),
color="#4ac16d", size=1, show.legend = FALSE) +
labs(title="Life Expectancy by State",
x="Life Expectancy (years) by County",
y="State")
VAboxplotIn trying to recreate the line chart of life expectancy by BPI centile, I was more interested in viewing this as a scatterplot with a linear regression. So here it is.
AvglifeexpbyBPIscatter <- ggplot(
data = CtyLevelBPI,
mapping = aes(x=black_progress_index_centile,y=life_exp_bl ))+
geom_point(color="#365c8d")+
labs(title="Average Life Expectancy by Centile of Black Progress Index",
x="BPI Centile",
y="Years")
gglm <- AvglifeexpbyBPIscatter + geom_smooth(formula = y ~ x,method = "lm", se = TRUE,level=0.9, data = CtyLevelBPI,
color="#4ac16d",size=2)
gglm <- gglm +
theme_bw()
gglmOne important thing I learned here is that I hate plotly. It is so hard to change the basic formatting. Isn’t there a better/easier way to add the hover info?? I didn’t get there.
labeledscatter <- plot_ly(CtyLevelBPIwleavg, x= ~black_progress_index_centile,y= ~life_exp_bl,
color = ~life_exp_bl, type = "scatter",
mode = 'markers',
# marker=list(color= c(~life_exp_bl)),
hoverinfo = 'text',
text = ~paste('</br> State:', CtyLevelBPIwleavg$state,
'</br> County:', CtyLevelBPIwleavg$county_name,
'</br> BPI:', round(CtyLevelBPIwleavg$black_progress_index,2),
'</br> BPI Centile:', CtyLevelBPIwleavg$black_progress_index_centile,
'</br> Life Expectancy (actual):', round(CtyLevelBPIwleavg$life_exp_bl,2),
'</br> Index Life Exp. Unexplained:', round(CtyLevelBPIwleavg$index_life_exp_unexplained,2)))
labeledscatter