The ‘Atlas Of Redistricting’ is a project published online by Nate Silver’s “FiveThirtyEight” website early 2018. The project describes and maps various congressional redistricting scenarios in the U.S. in order to explore how changes in district boundaries impact the racial and partisan makeup of congress.
A description and data used in the project can be found at the following websites.
Project description:
https://fivethirtyeight.com/features/we-drew-2568-congressional-districts-by-hand-heres-how/
Redistricting atlas:
https://projects.fivethirtyeight.com/redistricting-maps/
Redistricting atlas data:
https://github.com/fivethirtyeight/redistricting-atlas-data
The code below supports an initial review of the data sets (‘districts’, ‘county_assignments’, ‘states’).
# load R packages
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.5 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
library(readr)
library(ggplot2)
library(cowplot)
# Read in 358.com project data
districts <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/redistricting-atlas-data/master/districts.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## statefp = col_character(),
## state = col_character(),
## maptype = col_character(),
## district = col_character(),
## population = col_double(),
## population_18_over = col_double(),
## PVI = col_double(),
## dem_chance = col_double(),
## `Non-Hispanic White` = col_double(),
## `African-American` = col_double(),
## `Hispanic/Latino` = col_double(),
## Asian = col_double(),
## `Native American` = col_double(),
## `Pacific Islander` = col_double(),
## Other = col_character(),
## race_category = col_character(),
## minority_chance = col_double(),
## current_map = col_logical(),
## impossible = col_logical()
## )
county_files <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/redistricting-atlas-data/master/county_assignments.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## statefp = col_character(),
## state = col_character(),
## maptype = col_character(),
## countyfp = col_character(),
## county = col_character(),
## cd = col_character()
## )
states <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/redistricting-atlas-data/master/states.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## statefp = col_character(),
## state = col_character(),
## maptype = col_character(),
## districts = col_double(),
## county_splits = col_double(),
## efficiency_gap = col_double(),
## efficiency_gap_extra_seats = col_character(),
## district_perimeters = col_double(),
## state_perimeter = col_double(),
## interior_perimeter_measure = col_double(),
## compactness_rank = col_double()
## )
# review dataframe dimensions and components
districts%>%glimpse()
## Rows: 3,480
## Columns: 19
## $ statefp <chr> "02", "02", "02", "02", "02", "02", "02", "02"...
## $ state <chr> "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK"...
## $ maptype <chr> "Compact", "Competitive", "Dem", "GOP", "MajMi...
## $ district <chr> "00", "00", "00", "00", "00", "00", "00", "00"...
## $ population <dbl> 710231, 710231, 710231, 710231, 710231, 710231...
## $ population_18_over <dbl> 522853, 522853, 522853, 522853, 522853, 522853...
## $ PVI <dbl> -9.39, -9.39, -9.39, -9.39, -9.39, -9.39, -9.3...
## $ dem_chance <dbl> 5.40942673, 5.40942673, 5.40942673, 5.40942673...
## $ `Non-Hispanic White` <dbl> 68.27674, 68.27674, 68.27674, 68.27674, 68.276...
## $ `African-American` <dbl> 3.084806, 3.084806, 3.084806, 3.084806, 3.0848...
## $ `Hispanic/Latino` <dbl> 4.673780, 4.673780, 4.673780, 4.673780, 4.6737...
## $ Asian <dbl> 5.3328565, 5.3328565, 5.3328565, 5.3328565, 5....
## $ `Native American` <dbl> 13.2700778, 13.2700778, 13.2700778, 13.2700778...
## $ `Pacific Islander` <dbl> 0.85989752, 0.85989752, 0.85989752, 0.85989752...
## $ Other <chr> "4.501838948997137%", "4.501838948997137%", "4...
## $ race_category <chr> "Non-Hispanic White Majority", "Non-Hispanic W...
## $ minority_chance <dbl> 9.8894051, 9.8894051, 9.8894051, 9.8894051, 9....
## $ current_map <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ impossible <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
county_files%>%glimpse()
## Rows: 27,959
## Columns: 6
## $ statefp <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01"...
## $ state <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"...
## $ maptype <chr> "Compact", "Compact", "Compact", "Compact", "Compact", "Co...
## $ countyfp <chr> "01001", "01003", "01005", "01007", "01009", "01011", "010...
## $ county <chr> "Autauga County", "Baldwin County", "Barbour County", "Bib...
## $ cd <chr> "07", "01", "02", "07", "04", "02", "07", "03", "02", "03"...
states%>%glimpse()
## Rows: 400
## Columns: 11
## $ statefp <chr> "02", "02", "02", "02", "02", "02", "02"...
## $ state <chr> "AK", "AK", "AK", "AK", "AK", "AK", "AK"...
## $ maptype <chr> "Compact", "Competitive", "Dem", "GOP", ...
## $ districts <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7...
## $ county_splits <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 5, 12, 8, 8, 8, ...
## $ efficiency_gap <dbl> NA, NA, NA, NA, NA, NA, NA, NA, -0.05672...
## $ efficiency_gap_extra_seats <chr> NA, NA, NA, NA, NA, NA, NA, NA, "D+0", "...
## $ district_perimeters <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 55.39691...
## $ state_perimeter <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 18.98561...
## $ interior_perimeter_measure <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 18.20565...
## $ compactness_rank <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 1, 6, 3,...
Are political outcomes (republican vs democratic congressional seats) in the redistricting scenarios related to changing percentages of minority voters at the state level?
I have organized/cleaned the ‘districts’ dataset in order to assess the question above.
# create a view of the districts dataframe
districts%>%view()
# return total number of missing values
sprintf("The total number of NA and NAN is %d", sum(is.na(districts)))
## [1] "The total number of NA and NAN is 6714"
# identify/count missing values in district by column
map(districts, ~sum(is.na(.))) #-- > using purrr, note (.) refers to cols
## $statefp
## [1] 0
##
## $state
## [1] 0
##
## $maptype
## [1] 0
##
## $district
## [1] 0
##
## $population
## [1] 0
##
## $population_18_over
## [1] 0
##
## $PVI
## [1] 0
##
## $dem_chance
## [1] 0
##
## $`Non-Hispanic White`
## [1] 0
##
## $`African-American`
## [1] 0
##
## $`Hispanic/Latino`
## [1] 0
##
## $Asian
## [1] 0
##
## $`Native American`
## [1] 0
##
## $`Pacific Islander`
## [1] 0
##
## $Other
## [1] 0
##
## $race_category
## [1] 0
##
## $minority_chance
## [1] 0
##
## $current_map
## [1] 3321
##
## $impossible
## [1] 3393
# count the number of duplicate rows
sprintf("The number of duplicate rows is %d", sum(duplicated(districts)))
## [1] "The number of duplicate rows is 0"
# select subset of columns for new dataframe
d <- districts%>%select(-c(current_map, impossible))
#set column names to lower case
names(d)%<>%tolower
#update column names for districts dataframe
d%<>%dplyr::rename(state_fips_code=statefp, district_number=district,cook_partisan_index=pvi, non_hispanic_white=`non-hispanic white`, african_american =`african-american`, hispanic_latino =`hispanic/latino`, native_american =`native american`, pacific_islander =`pacific islander`)
# remove trailing '%' from values in Other column
d<-separate(data = d, col = other, into = c("other"), sep = "%")
## Warning: Expected 1 pieces. Additional pieces discarded in 3480 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# change 'other' col to dbl format
d%<>%mutate(other = as.numeric(other))
# pivot select cols to long form
d<-pivot_longer(d, cols=9:15, names_to = 'ethnicity', values_to ='percent_of_voters')
#reduce number of categorical variables in race_category column
d <- mutate(d, race_category = ifelse(race_category == "Non-Hispanic White Majority", "white_majority", "non_white_majority"))
# subset dataframe on maptypes, rows: current, democrat, republican, competitive ("|" --> 'or')
d <- filter(d, maptype == "current" | maptype =="Dem" | maptype =="GOP" | maptype == "Competitive")
# rename category values in maptype column
d<-d%>%mutate(maptype=recode(maptype, 'Competitive'='competitive', 'Dem'='democrat', 'GOP'='republican'))
# review updates to dataframe
head(d, 5)
## # A tibble: 5 x 12
## state_fips_code state maptype district_number population population_18_o~
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 02 AK compet~ 00 710231 522853
## 2 02 AK compet~ 00 710231 522853
## 3 02 AK compet~ 00 710231 522853
## 4 02 AK compet~ 00 710231 522853
## 5 02 AK compet~ 00 710231 522853
## # ... with 6 more variables: cook_partisan_index <dbl>, dem_chance <dbl>,
## # race_category <chr>, minority_chance <dbl>, ethnicity <chr>,
## # percent_of_voters <dbl>
Here we compute basic statistical measures for all numerical variables in the dataset.
In addition, we also compare changes in the percent of minorities voting in Texas (a potential swing state) under two scenarios: ‘current’ vs. ‘competitive’.
The latter data are drawn from 2010 census results while the former reflects estimates based on redistricting to enhance two-party competitiveness at the district level.
# return statistical measures for numerical variables
summary(d)
## state_fips_code state maptype district_number
## Length:12180 Length:12180 Length:12180 Length:12180
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## population population_18_over cook_partisan_index dem_chance
## Min. :525777 Min. :410765 Min. :-34.1800 Min. : 0.003
## 1st Qu.:698180 1st Qu.:519620 1st Qu.:-11.1025 1st Qu.: 3.280
## Median :705974 Median :542062 Median : -1.4500 Median : 37.832
## Mean :708375 Mean :538075 Mean : 0.4453 Mean : 47.488
## 3rd Qu.:720932 3rd Qu.:557644 3rd Qu.: 9.7950 3rd Qu.: 94.897
## Max. :989415 Max. :765852 Max. : 44.4800 Max. :100.000
## race_category minority_chance ethnicity percent_of_voters
## Length:12180 Min. : 0.2844 Length:12180 Min. : 0.00674
## Class :character 1st Qu.: 1.5536 Class :character 1st Qu.: 0.50337
## Mode :character Median : 4.5669 Mode :character Median : 2.12516
## Mean :19.3643 Mean :14.28587
## 3rd Qu.:19.5905 3rd Qu.:12.47718
## Max. :99.3074 Max. :96.97341
# a boxplot graph of percent of votes by ethnicity in 2010
(ethnic1 <- d%>%group_by(state)%>%filter(state =='TX')%>%filter(maptype == 'current')%>%group_by(ethnicity) %>%ggplot(aes(x=ethnicity,y=percent_of_voters))+geom_boxplot()+coord_flip()+ggtitle('Ethnic Breakdown of Voters in Texas,\n 2010')+theme(plot.title = element_text(hjust=0.5)))
# a boxplot graph of percent of votes by ethnicity in the competitive scenario
(ethnic2 <- d%>%group_by(state)%>%filter(state =='TX')%>%filter(maptype == 'competitive')%>%group_by(ethnicity)%>%ggplot(aes(x=ethnicity,y=percent_of_voters))+geom_boxplot()+coord_flip()+ggtitle('Ethnic Breakdown of Voters in Texas,\n Competitive Scenario')+theme(plot.title = element_text(hjust=0.5)))
# calculate the mean/median percent of voting by ethnic groups compared between current and competitive scenarios
(scenario <- d%>%group_by(maptype, ethnicity)%>%filter(state =='TX')%>%filter(maptype == 'current'| maptype=='competitive')%>%summarize(mean_pct = mean(percent_of_voters), median_pct=median(percent_of_voters)))
## `summarise()` has grouped output by 'maptype'. You can override using the `.groups` argument.
## # A tibble: 14 x 4
## # Groups: maptype [2]
## maptype ethnicity mean_pct median_pct
## <chr> <chr> <dbl> <dbl>
## 1 competitive african_american 11.4 10.2
## 2 competitive asian 3.91 2.44
## 3 competitive hispanic_latino 34.2 24.0
## 4 competitive native_american 0.336 0.334
## 5 competitive non_hispanic_white 49.1 49.9
## 6 competitive other 1.05 1.08
## 7 competitive pacific_islander 0.0702 0.0469
## 8 current african_american 11.4 9.15
## 9 current asian 3.91 2.44
## 10 current hispanic_latino 34.2 23.3
## 11 current native_american 0.336 0.332
## 12 current non_hispanic_white 49.1 57.7
## 13 current other 1.05 1.05
## 14 current pacific_islander 0.0703 0.0486
# compare scenarios using stacked barplot
ggplot(scenario, aes(fill=ethnicity, y=mean_pct, x=maptype))+geom_bar(position='stack', stat='identity')+ggtitle('Mean Percentage of Voters\n in Two Districting Scenarios\n in Texas')+theme(plot.title = element_text(hjust=0.5))+theme(axis.title.x = element_blank())+theme(axis.title.y = element_blank())
ggplot(scenario, aes(fill=ethnicity, y=median_pct, x=maptype))+geom_bar(position='stack', stat='identity')+ggtitle('Median Percentage of Voters\n in Two Districting Scenarios\n in Texas')+theme(plot.title = element_text(hjust=0.5))+theme(axis.title.x = element_blank())+theme(axis.title.y = element_blank())
## Findings and Recommendations
Initial review of the data indicates the following:
Non-hispanic whites comprise the largest voting block in Texas followed by Latinos and African Americans. Variance in voting between districts by ethnic groups, correlates with the percent of votes attributable to each group.
There are no state-level differences in the mean percent of votes by ethnic group compared between the ‘current’ vs. ‘competitive’ scenarios.
The median value for percent of votes by non_hispanic whites decreased from 57.7% to 49.8% (state level) compared between the ‘current’ vs. ‘competitive’ scenarios - indicating a decrease in the relative percentage of white voters in select districts as a result of redistricting. In contrast, the median values (state level) for other ethnic groups remained relatively unchanged.
Additional analyses should focus at the district level in order to explicate factors that shape election outcomes at the state level. These factors may include voter population densities, district geometries, etc.
The three sets of data provided by the authors lack a common variable to enable joins between these sets. Additional information should be acquired to link these sets for more extensive analyses.