SC County Choropleth Notes

Some detailed notes on how to get a dataset and display a county level data on a choropleth map using choroplethr

The process

Start with downloading data from the US Census Site
Manipulate the data to get it into the format required by the visualization packages
Display the results

Examples

Example 1: State Population
Example 2: Majority Race

Download US Census Data - Population Estimates

US Census population estimates * are available by county - Downloadable Data Sets

These examples use South Carolina data

File layout - A description of the fields in the csv data file, include how to convert coded values (such as year) to real year values
South Carolina csv file

Field Descriptions - Year

Year is encoded as described below

Year	Corresponds to
1	4/1/20 Census
2	4/1/2010 estimate
3	7/1/2010 estimate
4	7/1/2011 estimate
5	7/1/2012 estimate
6	7/1/2013 estimate
7	7/1/2014 estimate
8	7/1/2015 estimate

Field Descriptions - COUNTY

The mapping software requires the FIPS county code, which can be computed from this dataset as STATE * 1000 + COUNTY

Install Required Packages

These examples use the following packages (uaw install.packages to install before running the examples)

dplyr
choroplethr
choroplethrMaps
ggplot2

Example 1: Display population

Explanation of data pipeline

dplyr is used to extract and map the data to get it into the format needed.

Filter data to include only year of interest

The file extract contains 8 years of data, so first filter out the data so that only the year we are interested in is included. In this case, year 6 which is the 7/1/2013 population estimate

  filter(YEAR == 6)

Example 1: Display population (continued)

Create the region value as the county FIPS which is computed by combining the state and county codes in the dataset
The field TOT_POP contains the population by year, county, and agegroup.
transmute includes in the data set only the values specified.
the mapping software requires the two fields to be called region and value, so we use these names when we create the columns with the values

transmute(region=as.numeric(COUNTY) + as.numeric(STATE)*1000, value = TOT_POP)

Example 1: Display population (continued)

The data contains population estimates by age group, which we don’t want so we need to group by the county (using the region field which contains the county FIPS code that the mapping software requires)

group_by(region) %>%
  summarise(value = sum(value)) %>%

Example 1: Display population (continued)

Recapping all the data processing steps

library(choroplethr)
library(ggplot2)
library(choroplethrMaps)
library(dplyr)

data <- 
  read.csv("https://www.census.gov/popest/data/counties/asrh/2015/files/CC-EST2015-ALLDATA-45.csv",skipNul=TRUE) %>%
  filter(YEAR == 6) %>%
  transmute(region=as.numeric(COUNTY) + as.numeric(STATE)*1000, value = TOT_POP) %>%
  group_by(region) %>%
  summarise(value = sum(value)) %>%
  select(region,value)

Example 1: Generate and display choropleth (Code)

Use the choroplethr package, which takes a dataset with a two columns, region and value.
Specify the zoom level, the titles, and type of color palette (seq means a sequentially generated color polette, for continuous values)

ch = CountyChoropleth$new(data)
ch$title = "State Population - 2014"
ch$set_zoom("south carolina")
ch$ggplot_scale = scale_fill_brewer(name="Population (2014)", type="seq", palette=2)
ch$render()

Example 1: Generate and display choropleth (Output)

Example 2: Display Majority Race

Another example, using the same dataset but displaying the majority race for each county
Read the data and filter by year, then combine male and female totals by race, group by the county, and find the majority race, below shows only what is different from 1st example

  transmute(region= as.numeric(COUNTY) + as.numeric(STATE)*1000, 
                  num.black = BA_MALE + BA_FEMALE,
                  num.white = WA_MALE + WA_FEMALE,
                  num.indian = IA_FEMALE + IA_MALE,
                  num.asian = AA_FEMALE + AA_MALE,
                  num.hawaiian = NA_FEMALE + NA_MALE) %>%
  group_by(region) %>%
  summarise(num.black = sum(num.black),
            num.white = sum(num.white),
            num.indian = sum(num.indian),
            num.asian = sum(num.asian),
            num.hawaiian = sum(num.hawaiian)) 
data$mrace <- apply(data,1,majority.race)
data <- select(data,region,value=mrace)

Example 2: Display Majority Race - Data processing - part 1

the complete data processing code for this example
Function to create majority race based on the totals for each race (probably not the best way to do this…)


majority.race <- function(a) {
  
  num.black =  a["num.black"]
  num.white =  a["num.white"]
  num.indian = a["num.indian"]
  num.asian =  a["num.asian"]
  num.hawaiian = a["num.hawaiian"]

  if (num.black > num.white && num.black > num.indian && num.black > num.asian && num.black > num.hawaiian) {
    code = "BLACK"
  } else if (num.white > num.black && num.white > num.indian && num.white > num.asian && num.white > num.hawaiian) {
    code = "WHITE"
  } else if (num.asian > num.black && num.asian > num.indian && num.asian > num.white && num.asian > num.hawaiian) {
    code = "ASIAN"
  } else if (num.indian > num.black && num.indian > num.white && num.indian > num.white && num.indian > num.hawaiian) {
    code = "INDIAN" 
  } else if (num.hawaiian > num.black && num.hawaiian > num.indian && num.hawaiian > num.white && num.hawaiian > num.white) {
    code = "HAWAIIAN" # return ("HAWAIIAN")
  } else {
    code = "UNKNOWN"
  }
  code
}

Example 2: Display Majority Race - Data processing - part 2

Prepare the data to be displayed

data <- 
  read.csv("https://www.census.gov/popest/data/counties/asrh/2015/files/CC-EST2015-ALLDATA-45.csv",skipNul=TRUE)

data <- 
  filter(data, YEAR == yearcode) %>%
  transmute(region= as.numeric(COUNTY) + as.numeric(STATE)*1000, 
                  num.black = BA_MALE + BA_FEMALE,
                  num.white = WA_MALE + WA_FEMALE,
                  num.indian = IA_FEMALE + IA_MALE,
                  num.asian = AA_FEMALE + AA_MALE,
                  num.hawaiian = NA_FEMALE + NA_MALE) %>%
  group_by(region) %>%
  summarise(num.black = sum(num.black),
            num.white = sum(num.white),
            num.indian = sum(num.indian),
            num.asian = sum(num.asian),
            num.hawaiian = sum(num.hawaiian)) 

data$mrace <- apply(data,1,majority.race)
data <- select(data,region,value=mrace)

Example 2: Display Majority Race (Continued)

Now we have the data two columns, region with the county FIPS code and value with the majority race for that county
Race, unlike population, is a discrete value (not continous), so we change the type of the map to “qual” so that distinctive colors are used for each race

ch = CountyChoropleth$new(data)
ch$title = paste("State Majority Race - ", 2014)
ch$set_zoom("south carolina")
ch$ggplot_scale = scale_fill_brewer(name=("Race (2014)", type="qual", palette=2)
ch$render()

County Level Choropleth Notes

SC County Choropleth Notes

The process

Examples

Download US Census Data - Population Estimates

These examples use South Carolina data

Field Descriptions - Year

Field Descriptions - COUNTY

Install Required Packages

Example 1: Display population

Explanation of data pipeline

Filter data to include only year of interest

Example 1: Display population (continued)

Example 1: Display population (continued)

Example 1: Display population (continued)

Example 1: Generate and display choropleth (Code)

Example 1: Generate and display choropleth (Output)

Example 2: Display Majority Race

Example 2: Display Majority Race - Data processing - part 1

Example 2: Display Majority Race - Data processing - part 2

Example 2: Display Majority Race (Continued)

Example 2: Display Majority Race - Generated Map