R is a cool tool for visualizations. It might not get you into the New York Times but it will get you a long way. When I first found out about Choropleth maps, the hardest part about them was pronoucing the word…

Anyway, this is just a brief post about how to create one and possibly adapt it to your needs and data. The data we’ll be using was originally published by data.gov and consists of health risk factors and different data relating to access to care from 2014. Our goal is to visualize this data by state across a map of the US. The first step is to get the data into your R environment. You can copy/paste and execute the below and it will get the data you need for this demonstration. If you don’t have RCurl, install it using install.packages(‘RCurl’). It’s basically a convenient way for R to fetch URLs.

require(RCurl)
x <-getURL("https://raw.githubusercontent.com/paolomarco/Data_455_Group_Project/master/RISKFACTORSANDACCESSTOCARE.csv")

my_data<-read.csv(text=x,header = TRUE,sep = ',',na.strings = - 1111.1)

str(my_data)

Str(my_data) will give you a quick snapshot of the data that you are dealing with. In our data set we have about 3140 rows consisiting of observations from 31 variables. We don’t need the whole dataset for this demonstration so we take a subset.

my_data<-subset(my_data,select = c('State_FIPS_Code','County_FIPS_Code','CHSI_County_Name',
                                   'CHSI_State_Name','CHSI_State_Abbr','No_Exercise',
                                   'Few_Fruit_Veg','Obesity','Prim_Care_Phys_Rate',
                                   'Smoker','Diabetes'))

Notice also from the output of str() that there are a bunch of NA values, aka missing, for several of the variables. This is not good but get used to it, because it will happen often. There are some interesting ways to deal with missing values such as decision trees or regression. For the sake of this demo, we will simply replace NA values with the mean for that variable. The logic here is is to basically loop through a range of indexes for the columns, using each value as a dimension and checking if any values are missing.

for (i in seq_along(names(my_data))){
        my_data[,i][is.na(my_data[,i])] <- mean(my_data[,i],na.rm = TRUE)
}

As mentioned in the intro, we want to view our data by state. The code below groups our data by state and compute the mean value for several variables that look interesting. If you don’t have dplyr, you can quickly install it with install.packages().

require(dplyr)
my_data <- group_by(my_data,CHSI_State_Name)

my_data<-summarize(my_data,
                   avg_obesity             = mean(Obesity),
                   avg_no_exercise         = mean(No_Exercise),
                   avg_Few_Fruit_Veg       = mean(Few_Fruit_Veg),
                   avg_Prim_Care_Phys_Rate = mean(Prim_Care_Phys_Rate),
                   avg_smoker              = mean(Smoker),
                   avg_diabetes            = mean(Diabetes))

Ok, now we finallly get to the graphing part. First we need to get our map using a package called maps. We want a US state map.

require(ggplot2)
require(maps)

states_map <-map_data("state")
head(states_map)

If you look at the top of this dataframe using head(), you can see it includes the region or state (notice it’s lowercase which matters in R) and a bunch of longitute and latitude points for each state. In a nutshell, what we need to do is to match both datasets on the same value. To do this, we create a lowercase, state variable in our my_data dataframe to match on.

data <-data.frame(state = tolower(my_data$CHSI_State_Name), my_data)

Now we are ready for the fun. Below is the code to plot your choropleth. Using ggplot, we’re basically calling a main plot with ggplot() and then adding on with additional statements. The fill= statement specifies the variable that will be used to determine the look of our plot. Notice all of the things we are doing in the second section with the theme() command.

require(ggthemes)
plot1<-ggplot(data, aes(map_id = state)) +
        geom_map(aes(fill = avg_obesity), map = states_map) +
        expand_limits(x = states_map$long, y = states_map$lat)

plot1<-plot1 + theme(legend.position ='bottom',
        legend.title=element_blank(),
        panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        axis.ticks = element_blank(),
        axis.title= element_blank(),
        axis.text = element_blank()) + 
        scale_fill_gradient(low="white",high = 'red') +
        guides(fill = guide_colorbar(barwidth = 10, barheight = .5)) +
        ggtitle('Rate of People with Obesity')

With the code above, you’ve just created a nice choropleth map visualizing average obesity rate across states in the US. Now, let’s get really crazy and create several choropleth maps and view them side by side. The key part about the code below is how we use the statement grid.arrange at the bottom to arrange plo1, plot2, plot3, and plot3. One other note: <- in R can be interpreted as =, so we are basically using it to assign a plot to a variable (plot1, plot2, etc). This is important because R will not show your plot above, unless you explicitly tell it to by typing plot1 and running it.

plot2<-ggplot(data, aes(map_id = state)) +
        geom_map(aes(fill = avg_Few_Fruit_Veg), map = states_map) +
        expand_limits(x = states_map$long, y = states_map$lat)

plot2<-plot2 + theme(legend.position ='bottom',
        legend.title=element_blank(),
        panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        axis.ticks = element_blank(),
        axis.title= element_blank(),
        axis.text = element_blank()) + 
        scale_fill_gradient(low="white",high = 'red') +
        guides(fill = guide_colorbar(barwidth = 10, barheight = .5)) +
        ggtitle('Rate of Few Fruits and Veggies in diet')

plot3<-ggplot(data, aes(map_id = state)) +
        geom_map(aes(fill = avg_smoker), map = states_map) +
        expand_limits(x = states_map$long, y = states_map$lat)

plot3<-plot3 + theme(legend.position ='bottom',
        legend.title=element_blank(),
        panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        axis.ticks = element_blank(),
        axis.title= element_blank(),
        axis.text = element_blank()) + 
        scale_fill_gradient(low="white",high = 'red') +
        guides(fill = guide_colorbar(barwidth = 10, barheight = .5)) +
        ggtitle('Rate of People Smoking')

plot4<-ggplot(data, aes(map_id = state)) +
        geom_map(aes(fill = avg_diabetes), map = states_map) +
        expand_limits(x = states_map$long, y = states_map$lat)

plot4<-plot4 + theme(legend.position ='bottom',
        legend.title=element_blank(),
        panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        axis.ticks = element_blank(),
        axis.title= element_blank(),
        axis.text = element_blank()) + 
        scale_fill_gradient(low="white",high = 'red') +
        guides(fill = guide_colorbar(barwidth = 10, barheight = .5)) +
        ggtitle('Rate of Diabetes')


library(gridExtra)
grid.arrange(plot1, plot2, plot3, plot4, ncol= 2)

Ok, that’s pretty wild. Well, there you go. A really short overview of choropleth maps and how to do some interesting things with them. This is just the tip of the iceberg as you can do tons of other stuff to pretify your plots. Sorry for boring you with the data preparation steps in the beginning but I decided to include those because you rarely encouter a dataset that is ready to go. I also want to make sure to give credit to posts which I used to learn and compile this information:

http://rforpublichealth.blogspot.com/2015/10/mapping-with-ggplot-create-nice.html https://trinkerrstuff.wordpress.com/2013/07/05/ggplot2-chloropleth-of-supreme-court-decisions-an-tutorial/