Choropleth maps allow the visualisation of of the geographical distribution of data. This paper outlines the creation of three choropleth maps showing the incidence of teen chlamydia in Washington State. Two strategies were explored for obtaining a map of Washington state and its counties to which to add data: 1. Use one that is included in the “maps” package 2. Download a shapefile (.shp) from an independent source
Ultimately, manipulation of the shapefile was successful in producing the desired choropleth maps. However, there are ongoing limitations that require further exploration to solve.
This strategy was utilised first becuause the “maps” package has a lot of functionality built in. The other packages that were required in this strategy were “maptools”, “ggplot2”, and “raster”. Map files are built into the “maps” package and were very easy to visualize:
Wash <- map("county", "washington")
The next step in this strategy was to convert the map into a data frame in order to correlate county level data with the geographical coordinates of the counties in the map. The “maptools” package has a function “readShapePoly” which reads a polygon shapefile into data frame object. However, the map created with the “maps” package is not a shapefile. “as.data.frame” was not successful in coersing the map into a data frame. The function “writePolyShape” writes a shapefile from a data frame but this was not successful in outputing a shapefile, neither was the “sink” function.
Finally, after these routes were found to be unsuccesful, the function “map_data” in the “ggplot2” package which produced the needed data frame.
state_test <- map_data("state", "Washington")
str(state_test)
## 'data.frame': 545 obs. of 6 variables:
## $ long : num -123 -123 -123 -123 -123 ...
## $ lat : num 48.6 48.6 48.6 48.6 48.6 ...
## $ group : num 1 1 1 1 1 1 1 1 1 1 ...
## $ order : int 1 2 3 4 5 6 7 8 9 10 ...
## $ region : chr "washington" "washington" "washington" "washington" ...
## $ subregion: chr "san juan island" "san juan island" "san juan island" "san juan island" ...
Merging together the map data frame with the chlamydia incidence data was simple due to the “merge” function in the “raster” package:
WashChlam <- merge(state_df,ChlamRate, by.x="subregion", by.y="NoCaps")
The actual plotting of the map also required the “ggplot2” package. This package includes a function “ggplot” that allows for the creation of a map with added layers on top using the “geom_polygon” function within the “ggplot” function. Several versions of this were tried and the most efficient version included two total layers.
ggplot(data=na.omit(WashChlam), mapping=aes(x=long, y=lat), group=NewGroup, ggtitle("Chlamydia Rate per 100,000 15-19 Year Olds")) +
geom_polygon (data = state_df, aes (fill=WashChlam$CutRate),colour = alpha("black", 1/2), size = 0.3, show.legend = TRUE)
The limitations of the “maps” package strategy were severe enough that a second strategy was implemented for developing the choropleth maps. Shapefiles are readily available for download accross the internet. This strategy also required the packages “maptools”, “ggplot2”, and “raster”, and also relied on “classInt” and “RColorBrewer”.
A shapefile of Washington state and its counties was downloaded and read into R using the “readShapePoly” function from “maptools”.
fshape <- readShapePoly("/Users/Michelle/Desktop/Berkeley MPH/Fall 2016/R for Epi's/county10/county10.shp")
plot(fshape)
“Raster”’s “merge” function was again used to merge the map data frame to the chlamydia data. Visualising the map required more code, but it was found that there were more resources for the “plot” function from the “raster” package than for the “ggplot” function.
brks<-classIntervals(ChlamRate$Chlam, n=7, style="pretty")$brks
colors1 <- rev(brewer.pal(7, "RdBu"))
plot(map.dat, col=colors1[findInterval(ChlamRate$Chlam, brks,all.inside=TRUE)], axes=F)
legend("bottomleft", legend=leglabs(brks), fill=colors1, bty="n", x.intersp = 1, y.intersp = 1,cex=.8)
title("Teen Chlamydia Rates per 100,000 by Washington County, 2014")
As can be seen, errant lines were present in the map. Trouble shooting on this problem revealed that it is a common problem with “ggplot”. Two recommended solutions were tried. A column “NewGroup” was created that is a unique number for each county and was used to define the “group” variable built into the “ggplot” function. Also, all
A second problem with the “maps” package strategy was that the legened displayed values in scientific format. The “scipen” option was added to various locations of the code but was unable to solve the problem.
The “plot” function cannot handle missing values. In the chlamydia dataset, four counties had missing values becuase there were too few cases to report. These
Additionally, there is less detail in this shapefile than was present in the “maps” map, paricularly in the island counties. This is evident in the pltting of the shapefile before the chlamydia data was added, indicating that this may be a feature of the particular file downloaded and not inherent to the process.
Ultimately, the Shapefile strategy produced a better choroplest map than the “maps” strategy, despite existing limitations. Future work to remedy these limitations is planned.
R coding help:
https://gist.github.com/hadley/233134
https://www.nceas.ucsb.edu/scicomp/usecases/ReadWriteESRIShapeFiles
http://bl.ocks.org/prabhasp/raw/5030005/
https://drive.google.com/file/d/0B8x5D4r45ofzd3BMYkhfeEl3am8/view
https://bookdown.org/medepi/phds/
R packages guides:
https://cran.r-project.org/web/packages/classInt/classInt.pdf
https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf
http://www.ofm.wa.gov/pop/geographic/tiger10/metadata/county10.html
Shapefile downloaded from:
http://geography.wa.gov/node/111