My thesis project is studying the effect of high population density exposure on the risk for MS. In a study from Kaiser Permanente Northern California, we have obtained the residential history for all subjects. Age 10 is a highly susceptible age because the child is rapidly developing and growing as he or she enters puberty.
Most mapping techniques only allow one dimension of information, which is the location on map. I wanted to create a map that can convey another dimension of information that is of interest without over-complicating the visual. The visual mapping functions in R allow me to present the data in a way that is simple but conveys more information than a simple plot.
I have used the maps and mapdata packages in R to first map my location of interest, in this case the continental United States.
library(maps)
mapusa <- map("state")
There are many other options in the maps library, and I do have some observations that are outside the continental United States. For this demonstration I will stick to this map.
There is another option in the maps package for a map that is proportional to the population in each state
mapusa.pop <- map("state.carto")
While this is an interesting map, it is not very useful because it is not very recognizable as the United States at first glance. The point of using visuals to display information is that one can portray the information in an easily understandable manner.
latlong10 <- read.csv("C:/Users/Allie/Documents/MPH Fall 2013/PH 251D/RProj/Age10_coord_popdens1.csv",
header = T)
library(maps)
mapusa3 <- map("state", xlim = c(-130, -60), ylim = c(20, 50))
points(x = latlong10$longitude, y = latlong10$latitude, pch = 19, col = "red")
library(maps)
mapusa3 <- map("state", xlim = c(-130, -60), ylim = c(20, 50))
points(x = latlong10$longitude, y = latlong10$latitude, pch = 20, cex = 0.5 *
log(latlong10$popdens_age10), col = "red")
library(maps)
mapusa2 <- map("state", xlim = c(-130, -60), ylim = c(20, 50))
rcPal <- colorRampPalette(c("cyan", "red"))
latlong10$col <- rcPal(10)[as.numeric(cut(log(latlong10$popdens_age10), breaks = 10))]
points(x = latlong10$longitude, y = latlong10$latitude, col = latlong10$col,
pch = 20)
library(maps)
mapcal <- map("county", "california", xlim = c(-125, -114), ylim = c(32, 42.1))
rcPal <- colorRampPalette(c("cyan", "red"))
latlong10$col <- rcPal(10)[as.numeric(cut(log(latlong10$popdens_age10), breaks = 10))]
points(x = latlong10$longitude, y = latlong10$latitude, col = latlong10$col,
pch = 19)
There are many options for making useful visuals in R. I have displayed multiple options in how to portray various levels of information in one visual. Here I have shown how I have used these options to make the best visual for the information that I would like to portray for my thesis research. Because the population density in my data is not normally distributed, I wanted a way to show the actual population density in my data on the map. I have done this by changing aspects of the size and color of map points based on this third variable. I used the log transform of the population density because the majority of the points were at the low range of the distribution.
I believe that the point size change using the “cex” parameter may be useful in some visuals, but when used in a map it blocks much of the information, namely the other residential locations that I would like the reader to see.
The color palette option I believe is the best easily displays both the residential location and population density information while not blocking the other information on the map. I chose a cyan to red gradient to take advantage of one's intuitive thought that cyan=cool and red=hot. This is in concordance with my hypothesis for the research question that high population density will increase risk for multiple sclerosis.
I'm sure this is not a new concept in R, but it is new to me and will be useful in easily displaying information for my thesis project. The ggplot package has many similar functions to this. By taking advantage of the detailed parameters already in place in the “points” function, a strength of this approach is that it is simple and easily customizable. There are an infinite number of options available in the basic R functions, so I wanted to learn about how to use these to my advantage without the need to download multiple packages for each task that I would like to perform. This approach does not require any packages other than the maps package. It could be used very similarly without the maps package for point plotting in the x-y axis.
The next step in my use of this approach is to create a custom map that includes all of the parts of the world in which I have residential locations. The maps package is limited to single pre-loaded maps, but I will look for a way to join maps. This would allow me to use all of the information I possess to create a easy-to-read visual portraying multiple dimensions of information.