PH251D Project 2 Writeup

In this project, I hope to show users how to create nice looking maps in R, specifically choropleth (thematic) maps. While R is not designed for GIS applications, with the prudent installation of a few packages, it can become easy to adapt R for impromptu GIS use. For this project, I will use the “sp” package’s “spplot” function to create maps. While it is also possible to create excellent maps with the “ggplot2” package, I have found that certain GIS applications are more suited to the control arguments available to users of spplot. While I used SpatialPolygonDataFiles that were already in “RData” format, the “rgeos” package is easily able to work with .shp files and transform them into line, point, or polygon formats, and even work with Google Earth .kmz files.

A choropleth map is simply a type of map in which regions are colored with respect to their value on some variable X. In common applications X is a categorical variable, which can divide areas by climate, poverty level, dominant language, or any number of possible classifiers. X can also be a continuous variable, in which case the end result will appear much more similar to a traditional heat map because of the increased number of discrete levels of the variable. I will demonstrate the various methods used to classify a continuous variable, and its different effects on the understandability and data presentation. After an overview of data classification and its effect on cartography, I will show users how to make a multivariate choropleth map, with one continuous and one categorical variable drawn on a single map. Map quality and legibility is important to convey the meaning of data; a bad map can conceal true associations, and likewise, a well made map with evil intent can create bogus associations. To create my examples, I will use an administrative map of Japan, obtained from http://www.gadm.org/, a free database of national maps with various levels of precision (ie; townships, provinces/states, national borders, etc). The maps avaiable on gadm can be obtained in ESRI .shp file format, RData file format (specific to the “sp” package), and in Google Earth file formats.

#First, I am going to load all of the packages that will come into play at some point in this demonstration.
library(maps)
library(maptools)

## Loading required package: sp
## Checking rgeos availability: TRUE

library(sp)
library(lattice)
library(latticeExtra)

## Warning: package 'latticeExtra' was built under R version 3.1.2

## Loading required package: RColorBrewer

library(colorspace)

## Warning: package 'colorspace' was built under R version 3.1.2

library(rgdal)

## Warning: package 'rgdal' was built under R version 3.1.2

## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.0, released 2014/04/16
## Path to GDAL shared files: C:/Program Files/R/R-3.1.1/library/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: C:/Program Files/R/R-3.1.1/library/rgdal/proj

library(RColorBrewer)
library(rgeos)

## rgeos version: 0.3-8, (SVN revision 460)
##  GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921 
##  Polygon checking: TRUE

library(classInt)

## Warning: package 'classInt' was built under R version 3.1.2

library(colorspace)
library(latticeExtra)
library(grid)
library(plyr)

## Warning: package 'plyr' was built under R version 3.1.2

library(raster)

## Warning: package 'raster' was built under R version 3.1.2

## 
## Attaching package: 'raster'
## 
## The following object is masked from 'package:colorspace':
## 
##     RGB

jap.dat.town<-url("http://biogeo.ucdavis.edu/data/gadm2/R/JPN_adm2.RData")
print(load(jap.dat.town))

## [1] "gadm"

close(jap.dat.town)
jap.dat.town<-gadm
#lets start exploring the data a bit. First I want to know how many prefectures and towns are in Japan.
#so it looks like there are 1674 unique towns and 47 unique prefectures.
head(as.data.frame(table(jap.dat.town$NAME_1)))

##     Var1 Freq
## 1  Aichi   64
## 2  Akita   25
## 3 Aomori   40
## 4  Chiba   56
## 5  Ehime   20
## 6  Fukui   17

head(as.data.frame(table(jap.dat.town$NAME_2)))

##       Var1 Freq
## 1 Abashiri    1
## 2    Abiko    1
## 3    Abira    1
## 4      Abu    1
## 5     Achi    1
## 6   Adachi    1

jap.dat.town$NAME_1<-as.factor(jap.dat.town$NAME_1)
col.b<-colorRampPalette(brewer.pal(9,"Blues"))(47)
jap.dat.town

## class       : SpatialPolygonsDataFrame 
## features    : 1811 
## extent      : 122.9, 154, 24.05, 45.52  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0 
## variables   : 12
## names       :   PID, ID_0, ISO, NAME_0, ID_1,    NAME_1, ID_2,   NAME_2,                NL_NAME_2, VARNAME_2,     TYPE_2,  ENGTYPE_2 
## min values  : 14793,  114, JPN,  Japan,    1,     Aichi,    1, Abashiri,                         ,          ,           ,            
## max values  : 20547,  114, JPN,  Japan,   47, Yamanashi, 1811,    Zushi, <U+9F8D><U+90F7><U+753A>,    Tanabe, Water body, Water body

spplot(jap.dat.town,"NAME_1", col.regions=col.b, main="Japan by Prefecture", colorkey = TRUE, lwd=1, col="black",par.settings=list(panel.background=list(col="azure2")))

plot of chunk unnamed-chunk-1

That was cool! So the file divides Japan into boundaries of the 1674 townships. The map classifies each of the 1674 townships into one of the 47 prefectures of Japan (just a categorical variables with 47 levels). Looking a bit deeper into the object we created, “jap.dat.town”, it is of a class SpatialPolygonsDataFrame. This is a rather special type of data frame; it stores variables that we can interact with as if they were in a normal data frame, but also has polygon shapes attached. The “extent” output lists the borders of the map by latitude and longitude. In most cases we can think of this object as a normal data frame when we need to interact with the stored variables, and the “View” function will show us all the stored variables that are open for modification. It is also possible to interact with the polygons within the object (useful to merge maps; an example would be to add islands to blank map space, or something similar), but an overview of rearranging polygons is a challenge beyond the scope of my demonstration.

jap.dat<-url("http://biogeo.ucdavis.edu/data/gadm2/R/JPN_adm1.RData")
print(load(jap.dat))

## [1] "gadm"

close(jap.dat)
jap.dat<-gadm
jap.dat$NAME_1<-as.factor(jap.dat$NAME_1)
col.b<-colorRampPalette(brewer.pal(9,"Purples"))(47)
spplot(jap.dat,"NAME_1", col.regions=col.b, main="Japan by Prefecture", colorkey = TRUE, lwd=1, col="black",par.settings=list(panel.background=list(col="azure2")))

plot of chunk unnamed-chunk-2

Because it is very processor heavy to drawn 1674 unique polygons, for this demonstration, I’ll make use of this map using a more “zoomed out” administrative level. Again however, we classify each of the 47 prefectures into its own level of a categorical variable (the prefecture names).

Now that we have a basic overview of what is possible by using spplot, the natural extension is to look into applications. First I will demonstrate how it is possible to use R to create a univariate choropleth map, and also the effects of different interval classification techniques on the visual appearance of the mapped data. I will create a random continuous variable with a vector of 47 values, one for each prefecture, and then demonstrate various mapping techniques.

test<-c(rnorm(5,25,25),rnorm(5,60,15),rnorm(10,70,15),rnorm(10,80,15),rnorm(17,90,15))
jap.dat$test<-test
summary(test) #looks like a reasonably variable set of data, ie; it wont be boring! We are going to look at 4 methods to classify the data into groups

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.15   61.50   77.00   71.60   85.30  123.00

brks.qt=classIntervals(jap.dat$test,n=8,style="quantile")
brks.jk=classIntervals(jap.dat$test,n=8,style="jenks")
brks.eq=classIntervals(jap.dat$test,n=8,style="equal")
brks.sd=classIntervals(jap.dat$test,n=8,style="sd")
col.g<-colorRampPalette(brewer.pal(9,"Greens"))(47) #This is a single hue (sequential color scheme)
##SINGLE HUE COLORSCHEME
p1<-spplot(jap.dat,"test",at=brks.eq$brks,col.regions=col.g,col="black",lwd=1,main=list(label="Equal Breaks"),par.settings=list(panel.background=list(col="azure2")))
p2<-spplot(jap.dat,"test",at=brks.jk$brks,col.regions=col.g,col="black",lwd=1,main=list(label="Jenks"),par.settings=list(panel.background=list(col="azure2")))
p3<-spplot(jap.dat,"test",at=brks.qt$brks,col.regions=col.g,col="black",lwd=1,main=list(label="Quantile"),par.settings=list(panel.background=list(col="azure2")))
p4<-spplot(jap.dat,"test",at=brks.sd$brks,col.regions=col.g,col="black",lwd=1,main=list(label="Standard Deviations"),par.settings=list(panel.background=list(col="azure2")))
print(p1,position=c(0,.5,.5,1),more=T) 
print(p2,position=c(.5,.5,1,1),more = T) 
print(p3,position=c(0,0,.5,0.5),more=T) 
print(p4,position=c(.5,0,1,0.5))

plot of chunk unnamed-chunk-3

par(mfrow=c(2,2))
plot(brks.eq,pal=col.g,main="Equal Breaks")
plot(brks.jk,pal=col.g,main="Jenks")
plot(brks.qt,pal=col.g,main="Quantile")
plot(brks.sd,pal=col.g,main="Standard Deviations")

plot of chunk unnamed-chunk-3

par(mfrow=c(1,1))

What I have done above is create a random set of 47 data points were sampled from a series of normal distributions with different means and standard deviations. The data is intentionaly highly left skewed purely for didactic purposes that I hope will become clear shortly.

So first, after creating the data, we attach it as a new column called “test” to our spatial polygons data object. Then, I used the “classIntervals” function from the package ClassInt to break the vector of 47 values into different univariate class intervals to show how simple manipulations of data can cause dramatic effects when applied graphically. As stated in the CRAN description classIntervals is “a uniform interface to finding class intervals for continuous numerical variables, for example for choosing colours or symbols for plotting”. For consistency, with all class interval methods, I set the desired number of intervals equal to 8.

The first class splitting technique I demonstrate is the equal breaks method; as the name implies, equal breaks simply breaks the data range into equally sized intervals. For example, if one had a data set with a range of 0 to 100 and one desired to use equal breaks to split the data into 4 intervals, it would simply create breaks at 25, 50, and 75, regardless of the actual distribution of the data (ie; one interval might only contain one or two data points). The second technique I demonstrate is the Jenks method. The Jenks natural breaks optimization method seeks to minimize the deviance of data within a class from that class’s mean, but to maximize the difference between means of different classes. It is essentially the univariate form of k-means clustering (uses Euclidian distance to maximize intergroup variance and minimize intragroup variance). Next I used a quantile class splitting technique. A quantile technique is fairly simple: if the data is to be split into n intervals, the first interval shall contain 1/n of the data, and so on until all the data is sorted into n quantiles. Perhaps the most familiar expression of quantiles are quartiles, which is just the special case when n=4. Finally I also show the use of standard deviations to split the data, which is simply a measure of the overall variance present in a dataset (the square root of the average of the data’s squared differences from the mean).

Upon visual inspection of the 4 maps one can appreciate differences in the presentation of the relative value of the variable due to the 4 classification methods being used. In addition a quick glance at the legend swatches to the right of each map reveals the different cut points (note that only the equal breaks and standard deviations legends splits the classes into equally sized portions, as expected). If we next look at the cumulative distribution plots of the data under each classification scheme, the differences become even more distinct. Simply applying equal breaks without regard for variance in data can lead to intervals with only 1 data point (the 2nd and 3rd intervals), and some intervals with huge numbers of datapoints (the 4th and 6th intervals). The Jenks method, which minimizes intraclass variance and maximizes interclass variance seems to preform somewhat better in finding natural groups of data (as it should, seeing as it is basically just a data analysis tool for grouping). Jenks is most useful for when we can assume that the data naturally falls into some underlying groups that can be predicted based on variance; in this data, we can see that this classification method largely avoids the problem of too crowded and too sparse intervals rather well. Next we can look at the Quantile measure. While the quantile measure is nice because it forces each interval to have the same number of data points, it can cause confusion by creating classes that are very wide when data is sparse (the 1st interval is huge due to sparse data at the low end of the spectrum), and artifically narrow when data is very dense (the 2nd and 3rd intervals are tiny). This can potentially obscure important differences in data; the 1st interval spans a huge range of values, and it’s unlikely that a datapoint at the start of that range is very similar to one at the end. Likewise, the 2nd interval only takes a very small range of values, but includes a huge number of datapoints. It’s unlikeley that a datapoint in interval 2 is very different from one in the end of the 1st or the start of the 3rd, which could be problematic by creating artifical distinctions when there are none. Standard deviations seem to do a relatively decent job of splitting the data, but we still end up with 3 intervals with only a single datapoint each. For now, it seems that Jenks does the best job of making logical intervals in a dataset (at least in the univariate continuous case).

In this first example, I used a single hue color scheme with varying levels. This is known as a sequential color scheme and is commonly used to map the range of a single continuous variable (ie; rainfall, income, etc). More scientifically, according to color theory, it is a single hue with varying levels of chroma, that is, the degree of difference between a specific level of that color and that color’s null value. The takeaway point is that they are all shades of green. From the RColorBrewer package, “Sequential palettes are suited to ordered data that progress from low to high”. Next we’ll use a multihue color scheme (a divergent colorsheme) to show how, even without any data modifciation or difference in classification, different colorschemes can drastically change a map’s appearance and visual impact.

##DIVERGENT COLORSCHEME
col.d<-c("#2d004b","#542788","#8073ac","#b2abd2","#d8daeb","#f7f7f7","#fee0b6","#fdb863","#e08214","#b35806","#7f3b08") #this is a divergent colorscheme
p1<-spplot(jap.dat,"test",at=brks.eq$brks,col.regions=col.d,col="black",lwd=1,main=list(label="Equal Breaks"))
p2<-spplot(jap.dat,"test",at=brks.jk$brks,col.regions=col.d,col="black",lwd=1,main=list(label="Jenks"))
p3<-spplot(jap.dat,"test",at=brks.qt$brks,col.regions=col.d,col="black",lwd=1,main=list(label="Quantile"))
p4<-spplot(jap.dat,"test",at=brks.sd$brks,col.regions=col.d,col="black",lwd=1,main=list(label="Standard Deviations"))
print(p1,position=c(0,.5,.5,1),more=T) 
print(p2,position=c(.5,.5,1,1),more = T) 
print(p3,position=c(0,0,.5,0.5),more=T) 
print(p4,position=c(.5,0,1,0.5))

plot of chunk unnamed-chunk-4

par(mfrow=c(2,2))
plot(brks.eq,pal=col.d,main="Equal Breaks")
plot(brks.jk,pal=col.d,main="Jenks")
plot(brks.qt,pal=col.d,main="Quantile")
plot(brks.sd,pal=col.d,main="Standard Deviations")

plot of chunk unnamed-chunk-4

par(mfrow=c(1,1))

In lieu of our sequential single hue colorscheme with varying levels of chroma, we use a divergent colorscheme wherein the maximum chroma levels of the 2 hues are at the extreme values of the dataset, and lighten until they almost reach the null value at the center. From the RColorBrewer package, “The critical class or break in the middle of the legend is emphasized with light colors and low and high extremes are emphasized with dark colors that have contrasting hues.” The goal of a divergent colorscheme is generally to accentuate areas that deviate significantly from the mean (the two extremes). In the maps I created, we can see that when data is divided by equal breaks and standard deviations, many more regions are colored closer to the null (mean) value. However, when quantiles, and especially when Jenks are used to classify the data, significantly more regions are coded nearer to the extremes. The pedagogic lesson here is that, even without any data manipulation, mere use of a different coloscheme can significantly alter the expression of data on a map. Therefore, when one is mapping data, careful consideration must be made of how to calculate intervals, as well as the best colorscheme to present the data.

The cumulative distribution plots also tell the same story as in the previous example with the single hue colorscheme (the intervals are the same, only the color has been altered between the two). By looking at the color swatches at the bottom of each distribution, the viewer can get a better sense of which extreme each classification method will put more emphasis on when plotted.

Finally, I will demonstrate how to make a multivariate choropleth map. In this multivariate (technically bivariate) choropleth map, I will first code a categorical variables that will assign each of the 47 prefectures to one of 8 levels of the categorical variable. These are the traditional 8 “regions” of Japan. I will then create a continuous variable based on random samples from a normal distribution with mean=50 and standard deviation=15 for each of the 8 regions. The final result will be a map that uses color to map two variables onto Japan. First, each prefecture will be assigned a color hue that corresponds to the level of the categorical variable. Then each region’s hue will be assigned a different chromatic level (chroma) that represents the intensity of the continuous variable. The code to do this is rather lengthy, but most of it is to get a nice legend with color swatches for each level of the categorical variable.

#lets now make some cool new data for japan; assume that the 47 prefectures were able to be sorted into some fictional categorical variable; furthermore, let us now assume that each prefecture also has a value for some fictional continuous varible, and we want to map these together. Note that this is quite a practical (and reasonable!) assumption; it could be dominant political party and % of vote, climate classification and average temperature, or many other pairs of variables. This is what is known as a multivariate choropleth map, that is, one variable (the categorical one) will determine the color of a polygon and the continuous variable will determine its intensity.
jap.dat1<-jap.dat
jap.dat1$cat<-rep(0,47) #because i'm not creative, my categorical variable will break up the prefectures by traditional regions of the country.
for(i in 1:length(jap.dat1$NAME_1)){
  if(jap.dat1$NAME_1[i] %in% c("Hokkaido"))
  jap.dat1$cat[i]<-"Hokkaido"
  }
for(i in 1:length(jap.dat1$NAME_1)){
  if(jap.dat1$NAME_1[i] %in% c("Aomori","Akita","Iwate","Yamagata","Miyagi","Fukushima"))
  jap.dat1$cat[i]<-"Tohoku"
  }
for(i in 1:length(jap.dat1$NAME_1)){
  if(jap.dat1$NAME_1[i] %in% c("Tochigi","Ibaraki","Gunma","Chiba","Saitama","Tokyo","Kanagawa"))
  jap.dat1$cat[i]<-"Kanto"
  }
for(i in 1:length(jap.dat1$NAME_1)){
  if(jap.dat1$NAME_1[i] %in% c("Niigata","Nagano","Yamanashi","Shizuoka","Aichi","Gifu","Toyama","Fukui","Ishikawa"))
  jap.dat1$cat[i]<-"Chubu"
  }
for(i in 1:length(jap.dat1$NAME_1)){
  if(jap.dat1$NAME_1[i] %in% c("Mie","Shiga","Nara","Kyoto","Wakayama","Osaka","Hyogo"))
    jap.dat1$cat[i]<-"Kansai"
  }
for(i in 1:length(jap.dat1$NAME_1)){
  if(jap.dat1$NAME_1[i] %in% c("Tottori","Okayama","Hiroshima","Shimane","Yamaguchi"))
    jap.dat1$cat[i]<-"Chugoku"
  }
for(i in 1:length(jap.dat1$NAME_1)){
  if(jap.dat1$NAME_1[i] %in% c("Tokushima","Kagawa","Ehime","Kochi"))
    jap.dat1$cat[i]<-"Shikoku"
  }
for(i in 1:length(jap.dat1$NAME_1)){
  if(jap.dat1$NAME_1[i] %in% c("Fukuoka","Oita","Saga","Naoasaki","Kumamoto","Miyazaki","Kagoshima","Okinawa"))
    jap.dat1$cat[i]<-"Kyushu"
  }
#Let's check and see if there are still any 0s left (ie; places where my for loop didn't work)
print(jap.dat1$NAME_1[jap.dat1$cat=="0"])

## [1] Hyogo
## 47 Levels: Aichi Akita Aomori Chiba Ehime Fukui Fukuoka Fukushima ... Yamanashi

#for some reason, Hyogo does not want to cooperate, must manually fix it (could not figure out why).
jap.dat1$cat[13]<-"Kansai"
jap.dat1$cat<-as.factor(jap.dat1$cat)
jap.dat1$cont<-rep(0,47) #Now to make our continuous variable that will be paired to the "region" categorical variable (replacement is false just because I want to get more unique variables.
for(i in 1:length(jap.dat1$cat)){
  if(jap.dat1$cat[i]=="Hokkaido")
    jap.dat1$cont[i]<-rnorm(1,50,15)
  }
sum(jap.dat1$cat=="Chubu")

## [1] 9

for(i in 1:length(jap.dat1$cat)){
  if(jap.dat1$cat[i]=="Chubu")
    jap.dat1$cont[i]<-rnorm(9,50,15)
  }

## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length

sum(jap.dat1$cat=="Tohoku")

## [1] 6

for(i in 1:length(jap.dat1$cat)){
  if(jap.dat1$cat[i]=="Tohoku")
    jap.dat1$cont[i]<-rnorm(6,50,15)
  }

## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length

sum(jap.dat1$cat=="Kanto")

## [1] 7

for(i in 1:length(jap.dat1$cat)){
  if(jap.dat1$cat[i]=="Kanto")
    jap.dat1$cont[i]<-rnorm(7,50,15)
  }

## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length

sum(jap.dat1$cat=="Kansai")

## [1] 7

for(i in 1:length(jap.dat1$cat)){
  if(jap.dat1$cat[i]=="Kansai")
    jap.dat1$cont[i]<-rnorm(7,50,15)
  }

## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length

sum(jap.dat1$cat=="Chugoku")

## [1] 5

for(i in 1:length(jap.dat1$cat)){
  if(jap.dat1$cat[i]=="Chugoku")
    jap.dat1$cont[i]<-rnorm(5,50,15)
  }

## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length

sum(jap.dat1$cat=="Shikoku")

## [1] 4

for(i in 1:length(jap.dat1$cat)){
  if(jap.dat1$cat[i]=="Shikoku")
    jap.dat1$cont[i]<-rnorm(4,50,15)
  }

## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length

sum(jap.dat1$cat=="Kyushu")

## [1] 8

for(i in 1:length(jap.dat1$cat)){
  if(jap.dat1$cat[i]=="Kyushu")
    jap.dat1$cont[i]<-rnorm(8,50,15)
  }

## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length
## Warning: number of items to replace is not a multiple of replacement length

##PLOTTING THE CATEGORICAL VARIABLE
#This map will show the classification of each prefecture into one of 8 levels corresponding to the 8 regions of Japan (8 regions plotted into 8 different hues).
classes<-levels(factor(jap.dat1$cat))
nClasses<-length(classes)
qualPal <- rainbow_hcl(nClasses, start=30, end=300)
step <- 360/nClasses
hue = (30 + step*(seq_len(nClasses)-1))%%360
qualPal <- hcl(hue, c=50, l=70)
spplot(jap.dat1["cat"], col="transparent", col.regions=qualPal,par.settings=list(panel.background=list(col="azure2")))

plot of chunk unnamed-chunk-5

##PLOTTING THE CONTINUOUS VARIABLE
#This map will show the relative intensity of the continuous variable in each prefecture (level of variable=intensity (chromatic value) of the same color hue).
quantPal <- rev(heat_hcl(16))
spplot(jap.dat1["cont"], col="transparent", col.regions=quantPal,par.settings=list(panel.background=list(col="azure2")))

plot of chunk unnamed-chunk-5

#Now we want to put these two together somehow.
multiPal<-lapply(1:nClasses, function(i){rev(sequential_hcl(16, h = (30 + step*(i-1))%%360))})
extent(jap.dat1)

## class       : Extent 
## xmin        : 122.9 
## xmax        : 154 
## ymin        : 24.05 
## ymax        : 45.52

pList <- lapply(1:nClasses, function(i){
  mapClass <- jap.dat1[jap.dat1$cat==classes[i],]
  pClass <- spplot(mapClass["cont"],xlim=c(122.9332,153.9869),ylim=c(24.04542,45.52279),par.settings=list(panel.background=list(col="azure2")),col.regions=multiPal[[i]],col="transparent",colorkey=(if(i==nClasses) TRUE else list(labels=rep("",6))),at = seq(0, 100, by=20))})
p<-Reduce('+',pList)

#In order to make a sensible legend, we are going to want to make a legend for each element of pList (ie; a different legend for each of the regions of Japan).
addTitle <- function(legend, title){
titleGrob <- textGrob(title, gp=gpar(fontsize=8), hjust=1, vjust=1)
legendGrob <- eval(as.call(c(as.symbol(legend$fun), legend$args)))
ly <- grid.layout(ncol=1, nrow=2,
widths=unit(0.9,"grobwidth", data=legendGrob))
fg <- frameGrob(ly, name=paste("legendTitle", title, sep="_"))
pg <- packGrob(fg, titleGrob, row=2)
pg <- packGrob(pg, legendGrob, row=1)
}
for (i in seq_along(classes)){
lg <- pList[[i]]$legend$right
pList[[i]]$legend$right <- list(fun="addTitle",args=list(legend=lg, title=classes[i]))
}
#Now we want to use the p trellis object to merge the legends from the components of pList (the individual polygon data for each region of Japan).
legendList <- lapply(pList, function(x){
lg <- x$legend$right
clKey <- eval(as.call(c(as.symbol(lg$fun), lg$args)))
clKey
})
#now that we have all the legends merged, we need to put them into a unique object in order to be ready to plot.
packLegend <- function(legendList){
N <- length(legendList)
ly <- grid.layout(nrow = 1, ncol = N)
g <- frameGrob(layout = ly, name = "mergedLegend")
for (i in 1:N) g <- packGrob(g, legendList[[i]], col = i)
g
}
#Final step to create the legend:p will include all the legends
p$legend$right<-list(fun ="packLegend",args=list(legendList=legendList))
#now we can finally plot the multivariate chloropleth map
J.Lines<-jap.dat1
p+
  layer(sp.polygons(J.Lines,lwd=0.5))

plot of chunk unnamed-chunk-5

Wow! That was a lot of work, but I think the result was worth it. Unfortunately the text under each legend swatch is rather small and illegible; given more time I could play around with the arguments of textGrob, but for now, I think that for all pedagogic intents and purposes, the map is a success. Obviously if one was seeking to present real instead of fictional data, a better legend would be required, but I think that I have outlined my steps with enough detail that anyone with sufficient experience in R could modify my code to generate reasonably legible multivariate choropleth maps. Despite the use of fictional data, it is not hard to imagine a situation in which we might very well end up with a map like the one created above. It is not ludicrous to imagine we would want to group the 47 provinces by region, and then look at, say, unemployment rates of each prefecture. Multivariate choropleth maps are, in my opinion, woefully underused. Despite underuse, there are many natural pairings of categorical and continuous variables that would very neatly benefit from this mapping technique; among others: dominant political party and % of votes, climate type and annual rainfall, dominant ethnic group and % of native language speakers, and many, many others.

Refrences http://cran.r-project.org/web/packages/classInt/classInt.pdf http://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf Lamigueiro, O. (2014). Chapter 8. Thematic Maps. In Displaying time series, spatial, and space-time data with R (pp. 91-136). London: Chapman and Hall/CRC.

PH251D Project 2 Writeup

Sean Wu

Wednesday, November 26, 2014