Geospatial representation of economic data

by Rafailov Ruvin, second year student of ICEF

As my coursepaper I decided to learn R and to use its means to overlay economic data over map, finally with the help of friendly and helpful users of http://stackoverflow.com/ and http://gis.stackexchange.com/ plus other internet resources I managed to come up with some results I'll start with launching all libraries used in this kind of instruction and also, turn off all warnings. Also, one can see published and quite regularly updated file here

require(animation)
require(sp)
library(maptools)
require(RColorBrewer)
require(classInt)
require(rgdal)
library(ggmap)
library(ggplot2)
library(raster)
library(rgeos)
require(gpclib)
library(plyr)
gpclibPermit()

To start with something I decided to color Russian map according to region.

rus <- load("C://RUS_adm1.RData")  #downloaded data from gadm.org, then I got struggled with idea how to plot map with different regions colored, after I had read summary of gadm file and saw that region named are contained in gadm$NAME_1, it became much easier

regions <- c("Kamchatka", "Karachay-Cherkess", "Karelia", "Kemerovo", "Khabarovsk", 
    "Khakass", "Khanty-Mansiy", "Kirov", "Komi-Permyak", "Komi", "Koryak", "Kostroma", 
    "Krasnodar", "Krasnoyarsk", "Adygey", "Aga Buryat", "Altay", "Amur", "Arkhangel'sk", 
    "Astrakhan'", "Bashkortostan", "Belgorod", "Bryansk", "Buryat", "Chechnya", 
    "Chelyabinsk", "Chita", "Chukot", "Chuvash", "City of St. Petersburg", "Dagestan", 
    "Evenk", "Gorno-Altay", "Ingush", "Irkutsk", "Ivanovo", "Kabardin-Balkar", 
    "Kaliningrad", "Kalmyk", "Kaluga", "Kurgan", "Kursk", "Leningrad", "Lipetsk", 
    "Maga Buryatdan", "Mariy-El", "Mordovia", "Moscow City", "Moskva", "Murmansk", 
    "Nenets", "Nizhegorod", "North Ossetia", "Novgorod", "Novosibirsk", "Omsk", 
    "Orel", "Orenburg", "Penza", "Perm'", "Primor'ye", "Pskov", "Rostov", "Ryazan'", 
    "Sakha", "Sakhalin", "Samara", "Saratov", "Smolensk", "Stavropol'", "Sverdlovsk", 
    "Tambov", "Tatarstan", "Taymyr", "Tomsk", "Tula", "Tuva", "Tver'", "Tyumen'", 
    "Udmurt", "Ul'yanovsk", "Ust-Orda Buryat", "Vladimir", "Volgograd", "Vologda", 
    "Voronezh", "Yamal-Nenets", "Yaroslavl'", "Yevrey")
# create a data vector with cooresponding region names in gadm
gadm$regions = as.factor(regions)
# create a color pallette with rainbow layers
color = rainbow(length(levels(gadm$regions)))
# plot gadm file according to regios data with established color pattern
spplot(gadm, "regions", col.regions = color, main = "Russian regions")

plot of chunk unnamed-chunk-2

Having done this I struggled with two problems: 1) because of 180 meredian Chukotka region is divided in two parts thus making map pretty useless. The second problem is non-readable legend.

Both problems were decided by projecting gadm map with different coordinates, thus centering it around other meredian and the problem region is at upper right corner and boundary can be noticed and legend, especially after the zooming become more readable and appropriate

proj4.str <- CRS("+init=epsg:3413 +lon_0=105")
gadm.prj <- spTransform(gadm, proj4.str)
spplot(gadm.prj, "regions", col.regions = color, main = "Russian regions")

plot of chunk unnamed-chunk-3


Then I became interested in plotting just one single region ; it was easily done by using subset function in spplot


spplot(subset(gadm.prj, subset = (NAME_1 == "Moscow City")), "regions", colorkey = F)

plot of chunk unnamed-chunk-4


Also, there was possibility to use another technique and functions to plot Russian map. However, further I put on hold this option as I was told that merging on sp class objects is not always a good idea, due to the fact that the merge function resorts the dataframe and can break the slot relationship in sp objects.This issue is pointed out by the developer of the package.

# again loading gadm data and reprojecting it
rus <- load("C://RUS_adm1.RData")
proj4.str <- CRS("+init=epsg:3413 +lon_0=105")
gadm.prj <- spTransform(gadm, proj4.str)

# merging all polygons to create data file
df <- fortify(gadm.prj, region = "ID_1")
# as some of regions may not be included, again merging process take place
df2 <- merge(df, gadm.prj, by.x = "id", by.y = "ID_1")
# plotting using ggplot2 package, further this function will be explained
# in more details
ggplot(df2, aes(x = long, y = lat, group = group), height = 2500, width = 2500) + 
    geom_polygon(aes(fill = as.factor(NAME_1))) + ggtitle("Russian regions") + 
    xlab("Longitude") + ylab("Latitude")

plot of chunk unnamed-chunk-5



# save in png format so that legend was readable
png("Russian_regions.png", 2500, 2500)  #giving name and width
ggplot(df2, aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = as.factor(NAME_1))) + 
    ggtitle("Russia") + xlab("Longitude") + ylab("Latitude")
dev.off()  #shuts down the process
## pdf 
##   2

Afterthat I decided that it's time to move to something more serious and decided to plot Russian unemployment in different regions in one year

# Firstly I have prepared data in txt format with two columns, first was
# region, where region name should be the same as it is contained in gadm,
# second is unemployment rate, columns are devided by ';'
unempl <- read.delim2(file = "C:\\unempl11.txt", header = TRUE, sep = ";", 
    quote = "", dec = ",", stringsAsFactors = F)

# Then to be sure that data is displayed correctly we have to match data
# from txt file and gadm so that it was done in correct order
gadm_names <- gadm.prj$NAME_1
total <- length(gadm_names)
pb <- txtProgressBar(min = 0, max = total, style = 3)
order <- vector()
for (i in 1:total) {

    order[i] <- agrep(gadm_names[i], unempl$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}
# I have cut all unemployment rate data to create several intervals
col_no <- as.factor(as.numeric(cut(unempl$data[order], c(0, 2.5, 5, 7.5, 10, 
    15, 100))))


levels(col_no) <- c("<2,5%", "2,5-5%", "5-7,5%", "7,5-10%", "10-15%", ">15%")


gadm.prj$col_no <- col_no
myPalette <- brewer.pal(6, "Purples")

spplot(gadm.prj, "col_no", col = grey(0.9), col.regions = myPalette, main = "Unemployment in Russia by region in 2012")

plot of chunk unnamed-chunk-8


After creating such map for one year I figured out that it will be useful and spectacular to create animation with unemployment for 12 years.


# Reproject data
gadm <- spTransform(gadm, CRS("+init=epsg:3413 +lon_0=105"))

# uploaded all 12 years data

year_2000 <- read.csv2("C:\\unempl00.txt")
year_2001 <- read.csv2("C:\\unempl01.txt")
year_2002 <- read.csv2("C:\\unempl02.txt")
year_2003 <- read.csv2("C:\\unempl03.txt")
year_2004 <- read.csv2("C:\\unempl04.txt")
year_2005 <- read.csv2("C:\\unempl05.txt")
year_2006 <- read.csv2("C:\\unempl06.txt")
year_2007 <- read.csv2("C:\\unempl07.txt")
year_2008 <- read.csv2("C:\\unempl08.txt")
year_2009 <- read.csv2("C:\\unempl09.txt")
year_2010 <- read.csv2("C:\\unempl10.txt")
year_2011 <- read.csv2("C:\\unempl11.txt")
year_2012 <- read.csv2("C:\\unempl12.txt")

gadm_names <- gadm$NAME_1
total <- length(gadm_names)

Here we match data,result in number of vectors, which contain the same order as regions are ordered in gadm file

pb <- txtProgressBar(min = 0, max = total, style = 3)
# again match as we did it before, but for 12 years
order00 <- vector()
for (i in 1:total) {

    order00[i] <- agrep(gadm_names[i], year_2000$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}
order01 <- vector()
for (i in 1:total) {

    order01[i] <- agrep(gadm_names[i], year_2001$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}

order02 <- vector()
for (i in 1:total) {

    order02[i] <- agrep(gadm_names[i], year_2002$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}

order03 <- vector()
for (i in 1:total) {

    order03[i] <- agrep(gadm_names[i], year_2003$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}

order04 <- vector()
for (i in 1:total) {

    order04[i] <- agrep(gadm_names[i], year_2004$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}

order05 <- vector()
for (i in 1:total) {

    order05[i] <- agrep(gadm_names[i], year_2005$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}


order06 <- vector()
for (i in 1:total) {

    order06[i] <- agrep(gadm_names[i], year_2006$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}

order07 <- vector()
for (i in 1:total) {

    order07[i] <- agrep(gadm_names[i], year_2007$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}

order08 <- vector()
for (i in 1:total) {

    order08[i] <- agrep(gadm_names[i], year_2008$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}

order09 <- vector()
for (i in 1:total) {

    order09[i] <- agrep(gadm_names[i], year_2009$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}



order10 <- vector()
for (i in 1:total) {

    order10[i] <- agrep(gadm_names[i], year_2010$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}
order11 <- vector()
for (i in 1:total) {

    order11[i] <- agrep(gadm_names[i], year_2011$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}
order12 <- vector()
for (i in 1:total) {

    order12[i] <- agrep(gadm_names[i], year_2012$region, max.distance = 0.2)[1]
    setTxtProgressBar(pb, i)  # update progress bar
}

Create additional factor in gadm data and fill it with other unemployment data ordered in regions order


gadm@data$year_2000 <- year_2000$data[order00]
gadm@data$year_2001 <- year_2001$data[order01]
gadm@data$year_2002 <- year_2002$data[order02]
gadm@data$year_2003 <- year_2003$data[order03]
gadm@data$year_2004 <- year_2004$data[order04]
gadm@data$year_2005 <- year_2005$data[order05]
gadm@data$year_2006 <- year_2006$data[order06]
gadm@data$year_2007 <- year_2007$data[order07]
gadm@data$year_2008 <- year_2008$data[order08]
gadm@data$year_2009 <- year_2009$data[order09]
gadm@data$year_2010 <- year_2010$data[order10]
gadm@data$year_2011 <- year_2011$data[order11]
gadm@data$year_2012 <- year_2012$data[order12]


# Coerce into factors with defined levels
for (i in c("year_2000", "year_2001", "year_2002", "year_2003", "year_2004", 
    "year_2005", "year_2006", "year_2007", "year_2008", "year_2009", "year_2010", 
    "year_2011", "year_2012")) {
    gadm@data[, i] <- as.factor(as.numeric(cut(gadm@data[, i], c(0, 2.5, 5, 
        7.5, 10, 15, 100))))
    levels(gadm@data[, i]) <- c("<2,5%", "2,5-5%", "5-7,5%", "7,5-10%", "10-15%", 
        ">15%")
}
# use animation package to create animation in different formats and use
# color layout from previous example
myPalette <- brewer.pal(6, "Purples")

# for instance, gif/html/mp4/flash
saveGIF(for (i in c("year_2000", "year_2001", "year_2002", "year_2003", "year_2004", 
    "year_2005", "year_2006", "year_2007", "year_2008", "year_2009", "year_2010", 
    "year_2011", "year_2012")) {
    sp.plot <- spplot(gadm, i, col = grey(0.9), col.regions = myPalette, main = paste("Unemployment in Russia", 
        i, sep = " - "))
    print(sp.plot)
}, img.name = "map", swf.name = "animtionrus.gif")
## I cannot find ImageMagick with convert = "convert"
## but I can find it from the Registry Hive: C:\Program
## Files\ImageMagick-6.8.6-Q16
## Executing: "C:\Program Files\ImageMagick-6.8.6-Q16\convert.exe" -loop 0
## -delay 100 C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map1.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map2.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map3.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map4.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map5.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map6.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map7.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map8.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map9.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map10.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map11.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map12.png
## C:/Users/626B~1/AppData/Local/Temp/RtmpopQFZm/map13.png
## "C:\Users\626B~1\AppData\Local\Temp\RtmpopQFZm/animation.gif"
## Output at: C:\Users\626B~1\AppData\Local\Temp\RtmpopQFZm/animation.gif

One can see resulting animation here

However, as you have noticed it is pretty convoluted, volume and not optimized, thus I decided to optimize it

setwd("C://Unempldata")  #create working place, where I put all txt data files and name them with year (year_2000) for convenience

file.list <- dir()  #get all filenames in the working directory
all.data <- NULL

for (f in file.list) {

    temp <- read.csv2(f)  # read each file and put to temp table
    temp$year <- f  #create new column with name of each read file

    all.data <- rbind(all.data, temp)  #put new data in existing file
    temp <- NULL
}
total1 <- length(file.list)
# match data with two cycles
order0 <- vector()
for (f in file.list) {
    for (i in 1:total) {



        order0[i] <- agrep(gadm_names[i], subset(all.data, year == f)$region, 
            max.distance = 0.2)[1]
        setTxtProgressBar(pb, i)
    }
    gadm@data[, f] <- subset(all.data, year == f)$data[order0]

}
# Coerce into factors with defined levels
for (f in file.list) {
    gadm@data[, f] <- as.factor(as.numeric(cut(gadm@data[, f], c(0, 2.5, 5, 
        7.5, 10, 15, 100))))
    levels(gadm@data[, f]) <- c("<2,5%", "2,5-5%", "5-7,5%", "7,5-10%", "10-15%", 
        ">15%")
}

saveHTML(for (f in file.list) {
    sp.plot <- spplot(gadm, f, col = grey(0.9), col.regions = myPalette, main = paste("Unemployment in Russia", 
        f, sep = " - "))
    print(sp.plot)
}, img.name = "map1", htmlfile = "animatir1ussia.html")

Ended with animations I wished to create some map with data represenated on it in the points layer. Firstly, I began with showing population of each region

# again we start with loading and reprojecting data
load("C://RUS_adm1.RData")
proj4.str <- CRS("+init=epsg:3413 +lon_0=105")
gadm.prj <- spTransform(gadm, proj4.str)

## IDs needed to match polygons and data
nms <- gadm.prj$NAME_1

ll <- coordinates(gadm.prj)  #getting coordinates from gadm data
popul <- read.csv2("C:\\popul.txt")  #reading population data

# matching data, names of regions and coordinates
nms <- gadm.prj$NAME_1
ord1 <- match(nms, popul$region)
popul <- popul[ord1, ]
row.names(popul) <- nms
row.names(ll) <- nms

# dividing population for intervals
col_no <- as.factor(as.numeric(cut(popul$data, c(0, 1e+05, 3e+05, 5e+05, 7e+05, 
    9e+05, 1100000, 1300000, 1500000, 1700000, 2e+06, 2500000, 2700000, 3e+06, 
    3500000, 4e+06, 5e+06, 1e+07, 2e+09))))


levels(col_no) <- c("<100000", "100000-300000", "300000-500000", "500000-700000", 
    "700000-900000", "900000-1100000", "1100000-1300000", "1300000-1500000", 
    "1500000-1700000", "1700000-2000000", "2000000-2500000", "2500000-2700000", 
    "2700000-3000000", "3000000-3500000", "3500000-4000000", "4000000-5000000", 
    "5000000-10000000", ">10000000")
# creating spatial data with points
popSP <- SpatialPointsDataFrame(ll, popul["data"], proj4string = proj4.str)


color = rainbow(18)
popSP$col_no <- col_no

# Plot everything using sp.layout subfunction in spplot
spplot(popSP, "col_no", sp.layout = list("sp.polygons", gadm.prj), col.regions = color, 
    main = "Population in Russia", key.space = "right")

plot of chunk unnamed-chunk-16

Having finished with single points for each region, I tried to create multiple points for each region and I decided to plot points according to meteostations place in Russia and color of point dependent on meteostation elevation

# again loading data
rus <- load("C://RUS_adm1.RData")
# as in my raw data I had coordinates of meteostations*1000, firstly I had
# to correct it
popSP <- met <- read.csv2("C://meteo.txt")
for (i in c(3, 4)) popSP[, i] <- popSP[, i]/1000
# then create spatial data
coordinates(popSP) <- ~Long + Lat
projection(popSP) <- projection(gadm)

# create reprojection to EPSG:3413 (see
# http://www.spatialreference.org/ref/epsg/3413/)
proj4.str <- CRS("+proj=stere +lat_0=90 +lat_ts=70 +lon_0=105 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")

gadm.prj <- spTransform(gadm, proj4.str)
popSP.prj <- spTransform(popSP, proj4.str)





# again create data intervals
col_no <- as.factor(as.numeric(cut(popSP.prj$Elevation, c(0, 1000, 2500, 5000, 
    10000, 15000, 20000, 1e+05))))

# divide in 7 levels for proper colorkey
levels(col_no) <- c("<1000", "1000-2500", "2500-5000", "5000-10000", "10000-15000", 
    "15000-20000", ">20000")
# set range of colors
color = rainbow(7)

popSP.prj$col_no <- col_no


# Plot everything using sp.layout function in spplot
spplot(popSP.prj, "col_no", sp.layout = list("sp.polygons", cols = "ID_0", gadm.prj), 
    col.regions = color, main = "Meteostations in Russia", auto.key = list(title = "Elevation"), 
    key.space = "right")

plot of chunk unnamed-chunk-17

Finally, I tried to plot points on google map data, using ggmap and ggplot

# get map data from google
msc <- get_map("Russia, Moscow region", zoom = 8)


# create table data with coordinates of Moscow airports and lenghthes of
# their runs
air <- data.frame(lat = c(55.41, 55.97, 55.59), lon = c(37.91, 37.42, 37.26), 
    run = c(3793, 3550, 2999))

p <- ggmap(msc)  #create plot of Moscow region
# add points, which size and color are dependent on airports' run length
p <- p + geom_point(data = air, aes(x = lon, y = lat, size = run, colour = run))
# add title to the plot
p <- p + labs(title = "Airports in Moscow")
# add legend title, it is the same for both parameteres
p <- p + labs(colour = "Length of flight run ", size = "Length of flight run ")
# set range of size for points
p <- p + scale_size_continuous(range = c(8, 17))
# set range for points' color
p <- p + scale_colour_gradient(low = "red", high = "white")
# merge both size and color in one legend
p <- p + guides(colour = guide_legend())

p

plot of chunk unnamed-chunk-18

Or one can manage two different parameters, to show different data, for instance in this example, color will show length of airports flight run and size its position (height) above sea level

# again get map data from google
msc <- get_map("Russia, Moscow region", zoom = 8)


# create table data with coordinates of Moscow airports plus lenghthes and
# their height
air <- data.frame(lat = c(55.41, 55.97, 55.59), lon = c(37.91, 37.42, 37.26), 
    run = c(3793, 3550, 2999), hei = c(178, 192, 209))

p <- ggmap(msc)  #create plot of Moscow region
# add points, color are dependent on airports' run length and sixe
# dependent on height
p <- p + geom_point(data = air, aes(x = lon, y = lat, size = hei, colour = run))
# add title to the plot
p <- p + labs(title = "Airports in Moscow")
# add legend title, one for size, one for color
p <- p + labs(colour = "Length of flight run ", size = "Height")
# set range of size for points
p <- p + scale_size_continuous(range = c(8, 17))
# set range for points' color
p <- p + scale_colour_gradient(low = "red", high = "white")


p

plot of chunk unnamed-chunk-19


And I also has repeated my meteostation plot, shown before, using google map data.

rus <- get_map("Russia", zoom = 3)

popSP <- met <- read.csv2("C://meteo.txt")
for (i in c(3, 4)) popSP[, i] <- popSP[, i]/1000


air <- data.frame(lat = popSP$Lat, lon = popSP$Long, Elevation = popSP$Elevation)

p <- ggmap(rus)

p <- p + labs(title = "Meteostations in Russia")
p <- p + labs(colour = "Elevation", size = "Elevation")

p <- p + geom_point(data = air, aes(x = lon, y = lat, size = Elevation, colour = Elevation))
p <- p + scale_size_continuous(range = c(1, 6))
p <- p + scale_colour_gradient(low = "red", high = "white")
p <- p + guides(colour = guide_legend())
p