I saw this post and I decided to replicated that good example but with data closer to me. So, I've got the shape data of the capital of my country (You can download the data from here). The data comes from the 2002 CENSO… some old but very useful for this propose.
The packages what we'll need are:
library(maptools)
library(ggplot2)
library(plyr)
We need the data! Note that some shape file have data.
download.file("http://jkunst.com/media/data/SCL_Shape.zip", "SCL_Shape.zip")
unzip("SCL_Shape.zip")
map <- readShapeLines("RM_Comunas")
str(map@data)
## 'data.frame': 52 obs. of 18 variables:
## $ AREA : num 73005419 16845462 10975674 10948429 14287294 ...
## $ PERIMETRO: num 44705 21112 16263 13701 15479 ...
## $ REGION : Factor w/ 1 level "METROPOLITANA DE SANTIAGO": 1 1 1 1 1 1 1 1 1 1 ...
## $ PROVINCIA: Factor w/ 6 levels "CHACABUCO","CORDILLERA",..: 3 5 5 5 5 5 5 5 5 5 ...
## $ CODINE : int 13403 13102 13103 13104 13105 13106 13108 13109 13111 13112 ...
## $ COMUNA : Factor w/ 52 levels "ALHUE","BUIN",..: 3 4 5 7 9 11 13 15 17 18 ...
## $ SIN_INFO : int 7 38 29 41 46 170 90 52 25 39 ...
## $ ABC1 : int 609 855 217 956 1226 2042 1200 2089 575 220 ...
## $ C2 : int 581 3491 2766 5580 6141 7191 4536 6010 4095 2652 ...
## $ C3 : int 784 5285 9154 10148 12108 10189 5988 7030 9579 10080 ...
## $ D : int 1704 7821 19810 15187 20166 12877 6125 7015 15589 25002 ...
## $ E : int 828 2069 6772 3792 5514 3209 1223 1626 4584 9340 ...
## $ HOGARES : int 4513 19559 38748 35704 45201 35678 19162 23822 34447 47333 ...
## $ HOMBRES : int 9243 34961 72921 64973 86435 63939 30633 40651 64750 94963 ...
## $ MUJERES : int 8992 36945 75391 68283 89159 66455 34846 44467 67770 95122 ...
## $ POBLACION: int 18235 71906 148312 133256 175594 130394 65479 85118 132520 190085 ...
## $ DENSIDAD : num 6.18e-05 1.16e-03 3.53e-03 3.26e-03 3.16e-03 ...
## $ COUNT : int 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "data_types")= chr "N" "N" "C" "C" ...
mapdata <- map@data
Now we prepare the data obtaing the longitudes and latitudes from the data. We use the fortify (ggplot's) function. The result after apply this funtcion we dont now what group corresponds to a specific area (in this example a 'comuna'). Then we obtain the centers of the areas to put the names in a plot (for example).
cordsdata <- fortify(map)
cordsdata <- join(cordsdata, data.frame(group = unique(cordsdata$group), COMUNA = unique(mapdata$COMUNA)),
type = "left", by = "group")
mapdata <- join(mapdata, ddply(cordsdata, .(COMUNA), summarize, CENTER.LONG = mean(long),
CENTER.LAT = mean(lat)))
I this post we study only the comunas in the Santiago province. So, we select this cases. And plot the comunas for a first view to the data.
data_cords <- subset(join(cordsdata, mapdata, type = "left"), PROVINCIA == "SANTIAGO")
data_map <- subset(mapdata, PROVINCIA == "SANTIAGO")
g <- ggplot(data_cords) + theme(legend.position = "none", axis.ticks = element_blank(),
axis.text = element_blank(), axis.title = element_blank())
# Showing the comunas of Santiago
g + geom_polygon(aes(x = long, y = lat, group = group), color = "gray", alpha = 0.7) +
geom_text(aes(x = CENTER.LONG, y = CENTER.LAT, label = COMUNA), size = 2,
color = "white") + ggtitle("The Comunas in Santiago")
Now we plot the previous map but showing the comunas with more density (population/area). The firts of this comunas are Lo Prado o San Ramón, this comunas are places were exists a lot of suburbs.
head(as.character(data_map[order(data_map$DENSIDAD, decreasing = T), ]$COMUNA),
5)
## [1] "LO PRADO" "SAN RAMON" "CERRO NAVIA"
## [4] "LO ESPEJO" "PEDRO AGUIRRE CERDA"
g + geom_polygon(aes(x = long, y = lat, group = group, alpha = POBLACION/AREA),
fill = "darkred", color = "white") + ggtitle("Density of population in each Comuna")
If we plot now the comunas according the numbers of persons. In this case the comunas with more people are Maipú and La Florida, in my country this comunas have the nickname Bedroom Comunas (Comunas dormitorio) becuase this comunas have a large area and them have a lot of housing. So, the most of people in this areas don't work there, instead they have their housgins.
head(as.character(data_map[order(data_map$POBLACION, decreasing = T), ]$COMUNA,
5))
## [1] "MAIPU" "LA FLORIDA" "LAS CONDES" "PENALOLEN" "SANTIAGO"
## [6] "PUDAHUEL"
g + geom_polygon(aes(x = long, y = lat, group = group, alpha = POBLACION), fill = "blue",
color = "white") + ggtitle("Population in each Comuna")
Well, this is a first aproach to analysis this type of data. I hope make other interesting things soon. More posts in my site ;)