A cool thing I learned this summer was how to use R to do GIS analysis. This is a bit of code from a class assignment. We’ll look at data from countries around the world. The data is found in countries.shp, which is a global shapefile with various socio-economic indicators for different countries such as:
library(sf)
library(ggplot2)
library(RColorBrewer)
library(viridis)
library(ggsn)
library(lubridate)
library(dplyr)
countries <- st_read("countries/countries/countries.shp")
## Reading layer `countries' from data source
## `C:\Users\sherl\Documents\Geog 6680 - R\Mod12\countries\countries\countries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 177 features and 64 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## CRS: NA
countries = countries[ , c("name", "gdp_md_est", "pop_est", "income_grp")]
ggplot() + geom_sf(data=countries, aes(fill = gdp_md_est)) +
scale_fill_viridis(option = "viridis", name = "Estimated Median GDP") + theme_bw() +
ggtitle("Estimated Median GDP", "by Country") +
north(countries, location = "topright", symbol = 2, scale = 0.1)
It is very difficult to see the differences between the GDPs for the countries because there is such a large range between the poorest and richest countries. Taking the log() of the GDP data will make for a better scale to visualize the differences in the GDPs. Before taking log(), we need to be sure that the GDP data is greater than zero (gdp_me_est > 0). We’ll first look to see if the country dataframe has any info that is 0 or negative.
# make columns for the log of the population and gdp variables to better display differences between the countries
# first check for values <= 0
countries[countries$gdp_md_est < 1, ]
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -17.06342 ymin: 20.99975 xmax: -8.665124 ymax: 27.65643
## CRS: NA
## name gdp_md_est pop_est income_grp geometry
## 138 W. Sahara -99 -99 5. Low income MULTIPOLYGON (((-8.794884 2...
countries[countries$pop_est < 1, ]
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -17.06342 ymin: 20.99975 xmax: -8.665124 ymax: 27.65643
## CRS: NA
## name gdp_md_est pop_est income_grp geometry
## 138 W. Sahara -99 -99 5. Low income MULTIPOLYGON (((-8.794884 2...
Looks like W. Sahara is going to cause an issue. I will remove W. Sahara from the dataset, then take the log() of the population and GDP data for the remaining countries.
# remove W. Sahara
countries2 <- countries[countries$pop_est > 0, ]
# take logs of population and gdp columns
countries2$log_gdp_md_est = log(countries2$gdp_md_est)
countries2$log_pop_est = log(countries2$pop_est)
# create GDP and population maps
ggplot() + geom_sf(data=countries2, aes(fill = log_gdp_md_est)) +
scale_fill_viridis(option = "viridis", name = "log(Estimated Median GDP)") + theme_bw() +
ggtitle("World Estimated Median GDP", "by Country") +
north(countries2, location = "topright", symbol = 2, scale = 0.1)
This map looks much better. It is easier to see the spread of GDP beyond 1st world countries. I can create a similar map using population data.
# create map for population
ggplot() + geom_sf(data=countries2, aes(fill = log_pop_est)) +
scale_fill_viridis(option = "inferno", name = "log(Estimated Population)") + theme_bw() +
ggtitle("Estimated Population", "by Country") +
north(countries2, location = "topright", symbol = 2, scale = 0.1)
The scales for these maps look much better. It is easier to see the differences between the countries for the two data sets.
I will also create a map for displaying the countries by their income groups.
ggplot() + geom_sf(data=countries, aes(fill = income_grp)) +
scale_fill_brewer(palette = "Dark2", name = "Income Group") + theme_bw() +
north(countries, location = "topright", symbol = 2, scale = 0.1)