chapter 3: Basic operations with raster data in R

Part I

#load our previous workspace from chapter 2 to avoid running into errors of 'cannot find file'
load('E:/Documents/R Studio is here/R Studio files projects/gis_coding/.RData')

Now that we saw which were the most populated counties by table and map, we would like to do just a little exercise for the board executives who prefer to be moved by statistics rather than aesthetics. So we will wreck our nerves one more time and create a bar graph of the 5 most populated counties and display the percentage of household ownership for the following assets: Internet, refrigerator, desktop/computer/laptop, functional television and a car.

Let’s remind ourselves what the top five counties were.

#create table of 5 most populous counties
top_five

##       name    Male  Female Intersex   Total
## 1  Nairobi 2192452 2204376      245 4397073
## 2   Kiambu 1187146 1230454      135 2417735
## 3   Nakuru 1077272 1084835       95 2162202
## 4 Kakamega  897133  970406       40 1867579
## 5  Bungoma  812146  858389       35 1670570

Now that the counties are entrenched in our memory, let’s look at the merged table we created that contains the population of these top five counties appended with the assets.

#remember the merged data ended up being a spatial object, that's why we are calling its attributes
#with a @data
county_hs_assets@data

##    ADM1_EN Shape_Leng Shape_Area ADM1_PCODE ADM1_REF ADM1ALT1EN ADM1ALT2EN
## 1  Bungoma   3.062486  0.2450581      KE039     <NA>       <NA>       <NA>
## 2 Kakamega   4.125304  0.2443814      KE037     <NA>       <NA>       <NA>
## 3   Kiambu   3.193175  0.2066285      KE022     <NA>       <NA>       <NA>
## 4   Kiambu   3.193175  0.2066285      KE022     <NA>       <NA>       <NA>
## 5  Nairobi   1.658061  0.0574956      KE047     <NA>       <NA>       <NA>
## 6   Nakuru  10.598716  1.2836187      KE032     <NA>       <NA>       <NA>
##   ADM0_EN ADM0_PCODE       date    validOn validTo County...Sub.County
## 1   Kenya         KE 2017/11/03 2019/10/31    <NA>             BUNGOMA
## 2   Kenya         KE 2017/11/03 2019/10/31    <NA>            KAKAMEGA
## 3   Kenya         KE 2017/11/03 2019/10/31    <NA>              KIAMBU
## 4   Kenya         KE 2017/11/03 2019/10/31    <NA>              KIAMBU
## 5   Kenya         KE 2017/11/03 2019/10/31    <NA>        NAIROBI CITY
## 6   Kenya         KE 2017/11/03 2019/10/31    <NA>              NAKURU
##   Conventional.Households Stand...alone.Radio Desk.Top.Computer..Laptop..Tablet
## 1                 357,714                62.1                               4.1
## 2                 432,284                62.0                               4.3
## 3                  46,816                61.2                              28.4
## 4                 792,333                61.4                              19.6
## 5               1,494,676                53.4                              22.7
## 6                 598,237                61.1                               9.6
##   Functional.Television Analogue.Television Internet Bicycle Motor.Cycle
## 1                  25.4                 3.6      7.2    26.2        11.9
## 2                  29.2                 4.3      8.6    24.4        11.5
## 3                  73.8                 7.0     49.7    19.2         5.0
## 4                  70.0                 6.3     39.1    16.3         6.9
## 5                  68.7                 7.5     42.1    12.5         4.3
## 6                  52.5                 5.7     20.9    19.1        10.9
##   Refrigerator  Car Truck..Lorry..Bus..Three.Wheeler.truck Tuk.Tuk        names
## 1          2.8  3.1                                    0.6     0.4      bungoma
## 2          3.3  3.3                                    0.6     0.4     kakamega
## 3         32.2 21.1                                    2.3     0.5       kiambu
## 4         19.6 12.4                                    1.5     0.5       kiambu
## 5         23.5 12.9                                    1.2     0.4 nairobi city
## 6          8.9  7.1                                    1.0     0.4       nakuru

I just happen to realize that irregardless of deleting the unnecessary row 3 of duplicate Kiambu county, I did not save it to a new variable. Another lesson learnt the hard way, and coding will give you lots of them. Let’s permanently remove the duplicate layer for once and for all and save the county_hs_assets as a new object.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#for the upteenth time do not forget the comma after the curved brackets!
#i have not appended the @data after the spatial object. Doing so, upon my discovery
#leads to the object to the left - 'county_house_assets' being output as a dataframe
county_house_assets <- county_hs_assets@data[c(1, 2, 4, 6), ]

#lets look at the attributes of our county_house_assets shapefile
head(county_house_assets)

##    ADM1_EN Shape_Leng Shape_Area ADM1_PCODE ADM1_REF ADM1ALT1EN ADM1ALT2EN
## 1  Bungoma   3.062486  0.2450581      KE039     <NA>       <NA>       <NA>
## 2 Kakamega   4.125304  0.2443814      KE037     <NA>       <NA>       <NA>
## 4   Kiambu   3.193175  0.2066285      KE022     <NA>       <NA>       <NA>
## 6   Nakuru  10.598716  1.2836187      KE032     <NA>       <NA>       <NA>
##   ADM0_EN ADM0_PCODE       date    validOn validTo County...Sub.County
## 1   Kenya         KE 2017/11/03 2019/10/31    <NA>             BUNGOMA
## 2   Kenya         KE 2017/11/03 2019/10/31    <NA>            KAKAMEGA
## 4   Kenya         KE 2017/11/03 2019/10/31    <NA>              KIAMBU
## 6   Kenya         KE 2017/11/03 2019/10/31    <NA>              NAKURU
##   Conventional.Households Stand...alone.Radio Desk.Top.Computer..Laptop..Tablet
## 1                 357,714                62.1                               4.1
## 2                 432,284                62.0                               4.3
## 4                 792,333                61.4                              19.6
## 6                 598,237                61.1                               9.6
##   Functional.Television Analogue.Television Internet Bicycle Motor.Cycle
## 1                  25.4                 3.6      7.2    26.2        11.9
## 2                  29.2                 4.3      8.6    24.4        11.5
## 4                  70.0                 6.3     39.1    16.3         6.9
## 6                  52.5                 5.7     20.9    19.1        10.9
##   Refrigerator  Car Truck..Lorry..Bus..Three.Wheeler.truck Tuk.Tuk    names
## 1          2.8  3.1                                    0.6     0.4  bungoma
## 2          3.3  3.3                                    0.6     0.4 kakamega
## 4         19.6 12.4                                    1.5     0.5   kiambu
## 6          8.9  7.1                                    1.0     0.4   nakuru

However, because we (or rather I) are unsure of whether ggplot can plot graphs from a shapefile, common sense says we convert the spatial dataframe to a normal dataframe. We go by what our guts tell us to do, and test to see if it works.

#convert the spatial data frame to a normal dataframe
county_assets_table <- as.data.frame(county_house_assets)
county_assets_table

##    ADM1_EN Shape_Leng Shape_Area ADM1_PCODE ADM1_REF ADM1ALT1EN ADM1ALT2EN
## 1  Bungoma   3.062486  0.2450581      KE039     <NA>       <NA>       <NA>
## 2 Kakamega   4.125304  0.2443814      KE037     <NA>       <NA>       <NA>
## 4   Kiambu   3.193175  0.2066285      KE022     <NA>       <NA>       <NA>
## 6   Nakuru  10.598716  1.2836187      KE032     <NA>       <NA>       <NA>
##   ADM0_EN ADM0_PCODE       date    validOn validTo County...Sub.County
## 1   Kenya         KE 2017/11/03 2019/10/31    <NA>             BUNGOMA
## 2   Kenya         KE 2017/11/03 2019/10/31    <NA>            KAKAMEGA
## 4   Kenya         KE 2017/11/03 2019/10/31    <NA>              KIAMBU
## 6   Kenya         KE 2017/11/03 2019/10/31    <NA>              NAKURU
##   Conventional.Households Stand...alone.Radio Desk.Top.Computer..Laptop..Tablet
## 1                 357,714                62.1                               4.1
## 2                 432,284                62.0                               4.3
## 4                 792,333                61.4                              19.6
## 6                 598,237                61.1                               9.6
##   Functional.Television Analogue.Television Internet Bicycle Motor.Cycle
## 1                  25.4                 3.6      7.2    26.2        11.9
## 2                  29.2                 4.3      8.6    24.4        11.5
## 4                  70.0                 6.3     39.1    16.3         6.9
## 6                  52.5                 5.7     20.9    19.1        10.9
##   Refrigerator  Car Truck..Lorry..Bus..Three.Wheeler.truck Tuk.Tuk    names
## 1          2.8  3.1                                    0.6     0.4  bungoma
## 2          3.3  3.3                                    0.6     0.4 kakamega
## 4         19.6 12.4                                    1.5     0.5   kiambu
## 6          8.9  7.1                                    1.0     0.4   nakuru

Now that our novel thoughts have succeeded, we proceed to draw a bar chart of the top five most populous counties and also show the percentage of households having particular assets in each county. But then our data need to be converted from wide to long format. Darn! However, the ‘d’ in the word ‘coder’ stands for Daring.

library(reshape2)

attach(county_assets_table)

colnames(county_assets_table)

##  [1] "ADM1_EN"                               
##  [2] "Shape_Leng"                            
##  [3] "Shape_Area"                            
##  [4] "ADM1_PCODE"                            
##  [5] "ADM1_REF"                              
##  [6] "ADM1ALT1EN"                            
##  [7] "ADM1ALT2EN"                            
##  [8] "ADM0_EN"                               
##  [9] "ADM0_PCODE"                            
## [10] "date"                                  
## [11] "validOn"                               
## [12] "validTo"                               
## [13] "County...Sub.County"                   
## [14] "Conventional.Households"               
## [15] "Stand...alone.Radio"                   
## [16] "Desk.Top.Computer..Laptop..Tablet"     
## [17] "Functional.Television"                 
## [18] "Analogue.Television"                   
## [19] "Internet"                              
## [20] "Bicycle"                               
## [21] "Motor.Cycle"                           
## [22] "Refrigerator"                          
## [23] "Car"                                   
## [24] "Truck..Lorry..Bus..Three.Wheeler.truck"
## [25] "Tuk.Tuk"                               
## [26] "names"

#transform the dataframe of county assets from wide to long table
assets_data <- melt(county_assets_table, id.vars = 'ADM1_EN', measure.vars = c(16:17, 19, 22:23), 
     variable.name = 'Assets', value.name = '% of Households', na.rm = T)
assets_data

##     ADM1_EN                            Assets % of Households
## 1   Bungoma Desk.Top.Computer..Laptop..Tablet             4.1
## 2  Kakamega Desk.Top.Computer..Laptop..Tablet             4.3
## 3    Kiambu Desk.Top.Computer..Laptop..Tablet            19.6
## 4    Nakuru Desk.Top.Computer..Laptop..Tablet             9.6
## 5   Bungoma             Functional.Television            25.4
## 6  Kakamega             Functional.Television            29.2
## 7    Kiambu             Functional.Television            70.0
## 8    Nakuru             Functional.Television            52.5
## 9   Bungoma                          Internet             7.2
## 10 Kakamega                          Internet             8.6
## 11   Kiambu                          Internet            39.1
## 12   Nakuru                          Internet            20.9
## 13  Bungoma                      Refrigerator             2.8
## 14 Kakamega                      Refrigerator             3.3
## 15   Kiambu                      Refrigerator            19.6
## 16   Nakuru                      Refrigerator             8.9
## 17  Bungoma                               Car             3.1
## 18 Kakamega                               Car             3.3
## 19   Kiambu                               Car            12.4
## 20   Nakuru                               Car             7.1

Super! Let’s create a good looking bar chart that summarizes all the statistics for the stakeholders.

#load ggplot2
library(ggplot2)

#always helpful to print out the headers of a dataframe to avoid syntax errors
colnames(assets_data)

## [1] "ADM1_EN"         "Assets"          "% of Households"

#draw the plot with the counties as the x axis and the percentage of assets on the y_axis
ggplot(assets_data, aes(ADM1_EN, `% of Households`)) + geom_bar(aes(fill = Assets), 
                                                              stat = 'identity', 
                                                              position = position_dodge2()) + #put axis labels
  xlab('County') + ylab('Percentage of Households')

Notice the backtrack () enclosing the y_axis variable % of Households? Apparently the aes() value does not tolerate special characters, including spaces in a variable such as ‘Percentage of Households’. Just learnt this by experience and several error messages. If your header has some special characters, and/or spacing, it is recommended you enclose the entire header in backtracks if calling it using the aes() function. The easier way is to print the header names using the colnames or head function and copying the result of the target header into aes(). This career is not that kind but at the end of a day you have a good looking graphic to show your stakeholders.

As an exercise, change the colours of the bar columns. Remember you can use the scale_fill_manual argument.

In our bar plot above, a surprising result is that the percentage of households owning a television set in Kiambu county is more than the capital, Nairobi. The capital, on the other hand, has a higher percentage of households with access to the internet. Some investigation may be needed to understand why more households own TV sets in Kiambu, although with a marginal difference despite Nairobi contributing the largest GDP to the national economy. A possible reason is that a high percentage of Nairobi’s population resides in slums, but more research is needed. You can use this graphic to pique the interest of the stakeholders, and very soon you could be having another project in your hands!

Part II

Now in our previous exercises we saw maps depicting the population distribution among five counties. Next, and as an underlying geographer, one would like to investigate if there are any physical conditions that may have contributed to the higher population density of these counties compared to other counties. In a nutshell, could these counties be having unique conditions that made them favourable for human settlement from earliest times vis a vis other counties? Let’s take a shot at this environmentalism theory.

The raster that we will use for the exercise is an elevation dataset available from here: http://geoportal.rcmrd.org/layers/servir%3Akenya_srtm30meters

To load the raster file, we will have to install the raster package and in addition, also load our previously installed rgdal and sp package. Remember the last two deal with spatial objects and that’s why we are calling them back to work.

#load the raster package
library(raster)

## Warning: package 'raster' was built under R version 4.1.3

## Loading required package: sp

## 
## Attaching package: 'raster'

## The following object is masked from 'package:dplyr':
## 
##     select

#load the rgdal package
library(rgdal)

## Please note that rgdal will be retired by the end of 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## 
## rgdal: version: 1.5-28, (SVN revision 1158)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.1, released 2020/12/29
## Path to GDAL shared files: C:/Users/User/Documents/R/win-library/4.1/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/User/Documents/R/win-library/4.1/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-6
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
## Overwritten PROJ_LIB was C:/Users/User/Documents/R/win-library/4.1/rgdal/proj

library(sp)

Time to load the extracted elevation raster you loaded from the link above.

#load the raster into R from the directory you saved it in
elevation <- raster(x = 'D:/DUKTO/Kenya_SRTM30meters/Kenya_SRTM30meters.tif')

#take a look at the raster attributes
elevation

## class      : RasterLayer 
## dimensions : 33677, 28721, 967237117  (nrow, ncol, ncell)
## resolution : 0.0002777778, 0.0002777778  (x, y)
## extent     : 33.90958, 41.88764, -4.720694, 4.634028  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : Kenya_SRTM30meters.tif 
## names      : Kenya_SRTM30meters 
## values     : -34, 5054  (min, max)

This is the first time we are loading a raster and as usual, we would like to believe when we confirm it with the eyes.

#view the elevation raster
library(tmap)

viewing the file in its original extent of the entire country will consume so much R resources (and it did) that it will result in an error that will make you regret why you chewed more than you could swallow in the first place. Masking it right away brought in similar disappointing results.

Her is the way forward. We will crop the elevation raster, that is, we will clip the raster to the smaller spatial extents of our adm_subset shapefile. The adm_subset shapefile only covers five counties, and we can bet to our last coin that this will be within our poor laptop’s computing power. Thereafter, we will be a bit more ruthless and clip the reduced elevation raster to the polygons of our adm_subset file to reflect both the boundary characteristics and area.

#clip the elevation dataset to the spatial extent of the adm_subset shapefile which only
#covers five counties
elevation3 <- crop(elevation, adm_subset)

#the mask function will result in the elevation dataset matching the boundary form of our
#five counties
elevation_counties <- mask(elevation3, adm_subset)

tm_shape(elevation_counties) + tm_raster(palette = topo.colors(7, 0.7, rev = T), legend.show = T,
                                         legend.is.portrait = T, colorNA = NULL) + 
  tm_layout(title = 'Elevation of 5 most populous counties', legend.position = c('left', 'bottom'))

## stars object downsampled to 1080 by 927 cells. See tm_shape manual (argument raster.downsample)

Wallah! It finally worked!

#Save the raster file to your directory
elevation_counties_map <- writeRaster(elevation_counties, 'elevation.tiff', overwrite = T)

Now we have seen the general topography of our five most populous counties. Most of the counties lie above an altitude of 1500m asl. except for the Western region counties where the altitude sinks below this threshold. Considering we are in the Equator, having areas of such elevation is a welcome relief to an otherwise sunny climate all year round.

Now that we have seen the elevation of the five counties, our cartographic exercise cannot be complete until at least we know areas of common height within the counties. For this, we will create a contour map.

#create contour of same elevation heights in R
plot(elevation_counties_map)
#add contours
contour(elevation_counties_map, nlevels = 10, labcex = 0.6, drawlabels = T, method = 'edge', col = 'red', add = T)

Let’s do the same in tmap for a better looking map.

#convert the raster to a contour layer whereby the countour layer can be easily added into tmap
contours <- rasterToContour(elevation_counties_map, nlevels = 20)

#view the attributes of the contour data
head(contours@data)

##     level
## C_1  1400
## C_2  1600
## C_3  1800
## C_4  2000
## C_5  2200
## C_6  2400

#draw the map of the five counties with contours overlain over the map of the elevation raster
tm_shape(elevation_counties) + tm_raster(palette = topo.colors(7, 0.7, rev = T), legend.show = T,
                                         legend.is.portrait = T, colorNA = NULL) + 
  tm_shape(contours) + tm_lines(col = 'violet', lwd = 0.3, breaks = 10) + 
  tm_text(text = 'level', size = 0.4, col = 'black', along.lines = T, overwrite.lines = T) + 
  tm_layout(title = 'Elevation of 5 most populous counties', legend.position = c('left', 'bottom'))

## stars object downsampled to 1080 by 927 cells. See tm_shape manual (argument raster.downsample)

Enough is enough.

chapter 3: Basic operations with raster data in R

sam

4/2/2022