Part I

In our last chapter we had learnt how to work with tables in R. In this chapter, we will go a notch higher and for the budding spatial scientist, see how we can work with geospatial data in R.

We had last worked with county_sex data. This time round we will load the Kenya shapefile and view it in R.

The Kenya administrative shapefile can be found on this link - https://data.humdata.org/dataset/2c0b7571-4bef-4347-9b81-b2174c13f9ef/resource/03df9cbb-0b4f-4f22-9eb7-3cbd0157fd3d/download/ken_adm_iebc_20191031_shp.zip

Load it and extract it within your directory.

We would like to view the Kenya shapefile in our wonderful R canvas. Before we do so, we would like to call the following packages. Remember from chapter one the install.packages() functions is used to call packages into R. As a very simple exercise, call the following packages - rgdal, sp and sf.

Here is a brief overview of what these packages do from the geniuses who unpack the greek from their descriptions.

rgdal- provides bindings to the Geospatial Data Abstraction Library

sp - Classes and methods for spatial data; the classes document where the spatial location information resides, for 2D or 3D data. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc

sf - simple feature for R, Support for simple features, a standardized way to encode spatial vector data. Binds to ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations. Uses by default the ‘s2’ package for spherical geometry operations on ellipsoidal (long/lat) coordinates.

Now use library() to load the functions into our R workspace. Remember that these are external packages, and not inbuilt and that is why we need to keep on calling them back to work!

#load the rgdal package
library(rgdal)
## Loading required package: sp
## Please note that rgdal will be retired by the end of 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## 
## rgdal: version: 1.5-28, (SVN revision 1158)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.1, released 2020/12/29
## Path to GDAL shared files: C:/Users/User/Documents/R/win-library/4.1/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/User/Documents/R/win-library/4.1/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-6
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
## Overwritten PROJ_LIB was C:/Users/User/Documents/R/win-library/4.1/rgdal/proj
#load sp package
library(sp)
#load sf package
library(sf)
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE

Let’s load the Kenya administrative shapefile from our directory. To load shapefiles into R, we use the readOGR function.

#save the shapefile of admin1 (counties) into ke_adm object
ke_adm <- readOGR(dsn = './ken_adm', layer = 'ken_admbnda_adm1_iebc_20191031')
## OGR data source with driver: ESRI Shapefile 
## Source: "E:\Documents\R Studio is here\R Studio files projects\gis_coding\ken_adm", layer: "ken_admbnda_adm1_iebc_20191031"
## with 47 features
## It has 12 fields

As you can see, the ke_adm has been added into our R environment. However, taking advantage of the help details, here is a shortcut I discovered.

ke_adm2 <- readOGR(dsn = './ken_adm/ken_admbnda_adm1_iebc_20191031.shp')
## OGR data source with driver: ESRI Shapefile 
## Source: "E:\Documents\R Studio is here\R Studio files projects\gis_coding\ken_adm\ken_admbnda_adm1_iebc_20191031.shp", layer: "ken_admbnda_adm1_iebc_20191031"
## with 47 features
## It has 12 fields

Using the dsn = .shp' (note the extension) calls the shapefile without much fuss. You may have also seen that we used the./when calling the file instead of a long method of starting from the very beginning, likeE:/know/better/…. If your file is in the same project folder, you can use./` to get to the sub-directory within your project folder.

#remove ke_adm2 as we already have ke_adm
remove(ke_adm2)

Time to display our shapefile for seeing is believing. For this, we will load the tmap package.

#load tm-map package
library(tmap)

Next, we will use the tm_map tools to draw a map using the ke_adm shapefile. tm_map tools are quite a lot and lengthy to even scratch the surface here, so I would recommend you go through the below code slowly as you make good use of the help menu.

But first, lets read the attribute data of our shapefile to know which attributes to call into the tm_map tools. You will see why.

#read the first 6 attribute values for our ke_adm shapefile which will be used in some tm_map
#functions. 
head(ke_adm@data)
##   Shape_Leng Shape_Area         ADM1_EN ADM1_PCODE ADM1_REF ADM1ALT1EN
## 0   5.932315  0.8847323         Baringo      KE030     <NA>       <NA>
## 1   2.922220  0.1980989           Bomet      KE036     <NA>       <NA>
## 2   3.062486  0.2450581         Bungoma      KE039     <NA>       <NA>
## 3   2.670396  0.1471779           Busia      KE040     <NA>       <NA>
## 4   3.888933  0.2444318 Elgeyo-Marakwet      KE028     <NA>       <NA>
## 5   3.386653  0.2294135            Embu      KE014     <NA>       <NA>
##   ADM1ALT2EN ADM0_EN ADM0_PCODE       date    validOn validTo
## 0       <NA>   Kenya         KE 2017/11/03 2019/10/31    <NA>
## 1       <NA>   Kenya         KE 2017/11/03 2019/10/31    <NA>
## 2       <NA>   Kenya         KE 2017/11/03 2019/10/31    <NA>
## 3       <NA>   Kenya         KE 2017/11/03 2019/10/31    <NA>
## 4       <NA>   Kenya         KE 2017/11/03 2019/10/31    <NA>
## 5       <NA>   Kenya         KE 2017/11/03 2019/10/31    <NA>

You may ask where on earth @data came from. Unlike reading data frames where one will use the dataframe name, when reading attributes of shapefiles in R, @data is appended after the file name. If you would like to read the attributes of a column only, you add the $columnname after the appended @data. For example, after the head(ke_adm@data) has magically shown me the first 6 rows of my shapefile attribues, if I would like to view the attribute names of the ADM1_EN column, all I have to do is type ke_adm@data$ADM1_EN.

tm_map gets into action.

# plot the ke_adm shapefile on R. See help for details on the functions and the arguments used
tm_shape(ke_adm, 'Kenya', projection = '+init=epsg:4326', bbox = ke_adm) +
  tm_fill(col = 'ADM1_EN', alpha = 0.5, palette('Okabe-Ito'), legend.show = F) + tm_text('ADM1_EN', size = 'Shape_Area', col = 'steelblue', legend.size.show = F, legend.col.show = F)
## Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
## deprecated. It might return a CRS with a non-EPSG compliant axis order.
## Warning: Number of levels of the variable "ADM1_EN" is 47, which is
## larger than max.categories (which is 30), so levels are combined. Set
## tmap_options(max.categories = 47) in the layer function to show all levels.

Before going any further, is it possible to save a shapefile from the R environment? Yes.

#save the ke_adm shapefile from the R environment but with another name
writeOGR(ke_adm, dsn = './ken_adm', layer = 'ke_admins', driver = 'ESRI Shapefile', overwrite_layer = T)

Lets read the metadata of our Kenya shapefile and see if its projection reads as correctly as what we had specified for it +init=epsg:4326.

library(sp)
library(sf)
#check projection for our shapefile
st_crs(ke_adm)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]

Not sure whether a pre-projected file can be changed to a different projection in tm_shape but you can try yourself.

Lets recall the kenya administrative shapefile layer we wrote into our directory - the ke_admins shapefile layer we had created above using this code: writeOGR(ke_adm, dsn = './ken_adm', layer = 'ke_admins', driver = 'ESRI Shapefile'). We would like to see if files saved from R inherit (or maintain) the same projection of the file in the R environment. Hypothetically, they should.

library(rgdal)
ke_admins <- readOGR(dsn = './ken_adm/ke_admins.shp')
## OGR data source with driver: ESRI Shapefile 
## Source: "E:\Documents\R Studio is here\R Studio files projects\gis_coding\ken_adm\ke_admins.shp", layer: "ke_admins"
## with 47 features
## It has 12 fields
st_crs(ke_admins)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]

Gracia! The same projection we had assigned to our ke_adm file was also carried over to our created shapefile of ke_admins.

#we don wanna get confused with too many names of the same ke_ shapefile
rm(ke_admins)

Part II

Let’s do something a bit more exciting. In our last chapter, we had analysed a population by sex and county dataset. To analyse the socioeconomic situation of our country, we would like to analyze the distribution of house assets per county. Unlike other times where we sourced files into R from a file directory in our laptop, this time we will be bold and source them directly from the web. Viruses don’t seem to stand a chance with R, maybe.

#simply put in the url and just like when calling it from the directory, put in quotes(' ')
house_assets <- read.csv('https://open.africa/dataset/9b94fe50-9d75-4b92-be00-6354c6e6cc88/resource/95eb7a07-71dd-48be-8c82-dd321ab329f1/download/percentage-distribution-of-conventional-households-by-ownership-of-selected-household-assets-201.csv', header = T)

You can also source the file from the web by using the following convention - read.csv(url(), header = T).

#get an overview of the house_assets dataframe
head(house_assets)
##   County...Sub.County Conventional.Households Stand...alone.Radio
## 1               Kenya              12,043,016                56.9
## 2               Rural               7,379,282                58.5
## 3               Urban               4,663,734                54.4
## 4             MOMBASA                 376,295                47.4
## 5           CHANGAMWE                  46,439                51.6
## 6               JOMVU                  53,214                49.4
##   Desk.Top.Computer..Laptop..Tablet Functional.Television Analogue.Television
## 1                               8.8                  40.7                 4.7
## 2                               3.0                  26.9                 3.4
## 3                              18.0                  62.5                 6.7
## 4                              13.0                  57.2                 7.1
## 5                               9.6                  59.2                 6.8
## 6                               9.0                  53.5                 5.8
##   Internet Bicycle Motor.Cycle Refrigerator  Car
## 1     17.9    15.0         9.2          8.8  6.3
## 2      6.9    15.6        10.8          2.4  3.5
## 3     35.4    13.9         6.7         19.1 10.8
## 4     28.2    12.1         5.5         22.0  6.8
## 5     24.8    10.1         4.9         14.4  3.9
## 6     22.3     8.6         5.3         14.0  4.2
##   Truck..Lorry..Bus..Three.Wheeler.truck Tuk.Tuk
## 1                                    0.9     0.5
## 2                                    0.7     0.4
## 3                                    1.2     0.7
## 4                                    0.9     1.5
## 5                                    0.8     1.1
## 6                                    0.7     0.8

If not a citizen of this country, the names under ’County…Sub.County` may seem like they are all belong to one group - Counties or sub-counties depending on which you choose. But they are not. This field is simply a mixture of Counties and the sub counties that make them. For example, Changamwe and Jomvu are sub-counties of Mombasa. Hard to tell if you are not a citizen.

However, not all values under the County_and_Sub_county variable are counties. Furthermore we are more interested in the highest five counties only. To find out the top 5 most populous counties, we will go back to our county_sex_new dataset. To arrange the values in descending order, that is from most populous to least populous counties, we will have to first install the dplyr package.

county_sex2 <- read.csv('county_sex2.csv', header = T)
county_sex_new <- county_sex2[-c(1), ]
rownames(county_sex_new) <- seq(length = nrow(county_sex_new))
county_sex_new
##     X            name    Male  Female Intersex   Total
## 1   2         Mombasa  610257  598046       30 1208333
## 2   3           Kwale  425121  441681       18  866820
## 3   4          Kilifi  704089  749673       25 1453787
## 4   5      Tana River  158550  157391        2  315943
## 5   6            Lamu   76103   67813        4  143920
## 6   7    Taita-Taveta  173337  167327        7  340671
## 7   8         Garissa  458975  382344       34  841353
## 8   9           Wajir  415374  365840       49  781263
## 9  10         Mandera  434976  432444       37  867457
## 10 11        Marsabit  243548  216219       18  459785
## 11 12          Isiolo  139510  128483        9  268002
## 12 13            Meru  767698  777975       41 1545714
## 13 14   Tharaka-Nithi  193764  199406        7  393177
## 14 15            Embu  304208  304367       24  608599
## 15 16           Kitui  549003  587151       33 1136187
## 16 17        Machakos  710707  711191       34 1421932
## 17 18         Makueni  489691  497942       20  987653
## 18 19       Nyandarua  315022  323247       20  638289
## 19 20           Nyeri  374288  384845       31  759164
## 20 21       Kirinyaga  302011  308369       31  610411
## 21 22        Murang'a  523940  532669       31 1056640
## 22 23          Kiambu 1187146 1230454      135 2417735
## 23 24         Turkana  478087  448868       21  926976
## 24 25      West Pokot  307013  314213       15  621241
## 25 26         Samburu  156774  153546        7  310327
## 26 27     Trans Nzoia  489107  501206       28  990341
## 27 28     Uasin Gishu  580269  582889       28 1163186
## 28 29 Elgeyo-Marakwet  227317  227151       12  454480
## 29 30           Nandi  441259  444430       22  885711
## 30 31         Baringo  336322  330428       13  666763
## 31 32        Laikipia  259440  259102       18  518560
## 32 33          Nakuru 1077272 1084835       95 2162202
## 33 34           Narok  579042  578805       26 1157873
## 34 35         Kajiado  557098  560704       38 1117840
## 35 36         Kericho  450741  451008       28  901777
## 36 37           Bomet  434287  441379       23  875689
## 37 38        Kakamega  897133  970406       40 1867579
## 38 39          Vihiga  283678  306323       12  590013
## 39 40         Bungoma  812146  858389       35 1670570
## 40 41           Busia  426252  467401       28  893681
## 41 42           Siaya  471669  521496       18  993183
## 42 43          Kisumu  560942  594609       23 1155574
## 43 44        Homa Bay  539560  592367       23 1131950
## 44 45          Migori  536187  580214       35 1116436
## 45 46           Kisii  605784  661038       38 1266860
## 46 47         Nyamira  290907  314656       13  605576
## 47 48         Nairobi 2192452 2204376      245 4397073
#load dplyr after installing the package
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
top_five <- arrange(county_sex_new, desc(Total))%>%
  slice_max(Total, n = 5)%>%
  group_by(name)
  
top_five
## # A tibble: 5 x 6
## # Groups:   name [5]
##       X name        Male  Female Intersex   Total
##   <int> <chr>      <int>   <int>    <int>   <int>
## 1    48 Nairobi  2192452 2204376      245 4397073
## 2    23 Kiambu   1187146 1230454      135 2417735
## 3    33 Nakuru   1077272 1084835       95 2162202
## 4    38 Kakamega  897133  970406       40 1867579
## 5    40 Bungoma   812146  858389       35 1670570

The sign %>% may seem scary to the uninitiated R coder. It sounds like human beings have resorted to using symbospeech like in a novel I once read back with the above writing seeming to be the precursor to this scenario. You can simply interpret the pipe (%>%) sign as ‘then’. That is, after giving a command, like in the above case to arrange the Total variable of county_sex_new in descending order, then (%>%) proceed to….That’s the function of the pipe operator, to proceed the ongoing operation to something else. Think of it like a pipeline transporting crude oil from the storage tank to the refinery. In this case, the county_sex_new with an arranged Total variable will be ‘piped’ to be ‘sliced’. Read about slice in the help menu.

Run it please.

top_five <- arrange(county_sex_new, desc(Total))%>%
  slice_max(Total, n = 5)%>%
  group_by(name)
  
top_five
## # A tibble: 5 x 6
## # Groups:   name [5]
##       X name        Male  Female Intersex   Total
##   <int> <chr>      <int>   <int>    <int>   <int>
## 1    48 Nairobi  2192452 2204376      245 4397073
## 2    23 Kiambu   1187146 1230454      135 2417735
## 3    33 Nakuru   1077272 1084835       95 2162202
## 4    38 Kakamega  897133  970406       40 1867579
## 5    40 Bungoma   812146  858389       35 1670570

There goes a neat table of our top five most populous counties. In chapter 1, such a method was not provided, and in fact Excel was used to order the counties in terms of their population density, from highest to lowest. See here - https://rpubs.com/sammigachuhi/decodegisplease.

However, in coding, you are required to be a bit more sophisticated.

See more of dplyr here: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html

Since we are only interested in the above five counties to display the percentage of total households owning particular assets per county, we will subset our shapefile to the above five counties, and thereafter join these attributes to them.

#to find out the names of all counties under the ADM1_EN variable and see if the names of the
#5 most densely populated counties area also of the same length as in ke_adm
list(ke_adm@data$ADM1_EN)
## [[1]]
##  [1] "Baringo"         "Bomet"           "Bungoma"         "Busia"          
##  [5] "Elgeyo-Marakwet" "Embu"            "Garissa"         "Homa Bay"       
##  [9] "Isiolo"          "Kajiado"         "Kakamega"        "Kericho"        
## [13] "Kiambu"          "Kilifi"          "Kirinyaga"       "Kisii"          
## [17] "Kisumu"          "Kitui"           "Kwale"           "Laikipia"       
## [21] "Lamu"            "Machakos"        "Makueni"         "Mandera"        
## [25] "Marsabit"        "Meru"            "Migori"          "Mombasa"        
## [29] "Murang'a"        "Nairobi"         "Nandi"           "Narok"          
## [33] "Nyamira"         "Nyandarua"       "Nyeri"           "Samburu"        
## [37] "Siaya"           "Taita Taveta"    "Tana River"      "Tharaka-Nithi"  
## [41] "Trans Nzoia"     "Turkana"         "Uasin Gishu"     "Vihiga"         
## [45] "Wajir"           "West Pokot"      "Nakuru"
#using the list above use the correct names to subset ke_adm shapefile to the 5 most densely 
#populated counties
adm_subset <- subset(ke_adm, ADM1_EN%in% c('Nairobi', 'Kiambu', 'Nakuru', 'Kakamega', 'Bungoma')) 

Remember R is case sensitive and even a mispelling, or (-) appearing where it shouldn’t promote social distancing can bring errors.

But there is a problem. Our house_assets dataframe contains the county and sub county names all in uppercase.

list(house_assets$County...Sub.County[1:7])
## [[1]]
## [1] "Kenya"     "Rural"     "Urban"     "MOMBASA"   "CHANGAMWE" "JOMVU"    
## [7] "KISAUNI"

Is there a way to capitalize only the first letter and the rest remain lowercase at the touch of a key? Maybe.

*Unlike below, it is preferable you create a new object, like ‘house_assets2’ and then append a new column like `house_assets2\(names <- tolower(house_assets\)County…Sub.County).

#convert all the column values of county and sub counties to lowercase

house_assets$names <- tolower(house_assets$County...Sub.County)

Next, to capitalize the first letter of the lowercase values in the new column of names. To do this, you will have to install the package stringr and load it to R.

library(stringr)
## Warning: package 'stringr' was built under R version 4.1.3
house_assets$admin_name <- str_to_title(house_assets$names)

The names that will work out for our join operation are at the faaaaaaar end under the admin_name column header.

However, we have a pesky problem, house_assets names the capital as ‘Nairobi City’ instead of ‘Nairobi’ as named in the adm_subset layer. The creators sincerely wanted to deny us sleep.

#replace 'Nairobi City' with 'Nairobi'
house_assets[house_assets == 'Nairobi City'] <- 'Nairobi'
#remember adm_subset is the subsetted shapefile
#merge is used to merge the common fields between spatial data adm_subset and dataframe house_assets
county_hs_assets <- merge(adm_subset, house_assets, by.x = 'ADM1_EN', by.y = 'admin_name', duplicateGeoms = T)
#view if county_hs_assets contains the data from house_assets appended to adm_subset
head(county_hs_assets@data)
##    ADM1_EN Shape_Leng Shape_Area ADM1_PCODE ADM1_REF ADM1ALT1EN ADM1ALT2EN
## 1  Bungoma   3.062486  0.2450581      KE039     <NA>       <NA>       <NA>
## 2 Kakamega   4.125304  0.2443814      KE037     <NA>       <NA>       <NA>
## 3   Kiambu   3.193175  0.2066285      KE022     <NA>       <NA>       <NA>
## 4   Kiambu   3.193175  0.2066285      KE022     <NA>       <NA>       <NA>
## 5  Nairobi   1.658061  0.0574956      KE047     <NA>       <NA>       <NA>
## 6   Nakuru  10.598716  1.2836187      KE032     <NA>       <NA>       <NA>
##   ADM0_EN ADM0_PCODE       date    validOn validTo County...Sub.County
## 1   Kenya         KE 2017/11/03 2019/10/31    <NA>             BUNGOMA
## 2   Kenya         KE 2017/11/03 2019/10/31    <NA>            KAKAMEGA
## 3   Kenya         KE 2017/11/03 2019/10/31    <NA>              KIAMBU
## 4   Kenya         KE 2017/11/03 2019/10/31    <NA>              KIAMBU
## 5   Kenya         KE 2017/11/03 2019/10/31    <NA>        NAIROBI CITY
## 6   Kenya         KE 2017/11/03 2019/10/31    <NA>              NAKURU
##   Conventional.Households Stand...alone.Radio Desk.Top.Computer..Laptop..Tablet
## 1                 357,714                62.1                               4.1
## 2                 432,284                62.0                               4.3
## 3                  46,816                61.2                              28.4
## 4                 792,333                61.4                              19.6
## 5               1,494,676                53.4                              22.7
## 6                 598,237                61.1                               9.6
##   Functional.Television Analogue.Television Internet Bicycle Motor.Cycle
## 1                  25.4                 3.6      7.2    26.2        11.9
## 2                  29.2                 4.3      8.6    24.4        11.5
## 3                  73.8                 7.0     49.7    19.2         5.0
## 4                  70.0                 6.3     39.1    16.3         6.9
## 5                  68.7                 7.5     42.1    12.5         4.3
## 6                  52.5                 5.7     20.9    19.1        10.9
##   Refrigerator  Car Truck..Lorry..Bus..Three.Wheeler.truck Tuk.Tuk        names
## 1          2.8  3.1                                    0.6     0.4      bungoma
## 2          3.3  3.3                                    0.6     0.4     kakamega
## 3         32.2 21.1                                    2.3     0.5       kiambu
## 4         19.6 12.4                                    1.5     0.5       kiambu
## 5         23.5 12.9                                    1.2     0.4 nairobi city
## 6          8.9  7.1                                    1.0     0.4       nakuru

Even R wants to deny us sleep, there are two Kiambu values under the County...Sub.county variable and we are only dealing with counties. Since we don’t have two counties with the name ‘Kiambu’ but only one, definitely one of the two ‘Kiambu’ values is a namesake county. This is something you’ll most likely always run into and which may often eat into your sleeping time, like mine today.

Since we know that we are dealing with statistics at a county level, therefore the row with 792, 333 households is the correct one(the 4th row). Let’s delete the row with the other ‘Kiambu’ value (3rd row) with 46, 816 households within its borders.

#don't forget the comma after the row index!
county_hs_assets@data [-3, ]
##    ADM1_EN Shape_Leng Shape_Area ADM1_PCODE ADM1_REF ADM1ALT1EN ADM1ALT2EN
## 1  Bungoma   3.062486  0.2450581      KE039     <NA>       <NA>       <NA>
## 2 Kakamega   4.125304  0.2443814      KE037     <NA>       <NA>       <NA>
## 4   Kiambu   3.193175  0.2066285      KE022     <NA>       <NA>       <NA>
## 5  Nairobi   1.658061  0.0574956      KE047     <NA>       <NA>       <NA>
## 6   Nakuru  10.598716  1.2836187      KE032     <NA>       <NA>       <NA>
##   ADM0_EN ADM0_PCODE       date    validOn validTo County...Sub.County
## 1   Kenya         KE 2017/11/03 2019/10/31    <NA>             BUNGOMA
## 2   Kenya         KE 2017/11/03 2019/10/31    <NA>            KAKAMEGA
## 4   Kenya         KE 2017/11/03 2019/10/31    <NA>              KIAMBU
## 5   Kenya         KE 2017/11/03 2019/10/31    <NA>        NAIROBI CITY
## 6   Kenya         KE 2017/11/03 2019/10/31    <NA>              NAKURU
##   Conventional.Households Stand...alone.Radio Desk.Top.Computer..Laptop..Tablet
## 1                 357,714                62.1                               4.1
## 2                 432,284                62.0                               4.3
## 4                 792,333                61.4                              19.6
## 5               1,494,676                53.4                              22.7
## 6                 598,237                61.1                               9.6
##   Functional.Television Analogue.Television Internet Bicycle Motor.Cycle
## 1                  25.4                 3.6      7.2    26.2        11.9
## 2                  29.2                 4.3      8.6    24.4        11.5
## 4                  70.0                 6.3     39.1    16.3         6.9
## 5                  68.7                 7.5     42.1    12.5         4.3
## 6                  52.5                 5.7     20.9    19.1        10.9
##   Refrigerator  Car Truck..Lorry..Bus..Three.Wheeler.truck Tuk.Tuk        names
## 1          2.8  3.1                                    0.6     0.4      bungoma
## 2          3.3  3.3                                    0.6     0.4     kakamega
## 4         19.6 12.4                                    1.5     0.5       kiambu
## 5         23.5 12.9                                    1.2     0.4 nairobi city
## 6          8.9  7.1                                    1.0     0.4       nakuru

Finally, lets draw a map and call this a day.

library(tmap)
tm_shape(county_hs_assets, name = 'counties', projection = '+init=epsg:4326', bbox = county_hs_assets) + tm_fill(col = 'Internet', alpha = 0.6, palette = 'YlOrRd', title = 'Internet') + tm_text('ADM1_EN', size = 'Internet', col = 'black') + tm_borders(col = 'grey', lwd = 0.3, lty = 'solid', alpha = 0.8) + tm_layout(main.title = 'Household internet access for 5 highest populated counties', title.size = 3, bg.color = 'grey85')

We have handled a fair number of errors in this practical. Time to take a break or break.