CLASS 05: Geography Commands in R

In this class, we will learn about Attribute data related with spatial information. The objective in this session is to learn how to manipulate data based in its attributes, subsetting or aggregating them. First of all, we clean our enviroment with the command rm(), then we charge the libraries we are planning to use in this session.

rm(list=ls())

# list of libraries we are going to need for this class
library(sf)       # for spatial features.
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(raster)   # for raster data.
## Loading required package: sp
library(dplyr)    # for data sample in general, like using mutate.
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:raster':
## 
##     intersect, select, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)  # for working with strings (pattern matching)
library(tidyr)    # for unite() and separate()
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:raster':
## 
##     extract
library(spData)   # for upload dataset we use in this time

Remember that the library spData allows us to use the data world.

Attribute Data

The attribute data is the non-spatial information which is associated with the geographic information we have, for example: latitude (indicates if a point is on the north or the south), longitude (indicates if a point is located on the east or in the west), elevation (indicates the height of a point in the map, related with rater data), and names. For more information about latitude and longitude, see this link: latitude_longitude.

sp= spatial features helps us to extend the manipulation of dataframes to sf objects. Remember that in sf objects we have a colummn where all the geographic information is storied.

##   [1] $<-                   [                     [[<-                 
##   [4] aggregate             anti_join             arrange              
##   [7] as.data.frame         cbind                 coerce               
##  [10] dbDataType            dbWriteTable          distinct             
##  [13] dplyr_reconstruct     extent                extract              
##  [16] filter                full_join             gather               
##  [19] group_by              group_split           identify             
##  [22] initialize            inner_join            left_join            
##  [25] mask                  merge                 mutate               
##  [28] nest                  plot                  print                
##  [31] raster                rasterize             rbind                
##  [34] rename                right_join            sample_frac          
##  [37] sample_n              select                semi_join            
##  [40] separate              separate_rows         show                 
##  [43] slice                 slotsFromS3           spread               
##  [46] st_agr                st_agr<-              st_area              
##  [49] st_as_s2              st_as_sf              st_bbox              
##  [52] st_boundary           st_buffer             st_cast              
##  [55] st_centroid           st_collection_extract st_convex_hull       
##  [58] st_coordinates        st_crop               st_crs               
##  [61] st_crs<-              st_difference         st_filter            
##  [64] st_geometry           st_geometry<-         st_interpolate_aw    
##  [67] st_intersection       st_intersects         st_is                
##  [70] st_is_valid           st_join               st_line_merge        
##  [73] st_m_range            st_make_valid         st_nearest_points    
##  [76] st_node               st_normalize          st_point_on_surface  
##  [79] st_polygonize         st_precision          st_reverse           
##  [82] st_sample             st_segmentize         st_set_precision     
##  [85] st_shift_longitude    st_simplify           st_snap              
##  [88] st_sym_difference     st_transform          st_triangulate       
##  [91] st_union              st_voronoi            st_wrap_dateline     
##  [94] st_write              st_z_range            st_zm                
##  [97] summarise             transform             transmute            
## [100] ungroup               unite                 unnest               
## see '?methods' for accessing help and source code

The function methods() lists the functions we can use with this command. Then, lets explore a bit the data we have:

data(world)
world
## Simple feature collection with 177 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7    64.2
##  3 EH     Western ~ Africa    Africa    Northern~ Inde~   9.63e4 NA         NA  
##  4 CA     Canada    North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7    82.0
##  5 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7    76.3
## # ... with 167 more rows, and 2 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>
dim(world) # it is a 2 dimensional object, with rows and columns
## [1] 177  11
nrow(world) # how many rows?
## [1] 177
ncol(world) # how many columns?
## [1] 11

Into the 11 columns we have here, there is 1 with geographic information. We can extract the non-spatial data by using the following command:

world_df = st_drop_geometry(world)      
class(world_df)                       
## [1] "tbl_df"     "tbl"        "data.frame"
world_df
## # A tibble: 177 x 10
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##  * <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7    64.2
##  3 EH     Western ~ Africa    Africa    Northern~ Inde~   9.63e4 NA         NA  
##  4 CA     Canada    North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7    82.0
##  5 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7    76.3
## # ... with 167 more rows, and 1 more variable: gdpPercap <dbl>

tbl stands for tabulated data, which is very easy to analyze and print.

Subsetting geographic data is the same as what we already learn previously:

world[1:15, ]                               # subset rows by position
## Simple feature collection with 15 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -55.61183 xmax: 180 ymax: 83.23324
## geographic CRS: WGS 84
## # A tibble: 15 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7    64.2
##  3 EH     Western ~ Africa    Africa    Northern~ Inde~   9.63e4 NA         NA  
##  4 CA     Canada    North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7    82.0
##  5 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7    76.3
## 11 CL     Chile     South Am~ Americas  South Am~ Sove~   8.15e5  1.76e7    79.1
## 12 CD     Democrat~ Africa    Africa    Middle A~ Sove~   2.32e6  7.37e7    58.8
## 13 SO     Somalia   Africa    Africa    Eastern ~ Sove~   4.84e5  1.35e7    55.5
## 14 KE     Kenya     Africa    Africa    Eastern ~ Sove~   5.91e5  4.60e7    66.2
## 15 SD     Sudan     Africa    Africa    Northern~ Sove~   1.85e6  3.77e7    64.0
## # ... with 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>
world[, 1:5]                                # subset columns by position
## Simple feature collection with 177 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 6
##    iso_a2 name_long  continent  region_un subregion                         geom
##    <chr>  <chr>      <chr>      <chr>     <chr>               <MULTIPOLYGON [°]>
##  1 FJ     Fiji       Oceania    Oceania   Melanesia (((180 -16.06713, 180 -16.5~
##  2 TZ     Tanzania   Africa     Africa    Eastern ~ (((33.90371 -0.95, 34.07262~
##  3 EH     Western S~ Africa     Africa    Northern~ (((-8.66559 27.65643, -8.66~
##  4 CA     Canada     North Ame~ Americas  Northern~ (((-122.84 49, -122.9742 49~
##  5 US     United St~ North Ame~ Americas  Northern~ (((-122.84 49, -120 49, -11~
##  6 KZ     Kazakhstan Asia       Asia      Central ~ (((87.35997 49.21498, 86.59~
##  7 UZ     Uzbekistan Asia       Asia      Central ~ (((55.96819 41.30864, 55.92~
##  8 PG     Papua New~ Oceania    Oceania   Melanesia (((141.0002 -2.600151, 142.~
##  9 ID     Indonesia  Asia       Asia      South-Ea~ (((141.0002 -2.600151, 141.~
## 10 AR     Argentina  South Ame~ Americas  South Am~ (((-68.63401 -52.63637, -68~
## # ... with 167 more rows
world[, c("name_long", "lifeExp", "pop")]   # subset columns by name
## Simple feature collection with 177 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 4
##    name_long    lifeExp      pop                                            geom
##    <chr>          <dbl>    <dbl>                              <MULTIPOLYGON [°]>
##  1 Fiji            70.0   8.86e5 (((180 -16.06713, 180 -16.55522, 179.3641 -16.~
##  2 Tanzania        64.2   5.22e7 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869~
##  3 Western Sah~    NA    NA      (((-8.66559 27.65643, -8.665124 27.58948, -8.6~
##  4 Canada          82.0   3.55e7 (((-122.84 49, -122.9742 49.00254, -124.9102 4~
##  5 United Stat~    78.8   3.19e8 (((-122.84 49, -120 49, -117.0312 49, -116.048~
##  6 Kazakhstan      71.6   1.73e7 (((87.35997 49.21498, 86.59878 48.54918, 85.76~
##  7 Uzbekistan      71.0   3.08e7 (((55.96819 41.30864, 55.92892 44.99586, 58.50~
##  8 Papua New G~    65.2   7.76e6 (((141.0002 -2.600151, 142.7352 -3.289153, 144~
##  9 Indonesia       68.9   2.55e8 (((141.0002 -2.600151, 141.0171 -5.859022, 141~
## 10 Argentina       76.3   4.30e7 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -~
## # ... with 167 more rows

Or we can do in this way (2) Subbsetting by the size of the country,

(sel_area <- world$area_km2 < 10000)
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [85] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
summary(sel_area) 
##    Mode   FALSE    TRUE 
## logical     170       7
(small_countries <- world[sel_area, ])
## Simple feature collection with 7 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -67.24243 ymin: -16.59785 xmax: 167.8449 ymax: 50.12805
## geographic CRS: WGS 84
## # A tibble: 7 x 11
##   iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##   <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
## 1 PR     Puerto R~ North Am~ Americas  Caribbean Depe~    9225. 3534874    79.4
## 2 PS     Palestine Asia      Asia      Western ~ Disp~    5037. 4294682    73.1
## 3 VU     Vanuatu   Oceania   Oceania   Melanesia Sove~    7490.  258850    71.7
## 4 LU     Luxembou~ Europe    Europe    Western ~ Sove~    2417.  556319    82.2
## 5 <NA>   Northern~ Asia      Asia      Western ~ Sove~    3786.      NA    NA  
## 6 CY     Cyprus    Asia      Asia      Western ~ Sove~    6207. 1152309    80.2
## 7 TT     Trinidad~ North Am~ Americas  Caribbean Sove~    7738. 1354493    70.4
## # ... with 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>
sel_area2 <- world$area_km2>=10000 & world$area_km2<100000
summary(sel_area2)
##    Mode   FALSE    TRUE 
## logical     115      62
(medium_countries <- world[sel_area2,])
## Simple feature collection with 62 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -52.3 xmax: 180 ymax: 59.61109
## geographic CRS: WGS 84
## # A tibble: 62 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove~   19290.  8.86e5    70.0
##  2 EH     Western ~ Africa    Africa    Northern~ Inde~   96271. NA         NA  
##  3 HT     Haiti     North Am~ Americas  Caribbean Sove~   28541.  1.06e7    62.8
##  4 DO     Dominica~ North Am~ Americas  Caribbean Sove~   48158.  1.04e7    73.5
##  5 BS     Bahamas   North Am~ Americas  Caribbean Sove~   15585.  3.82e5    75.4
##  6 FK     Falkland~ South Am~ Americas  South Am~ Depe~   16364. NA         NA  
##  7 TF     French S~ Seven se~ Seven se~ Seven se~ Depe~   11603. NA         NA  
##  8 TL     Timor-Le~ Asia      Asia      South-Ea~ Sove~   14715.  1.21e6    68.3
##  9 LS     Lesotho   Africa    Africa    Southern~ Sove~   27506.  2.15e6    53.3
## 10 PA     Panama    North Am~ Americas  Central ~ Sove~   75265.  3.90e6    77.6
## # ... with 52 more rows, and 2 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>

Subsetting by the size of population:

sel_pop <- world$pop > 100000000
summary(sel_pop)
##    Mode   FALSE    TRUE    NA's 
## logical     155      12      10
(populated_countries = world[sel_pop,])
## Simple feature collection with 22 features and 10 fields (with 10 geometries empty)
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -33.76838 xmax: 180 ymax: 81.2504
## geographic CRS: WGS 84
## # A tibble: 22 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
##  2 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  3 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
##  4 RU     Russian ~ Europe    Europe    Eastern ~ Sove~   1.70e7  1.44e8    70.7
##  5 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
##  6 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
##  7 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
##  8 MX     Mexico    North Am~ Americas  Central ~ Sove~   1.97e6  1.24e8    76.8
##  9 BR     Brazil    South Am~ Americas  South Am~ Sove~   8.51e6  2.04e8    75.0
## 10 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
## # ... with 12 more rows, and 2 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>

Another option (3):

small_countries <- world[world$area_km2 < 10000, ]
medium_countries <- world[world$area_km2>=10000 & world$area_km2<100000,]
(populated_countries <- world[world$pop > 100000000,])
## Simple feature collection with 22 features and 10 fields (with 10 geometries empty)
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -33.76838 xmax: 180 ymax: 81.2504
## geographic CRS: WGS 84
## # A tibble: 22 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
##  2 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  3 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
##  4 RU     Russian ~ Europe    Europe    Eastern ~ Sove~   1.70e7  1.44e8    70.7
##  5 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
##  6 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
##  7 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
##  8 MX     Mexico    North Am~ Americas  Central ~ Sove~   1.97e6  1.24e8    76.8
##  9 BR     Brazil    South Am~ Americas  South Am~ Sove~   8.51e6  2.04e8    75.0
## 10 <NA>   <NA>      <NA>      <NA>      <NA>      <NA>   NA      NA         NA  
## # ... with 12 more rows, and 2 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>
# Or you can use the command subset()
small_countries <- subset(world, area_km2 < 10000)
medium_countries <- subset(world, area_km2>=10000 & area_km2<100000)
(populated_countries <- subset(world, pop > 100000000))
## Simple feature collection with 12 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -33.76838 xmax: 180 ymax: 81.2504
## geographic CRS: WGS 84
## # A tibble: 12 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2    pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>  <dbl>   <dbl>
##  1 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6 3.19e8    78.8
##  2 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6 2.55e8    68.9
##  3 RU     Russian ~ Europe    Europe    Eastern ~ Sove~   1.70e7 1.44e8    70.7
##  4 MX     Mexico    North Am~ Americas  Central ~ Sove~   1.97e6 1.24e8    76.8
##  5 BR     Brazil    South Am~ Americas  South Am~ Sove~   8.51e6 2.04e8    75.0
##  6 NG     Nigeria   Africa    Africa    Western ~ Sove~   9.05e5 1.76e8    52.5
##  7 IN     India     Asia      Asia      Southern~ Sove~   3.14e6 1.29e9    68.0
##  8 BD     Banglade~ Asia      Asia      Southern~ Sove~   1.34e5 1.59e8    71.8
##  9 PK     Pakistan  Asia      Asia      Southern~ Sove~   8.74e5 1.86e8    66.1
## 10 CN     China     Asia      Asia      Eastern ~ Coun~   9.41e6 1.36e9    75.9
## 11 PH     Philippi~ Asia      Asia      South-Ea~ Sove~   2.92e5 1.00e8    68.8
## 12 JP     Japan     Asia      Asia      Eastern ~ Sove~   4.05e5 1.27e8    83.6
## # ... with 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>

Subsetting and getting some dataframes:

### subsetting columns
world1 <- dplyr::select(world, name_long, pop)
names(world1)
## [1] "name_long" "pop"       "geom"
world1
## Simple feature collection with 177 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 3
##    name_long           pop                                                  geom
##    <chr>             <dbl>                                    <MULTIPOLYGON [°]>
##  1 Fiji             885806 (((180 -16.06713, 180 -16.55522, 179.3641 -16.80135,~
##  2 Tanzania       52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.09~
##  3 Western Saha~        NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 27~
##  4 Canada         35535348 (((-122.84 49, -122.9742 49.00254, -124.9102 49.9845~
##  5 United States 318622525 (((-122.84 49, -120 49, -117.0312 49, -116.0482 49, ~
##  6 Kazakhstan     17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48~
##  7 Uzbekistan     30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45~
##  8 Papua New Gu~   7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 -~
##  9 Indonesia     255131116 (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339 ~
## 10 Argentina      42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85,~
## # ... with 167 more rows
# all columns between name_long and pop (inclusive)
world2 <- dplyr::select(world, name_long:pop)
world2
## Simple feature collection with 177 features and 7 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 8
##    name_long continent region_un subregion type  area_km2     pop
##    <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>
##  1 Fiji      Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5
##  2 Tanzania  Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7
##  3 Western ~ Africa    Africa    Northern~ Inde~   9.63e4 NA     
##  4 Canada    North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7
##  5 United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8
##  6 Kazakhst~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7
##  7 Uzbekist~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7
##  8 Papua Ne~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6
##  9 Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8
## 10 Argentina South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7
## # ... with 167 more rows, and 1 more variable: geom <MULTIPOLYGON [°]>
# all columns except subregion and area_km2 (inclusive)
world3 <- dplyr::select(world, -subregion, -area_km2)
world3
## Simple feature collection with 177 features and 8 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 9
##    iso_a2 name_long continent region_un type      pop lifeExp gdpPercap
##    <chr>  <chr>     <chr>     <chr>     <chr>   <dbl>   <dbl>     <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Sove~  8.86e5    70.0     8222.
##  2 TZ     Tanzania  Africa    Africa    Sove~  5.22e7    64.2     2402.
##  3 EH     Western ~ Africa    Africa    Inde~ NA         NA         NA 
##  4 CA     Canada    North Am~ Americas  Sove~  3.55e7    82.0    43079.
##  5 US     United S~ North Am~ Americas  Coun~  3.19e8    78.8    51922.
##  6 KZ     Kazakhst~ Asia      Asia      Sove~  1.73e7    71.6    23587.
##  7 UZ     Uzbekist~ Asia      Asia      Sove~  3.08e7    71.0     5371.
##  8 PG     Papua Ne~ Oceania   Oceania   Sove~  7.76e6    65.2     3709.
##  9 ID     Indonesia Asia      Asia      Sove~  2.55e8    68.9    10003.
## 10 AR     Argentina South Am~ Americas  Sove~  4.30e7    76.3    18798.
## # ... with 167 more rows, and 1 more variable: geom <MULTIPOLYGON [°]>
world4 <- dplyr::select(world, name_long, population = pop)
names(world4)
## [1] "name_long"  "population" "geom"
world4
## Simple feature collection with 177 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 3
##    name_long     population                                                 geom
##    <chr>              <dbl>                                   <MULTIPOLYGON [°]>
##  1 Fiji              885806 (((180 -16.06713, 180 -16.55522, 179.3641 -16.80135~
##  2 Tanzania        52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.0~
##  3 Western Saha~         NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 2~
##  4 Canada          35535348 (((-122.84 49, -122.9742 49.00254, -124.9102 49.984~
##  5 United States  318622525 (((-122.84 49, -120 49, -117.0312 49, -116.0482 49,~
##  6 Kazakhstan      17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 4~
##  7 Uzbekistan      30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 4~
##  8 Papua New Gu~    7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 ~
##  9 Indonesia      255131116 (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339~
## 10 Argentina       42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85~
## # ... with 167 more rows
world5 <- world[, c("name_long", "pop")] # subset columns by name
names(world5)[names(world5) == "pop"] = "population" # rename column manually
world5
## Simple feature collection with 177 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 3
##    name_long     population                                                 geom
##    <chr>              <dbl>                                   <MULTIPOLYGON [°]>
##  1 Fiji              885806 (((180 -16.06713, 180 -16.55522, 179.3641 -16.80135~
##  2 Tanzania        52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.0~
##  3 Western Saha~         NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 2~
##  4 Canada          35535348 (((-122.84 49, -122.9742 49.00254, -124.9102 49.984~
##  5 United States  318622525 (((-122.84 49, -120 49, -117.0312 49, -116.0482 49,~
##  6 Kazakhstan      17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 4~
##  7 Uzbekistan      30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 4~
##  8 Papua New Gu~    7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 ~
##  9 Indonesia      255131116 (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339~
## 10 Argentina       42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85~
## # ... with 167 more rows
# Countries with a life expectancy longer than 82 years
world6 <- filter(world, lifeExp > 80)
world6
## Simple feature collection with 24 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -140.9978 ymin: -46.64124 xmax: 178.5171 ymax: 83.23324
## geographic CRS: WGS 84
## # A tibble: 24 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2    pop lifeExp
##  * <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>  <dbl>   <dbl>
##  1 CA     Canada    North Am~ Americas  Northern~ Sove~   1.00e7 3.55e7    82.0
##  2 IL     Israel    Asia      Asia      Western ~ Coun~   2.30e4 8.22e6    82.2
##  3 KR     Republic~ Asia      Asia      Eastern ~ Sove~   9.90e4 5.07e7    81.7
##  4 SE     Sweden    Europe    Europe    Northern~ Sove~   4.51e5 9.70e6    82.3
##  5 AT     Austria   Europe    Europe    Western ~ Sove~   8.51e4 8.55e6    81.5
##  6 DE     Germany   Europe    Europe    Western ~ Sove~   3.57e5 8.10e7    81.1
##  7 GR     Greece    Europe    Europe    Southern~ Sove~   1.32e5 1.09e7    81.4
##  8 CH     Switzerl~ Europe    Europe    Western ~ Sove~   4.62e4 8.19e6    83.2
##  9 LU     Luxembou~ Europe    Europe    Western ~ Sove~   2.42e3 5.56e5    82.2
## 10 BE     Belgium   Europe    Europe    Western ~ Sove~   3.01e4 1.12e7    81.3
## # ... with 14 more rows, and 2 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>
world7 <- world %>%
  filter(continent == "Asia") %>%
  dplyr::select(name_long, continent) %>%
  slice(1:5)
world7
## Simple feature collection with 5 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 34.26543 ymin: -10.35999 xmax: 141.0339 ymax: 55.38525
## geographic CRS: WGS 84
## # A tibble: 5 x 3
##   name_long   continent                                                     geom
##   <chr>       <chr>                                           <MULTIPOLYGON [°]>
## 1 Kazakhstan  Asia      (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48.45~
## 2 Uzbekistan  Asia      (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45.58~
## 3 Indonesia   Asia      (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339 -9.~
## 4 Timor-Leste Asia      (((124.9687 -8.89279, 125.0862 -8.656887, 125.9471 -8.4~
## 5 Israel      Asia      (((35.71992 32.70919, 35.54567 32.39399, 35.18393 32.53~
world8 <- world %>%
  filter(continent == "South America") %>%
  dplyr::select(name_long, continent) %>%
  slice(1:5)
world8
## Simple feature collection with 5 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -75.6444 ymin: -55.61183 xmax: -34.72999 ymax: 5.244486
## geographic CRS: WGS 84
## # A tibble: 5 x 3
##   name_long     continent                                                   geom
##   <chr>         <chr>                                         <MULTIPOLYGON [°]>
## 1 Argentina     South Amer~ (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85~
## 2 Chile         South Amer~ (((-68.63401 -52.63637, -68.63335 -54.8695, -67.562~
## 3 Falkland Isl~ South Amer~ (((-61.2 -51.85, -60 -51.25, -59.15 -51.5, -58.55 -~
## 4 Uruguay       South Amer~ (((-57.62513 -30.21629, -56.97603 -30.10969, -55.97~
## 5 Brazil        South Amer~ (((-53.37366 -33.76838, -53.65054 -33.202, -53.2095~

Aggregation

We can aggregate some data:

world_agg1 = aggregate(pop ~ continent, FUN = sum, data = world, na.rm = TRUE)
class(world_agg1)
## [1] "data.frame"
world_agg1
##       continent        pop
## 1        Africa 1154946633
## 2          Asia 4311408059
## 3        Europe  669036256
## 4 North America  565028684
## 5       Oceania   37757833
## 6 South America  412060811
names(world)
##  [1] "iso_a2"    "name_long" "continent" "region_un" "subregion" "type"     
##  [7] "area_km2"  "pop"       "lifeExp"   "gdpPercap" "geom"
world_agg2 = aggregate(world[c("pop", "area_km2")], by = list(world$continent),
                       FUN = sum, na.rm = TRUE)
class(world_agg2)
## [1] "sf"         "data.frame"
world_agg2
## Simple feature collection with 8 features and 3 fields
## Attribute-geometry relationship: 0 constant, 2 aggregate, 1 identity
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
##                   Group.1        pop    area_km2                       geometry
## 1                  Africa 1154946633 29946197.81 MULTIPOLYGON (((32.83012 -2...
## 2              Antarctica          0 12335956.08 MULTIPOLYGON (((-163.7129 -...
## 3                    Asia 4311408059 31252459.39 MULTIPOLYGON (((120.295 -10...
## 4                  Europe  669036256 23065218.79 MULTIPOLYGON (((-51.6578 4....
## 5           North America  565028684 24484309.37 MULTIPOLYGON (((-61.68 10.7...
## 6                 Oceania   37757833  8504488.66 MULTIPOLYGON (((169.6678 -4...
## 7 Seven seas (open ocean)          0    11602.57 POLYGON ((68.935 -48.625, 6...
## 8           South America  412060811 17762592.17 MULTIPOLYGON (((-66.95992 -...
world_agg3 = world %>%
  group_by(continent) %>%
  summarize(pop = sum(pop, na.rm = TRUE),
            area = sum(area_km2, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
world_agg3
## Simple feature collection with 8 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 8 x 4
##   continent            pop    area                                          geom
##   <chr>              <dbl>   <dbl>                            <MULTIPOLYGON [°]>
## 1 Africa            1.15e9  2.99e7 (((32.83012 -26.74219, 32.58026 -27.47016, 3~
## 2 Antarctica        0.      1.23e7 (((-48.66062 -78.04702, -48.1514 -78.04707, ~
## 3 Asia              4.31e9  3.13e7 (((120.295 -10.25865, 118.9678 -9.557969, 11~
## 4 Europe            6.69e8  2.31e7 (((-51.6578 4.156232, -52.24934 3.241094, -5~
## 5 North America     5.65e8  2.45e7 (((-61.68 10.76, -61.105 10.89, -60.895 10.8~
## 6 Oceania           3.78e7  8.50e6 (((169.6678 -43.55533, 170.5249 -43.03169, 1~
## 7 Seven seas (op~   0.      1.16e4 (((68.935 -48.625, 69.58 -48.94, 70.525 -49.~
## 8 South America     4.12e8  1.78e7 (((-66.95992 -54.89681, -67.29103 -55.30124,~
world %>% 
  group_by(continent) %>%
  summarize(pop = sum(pop, na.rm = TRUE), 
            n = n())
## `summarise()` ungrouping output (override with `.groups` argument)
## Simple feature collection with 8 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 8 x 4
##   continent            pop     n                                            geom
##   <chr>              <dbl> <int>                              <MULTIPOLYGON [°]>
## 1 Africa            1.15e9    51 (((32.83012 -26.74219, 32.58026 -27.47016, 32.~
## 2 Antarctica        0.         1 (((-48.66062 -78.04702, -48.1514 -78.04707, -4~
## 3 Asia              4.31e9    47 (((120.295 -10.25865, 118.9678 -9.557969, 119.~
## 4 Europe            6.69e8    39 (((-51.6578 4.156232, -52.24934 3.241094, -52.~
## 5 North America     5.65e8    18 (((-61.68 10.76, -61.105 10.89, -60.895 10.855~
## 6 Oceania           3.78e7     7 (((169.6678 -43.55533, 170.5249 -43.03169, 171~
## 7 Seven seas (op~   0.         1 (((68.935 -48.625, 69.58 -48.94, 70.525 -49.06~
## 8 South America     4.12e8    13 (((-66.95992 -54.89681, -67.29103 -55.30124, -~
world %>% 
  dplyr::select(pop, area_km2, gdpPercap, continent) %>% 
  group_by(continent) %>% 
  summarize(pop = sum(pop, na.rm = TRUE),
            area = sum(area_km2, na.rm = TRUE),
            gdppc = mean(gdpPercap, na.rm = TRUE),
            n_countries = n()) %>% 
  top_n(n = 5, wt = pop) %>%
  arrange(desc(pop)) %>%
  st_drop_geometry()
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 5 x 5
##   continent            pop      area  gdppc n_countries
## * <chr>              <dbl>     <dbl>  <dbl>       <int>
## 1 Asia          4311408059 31252459. 20026.          47
## 2 Africa        1154946633 29946198.  5042.          51
## 3 Europe         669036256 23065219. 29451.          39
## 4 North America  565028684 24484309. 18384.          18
## 5 South America  412060811 17762592. 13762.          13

Matching datasets

data(coffee_data)
data(world)
world_coffee <- left_join(world, coffee_data)
## Joining, by = "name_long"
world_coffee <- left_join(world, coffee_data, c("name_long"="name_long"))
class(world_coffee)
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"
world_coffee
## Simple feature collection with 177 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 13
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7    64.2
##  3 EH     Western ~ Africa    Africa    Northern~ Inde~   9.63e4 NA         NA  
##  4 CA     Canada    North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7    82.0
##  5 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7    76.3
## # ... with 167 more rows, and 4 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>, coffee_production_2016 <int>,
## #   coffee_production_2017 <int>
names(world_coffee)
##  [1] "iso_a2"                 "name_long"              "continent"             
##  [4] "region_un"              "subregion"              "type"                  
##  [7] "area_km2"               "pop"                    "lifeExp"               
## [10] "gdpPercap"              "geom"                   "coffee_production_2016"
## [13] "coffee_production_2017"
plot(world_coffee["coffee_production_2017"])

(coffee_renamed <- rename(coffee_data, nm = name_long))
## # A tibble: 47 x 3
##    nm                       coffee_production_2016 coffee_production_2017
##    <chr>                                     <int>                  <int>
##  1 Angola                                       NA                     NA
##  2 Bolivia                                       3                      4
##  3 Brazil                                     3277                   2786
##  4 Burundi                                      37                     38
##  5 Cameroon                                      8                      6
##  6 Central African Republic                     NA                     NA
##  7 Congo, Dem. Rep. of                           4                     12
##  8 Colombia                                   1330                   1169
##  9 Costa Rica                                   28                     32
## 10 Côte d'Ivoire                               114                    130
## # ... with 37 more rows
(world_coffee2 <- left_join(world, coffee_renamed, by = c("name_long" = "nm")))
## Simple feature collection with 177 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 13
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7    64.2
##  3 EH     Western ~ Africa    Africa    Northern~ Inde~   9.63e4 NA         NA  
##  4 CA     Canada    North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7    82.0
##  5 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7    76.3
## # ... with 167 more rows, and 4 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>, coffee_production_2016 <int>,
## #   coffee_production_2017 <int>
(world_coffee_inner <- inner_join(world, coffee_data))
## Joining, by = "name_long"
## Simple feature collection with 45 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -117.1278 ymin: -33.76838 xmax: 156.02 ymax: 35.49401
## geographic CRS: WGS 84
## # A tibble: 45 x 13
##    iso_a2 name_long continent region_un subregion type  area_km2    pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>  <dbl>   <dbl>
##  1 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~  932746. 5.22e7    64.2
##  2 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~  464520. 7.76e6    65.2
##  3 ID     Indonesia Asia      Asia      South-Ea~ Sove~ 1819251. 2.55e8    68.9
##  4 KE     Kenya     Africa    Africa    Eastern ~ Sove~  590837. 4.60e7    66.2
##  5 DO     Dominica~ North Am~ Americas  Caribbean Sove~   48158. 1.04e7    73.5
##  6 TL     Timor-Le~ Asia      Asia      South-Ea~ Sove~   14715. 1.21e6    68.3
##  7 MX     Mexico    North Am~ Americas  Central ~ Sove~ 1969480. 1.24e8    76.8
##  8 BR     Brazil    South Am~ Americas  South Am~ Sove~ 8508557. 2.04e8    75.0
##  9 BO     Bolivia   South Am~ Americas  South Am~ Sove~ 1085270. 1.06e7    68.4
## 10 PE     Peru      South Am~ Americas  South Am~ Sove~ 1309700. 3.10e7    74.5
## # ... with 35 more rows, and 4 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>, coffee_production_2016 <int>,
## #   coffee_production_2017 <int>
nrow(world_coffee_inner)
## [1] 45
(world_coffee_inner <- inner_join(world, coffee_data, c("name_long"="name_long")))
## Simple feature collection with 45 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -117.1278 ymin: -33.76838 xmax: 156.02 ymax: 35.49401
## geographic CRS: WGS 84
## # A tibble: 45 x 13
##    iso_a2 name_long continent region_un subregion type  area_km2    pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>  <dbl>   <dbl>
##  1 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~  932746. 5.22e7    64.2
##  2 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~  464520. 7.76e6    65.2
##  3 ID     Indonesia Asia      Asia      South-Ea~ Sove~ 1819251. 2.55e8    68.9
##  4 KE     Kenya     Africa    Africa    Eastern ~ Sove~  590837. 4.60e7    66.2
##  5 DO     Dominica~ North Am~ Americas  Caribbean Sove~   48158. 1.04e7    73.5
##  6 TL     Timor-Le~ Asia      Asia      South-Ea~ Sove~   14715. 1.21e6    68.3
##  7 MX     Mexico    North Am~ Americas  Central ~ Sove~ 1969480. 1.24e8    76.8
##  8 BR     Brazil    South Am~ Americas  South Am~ Sove~ 8508557. 2.04e8    75.0
##  9 BO     Bolivia   South Am~ Americas  South Am~ Sove~ 1085270. 1.06e7    68.4
## 10 PE     Peru      South Am~ Americas  South Am~ Sove~ 1309700. 3.10e7    74.5
## # ... with 35 more rows, and 4 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>, coffee_production_2016 <int>,
## #   coffee_production_2017 <int>
nrow(world_coffee_inner)
## [1] 45

Note that the result of inner_join() has only 45 rows compared with 47 in coffee_data. What happened to the remaining rows?

setdiff(coffee_data$name_long, world$name_long)
## [1] "Congo, Dem. Rep. of" "Others"
str_subset(world$name_long, "Dem*.+Congo")
## [1] "Democratic Republic of the Congo"
coffee_data$name_long[grepl("Congo,", coffee_data$name_long)] <- str_subset(world$name_long, "Dem*.+Congo")
world_coffee_match <- inner_join(world, coffee_data)
## Joining, by = "name_long"
nrow(world_coffee_match)
## [1] 46
coffee_world <- left_join(coffee_data, world)
## Joining, by = "name_long"
class(coffee_world)
## [1] "tbl_df"     "tbl"        "data.frame"

Creating and Removing information

world_new = world # do not overwrite our original data
world_new$pop_dens = world_new$pop / world_new$area_km2
world %>% 
  mutate(pop_dens = pop / area_km2)
## Simple feature collection with 177 features and 11 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 12
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##  * <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7    64.2
##  3 EH     Western ~ Africa    Africa    Northern~ Inde~   9.63e4 NA         NA  
##  4 CA     Canada    North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7    82.0
##  5 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7    76.3
## # ... with 167 more rows, and 3 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>, pop_dens <dbl>
world_new = world %>% 
  mutate(pop_dens = pop / area_km2)

world %>% 
  transmute(pop_dens = pop / area_km2)
## Simple feature collection with 177 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 2
##    pop_dens                                                                 geom
##  *    <dbl>                                                   <MULTIPOLYGON [°]>
##  1    45.9  (((180 -16.06713, 180 -16.55522, 179.3641 -16.80135, 178.7251 -17.0~
##  2    56.0  (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.09699, 37.7669 -3~
##  3    NA    (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 27.39574, -8.6872~
##  4     3.54 (((-122.84 49, -122.9742 49.00254, -124.9102 49.98456, -125.6246 50~
##  5    33.5  (((-122.84 49, -120 49, -117.0312 49, -116.0482 49, -113 49, -110.0~
##  6     6.33 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48.45575, 85.7204~
##  7    66.7  (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45.5868, 58.68999~
##  8    16.7  (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 -3.861418, 145.2~
##  9   140.   (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339 -9.117893, 140.~
## 10    15.4  (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85, -66.45 -54.45,~
## # ... with 167 more rows
# we want to combine the continent and region_un columns into a new column
world_unite <- world %>%
  unite("con_reg", continent:region_un, sep = ":", remove = TRUE)
world_unite
## Simple feature collection with 177 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 10
##    iso_a2 name_long con_reg subregion type  area_km2     pop lifeExp gdpPercap
##    <chr>  <chr>     <chr>   <chr>     <chr>    <dbl>   <dbl>   <dbl>     <dbl>
##  1 FJ     Fiji      Oceani~ Melanesia Sove~   1.93e4  8.86e5    70.0     8222.
##  2 TZ     Tanzania  Africa~ Eastern ~ Sove~   9.33e5  5.22e7    64.2     2402.
##  3 EH     Western ~ Africa~ Northern~ Inde~   9.63e4 NA         NA         NA 
##  4 CA     Canada    North ~ Northern~ Sove~   1.00e7  3.55e7    82.0    43079.
##  5 US     United S~ North ~ Northern~ Coun~   9.51e6  3.19e8    78.8    51922.
##  6 KZ     Kazakhst~ Asia:A~ Central ~ Sove~   2.73e6  1.73e7    71.6    23587.
##  7 UZ     Uzbekist~ Asia:A~ Central ~ Sove~   4.61e5  3.08e7    71.0     5371.
##  8 PG     Papua Ne~ Oceani~ Melanesia Sove~   4.65e5  7.76e6    65.2     3709.
##  9 ID     Indonesia Asia:A~ South-Ea~ Sove~   1.82e6  2.55e8    68.9    10003.
## 10 AR     Argentina South ~ South Am~ Sove~   2.78e6  4.30e7    76.3    18798.
## # ... with 167 more rows, and 1 more variable: geom <MULTIPOLYGON [°]>
world_separate <- world_unite %>% 
  separate(con_reg, c("continent", "region_un"), sep = ":")
world_separate
## Simple feature collection with 177 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji      Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5    70.0
##  2 TZ     Tanzania  Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7    64.2
##  3 EH     Western ~ Africa    Africa    Northern~ Inde~   9.63e4 NA         NA  
##  4 CA     Canada    North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7    82.0
##  5 US     United S~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  6 KZ     Kazakhst~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7    71.6
##  7 UZ     Uzbekist~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7    71.0
##  8 PG     Papua Ne~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6    65.2
##  9 ID     Indonesia Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
## 10 AR     Argentina South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7    76.3
## # ... with 167 more rows, and 2 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>
world %>% 
  rename(name = name_long)
## Simple feature collection with 177 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 11
##    iso_a2 name  continent region_un subregion type  area_km2     pop lifeExp
##    <chr>  <chr> <chr>     <chr>     <chr>     <chr>    <dbl>   <dbl>   <dbl>
##  1 FJ     Fiji  Oceania   Oceania   Melanesia Sove~   1.93e4  8.86e5    70.0
##  2 TZ     Tanz~ Africa    Africa    Eastern ~ Sove~   9.33e5  5.22e7    64.2
##  3 EH     West~ Africa    Africa    Northern~ Inde~   9.63e4 NA         NA  
##  4 CA     Cana~ North Am~ Americas  Northern~ Sove~   1.00e7  3.55e7    82.0
##  5 US     Unit~ North Am~ Americas  Northern~ Coun~   9.51e6  3.19e8    78.8
##  6 KZ     Kaza~ Asia      Asia      Central ~ Sove~   2.73e6  1.73e7    71.6
##  7 UZ     Uzbe~ Asia      Asia      Central ~ Sove~   4.61e5  3.08e7    71.0
##  8 PG     Papu~ Oceania   Oceania   Melanesia Sove~   4.65e5  7.76e6    65.2
##  9 ID     Indo~ Asia      Asia      South-Ea~ Sove~   1.82e6  2.55e8    68.9
## 10 AR     Arge~ South Am~ Americas  South Am~ Sove~   2.78e6  4.30e7    76.3
## # ... with 167 more rows, and 2 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>
new_names <- c("i", "n", "c", "r", "s", "t", "a", "p", "l", "gP", "geom")

world %>% 
  setNames(new_names)
## Simple feature collection with 177 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 11
##    i     n     c     r     s     t          a       p     l     gP
##    <chr> <chr> <chr> <chr> <chr> <chr>  <dbl>   <dbl> <dbl>  <dbl>
##  1 FJ    Fiji  Ocea~ Ocea~ Mela~ Sove~ 1.93e4  8.86e5  70.0  8222.
##  2 TZ    Tanz~ Afri~ Afri~ East~ Sove~ 9.33e5  5.22e7  64.2  2402.
##  3 EH    West~ Afri~ Afri~ Nort~ Inde~ 9.63e4 NA       NA      NA 
##  4 CA    Cana~ Nort~ Amer~ Nort~ Sove~ 1.00e7  3.55e7  82.0 43079.
##  5 US    Unit~ Nort~ Amer~ Nort~ Coun~ 9.51e6  3.19e8  78.8 51922.
##  6 KZ    Kaza~ Asia  Asia  Cent~ Sove~ 2.73e6  1.73e7  71.6 23587.
##  7 UZ    Uzbe~ Asia  Asia  Cent~ Sove~ 4.61e5  3.08e7  71.0  5371.
##  8 PG    Papu~ Ocea~ Ocea~ Mela~ Sove~ 4.65e5  7.76e6  65.2  3709.
##  9 ID    Indo~ Asia  Asia  Sout~ Sove~ 1.82e6  2.55e8  68.9 10003.
## 10 AR    Arge~ Sout~ Amer~ Sout~ Sove~ 2.78e6  4.30e7  76.3 18798.
## # ... with 167 more rows, and 1 more variable: geom <MULTIPOLYGON [°]>

Manipulation Raster data

rm(list = ls())

elev = raster(nrows = 6, ncols = 6, res = 0.5, xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5, vals = 1:36)

grain_order = c("clay", "silt", "sand")
grain_char = sample(grain_order, 36, replace = TRUE)
grain_fact = factor(grain_char, levels = grain_order)
grain_fact
##  [1] sand silt silt clay sand sand clay silt silt silt clay clay clay clay clay
## [16] sand clay silt sand sand sand sand silt sand sand silt silt sand silt sand
## [31] clay clay silt sand silt sand
## Levels: clay silt sand
grain = raster(nrows = 6, ncols = 6, res = 0.5, xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5, vals = grain_fact)

# function levels() for retrieving and adding new factor levels to the attribute table
levels(grain)[[1]] = cbind(levels(grain)[[1]], wetness = c("wet", "moist", "dry"))
levels(grain)
## [[1]]
##   ID VALUE wetness
## 1  1  clay     wet
## 2  2  silt   moist
## 3  3  sand     dry
factorValues(grain, grain[c(1, 12, 30)])
##   VALUE wetness
## 1  sand     dry
## 2  clay     wet
## 3  sand     dry
plot(elev, col=c('#ffffe5','#fff7bc','#fee391','#fec44f','#fe9929','#ec7014','#cc4c02','#8c2d04'))

plot(grain, col=c('#ffffe5','#fff7bc','#fee391','#fec44f','#fe9929','#ec7014','#cc4c02','#8c2d04'))