In this class, we will learn about Attribute data related with spatial information. The objective in this session is to learn how to manipulate data based in its attributes, subsetting or aggregating them. First of all, we clean our enviroment with the command rm(), then we charge the libraries we are planning to use in this session.
rm(list=ls())
# list of libraries we are going to need for this class
library(sf) # for spatial features.
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(raster) # for raster data.
## Loading required package: sp
library(dplyr) # for data sample in general, like using mutate.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:raster':
##
## intersect, select, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr) # for working with strings (pattern matching)
library(tidyr) # for unite() and separate()
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:raster':
##
## extract
library(spData) # for upload dataset we use in this time
Remember that the library spData allows us to use the data world.
The attribute data is the non-spatial information which is associated with the geographic information we have, for example: latitude (indicates if a point is on the north or the south), longitude (indicates if a point is located on the east or in the west), elevation (indicates the height of a point in the map, related with rater data), and names. For more information about latitude and longitude, see this link: latitude_longitude.
sp= spatial features helps us to extend the manipulation of dataframes to sf objects. Remember that in sf objects we have a colummn where all the geographic information is storied.
## [1] $<- [ [[<-
## [4] aggregate anti_join arrange
## [7] as.data.frame cbind coerce
## [10] dbDataType dbWriteTable distinct
## [13] dplyr_reconstruct extent extract
## [16] filter full_join gather
## [19] group_by group_split identify
## [22] initialize inner_join left_join
## [25] mask merge mutate
## [28] nest plot print
## [31] raster rasterize rbind
## [34] rename right_join sample_frac
## [37] sample_n select semi_join
## [40] separate separate_rows show
## [43] slice slotsFromS3 spread
## [46] st_agr st_agr<- st_area
## [49] st_as_s2 st_as_sf st_bbox
## [52] st_boundary st_buffer st_cast
## [55] st_centroid st_collection_extract st_convex_hull
## [58] st_coordinates st_crop st_crs
## [61] st_crs<- st_difference st_filter
## [64] st_geometry st_geometry<- st_interpolate_aw
## [67] st_intersection st_intersects st_is
## [70] st_is_valid st_join st_line_merge
## [73] st_m_range st_make_valid st_nearest_points
## [76] st_node st_normalize st_point_on_surface
## [79] st_polygonize st_precision st_reverse
## [82] st_sample st_segmentize st_set_precision
## [85] st_shift_longitude st_simplify st_snap
## [88] st_sym_difference st_transform st_triangulate
## [91] st_union st_voronoi st_wrap_dateline
## [94] st_write st_z_range st_zm
## [97] summarise transform transmute
## [100] ungroup unite unnest
## see '?methods' for accessing help and source code
The function methods() lists the functions we can use with this command. Then, lets explore a bit the data we have:
data(world)
world
## Simple feature collection with 177 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7 64.2
## 3 EH Western ~ Africa Africa Northern~ Inde~ 9.63e4 NA NA
## 4 CA Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 5 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7 71.0
## 8 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7 76.3
## # ... with 167 more rows, and 2 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>
dim(world) # it is a 2 dimensional object, with rows and columns
## [1] 177 11
nrow(world) # how many rows?
## [1] 177
ncol(world) # how many columns?
## [1] 11
Into the 11 columns we have here, there is 1 with geographic information. We can extract the non-spatial data by using the following command:
world_df = st_drop_geometry(world)
class(world_df)
## [1] "tbl_df" "tbl" "data.frame"
world_df
## # A tibble: 177 x 10
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## * <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7 64.2
## 3 EH Western ~ Africa Africa Northern~ Inde~ 9.63e4 NA NA
## 4 CA Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 5 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7 71.0
## 8 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7 76.3
## # ... with 167 more rows, and 1 more variable: gdpPercap <dbl>
tbl stands for tabulated data, which is very easy to analyze and print.
Subsetting geographic data is the same as what we already learn previously:
world[1:15, ] # subset rows by position
## Simple feature collection with 15 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -55.61183 xmax: 180 ymax: 83.23324
## geographic CRS: WGS 84
## # A tibble: 15 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7 64.2
## 3 EH Western ~ Africa Africa Northern~ Inde~ 9.63e4 NA NA
## 4 CA Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 5 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7 71.0
## 8 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7 76.3
## 11 CL Chile South Am~ Americas South Am~ Sove~ 8.15e5 1.76e7 79.1
## 12 CD Democrat~ Africa Africa Middle A~ Sove~ 2.32e6 7.37e7 58.8
## 13 SO Somalia Africa Africa Eastern ~ Sove~ 4.84e5 1.35e7 55.5
## 14 KE Kenya Africa Africa Eastern ~ Sove~ 5.91e5 4.60e7 66.2
## 15 SD Sudan Africa Africa Northern~ Sove~ 1.85e6 3.77e7 64.0
## # ... with 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>
world[, 1:5] # subset columns by position
## Simple feature collection with 177 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 6
## iso_a2 name_long continent region_un subregion geom
## <chr> <chr> <chr> <chr> <chr> <MULTIPOLYGON [°]>
## 1 FJ Fiji Oceania Oceania Melanesia (((180 -16.06713, 180 -16.5~
## 2 TZ Tanzania Africa Africa Eastern ~ (((33.90371 -0.95, 34.07262~
## 3 EH Western S~ Africa Africa Northern~ (((-8.66559 27.65643, -8.66~
## 4 CA Canada North Ame~ Americas Northern~ (((-122.84 49, -122.9742 49~
## 5 US United St~ North Ame~ Americas Northern~ (((-122.84 49, -120 49, -11~
## 6 KZ Kazakhstan Asia Asia Central ~ (((87.35997 49.21498, 86.59~
## 7 UZ Uzbekistan Asia Asia Central ~ (((55.96819 41.30864, 55.92~
## 8 PG Papua New~ Oceania Oceania Melanesia (((141.0002 -2.600151, 142.~
## 9 ID Indonesia Asia Asia South-Ea~ (((141.0002 -2.600151, 141.~
## 10 AR Argentina South Ame~ Americas South Am~ (((-68.63401 -52.63637, -68~
## # ... with 167 more rows
world[, c("name_long", "lifeExp", "pop")] # subset columns by name
## Simple feature collection with 177 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 4
## name_long lifeExp pop geom
## <chr> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 Fiji 70.0 8.86e5 (((180 -16.06713, 180 -16.55522, 179.3641 -16.~
## 2 Tanzania 64.2 5.22e7 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869~
## 3 Western Sah~ NA NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6~
## 4 Canada 82.0 3.55e7 (((-122.84 49, -122.9742 49.00254, -124.9102 4~
## 5 United Stat~ 78.8 3.19e8 (((-122.84 49, -120 49, -117.0312 49, -116.048~
## 6 Kazakhstan 71.6 1.73e7 (((87.35997 49.21498, 86.59878 48.54918, 85.76~
## 7 Uzbekistan 71.0 3.08e7 (((55.96819 41.30864, 55.92892 44.99586, 58.50~
## 8 Papua New G~ 65.2 7.76e6 (((141.0002 -2.600151, 142.7352 -3.289153, 144~
## 9 Indonesia 68.9 2.55e8 (((141.0002 -2.600151, 141.0171 -5.859022, 141~
## 10 Argentina 76.3 4.30e7 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -~
## # ... with 167 more rows
Or we can do in this way (2) Subbsetting by the size of the country,
(sel_area <- world$area_km2 < 10000)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
summary(sel_area)
## Mode FALSE TRUE
## logical 170 7
(small_countries <- world[sel_area, ])
## Simple feature collection with 7 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -67.24243 ymin: -16.59785 xmax: 167.8449 ymax: 50.12805
## geographic CRS: WGS 84
## # A tibble: 7 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 PR Puerto R~ North Am~ Americas Caribbean Depe~ 9225. 3534874 79.4
## 2 PS Palestine Asia Asia Western ~ Disp~ 5037. 4294682 73.1
## 3 VU Vanuatu Oceania Oceania Melanesia Sove~ 7490. 258850 71.7
## 4 LU Luxembou~ Europe Europe Western ~ Sove~ 2417. 556319 82.2
## 5 <NA> Northern~ Asia Asia Western ~ Sove~ 3786. NA NA
## 6 CY Cyprus Asia Asia Western ~ Sove~ 6207. 1152309 80.2
## 7 TT Trinidad~ North Am~ Americas Caribbean Sove~ 7738. 1354493 70.4
## # ... with 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>
sel_area2 <- world$area_km2>=10000 & world$area_km2<100000
summary(sel_area2)
## Mode FALSE TRUE
## logical 115 62
(medium_countries <- world[sel_area2,])
## Simple feature collection with 62 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -52.3 xmax: 180 ymax: 59.61109
## geographic CRS: WGS 84
## # A tibble: 62 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 19290. 8.86e5 70.0
## 2 EH Western ~ Africa Africa Northern~ Inde~ 96271. NA NA
## 3 HT Haiti North Am~ Americas Caribbean Sove~ 28541. 1.06e7 62.8
## 4 DO Dominica~ North Am~ Americas Caribbean Sove~ 48158. 1.04e7 73.5
## 5 BS Bahamas North Am~ Americas Caribbean Sove~ 15585. 3.82e5 75.4
## 6 FK Falkland~ South Am~ Americas South Am~ Depe~ 16364. NA NA
## 7 TF French S~ Seven se~ Seven se~ Seven se~ Depe~ 11603. NA NA
## 8 TL Timor-Le~ Asia Asia South-Ea~ Sove~ 14715. 1.21e6 68.3
## 9 LS Lesotho Africa Africa Southern~ Sove~ 27506. 2.15e6 53.3
## 10 PA Panama North Am~ Americas Central ~ Sove~ 75265. 3.90e6 77.6
## # ... with 52 more rows, and 2 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>
Subsetting by the size of population:
sel_pop <- world$pop > 100000000
summary(sel_pop)
## Mode FALSE TRUE NA's
## logical 155 12 10
(populated_countries = world[sel_pop,])
## Simple feature collection with 22 features and 10 fields (with 10 geometries empty)
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -33.76838 xmax: 180 ymax: 81.2504
## geographic CRS: WGS 84
## # A tibble: 22 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 2 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 3 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 4 RU Russian ~ Europe Europe Eastern ~ Sove~ 1.70e7 1.44e8 70.7
## 5 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 6 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 7 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 8 MX Mexico North Am~ Americas Central ~ Sove~ 1.97e6 1.24e8 76.8
## 9 BR Brazil South Am~ Americas South Am~ Sove~ 8.51e6 2.04e8 75.0
## 10 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## # ... with 12 more rows, and 2 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>
Another option (3):
small_countries <- world[world$area_km2 < 10000, ]
medium_countries <- world[world$area_km2>=10000 & world$area_km2<100000,]
(populated_countries <- world[world$pop > 100000000,])
## Simple feature collection with 22 features and 10 fields (with 10 geometries empty)
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -33.76838 xmax: 180 ymax: 81.2504
## geographic CRS: WGS 84
## # A tibble: 22 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 2 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 3 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 4 RU Russian ~ Europe Europe Eastern ~ Sove~ 1.70e7 1.44e8 70.7
## 5 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 6 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 7 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 8 MX Mexico North Am~ Americas Central ~ Sove~ 1.97e6 1.24e8 76.8
## 9 BR Brazil South Am~ Americas South Am~ Sove~ 8.51e6 2.04e8 75.0
## 10 <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## # ... with 12 more rows, and 2 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>
# Or you can use the command subset()
small_countries <- subset(world, area_km2 < 10000)
medium_countries <- subset(world, area_km2>=10000 & area_km2<100000)
(populated_countries <- subset(world, pop > 100000000))
## Simple feature collection with 12 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -33.76838 xmax: 180 ymax: 81.2504
## geographic CRS: WGS 84
## # A tibble: 12 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 2 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 3 RU Russian ~ Europe Europe Eastern ~ Sove~ 1.70e7 1.44e8 70.7
## 4 MX Mexico North Am~ Americas Central ~ Sove~ 1.97e6 1.24e8 76.8
## 5 BR Brazil South Am~ Americas South Am~ Sove~ 8.51e6 2.04e8 75.0
## 6 NG Nigeria Africa Africa Western ~ Sove~ 9.05e5 1.76e8 52.5
## 7 IN India Asia Asia Southern~ Sove~ 3.14e6 1.29e9 68.0
## 8 BD Banglade~ Asia Asia Southern~ Sove~ 1.34e5 1.59e8 71.8
## 9 PK Pakistan Asia Asia Southern~ Sove~ 8.74e5 1.86e8 66.1
## 10 CN China Asia Asia Eastern ~ Coun~ 9.41e6 1.36e9 75.9
## 11 PH Philippi~ Asia Asia South-Ea~ Sove~ 2.92e5 1.00e8 68.8
## 12 JP Japan Asia Asia Eastern ~ Sove~ 4.05e5 1.27e8 83.6
## # ... with 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>
Subsetting and getting some dataframes:
### subsetting columns
world1 <- dplyr::select(world, name_long, pop)
names(world1)
## [1] "name_long" "pop" "geom"
world1
## Simple feature collection with 177 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 3
## name_long pop geom
## <chr> <dbl> <MULTIPOLYGON [°]>
## 1 Fiji 885806 (((180 -16.06713, 180 -16.55522, 179.3641 -16.80135,~
## 2 Tanzania 52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.09~
## 3 Western Saha~ NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 27~
## 4 Canada 35535348 (((-122.84 49, -122.9742 49.00254, -124.9102 49.9845~
## 5 United States 318622525 (((-122.84 49, -120 49, -117.0312 49, -116.0482 49, ~
## 6 Kazakhstan 17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48~
## 7 Uzbekistan 30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45~
## 8 Papua New Gu~ 7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 -~
## 9 Indonesia 255131116 (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339 ~
## 10 Argentina 42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85,~
## # ... with 167 more rows
# all columns between name_long and pop (inclusive)
world2 <- dplyr::select(world, name_long:pop)
world2
## Simple feature collection with 177 features and 7 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 8
## name_long continent region_un subregion type area_km2 pop
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5
## 2 Tanzania Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7
## 3 Western ~ Africa Africa Northern~ Inde~ 9.63e4 NA
## 4 Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7
## 5 United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8
## 6 Kazakhst~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7
## 7 Uzbekist~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7
## 8 Papua Ne~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6
## 9 Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8
## 10 Argentina South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7
## # ... with 167 more rows, and 1 more variable: geom <MULTIPOLYGON [°]>
# all columns except subregion and area_km2 (inclusive)
world3 <- dplyr::select(world, -subregion, -area_km2)
world3
## Simple feature collection with 177 features and 8 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 9
## iso_a2 name_long continent region_un type pop lifeExp gdpPercap
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Sove~ 8.86e5 70.0 8222.
## 2 TZ Tanzania Africa Africa Sove~ 5.22e7 64.2 2402.
## 3 EH Western ~ Africa Africa Inde~ NA NA NA
## 4 CA Canada North Am~ Americas Sove~ 3.55e7 82.0 43079.
## 5 US United S~ North Am~ Americas Coun~ 3.19e8 78.8 51922.
## 6 KZ Kazakhst~ Asia Asia Sove~ 1.73e7 71.6 23587.
## 7 UZ Uzbekist~ Asia Asia Sove~ 3.08e7 71.0 5371.
## 8 PG Papua Ne~ Oceania Oceania Sove~ 7.76e6 65.2 3709.
## 9 ID Indonesia Asia Asia Sove~ 2.55e8 68.9 10003.
## 10 AR Argentina South Am~ Americas Sove~ 4.30e7 76.3 18798.
## # ... with 167 more rows, and 1 more variable: geom <MULTIPOLYGON [°]>
world4 <- dplyr::select(world, name_long, population = pop)
names(world4)
## [1] "name_long" "population" "geom"
world4
## Simple feature collection with 177 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 3
## name_long population geom
## <chr> <dbl> <MULTIPOLYGON [°]>
## 1 Fiji 885806 (((180 -16.06713, 180 -16.55522, 179.3641 -16.80135~
## 2 Tanzania 52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.0~
## 3 Western Saha~ NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 2~
## 4 Canada 35535348 (((-122.84 49, -122.9742 49.00254, -124.9102 49.984~
## 5 United States 318622525 (((-122.84 49, -120 49, -117.0312 49, -116.0482 49,~
## 6 Kazakhstan 17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 4~
## 7 Uzbekistan 30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 4~
## 8 Papua New Gu~ 7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 ~
## 9 Indonesia 255131116 (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339~
## 10 Argentina 42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85~
## # ... with 167 more rows
world5 <- world[, c("name_long", "pop")] # subset columns by name
names(world5)[names(world5) == "pop"] = "population" # rename column manually
world5
## Simple feature collection with 177 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 3
## name_long population geom
## <chr> <dbl> <MULTIPOLYGON [°]>
## 1 Fiji 885806 (((180 -16.06713, 180 -16.55522, 179.3641 -16.80135~
## 2 Tanzania 52234869 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.0~
## 3 Western Saha~ NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 2~
## 4 Canada 35535348 (((-122.84 49, -122.9742 49.00254, -124.9102 49.984~
## 5 United States 318622525 (((-122.84 49, -120 49, -117.0312 49, -116.0482 49,~
## 6 Kazakhstan 17288285 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 4~
## 7 Uzbekistan 30757700 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 4~
## 8 Papua New Gu~ 7755785 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 ~
## 9 Indonesia 255131116 (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339~
## 10 Argentina 42981515 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85~
## # ... with 167 more rows
# Countries with a life expectancy longer than 82 years
world6 <- filter(world, lifeExp > 80)
world6
## Simple feature collection with 24 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -140.9978 ymin: -46.64124 xmax: 178.5171 ymax: 83.23324
## geographic CRS: WGS 84
## # A tibble: 24 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## * <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 CA Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 2 IL Israel Asia Asia Western ~ Coun~ 2.30e4 8.22e6 82.2
## 3 KR Republic~ Asia Asia Eastern ~ Sove~ 9.90e4 5.07e7 81.7
## 4 SE Sweden Europe Europe Northern~ Sove~ 4.51e5 9.70e6 82.3
## 5 AT Austria Europe Europe Western ~ Sove~ 8.51e4 8.55e6 81.5
## 6 DE Germany Europe Europe Western ~ Sove~ 3.57e5 8.10e7 81.1
## 7 GR Greece Europe Europe Southern~ Sove~ 1.32e5 1.09e7 81.4
## 8 CH Switzerl~ Europe Europe Western ~ Sove~ 4.62e4 8.19e6 83.2
## 9 LU Luxembou~ Europe Europe Western ~ Sove~ 2.42e3 5.56e5 82.2
## 10 BE Belgium Europe Europe Western ~ Sove~ 3.01e4 1.12e7 81.3
## # ... with 14 more rows, and 2 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>
world7 <- world %>%
filter(continent == "Asia") %>%
dplyr::select(name_long, continent) %>%
slice(1:5)
world7
## Simple feature collection with 5 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 34.26543 ymin: -10.35999 xmax: 141.0339 ymax: 55.38525
## geographic CRS: WGS 84
## # A tibble: 5 x 3
## name_long continent geom
## <chr> <chr> <MULTIPOLYGON [°]>
## 1 Kazakhstan Asia (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48.45~
## 2 Uzbekistan Asia (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45.58~
## 3 Indonesia Asia (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339 -9.~
## 4 Timor-Leste Asia (((124.9687 -8.89279, 125.0862 -8.656887, 125.9471 -8.4~
## 5 Israel Asia (((35.71992 32.70919, 35.54567 32.39399, 35.18393 32.53~
world8 <- world %>%
filter(continent == "South America") %>%
dplyr::select(name_long, continent) %>%
slice(1:5)
world8
## Simple feature collection with 5 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -75.6444 ymin: -55.61183 xmax: -34.72999 ymax: 5.244486
## geographic CRS: WGS 84
## # A tibble: 5 x 3
## name_long continent geom
## <chr> <chr> <MULTIPOLYGON [°]>
## 1 Argentina South Amer~ (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85~
## 2 Chile South Amer~ (((-68.63401 -52.63637, -68.63335 -54.8695, -67.562~
## 3 Falkland Isl~ South Amer~ (((-61.2 -51.85, -60 -51.25, -59.15 -51.5, -58.55 -~
## 4 Uruguay South Amer~ (((-57.62513 -30.21629, -56.97603 -30.10969, -55.97~
## 5 Brazil South Amer~ (((-53.37366 -33.76838, -53.65054 -33.202, -53.2095~
We can aggregate some data:
world_agg1 = aggregate(pop ~ continent, FUN = sum, data = world, na.rm = TRUE)
class(world_agg1)
## [1] "data.frame"
world_agg1
## continent pop
## 1 Africa 1154946633
## 2 Asia 4311408059
## 3 Europe 669036256
## 4 North America 565028684
## 5 Oceania 37757833
## 6 South America 412060811
names(world)
## [1] "iso_a2" "name_long" "continent" "region_un" "subregion" "type"
## [7] "area_km2" "pop" "lifeExp" "gdpPercap" "geom"
world_agg2 = aggregate(world[c("pop", "area_km2")], by = list(world$continent),
FUN = sum, na.rm = TRUE)
class(world_agg2)
## [1] "sf" "data.frame"
world_agg2
## Simple feature collection with 8 features and 3 fields
## Attribute-geometry relationship: 0 constant, 2 aggregate, 1 identity
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## Group.1 pop area_km2 geometry
## 1 Africa 1154946633 29946197.81 MULTIPOLYGON (((32.83012 -2...
## 2 Antarctica 0 12335956.08 MULTIPOLYGON (((-163.7129 -...
## 3 Asia 4311408059 31252459.39 MULTIPOLYGON (((120.295 -10...
## 4 Europe 669036256 23065218.79 MULTIPOLYGON (((-51.6578 4....
## 5 North America 565028684 24484309.37 MULTIPOLYGON (((-61.68 10.7...
## 6 Oceania 37757833 8504488.66 MULTIPOLYGON (((169.6678 -4...
## 7 Seven seas (open ocean) 0 11602.57 POLYGON ((68.935 -48.625, 6...
## 8 South America 412060811 17762592.17 MULTIPOLYGON (((-66.95992 -...
world_agg3 = world %>%
group_by(continent) %>%
summarize(pop = sum(pop, na.rm = TRUE),
area = sum(area_km2, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
world_agg3
## Simple feature collection with 8 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 8 x 4
## continent pop area geom
## <chr> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 Africa 1.15e9 2.99e7 (((32.83012 -26.74219, 32.58026 -27.47016, 3~
## 2 Antarctica 0. 1.23e7 (((-48.66062 -78.04702, -48.1514 -78.04707, ~
## 3 Asia 4.31e9 3.13e7 (((120.295 -10.25865, 118.9678 -9.557969, 11~
## 4 Europe 6.69e8 2.31e7 (((-51.6578 4.156232, -52.24934 3.241094, -5~
## 5 North America 5.65e8 2.45e7 (((-61.68 10.76, -61.105 10.89, -60.895 10.8~
## 6 Oceania 3.78e7 8.50e6 (((169.6678 -43.55533, 170.5249 -43.03169, 1~
## 7 Seven seas (op~ 0. 1.16e4 (((68.935 -48.625, 69.58 -48.94, 70.525 -49.~
## 8 South America 4.12e8 1.78e7 (((-66.95992 -54.89681, -67.29103 -55.30124,~
world %>%
group_by(continent) %>%
summarize(pop = sum(pop, na.rm = TRUE),
n = n())
## `summarise()` ungrouping output (override with `.groups` argument)
## Simple feature collection with 8 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 8 x 4
## continent pop n geom
## <chr> <dbl> <int> <MULTIPOLYGON [°]>
## 1 Africa 1.15e9 51 (((32.83012 -26.74219, 32.58026 -27.47016, 32.~
## 2 Antarctica 0. 1 (((-48.66062 -78.04702, -48.1514 -78.04707, -4~
## 3 Asia 4.31e9 47 (((120.295 -10.25865, 118.9678 -9.557969, 119.~
## 4 Europe 6.69e8 39 (((-51.6578 4.156232, -52.24934 3.241094, -52.~
## 5 North America 5.65e8 18 (((-61.68 10.76, -61.105 10.89, -60.895 10.855~
## 6 Oceania 3.78e7 7 (((169.6678 -43.55533, 170.5249 -43.03169, 171~
## 7 Seven seas (op~ 0. 1 (((68.935 -48.625, 69.58 -48.94, 70.525 -49.06~
## 8 South America 4.12e8 13 (((-66.95992 -54.89681, -67.29103 -55.30124, -~
world %>%
dplyr::select(pop, area_km2, gdpPercap, continent) %>%
group_by(continent) %>%
summarize(pop = sum(pop, na.rm = TRUE),
area = sum(area_km2, na.rm = TRUE),
gdppc = mean(gdpPercap, na.rm = TRUE),
n_countries = n()) %>%
top_n(n = 5, wt = pop) %>%
arrange(desc(pop)) %>%
st_drop_geometry()
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 5 x 5
## continent pop area gdppc n_countries
## * <chr> <dbl> <dbl> <dbl> <int>
## 1 Asia 4311408059 31252459. 20026. 47
## 2 Africa 1154946633 29946198. 5042. 51
## 3 Europe 669036256 23065219. 29451. 39
## 4 North America 565028684 24484309. 18384. 18
## 5 South America 412060811 17762592. 13762. 13
data(coffee_data)
data(world)
world_coffee <- left_join(world, coffee_data)
## Joining, by = "name_long"
world_coffee <- left_join(world, coffee_data, c("name_long"="name_long"))
class(world_coffee)
## [1] "sf" "tbl_df" "tbl" "data.frame"
world_coffee
## Simple feature collection with 177 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 13
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7 64.2
## 3 EH Western ~ Africa Africa Northern~ Inde~ 9.63e4 NA NA
## 4 CA Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 5 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7 71.0
## 8 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7 76.3
## # ... with 167 more rows, and 4 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>, coffee_production_2016 <int>,
## # coffee_production_2017 <int>
names(world_coffee)
## [1] "iso_a2" "name_long" "continent"
## [4] "region_un" "subregion" "type"
## [7] "area_km2" "pop" "lifeExp"
## [10] "gdpPercap" "geom" "coffee_production_2016"
## [13] "coffee_production_2017"
plot(world_coffee["coffee_production_2017"])
(coffee_renamed <- rename(coffee_data, nm = name_long))
## # A tibble: 47 x 3
## nm coffee_production_2016 coffee_production_2017
## <chr> <int> <int>
## 1 Angola NA NA
## 2 Bolivia 3 4
## 3 Brazil 3277 2786
## 4 Burundi 37 38
## 5 Cameroon 8 6
## 6 Central African Republic NA NA
## 7 Congo, Dem. Rep. of 4 12
## 8 Colombia 1330 1169
## 9 Costa Rica 28 32
## 10 Côte d'Ivoire 114 130
## # ... with 37 more rows
(world_coffee2 <- left_join(world, coffee_renamed, by = c("name_long" = "nm")))
## Simple feature collection with 177 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 13
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7 64.2
## 3 EH Western ~ Africa Africa Northern~ Inde~ 9.63e4 NA NA
## 4 CA Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 5 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7 71.0
## 8 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7 76.3
## # ... with 167 more rows, and 4 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>, coffee_production_2016 <int>,
## # coffee_production_2017 <int>
(world_coffee_inner <- inner_join(world, coffee_data))
## Joining, by = "name_long"
## Simple feature collection with 45 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -117.1278 ymin: -33.76838 xmax: 156.02 ymax: 35.49401
## geographic CRS: WGS 84
## # A tibble: 45 x 13
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 TZ Tanzania Africa Africa Eastern ~ Sove~ 932746. 5.22e7 64.2
## 2 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 464520. 7.76e6 65.2
## 3 ID Indonesia Asia Asia South-Ea~ Sove~ 1819251. 2.55e8 68.9
## 4 KE Kenya Africa Africa Eastern ~ Sove~ 590837. 4.60e7 66.2
## 5 DO Dominica~ North Am~ Americas Caribbean Sove~ 48158. 1.04e7 73.5
## 6 TL Timor-Le~ Asia Asia South-Ea~ Sove~ 14715. 1.21e6 68.3
## 7 MX Mexico North Am~ Americas Central ~ Sove~ 1969480. 1.24e8 76.8
## 8 BR Brazil South Am~ Americas South Am~ Sove~ 8508557. 2.04e8 75.0
## 9 BO Bolivia South Am~ Americas South Am~ Sove~ 1085270. 1.06e7 68.4
## 10 PE Peru South Am~ Americas South Am~ Sove~ 1309700. 3.10e7 74.5
## # ... with 35 more rows, and 4 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>, coffee_production_2016 <int>,
## # coffee_production_2017 <int>
nrow(world_coffee_inner)
## [1] 45
(world_coffee_inner <- inner_join(world, coffee_data, c("name_long"="name_long")))
## Simple feature collection with 45 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -117.1278 ymin: -33.76838 xmax: 156.02 ymax: 35.49401
## geographic CRS: WGS 84
## # A tibble: 45 x 13
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 TZ Tanzania Africa Africa Eastern ~ Sove~ 932746. 5.22e7 64.2
## 2 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 464520. 7.76e6 65.2
## 3 ID Indonesia Asia Asia South-Ea~ Sove~ 1819251. 2.55e8 68.9
## 4 KE Kenya Africa Africa Eastern ~ Sove~ 590837. 4.60e7 66.2
## 5 DO Dominica~ North Am~ Americas Caribbean Sove~ 48158. 1.04e7 73.5
## 6 TL Timor-Le~ Asia Asia South-Ea~ Sove~ 14715. 1.21e6 68.3
## 7 MX Mexico North Am~ Americas Central ~ Sove~ 1969480. 1.24e8 76.8
## 8 BR Brazil South Am~ Americas South Am~ Sove~ 8508557. 2.04e8 75.0
## 9 BO Bolivia South Am~ Americas South Am~ Sove~ 1085270. 1.06e7 68.4
## 10 PE Peru South Am~ Americas South Am~ Sove~ 1309700. 3.10e7 74.5
## # ... with 35 more rows, and 4 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>, coffee_production_2016 <int>,
## # coffee_production_2017 <int>
nrow(world_coffee_inner)
## [1] 45
Note that the result of inner_join() has only 45 rows compared with 47 in coffee_data. What happened to the remaining rows?
setdiff(coffee_data$name_long, world$name_long)
## [1] "Congo, Dem. Rep. of" "Others"
str_subset(world$name_long, "Dem*.+Congo")
## [1] "Democratic Republic of the Congo"
coffee_data$name_long[grepl("Congo,", coffee_data$name_long)] <- str_subset(world$name_long, "Dem*.+Congo")
world_coffee_match <- inner_join(world, coffee_data)
## Joining, by = "name_long"
nrow(world_coffee_match)
## [1] 46
coffee_world <- left_join(coffee_data, world)
## Joining, by = "name_long"
class(coffee_world)
## [1] "tbl_df" "tbl" "data.frame"
world_new = world # do not overwrite our original data
world_new$pop_dens = world_new$pop / world_new$area_km2
world %>%
mutate(pop_dens = pop / area_km2)
## Simple feature collection with 177 features and 11 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 12
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## * <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7 64.2
## 3 EH Western ~ Africa Africa Northern~ Inde~ 9.63e4 NA NA
## 4 CA Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 5 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7 71.0
## 8 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7 76.3
## # ... with 167 more rows, and 3 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>, pop_dens <dbl>
world_new = world %>%
mutate(pop_dens = pop / area_km2)
world %>%
transmute(pop_dens = pop / area_km2)
## Simple feature collection with 177 features and 1 field
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 2
## pop_dens geom
## * <dbl> <MULTIPOLYGON [°]>
## 1 45.9 (((180 -16.06713, 180 -16.55522, 179.3641 -16.80135, 178.7251 -17.0~
## 2 56.0 (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.09699, 37.7669 -3~
## 3 NA (((-8.66559 27.65643, -8.665124 27.58948, -8.6844 27.39574, -8.6872~
## 4 3.54 (((-122.84 49, -122.9742 49.00254, -124.9102 49.98456, -125.6246 50~
## 5 33.5 (((-122.84 49, -120 49, -117.0312 49, -116.0482 49, -113 49, -110.0~
## 6 6.33 (((87.35997 49.21498, 86.59878 48.54918, 85.76823 48.45575, 85.7204~
## 7 66.7 (((55.96819 41.30864, 55.92892 44.99586, 58.50313 45.5868, 58.68999~
## 8 16.7 (((141.0002 -2.600151, 142.7352 -3.289153, 144.584 -3.861418, 145.2~
## 9 140. (((141.0002 -2.600151, 141.0171 -5.859022, 141.0339 -9.117893, 140.~
## 10 15.4 (((-68.63401 -52.63637, -68.25 -53.1, -67.75 -53.85, -66.45 -54.45,~
## # ... with 167 more rows
# we want to combine the continent and region_un columns into a new column
world_unite <- world %>%
unite("con_reg", continent:region_un, sep = ":", remove = TRUE)
world_unite
## Simple feature collection with 177 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 10
## iso_a2 name_long con_reg subregion type area_km2 pop lifeExp gdpPercap
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceani~ Melanesia Sove~ 1.93e4 8.86e5 70.0 8222.
## 2 TZ Tanzania Africa~ Eastern ~ Sove~ 9.33e5 5.22e7 64.2 2402.
## 3 EH Western ~ Africa~ Northern~ Inde~ 9.63e4 NA NA NA
## 4 CA Canada North ~ Northern~ Sove~ 1.00e7 3.55e7 82.0 43079.
## 5 US United S~ North ~ Northern~ Coun~ 9.51e6 3.19e8 78.8 51922.
## 6 KZ Kazakhst~ Asia:A~ Central ~ Sove~ 2.73e6 1.73e7 71.6 23587.
## 7 UZ Uzbekist~ Asia:A~ Central ~ Sove~ 4.61e5 3.08e7 71.0 5371.
## 8 PG Papua Ne~ Oceani~ Melanesia Sove~ 4.65e5 7.76e6 65.2 3709.
## 9 ID Indonesia Asia:A~ South-Ea~ Sove~ 1.82e6 2.55e8 68.9 10003.
## 10 AR Argentina South ~ South Am~ Sove~ 2.78e6 4.30e7 76.3 18798.
## # ... with 167 more rows, and 1 more variable: geom <MULTIPOLYGON [°]>
world_separate <- world_unite %>%
separate(con_reg, c("continent", "region_un"), sep = ":")
world_separate
## Simple feature collection with 177 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7 64.2
## 3 EH Western ~ Africa Africa Northern~ Inde~ 9.63e4 NA NA
## 4 CA Canada North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 5 US United S~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 6 KZ Kazakhst~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7 71.6
## 7 UZ Uzbekist~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7 71.0
## 8 PG Papua Ne~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6 65.2
## 9 ID Indonesia Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 10 AR Argentina South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7 76.3
## # ... with 167 more rows, and 2 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>
world %>%
rename(name = name_long)
## Simple feature collection with 177 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 11
## iso_a2 name continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove~ 1.93e4 8.86e5 70.0
## 2 TZ Tanz~ Africa Africa Eastern ~ Sove~ 9.33e5 5.22e7 64.2
## 3 EH West~ Africa Africa Northern~ Inde~ 9.63e4 NA NA
## 4 CA Cana~ North Am~ Americas Northern~ Sove~ 1.00e7 3.55e7 82.0
## 5 US Unit~ North Am~ Americas Northern~ Coun~ 9.51e6 3.19e8 78.8
## 6 KZ Kaza~ Asia Asia Central ~ Sove~ 2.73e6 1.73e7 71.6
## 7 UZ Uzbe~ Asia Asia Central ~ Sove~ 4.61e5 3.08e7 71.0
## 8 PG Papu~ Oceania Oceania Melanesia Sove~ 4.65e5 7.76e6 65.2
## 9 ID Indo~ Asia Asia South-Ea~ Sove~ 1.82e6 2.55e8 68.9
## 10 AR Arge~ South Am~ Americas South Am~ Sove~ 2.78e6 4.30e7 76.3
## # ... with 167 more rows, and 2 more variables: gdpPercap <dbl>,
## # geom <MULTIPOLYGON [°]>
new_names <- c("i", "n", "c", "r", "s", "t", "a", "p", "l", "gP", "geom")
world %>%
setNames(new_names)
## Simple feature collection with 177 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## # A tibble: 177 x 11
## i n c r s t a p l gP
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 FJ Fiji Ocea~ Ocea~ Mela~ Sove~ 1.93e4 8.86e5 70.0 8222.
## 2 TZ Tanz~ Afri~ Afri~ East~ Sove~ 9.33e5 5.22e7 64.2 2402.
## 3 EH West~ Afri~ Afri~ Nort~ Inde~ 9.63e4 NA NA NA
## 4 CA Cana~ Nort~ Amer~ Nort~ Sove~ 1.00e7 3.55e7 82.0 43079.
## 5 US Unit~ Nort~ Amer~ Nort~ Coun~ 9.51e6 3.19e8 78.8 51922.
## 6 KZ Kaza~ Asia Asia Cent~ Sove~ 2.73e6 1.73e7 71.6 23587.
## 7 UZ Uzbe~ Asia Asia Cent~ Sove~ 4.61e5 3.08e7 71.0 5371.
## 8 PG Papu~ Ocea~ Ocea~ Mela~ Sove~ 4.65e5 7.76e6 65.2 3709.
## 9 ID Indo~ Asia Asia Sout~ Sove~ 1.82e6 2.55e8 68.9 10003.
## 10 AR Arge~ Sout~ Amer~ Sout~ Sove~ 2.78e6 4.30e7 76.3 18798.
## # ... with 167 more rows, and 1 more variable: geom <MULTIPOLYGON [°]>
rm(list = ls())
elev = raster(nrows = 6, ncols = 6, res = 0.5, xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5, vals = 1:36)
grain_order = c("clay", "silt", "sand")
grain_char = sample(grain_order, 36, replace = TRUE)
grain_fact = factor(grain_char, levels = grain_order)
grain_fact
## [1] sand silt silt clay sand sand clay silt silt silt clay clay clay clay clay
## [16] sand clay silt sand sand sand sand silt sand sand silt silt sand silt sand
## [31] clay clay silt sand silt sand
## Levels: clay silt sand
grain = raster(nrows = 6, ncols = 6, res = 0.5, xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5, vals = grain_fact)
# function levels() for retrieving and adding new factor levels to the attribute table
levels(grain)[[1]] = cbind(levels(grain)[[1]], wetness = c("wet", "moist", "dry"))
levels(grain)
## [[1]]
## ID VALUE wetness
## 1 1 clay wet
## 2 2 silt moist
## 3 3 sand dry
factorValues(grain, grain[c(1, 12, 30)])
## VALUE wetness
## 1 sand dry
## 2 clay wet
## 3 sand dry
plot(elev, col=c('#ffffe5','#fff7bc','#fee391','#fec44f','#fe9929','#ec7014','#cc4c02','#8c2d04'))
plot(grain, col=c('#ffffe5','#fff7bc','#fee391','#fec44f','#fe9929','#ec7014','#cc4c02','#8c2d04'))