Assignment 1: Geospatial Data Science

Anastasiia Chernavskaia, Blanca Jimenez, Pablo Fernández, Nour Mohamed

01/28/2025

\(\color{darkblue}{\text{Paper 1}}\)

Paper: “Irrigation dams, water and infant mortality: Evidence from South Africa”.

1. Relevant information of the paper for replicating the map

  1. Region: South Africa.
  2. Dams data collected from the “Dam Safety Office within the Department of Water Affairs”.
    • Only dams of at least 5 m in height and 50,000 cubic meters in capacity are included.
    • Sample restricted to dams with “irrigation” uses.
    • Dams seem to be located with coordinates as points.
    • Data source: http://www.dwaf.gov.za/DSO/Default.aspx
    • About 15 percent of dams were dropped because of missing critical information that could not be verified (690/4,830).
      • For most missing dams (658/690), the missing information was completion date.
      • Of those dropped, 499 reported a purpose of “irrigation.”
    • The resulting dataset included 4,140 dams, which were restricted to the subset of irrigation dams (3,176). These dams were matched to magisterial districts using their GPS coordinates.
  3. “To construct river gradient, I match river pixels to land gradient data and calculate the fraction of river pixels that are steep, defined as greater than six percent slope, within each district.” Data of the Department of Water Affairs.
    • Used as the color legend for the map.
    • The river and land gradient measures are constructed using elevation data from ArcGIS.
    • The elevation data is provided in a raster (pixel) format. “I use Slope (3D analyst), an ArcGIS tool, to construct the slope at each pixel using the raster elevation data.”
    • “I then use this to construct the average district slope and river gradient slope (the latter is restricted to pixels along the river network provided by the Department of Water Affairs)”.
    • Probably, we won’t be able to replicate this part - since it uses raster data (and we still haven’t seen how to deal with it).
    • For more information on this, check appendix A.3 of the paper.
  4. Territorial divisions: “magisterial district boundaries that were in place before the end of Apartheid”. There were 354 magisterial districts. Data obtained from Global Administrative Boundaries.

Conclusion on the data that we need (for what we can replicate with the tools that we have learned until now):

  1. Territorial divisions: “magisterial district boundaries that were in place before the end of Apartheid”. Data obtained from Global Administrative Boundaries.
  2. Dams data which includes dams constructed until March 2013, and only restricted to “irrigation” uses.

2. Importing and cleaning the data

We need to read the data from 2 different sources: an Excel (for the dams) and a shape file (for the magisterial district boundaries).

2.1. Dams data

## # A tibble: 6 × 28
##   Country `Name of dam` `Alternate dam name` `ISO alpha- 3`
##   <chr>   <chr>         <chr>                <chr>         
## 1 Algeria Meurad        <NA>                 DZA           
## 2 Algeria Hamiz         <NA>                 DZA           
## 3 Algeria Oued Fodda    <NA>                 DZA           
## 4 Algeria Boughzoul     <NA>                 DZA           
## 5 Algeria Cheurfas      <NA>                 DZA           
## 6 Algeria Bakhadda      <NA>                 DZA           
## # ℹ 24 more variables: `Administrative\r\nUnit` <chr>, `Nearest city` <chr>,
## #   River <chr>, `Major basin` <chr>, `Sub-basin` <chr>,
## #   `Completed /operational since` <chr>, `Dam height (m)` <dbl>,
## #   `Reservoir capacity (million m3)` <dbl>, `Reservoir area (km2)` <dbl>,
## #   `Sedimen-tation \r\n(latest known) \r\n(%)` <dbl>, Irrigation <chr>,
## #   `Water supply` <chr>, `Flood control` <chr>, `Hydroelectricity (MW)` <chr>,
## #   Navigation <chr>, Recreation <chr>, `Pollution control` <chr>, …

Cleaning that is done in the cells below:

  1. Selection of the variables we’re interested in and exclusion of the rest.
  2. Removal of rows with null values in the coordinates.
  3. Filtering of dams which:
    • Serve irrigation purposes (marked with an “x” in the dataset).
    • Have at least 5m height (though those with missing values will be kept, as there are too many missing values in this column).
    • Have at least 50,000 cubic meters of capacity (0.05 million cubic meters).
    • Completed at most until 2013.
## # A tibble: 6 × 8
##   Country      `Name of dam` `Completed /operational since` `Dam height (m)`
##   <chr>        <chr>         <chr>                                     <dbl>
## 1 South Africa Hill Crest    1691                                         NA
## 2 South Africa Gibson        1880                                         NA
## 3 South Africa Newberry      1896                                         NA
## 4 South Africa Woodhead      1897                                         38
## 5 South Africa Milner        1900                                         NA
## 6 South Africa Jameson       1900                                         NA
## # ℹ 4 more variables: `Reservoir capacity (million m3)` <dbl>,
## #   Irrigation <chr>, `Decimal degree latitude` <dbl>,
## #   `Decimal degree longitude` <dbl>

Clearly, this dataset is a lot less rich than the one used in the paper: while in the paper they have data of 4,140 dams, of which 3,176 were used for irrigation purposes, in this dataset we just have 531 dams for which we have geospatial data (before even filtering further the dataset)!

## # A tibble: 6 × 8
##   Country      `Name of dam`  `Completed /operational since` `Dam height (m)`
##   <chr>        <chr>          <chr>                                     <dbl>
## 1 South Africa Calitzdorp     1918                                       30.5
## 2 South Africa Kammanassie    1923                                       35  
## 3 South Africa Mogoto         1924                                       36.5
## 4 South Africa Nwqeba         1925                                       46  
## 5 South Africa Olifantsnek    1929                                       30  
## 6 South Africa Rust De Winter 1934                                       31  
## # ℹ 4 more variables: `Reservoir capacity (million m3)` <dbl>,
## #   Irrigation <chr>, `Decimal degree latitude` <dbl>,
## #   `Decimal degree longitude` <dbl>
## Simple feature collection with 6 features and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 21.70417 ymin: -33.64306 xmax: 29.23333 ymax: -24.26667
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 7
##   Country      `Name of dam`  `Completed /operational since` `Dam height (m)`
##   <chr>        <chr>          <chr>                                     <dbl>
## 1 South Africa Calitzdorp     1918                                       30.5
## 2 South Africa Kammanassie    1923                                       35  
## 3 South Africa Mogoto         1924                                       36.5
## 4 South Africa Nwqeba         1925                                       46  
## 5 South Africa Olifantsnek    1929                                       30  
## 6 South Africa Rust De Winter 1934                                       31  
## # ℹ 3 more variables: `Reservoir capacity (million m3)` <dbl>,
## #   Irrigation <chr>, geometry <POINT [°]>

2.2. Magisterial district boundaries

## Reading layer `South_Africa_DIVA_GIS_State_L2_Admin_Boundaries' from data source `/Users/nourkhalid/Desktop/Geospatial DS/Assignment 1/South_Africa_DIVA_GIS_State_L2_Admin_Boundaries.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 354 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 16.45802 ymin: -34.83514 xmax: 32.89125 ymax: -22.12661
## Geodetic CRS:  WGS 84

Note that it has 354 features, like the 354 magisterial boundaries used in the paper.

3. Map replication

However, in the map above we see that the observations are reduced significantly when filtering for dam height, reservoir capacity and for dams that serve irrigation purposes. There are 2 reasons for that: 1. The data used here contains a lot less observations than the data used in the paper. 2. There are several missing values in the dam height column, which filters the dataset more than it probably would if we had a complete dataset.

\(\color{darkblue}{\text{Paper 2}}\)

Paper: “Rural electrification, migration and structural transformation: Evidence from Ethiopia” (Stephie Fried, David Lagakos, 2020).

1. Relevant information for replicating the map

  1. Region: Ethiopia.
  2. The main data source is the Ethiopian Rural Socioeconomic Survey (ERSS) conducted in 2011/2012 and again in 2013/2014 on a set of enumeration areas representative of parts of Ethiopia. The map shows the location of sample enumeration areas in the ERSS, along with population density (darker regions being more dense), major roads (thicker lines), and the high-voltage electricity grid (thinner, straighter, lines).

2. Our data sources

2.1. Layer 1: Administrative boundaries

## Reading layer `eth_admbnda_adm3_csa_bofedb_2021' from data source 
##   `/Users/nourkhalid/Desktop/Geospatial DS/Assignment 1/eth_admbnda_adm3_csa_bofedb_2021.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1082 features and 16 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 32.9918 ymin: 3.40667 xmax: 47.98824 ymax: 14.84548
## Geodetic CRS:  WGS 84

2.3. Layer 3: Main Roads

## Reading layer `hotosm_eth_roads_lines_shp' from data source 
##   `/Users/nourkhalid/Desktop/Geospatial DS/Assignment 1/hotosm_eth_roads_lines_shp.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 420912 features and 18 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 32.9902 ymin: 3.410445 xmax: 47.98158 ymax: 14.8554
## Geodetic CRS:  WGS 84

2.4. Layer 4: Power grid lines

## Reading layer `Ethiopia_grid' from data source 
##   `/Users/nourkhalid/Desktop/Geospatial DS/Assignment 1/Ethiopia_grid.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 132 features and 11 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: 34.14213 ymin: 5.389761 xmax: 43.19406 ymax: 14.44666
## Geodetic CRS:  WGS 84

2.6. Layer 6: Power Plants

## Reading layer `ETH_PowerPlants' from data source 
##   `/Users/nourkhalid/Desktop/Geospatial DS/Assignment 1/ETH_PowerPlants.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 22 features and 8 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 36.9127 ymin: 6.856073 xmax: 40.17 ymax: 13.3
## CRS:           NA

3. Plot replication

We tried to underlay a baselayer of a relief map, but for some reason, it always includes the zero-zero coordinate and makes the graph look uncentered. It works fine if we only plot linestrings and polygons, but breaks down as soon as we include Layer 5 and 6 which are points. We have tried to fix this issue using four different suggestions found on the internet: 1. Make sure all spatial data layers and the basemap are using the same CRS, 2. Filter out points where (x = 0, y = 0), 3. Limit the map extent to the region of interest using coord_sf(), 4. Ensure that the baselayer is correctly fetched and aligned, for example adjust the zoom level.

However, none of these tips worked.

We have included an optional ggplot without these geom_point layers just for the sake of experimenting with baselayers.

The bounding box of the plot is necessary to calculate the correct coordinates andrestrict the displayed area of the baselayer for when it´s turned on.

## Bounding Box:
## Min Longitude: 32.9918
## Max Longitude: 47.98824
## Min Latitude: 3.40667
## Max Latitude: 14.84548
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Zoom: 4

\(\color{darkblue}{\text{Paper 3}}\)

Paper: “Migration, Specialization, and Trade: Evidence from Brazil’s March to the West”

Relevant Information

  • Map to be replicated: The Spatial Distribution of the Brazilian Population between 1950 and 2010. Figure 2 from the working paper Migration, Specialization, and Trade: Evidence from Brazil’s March to the West by Heitor S. Pellegrina and Sebastian Sotelo, 2021.

  • The map in the original map is divided into meso regions, of which there are around 136. Nonetheless, it was not possible to find population data that followed those divisions in the official data sources of the country. Similarly, there was no population data available for dates prior to 1989.

  • For this reason, we replicated the map with the data at state-level (27 states) and for the years 2000, 2010, 2020. It is important to note that this year are significantly closer together than the ones picked by the authors (1950, 1980, and 2010) and might probably not have been subject to the phenomenon of they are observing in the paper, at least to the same extent. Therefore, our resulting map shows practically no observable differences in the population distribution. This data was obtained from the Instituto Brasileiro de Geografia e Estatística: https://www.ibge.gov.br/en/statistics/social/population/18448-estimates-of-resident-population-for-municipalities-and-federation-units.html?=&t=downloads .

  • On the other hand, the geographical data was obtained from https://gadm.org/maps/BRA_1.html .

## Reading layer `BRA_adm1' from data source 
##   `/Users/nourkhalid/Desktop/Geospatial DS/Assignment 1/BRA_adm1.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 27 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.98971 ymin: -33.74708 xmax: -28.84694 ymax: 5.264878
## Geodetic CRS:  WGS 84
## Simple feature collection with 6 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.98971 ymin: -18.34986 xmax: -35.15182 ymax: 4.44236
## Geodetic CRS:  WGS 84
##   ID_0 ISO NAME_0 ID_1   NAME_1 TYPE_1 ENGTYPE_1 NL_NAME_1 VARNAME_1
## 1   33 BRA Brazil    1     Acre Estado     State      <NA>      <NA>
## 2   33 BRA Brazil    2  Alagoas Estado     State      <NA>      <NA>
## 3   33 BRA Brazil    3    Amapá Estado     State      <NA>      <NA>
## 4   33 BRA Brazil    4 Amazonas Estado     State      <NA>   Amazone
## 5   33 BRA Brazil    5    Bahia Estado     State      <NA>      Ba¡a
## 6   33 BRA Brazil    6    Ceará Estado     State      <NA>      <NA>
##                         geometry
## 1 MULTIPOLYGON (((-73.33251 -...
## 2 MULTIPOLYGON (((-35.90153 -...
## 3 MULTIPOLYGON (((-50.02403 0...
## 4 MULTIPOLYGON (((-67.32623 2...
## 5 MULTIPOLYGON (((-38.69708 -...
## 6 MULTIPOLYGON (((-38.47542 -...

Importing and Cleaning the Data

We imported three excel files containing the population data for the 3 different years. The one for year 2000 had the population distributed across municipalities, but included an indicator for the state (the state code), so we aggregated the data. The 2010 file required minor cleaning (changing some data types and column names) while the 2020 file required no cleaning at all. Next, we merged these three data frames together and also with the data frame containing the state geometry. We also highlighted the West states, those of interest in the study, and joined them into a singular geographical unit to be able to create the red boundary in the map.

Merging all the data into a dataset with the relevant columns

## Simple feature collection with 6 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.98971 ymin: -18.34986 xmax: -35.15182 ymax: 4.44236
## Geodetic CRS:  WGS 84
##      State  Pop2000  Pop2010  Pop2020                       geometry   share00
## 1     Acre   541873   733559   894470 MULTIPOLYGON (((-73.33251 -... 0.3262084
## 2  Alagoas  2738378  3120494  3351543 MULTIPOLYGON (((-35.90153 -... 1.6485079
## 3    Amapá   458796   669526   861773 MULTIPOLYGON (((-50.02403 0... 0.2761959
## 4 Amazonas  2641251  3483985  4207714 MULTIPOLYGON (((-67.32623 2... 1.5900373
## 5    Bahia 13135262 14016906 14930634 MULTIPOLYGON (((-38.69708 -... 7.9074486
## 6    Ceará  7200167  8452381  9187103 MULTIPOLYGON (((-38.47542 -... 4.3345120
##     share10   share20
## 1 0.3845540 0.4224066
## 2 1.6358580 1.5827405
## 3 0.3509859 0.4069657
## 4 1.8264111 1.9870606
## 5 7.3480891 7.0508773
## 6 4.4309956 4.3385389
## Simple feature collection with 6 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.98971 ymin: -11.14516 xmax: -35.15182 ymax: -7.12132
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 4
##   State                                                     geometry Year  Share
##   <chr>                                           <MULTIPOLYGON [°]> <chr> <dbl>
## 1 Acre    (((-73.33251 -7.324879, -73.27482 -7.350334, -72.87016 -7… 2000  0.326
## 2 Acre    (((-73.33251 -7.324879, -73.27482 -7.350334, -72.87016 -7… 2010  0.385
## 3 Acre    (((-73.33251 -7.324879, -73.27482 -7.350334, -72.87016 -7… 2020  0.422
## 4 Alagoas (((-35.90153 -9.861805, -35.90153 -9.862083, -35.90125 -9… 2000  1.65 
## 5 Alagoas (((-35.90153 -9.861805, -35.90153 -9.862083, -35.90125 -9… 2010  1.64 
## 6 Alagoas (((-35.90153 -9.861805, -35.90153 -9.862083, -35.90125 -9… 2020  1.58

\(\color{darkblue}{\text{Paper 4}}\)

Paper: “In harm’s way? infrastructure investments and the persistence of coastal cities”.

Notes on the possible sources of data for figure 4 (“Road maps of Vietnam, 2000 and 2010”): - Replication package for paper 4: https://www.openicpsr.org/openicpsr/project/207641/version/V1/view. Useful information in: - Raw data -> Boundaries (for constructing Vietnam’s coastline as shown in the paper). - The author says that they were constructed using Natural Earth Data and then cropping the shapefile to include only Vietnam’s coastline. - But their files don’t include the boundaries of the whole Vietnam! So the replication package is not very useful. - Vietnam’s country boundaries obtained from: https://gadm.org/download_country.html. Different shape files for different territorial administrative levels. - Vietnam’s road network information (for 2015) obtained from: https://data.humdata.org/dataset/viet-nam-roads

1. Relevant information of the paper for replicating the map

  1. Maps to be replicated: road network of Vietnam in 2000 and 2010.
  2. Need map of Vietnam.
  3. Need road networks by road type (if they are available). On how the author constructed Vietnam’s road network:
  • "I obtain road network data from the 2000 and 2010 editions of ITMB Publishing’s detailed International Travel Maps of Vietnam, which show the location of freeways, dual carriageways, major, minor and other roads.
  • I geo-referenced each map and manually traced the location of each road category to obtain a GIS shapefile of the entire road network in each road category in 2000 and 2010, shown in Figure 3."

2. Importing and cleaning the data

We need to read the data from 2 different sources: an Excel (for the dams) and a shape file (for the magisterial district boundaries).

2.1. Vietnam’s boundaries

## Reading layer `gadm41_VNM_0' from data source 
##   `/Users/nourkhalid/Desktop/Geospatial DS/Assignment 1/gadm41_VNM_0.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 102.1446 ymin: 8.381355 xmax: 109.4692 ymax: 23.39269
## Geodetic CRS:  WGS 84

2.2. Road network

## Reading layer `vnm_rdsl_2015_0SM' from data source 
##   `/Users/nourkhalid/Desktop/Geospatial DS/Assignment 1/vnm_rdsl_2015_0SM.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 118138 features and 5 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: 102.1562 ymin: 8.596338 xmax: 109.4536 ymax: 23.37756
## Geodetic CRS:  WGS 84
## Classes 'sf' and 'data.frame':   118138 obs. of  6 variables:
##  $ osm_id    : chr  "9566542" "9656653" "9656730" "9963509" ...
##  $ name      : chr  "C?u Thê Húc" "Mã Mây" "C?u Chuong Duong" "Hàng Cót" ...
##  $ ref       : chr  NA NA NA NA ...
##  $ type      : chr  "footway" "residential" "primary" "secondary" ...
##  $ Shape_Leng: num  0.000413 0.002599 0.011551 0.001603 0.001613 ...
##  $ geometry  :sfc_MULTILINESTRING of length 118138; first list element: List of 1
##   ..$ : num [1:2, 1:2] 106 106 21 21
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTILINESTRING" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
##   ..- attr(*, "names")= chr [1:5] "osm_id" "name" "ref" "type" ...
##  [1] "footway"        "residential"    "primary"        "secondary"     
##  [5] "pedestrian"     "tertiary"       "trunk"          "unclassified"  
##  [9] "service"        "motorway_link"  "motorway"       "primary_link"  
## [13] "track"          "tertiary_link"  "road"           "trunk_link"    
## [17] "secondary_link" "living_street"  "steps"          "path"          
## [21] "construction"   "cycleway"       "proposed"       "crossing"      
## [25] "services"       "rest_area"      "yes"

The type of road is stored within the type column. Given that there are 28 unique types of roads (as they include crossings and linkks as separate types), below we simplify the classification by separating the types into either: 1. “motorway” or “trunk” roads (assimilated to a “Dual carriageway” as used in the paper). 2. “primary” road (that could be assimilated to the term “Freeway” as used in the paper). 3. “secondary” roads (assimilated to “major” roads). 4. “tertiary” roads (assimilated to “minor” roads). 5. “other” roads. The other types of roads not included in the previous classifications.

We can map this renaming into the simple features dataset:

## [1] "Other"                    "Major road"              
## [3] "Minor road"               "Dual carriageway/Freeway"

The type of road is stored within the type column. Given that there are 28 unique types of roads (as they include crossings and linkks as separate types), below we simplify the classification by separating the types into either:

  1. “Dual carriageway/Freeway” roads (assimilated to “motorways” and “trunks”).
  2. “Major” roads (that could be assimilated to “primary roads”).
  3. “Minor” roads (assimilated to “secondary” and “tertiary” roads).
  4. “Other” roads. The other types of roads not included in the previous classifications.

\(\color{darkblue}{\text{Paper 5}}\)

Paper: “The Effects of Roads on Trade and Migration: Evidence from a Planned Capital City” The following analysis identifies radial and non-radial highways in Brazil, using Brasília as a central point. The roads are classified based on proximity to Brasília and visualized on a map.

We use data from the Instituto Brasileiro de Geografia e Estatística (Brazilian Institute of Geography and Statistics), which is responsible for the official collection of statistical, geographic, cartographic, geodetic, and environmental information in Brazil. The road dataset represents the main structuring road axes of the Brazilian territory. Additionally, we use the world map data and filter it to include only Brazil.

Load Datasets

## Using year/date 2000

We use Brasília as the center point as the paper states “The roads radiating out from Brasília are known as radial highways” (Morten & Oliveira, 2018). Moreover, we are making sure that the point representing Brasília is in the same CRS as the road dataset. This is important because as seen in class the spatial operations like calculating distances will only work correctly if both the point and the roads use the same CRS.

References

Mettetal, E., 2019. Irrigation dams, water and infant mortality: Evidence from South Africa (fig. 2: hydro dams in South Africa)

Fried, S. and Lagakos, D., 2021. Rural electrification, migration and structural transformation: Evidence from Ethiopia (fig. 4: districts and electricity grid in Ethiopia)

Pellegrina, H.S. and Sotelo, S., 2021. Migration, Specialization, and Trade: Evidence from Brazil’s March to the West (fig. 2: Population in Brazil’s meso-regions (or districts) in different periods

Balboni, C.A., 2019. In harm’s way? infrastructure investments and the persistence of coastal cities. Link here (fig. 3: Vietnam’s road infrastructure by road type - if available)

Morten, M. & Oliveira, J., 2018. The Effects of Roads on Trade and Migration: Evidence from a Planned Capital City (fig. 1: Brazil’s capital and main road infrastructure)

Data Sources

Paper 1:

Department of Water Affairs (Dam Safety Office):Department of Water Affairs. (2013). Dam Safety Office dataset. Retrieved from http://www.dwaf.gov.za/DSO/Default.aspx

Global Administrative Boundaries (Magisterial Districts): Global Administrative Boundaries. Magisterial district boundaries (pre-1996). Retrieved from https://hub.arcgis.com/datasets/nga::land-ownership/about?layer=34

AQUASTAT (FAO): Food and Agriculture Organization of the United Nations (FAO). (2025). AQUASTAT dams database. Retrieved from https://www.fao.org/aquastat/en/databases/dams/

Paper 2:

United Nations Development Programme (UNDP) & OCHA Ethiopia. (n.d.). Subnational Administrative Boundaries dataset. Retrieved from https://data.humdata.org/dataset/cod-ab-eth

OCHA Ethiopia. (2022). Ethiopia administrative level 3 disaggregated population statistics (1,084 woredas). Retrieved from https://data.humdata.org/dataset/cod-ps-eth

OpenStreetMap. (2018). Ethiopia Roads (OpenStreetMap Export). Retrieved from https://data.humdata.org/dataset/hotosm_eth_roads

World Bank. (2014). Electricity Transmission Network. Retrieved from https://datacatalog.worldbank.org/search/dataset/0039865/Ethiopia—Electricity-Transmission-Network

World Bank. (2011–2014). Locations of sample enumeration areas from the ERSS. Retrieved from https://microdata.worldbank.org/index.php/catalog/2053/data-dictionary

Platts. (2006). World Electric Power Plants Database (WEPP): Data for power plants in Ethiopia with total installed generating capacity 10 MW. Retrieved from https://datacatalog.worldbank.org/search/dataset/0041714/Ethiopia-Power-Plants

Paper 3:

Pellegrina, H. S., & Sotelo, S. (2021). Migration, specialization, and trade: Evidence from Brazil’s march to the west. Retrieved from https://www.ibge.gov.br/en/statistics/social/population/18448-estimates-of-resident-population-for-municipalities-and-federation-units.html?=&t=downloads

Paper 4:

Replication Package for Paper 4: Balboni, C.A. Retrieved from https://www.openicpsr.org/openicpsr/project/207641/version/V1/view

Natural Earth Data (for Vietnam’s coastline):Natural Earth. Retrieved from https://www.naturalearthdata.com

Vietnam’s Country Boundaries: GADM. Retrieved from https://gadm.org/download_country.html

Vietnam’s Road Network (2015): United Nations Office for the Coordination of Humanitarian Affairs. (2015). Retrieved from https://data.humdata.org/dataset/viet-nam-roads

Paper 5:

Instituto Brasileiro de Geografia e Estatística. (2019). Transportation logistics. Retrieved from https://www.ibge.gov.br/en/geosciences/maps/brazil-geographic-networks-mapasdobrasil/18884-transportation-logistics.html