Please get the course materials (data and scripts) from GitHub

1 Introduction

1.1 Applications


  • Anything that is regional/spatial/urban economics – related

  • But geographic perspective often key for all fields of (applied) economics. For example:

    • Border-discontinuity design
    • Exposure (e.g. distance) to treatment
    • Geographic controls
    • Spatial spillovers (e.g. exclude neighbors)
    • Heterogeneity analysis


1.2 What are we talking about


GIS: A geographic information system (GIS) is a system that creates, manages, analyzes, and maps all types of data. GIS connects data to a map, integrating location data (where things are) with all types of descriptive information (what things are like there).

1854: John Snow identifies source of cholera spread in London through mapping of casualities and water sources. Still regarded as one of the most interesting natural experiments and one of the first difference-in-differences designs. 1

1963: The Canadian government had commissioned the geographer Roger Tomlinson to create a manageable inventory of its natural resources. The objective was to determine the land capability for rural Canada by mapping information about soils, agriculture, recreation, wildlife, waterfowl, forestry. First digital (Canada) Geographic Information System. 2


GCS: A geographic coordinate system (GCS) uses a three-dimensional spherical surface to define locations on the earth. A point is referenced by its longitude and latitude values. Longitude and latitude are angles measured from the earth’s center to a point on the earth’s surface. The angles often are measured in degrees 3. Longitude is location in the East-West direction in angular distance from the Prime Meridian plane. Latitude is angular distance North or South of the equatorial plane.


Source: www.earthdatascience.org


PCS: A projected coordinate system (PCS) is a geographic coordinate system (GCS) projected on a two-dimensional (flat) surface. They have an origin, x and y values. The unit of measurement is typically meters.


Source: www.esri.com


This transition cannot be done without adding some deformations4. Therefore, some properties of the Earth’s surface are distorted in this process, such as area, direction, distance, and shape. A projected coordinate system can preserve only one or two of those properties. Different projections can be explored at: https://www.geo-projections.com/ and at https://epsg.io/.


Source: www.earthdatascience.org


Main take aways:


  • Coordinate systems are a key component of geographic objects
  • Knowing which CRS your data is in, and whether it is in geographic (lon/lat) or projected (typically meters), is key


1.3 Formats


  • .shp: Esri (ArcGIS producer) shapefile. Most common. But it comes with limitations:
    • additional files (.shx, .dbf, …) that contain the data stored must be in the same directory as the .shp file
    • limitations in data storage (2GB)
    • limitations in attribute names (10 chars)
    • can only contain one geometry type per file
  • .gpkg (geopackage)
    • open
    • everything is stored in one single file
    • can contain multiple layers with different geometry types (points, lines, …)
  • .gpx (GPS exchange format) used to describe waypoints, tracks and routes
  • .osm (.pbf): for data downloaded from OSM (largest crowdsourcing GIS data project in the world)
  • .tif: for rasters (pixels)


1.4 Geometry types

1.4.1 Vectors


  • points,
  • lines
  • polygons
  • and their respective ‘multi’ versions (which group together features of the same type into a single feature)
  • geometry collections contain multiple geometry types in a single object



1.4.2 Rasters


  • matrix: not a vector! It’s a matrix representing equally spaced cells (also called pixels) in which each cell contain a value



1.5 Softwares



2 Hands-on

2.1 RStudio things


  • Integration sf and tidyverse make R coding attractive

  • Shortcuts5:

    • Comment #: Ctrl+Shift+C // Cmd+Shift+C

    • Pipe %>%: Ctrl+Shift+M // Cmd+Shift+M

    • Sections: Ctrl+Shift+R // Cmd+Shift+R.

      • Alternatively, any comment line which includes at least 4 trailing -, = or #. (e.g. # Section 1 - - - -)
    • Subsections: ##### subsection’s title - - - - (starting with 5#) (e.g. ##### Subsection 1.1 - - - - )

    • Fold selected section: Alt+L // Cmd+Option+L

    • Unfold selected section: Shift+Alt+L // Cmd+Shift+Option+L

    • Fold all sections: Alt+O // Cmd+Option+O

    • Unfold all sections: Shift+Alt+O // Cmd+Shift+Option+O

    • Open/close outline: Ctrl+Shift+O // Cmd+Shift+O


2.2 Maps

[scripts/01.maps.R]


2.2.1 Data

[data/raw_data]


  • Schools of Barcelona (Barcelona City Hall)
  • Parks of Barcelona (OSM)
  • Neighborhoods of Barcelona (Barcelona City Hall)
  • City of Barcelona (Barcelona City Hall)
  • Population by age and neighborhood (Barcelona City Hall)


Map 1: where are the schools and the parks of Barcelona?

Map 2: neighborhoods, schools and parks of Barcelona


2.3 Geometrical operations

[scripts/02.geometry_operations.R]


Q1: how many schools are there in each neighborhood (by type)?

Q1.1: how many schools per kid by neighborhood?

Q2: how many m2 of parks per neighborhood?

Q3: how many schools have a park within 150m (by neighborhood)?


2.4 A more realistic workflow

[scripts/03.data_preparation.R]


Spatial data do not always come as shapefiles (.shp, .gpkg, etc.). They often come as .csv or .xlsx with a column holding geometry information. We’ll need to transform the data before any mapping or geometry operation.


2.4.1 Data

[data/raw_data/data_preparation]


  • schools raw data: opendatabcn_llista-equipaments_educacio-csv.xlsx (Barcelona City Hall)


2.5 Raster and vector data interaction

[scripts/04.rasters.R]


You might need to extract information contained in a raster to units of a spatial vector (e.g. polygons, points, …).


2.5.1 Data

[data/raw_data]


  • NDVI (Barcelona City Hall)
  • Elevation raster (Cartographic and Geologic institute of Catalunya)
  • Street network (Barcelona City Hall)

Note: NDVI stands for ‘Normalized Difference Vegetation Index’. It quantifies vegetation by measuring the difference between near-infrared (which vegetation strongly reflects) and red light (which vegetation absorbs). NDVI ranges from -1 to +1 6.


Q1: what’s the average vegetation index (NDVI) by neighborhood?

Q2: what’s the gradient of streets in Barcelona?


2.6 OSM data extraction

[scripts/05.osm_data_extraction.R]


OSM is a community-contributed geographic database. Anyone can create an account and start adding information. And anyone can access it.

  • It is a source of data which may otherwise not be freely-available.
  • Possible to get historical data through Geofabrik
  • However, watch out for comparison to actual data. Data coverage varies across the world and over time.


Q1: what’s the current cycle lanes network on OSM?

Q2: how does this compare to official data?


2.7 Getting more efficient: apply functions

[scripts/06.functions.R]


  • Stata has loops, R has functions (and loops).
  • Less coding, less likely to make mistakes.


Q1: create one map of number of schools per neighborhood for each type of school.