2023-02-22

How to Access the Course Material: 2 Options

  1. Transfer from flash drive

  2. View presentation on web:

Speaker Bios

Aaron Fisch:

Water quality modeler with the Dept. of Natural Resources. Interested in computer programming for automation in modeling/statistics.

Dave Evans:

Soil scientist with a background in GIS, data science and data engineering. He does some consulting work with soils and web development.

Matt Haffner:

Assistant professor at the University of Wisconsin–Eau Claire. He is interested in spatial data science, geocomputation, and urban analytics.

Agenda

Time Length Content Speaker
9:30-9:50 20 mins Intro and Setup (We are here) Aaron
9:30-10:10 20 mins What is SQLite? Dave
10:10-10:30 20 mins Databases 101 Matt
10:30-10:50 20 mins Interactive: Basic SQL Everyone
10:50-11:00 10 mins Break NA
11:00-11:20 20 mins Database Views, Why and How Aaron
11:20-11:40 20 mins Interactive: Spatial Views Everyone
11:40-12:00 20 mins SQL Geoprocessing Dave
12:00-12:20 20 mins Interactive: SQL Geoprocessing Everyone
12:20-12:30 10 mins Wrap-up Aaron

Course Material Contents

./wlia-sqlite-workshop/

  • ./sqlite_presentation.html
    • This presentation
  • ./run_spatialite_gui
    • A shortcut to run Spatialite software on Windows.
  • ./dbs/ (contains example SQLite databases)
    • esri_mobile.geodatabase (ESRI mobile geodatabase)
    • SpatiaLite.sqlite (Spatialite database)
    • GeoPackage.gpkg (Geopackage)

Workshop Setup

  1. Open this presentation in a browser window
  2. Open another window for executing SQL code

Executing SQL code

Windows users:

  • double-click run_spatialite_gui
  • Click Menu/Connecting [sic] an Existing SQLite DB
  • Navigate to ./dbs/SpatiaLite.sqlite



QGIS users:

  • click Database/DB Manager
  • right click SpatiaLite/New Connection
  • Navigate to ./dbs/SpatiaLite.sqlite
  • In DB Manager, click Database/SQL Window

What is SQLite? (~20 mins)

  • Introducing software and concepts

What is a Database?

  • Think of a database as a collection of tables
  • A database allows rows to be added, changed, or deleted
  • Facilitates combining data from multiple tables.
  • Allows for data manipulations

What is SQLite?

  1. A relational database engine
  2. Allows for storing, transforming, updating, and deleting tables of data
  3. Widely used, so lots of tutorials and help available
  4. Low overhead: a single file

Working with SQLite

  • Little to no install
  • Easy to load and export data
  • Can handle lots of data
  • Access with
    • command line
    • programming languages
    • GUI

Spatial SQLite

  • Geopackage
  • SpatiaLite
  • ESRI Mobile Geodatabase

Geographic Information Systems

Spatial SQLite

Storage only - Geopackage

  • Allows for storage of rasters and vectors
  • Can build views, store styles (QGIS styles), spatial indices
  • Allows for much faster viewing from a GIS

Geopackage Demo

Format Size Viewing speed Geoproc time
geoJSON 180 MB Very poor 28 sec
Shapefile 60 MB Good 23 sec
Geopackage 138 MB Good 20 sec


Link to supplemental materials

Storage and analysis

ESRI Mobile Geodatabase and SpatiaLite

  • A format for storing rasters and vectors
  • Allows for GIS operations (intersect, dissolve, etc) on the geometry contained within

Recap

Any questions?

Databases 101 (~20 mins)

Databases contain tables

Table A

State Governor
Wisconsin Tony Evers
California Gavin Newsom
Michigan Gretchen Whitmer
Illinois J. B. Pritzker

Relational Databases contain related tables

Why is joining important?

  1. Tables can be extended
  2. Can drastically reduce duplication in a database
  3. Can improve data integrity

Spatial databases are just like regular databases except…

A table can function like a feature class with the addition of a geometry column.

Feature Class

Geometry State County
POLYGON ((-89… Wisconsin Dane
POLYGON ((-89… Wisconsin Sauk
POLYGON ((-87… Wisconsin Milwaukee
POLYGON ((-11… California Orange

Example of join using ArcPro and SQLite database

Let’s look at this same database using SpatiaLite

Windows users:

  • double-click run_spatialite_gui
  • Click Menu/Connecting [sic] an Existing SQLite DB
  • Navigate to ./dbs/SpatiaLite.sqlite



QGIS users:

  • click Database/DB Manager
  • right click SpatiaLite/New Connection
  • Navigate to ./dbs/SpatiaLite.sqlite
  • In DB Manager, click Database/SQL Window

SQL select statement

Syntax:

select * from countries


Anatomy of a select statement:

Command Which columns Which table Table name
select * from countries

SQL select specific columns

We can choose to limit our selection to only certain columns

select name, pop_est
from countries
Displaying records 1 - 10
NAME POP_EST
Zimbabwe 14645468
Zambia 17861030
Yemen 29161922
Vietnam 96462106
Venezuela 28515829
Vatican 825
Vanuatu 299882
Uzbekistan 33580650
Uruguay 3461734
Micronesia 113815

SQL select where clause (i.e., specific rows)

We can filter rows by using a where clause

select name
from countries
where continent = 'South America'
Displaying records 1 - 10
NAME
Venezuela
Uruguay
Falkland Is.
Suriname
Peru
Paraguay
Guyana
Ecuador
Colombia
Chile

SQL select where clause (i.e., specific rows)

SQL select basic math

We can also do math by column:

select
  name,
  pop_est * 1e-6 as pop_est_mil
from countries
Displaying records 1 - 10
NAME pop_est_mil
Zimbabwe 14.645468
Zambia 17.861030
Yemen 29.161922
Vietnam 96.462106
Venezuela 28.515829
Vatican 0.000825
Vanuatu 0.299882
Uzbekistan 33.580650
Uruguay 3.461734
Micronesia 0.113815

SQL select math within groups

Grouping (AKA aggregation) allows us to do math across categories, much like how the Dissolve tools works in Arc.

select
  continent,
  sum(pop_est) as total_pop_continent
from countries
group by continent
8 records
CONTINENT total_pop_continent
Africa 1307986092
Antarctica 4490
Asia 4565778126
Europe 746437653
North America 585351846
Oceania 42430881
Seven seas (open ocean) 1901993
South America 427066661

SQL select sorting

And we can sort columns numerically or alphabetically:

select name
from countries
order by pop_est desc
Displaying records 1 - 10
NAME
China
India
United States of America
Indonesia
Pakistan
Brazil
Nigeria
Bangladesh
Russia
Mexico

SQL select with a join

We can join tables using the SQL join command:

select name, head_of_state, pop_est
from countries
join south_american_heads_of_state using (name)
Displaying records 1 - 10
NAME head_of_state POP_EST
Venezuela Nicolás Maduro 28515829
Uruguay Luis Lacalle Pou 3461734
Suriname Chan Santokhi 581363
Peru Dina Boluarte 32510453
Paraguay Mario Abdo Benítez 7044636
Panama Laurentino Cortizo 4246439
Guyana Irfaan Ali 782766
Ecuador Guillermo Lasso 17373662
Colombia Gustavo Petro 50339443
Chile Gabriel Boric 18952038

A note about joins

Join by an ID when possible, not by a column containing long character strings. Values must match exactly in order for the join to work successfully.

select name, head_of_state, pop_est
from countries
join south_american_heads_of_state using (name)

SQL select, table references and aliases

When we join tables, sometimes SQL gets confused if there are multiple tables with the same column name. We can fix this with either table references or aliases and the . (dot) operator.

select a.name, b.head_of_state, a.pop_est
from countries as a
join south_american_heads_of_state as b using (name)
Displaying records 1 - 10
NAME head_of_state POP_EST
Venezuela Nicolás Maduro 28515829
Uruguay Luis Lacalle Pou 3461734
Suriname Chan Santokhi 581363
Peru Dina Boluarte 32510453
Paraguay Mario Abdo Benítez 7044636
Panama Laurentino Cortizo 4246439
Guyana Irfaan Ali 782766
Ecuador Guillermo Lasso 17373662
Colombia Gustavo Petro 50339443
Chile Gabriel Boric 18952038

Interactive: Basic SQL (~20 mins)

Using SpatiaLite or QGIS

Level 1:

How many people live in Australia?

Level 2:

How many millions of citizens live in Gabriel Boric’s country?

Database Views (and more SQL)

What is a database view?

Simple: a view is a saved query.

Why would I use a database view?

  1. Data exploration and analysis
  2. Building blocks for repeated or complex SQL queries
  3. Reducing data storage
  4. Can make tables more readable
  5. Security and data integrity

Create a view using a spatial function

Working with the wi_counties feature class we can extract the area of each Wisconsin county using the spatial st_area function. We can save this view using the create view view_name as syntax.

create view wi_county_area as
select
  fips,
  st_area(shape) * 3.86102e-7 as area_sq_miles
from wi_counties

Create a view from a view

Let’s convert total population to density by using the view we just created.

create view wi_pop_dens as
select
  fips,
  round(pop_est / area_sq_miles) as pop_per_sq_mile
from wi_counties_pop
join wi_county_area using (fips)

Create a spatial view

What if we wanted to make a view that we can use in a map? We can create a spatial view that is analogous to using the join tool in ArcPro.

create view wi_pop_dens_sv as
select
  objectid,
  shape,
  fips,
  name,
  pop_per_sq_mile
from wi_counties
join wi_pop_dens using (fips)

Options for creating a spatial view

Views update dynamically with new table data

When tables used in views are updated, the views are dynamically updated at the same time

Interactive: Database Views (~20 mins)

Using SpatiaLite or QGIS

Level 1:

Create a new view of county populations expressed as thousands of people.

Level 2:

Make a point spatial view of county centroids including the population field (you may use multiple views), and make a map that labels those centroids with county population.

Geoprocessing and GIS Operations

  1. Topological: looks at relationships between geometries; return a 1 or 0, if they relate or not Examples: ST_Intersects(a.geometry, b.geometry) and ST_Within(a.geometry, b.geometry)
  2. Geometry functions: change the input geometry; return a geometry Examples: ST_Buffer(geometry, distance) and ST_Centroid(geometry)
  3. Overlay functions: compute the results of different overlay relationships: return a geometry Examples: ST_Union(a.geometry, b.geometry) and ST_Intersection(a.geometry, b.geometry)
  4. Measurement functions: allow for measuring geometry; return a number of the measurement Example: ST_Area(geometry) and ST_Length(geometry)

Topological functions - Spatial Joins

select name, contaminated, st_within(fake_well_data.geom, wi_counties.shape)
from wi_counties
join fake_well_data
on st_within(fake_well_data.geom, wi_counties.shape)

Geometry functions - Buffer

Buffer each well point 5 km (5000 m)

select fid, st_buffer(geom, 5000) as geom
from fake_well_data
where contaminated=1

Overlay functions - Union

select fid, st_union(st_buffer(geom, 5000)) as geom
from fake_well_data
where contaminated=1    

Synthesis

Scenario: We want to resample all uncontaminated wells which are with 5 km of a contaminated well.

Find the wells that need resampling.

  1. Find areas which are within 5km - buffer
  2. Make sure we don’t double count wells - union the buffer
  3. Find uncontaminated wells that touch the buffer - spatial join

Synthesis

select uncont.*
from (

    select fid, st_union(st_buffer(geom, 5000)) as geom
    from fake_well_data
    where contaminated=1    

) as cont_ubuff
join (

    select *
    from fake_well_data
    where contaminated = 0
    
) uncont
on st_within(uncont.geom, cont_ubuff.geom)

Interactive geoprocessing

  • Find all uncontaminated wells within 10km of a contaminated well
    • Challenge: Rewrite using views?
  • Create a view of rivers in South America (Should be 79 rivers)
  • Create a view of world land boundaries
    • Challenge: Just continent boundaries (You’ll need a group by here)

Wrap-up

How to continue learning

  • Just start using SQLite as your chosen file format!
    • Start assuming it’s your file format of choice
    • Challenge yourself to stop creating shapefiles and file geodatabases entirely
    • See how far you can take your analysis with just SQL views
    • Use it as a local staging area for enterprise database development

Expectations for the future of SQLite

  • SQLite will likely be around for a long-time
  • The future of SpatiaLite, GeoPackage, and mobile geodatabases is not as certain.

Thank you!

Spoiler Alert! The following slides contain solutions for the interactive exercises

Interactive: Databases 101 (spoilers)

Level 1:

How many people live in Australia?

select pop_est
from countries
where name = 'Australia'

Interactive: Databases 101 (spoilers)

Level 2:

How many millions of citizens live in Gabriel Boric’s country?

select pop_est * 1e-6 as pop_est_mil
from countries
join south_american_heads_of_state using (name)
where head_of_state = 'Gabriel Boric'

Interactive: Database Views (spoilers)

Level 1:

Create a new view of county populations expressed as thousands of people.

create view wi_counties_pop_thousands as
select
  name,
  pop_est * 1e-3 as pop_est_thousands
from wi_counties_pop

Interactive: Database Views (spoilers)

Level 2:

Make a point spatial view of county centroids including the population field (you may use multiple views), and make a map that labels those centroids with county population. It may require the use of the SpatiaLite GUI, QGIS DB Manager, or the ArcPro Create Database View tool to register the view properly, in which case line 1 can be omitted from the SQL code below:

create view wi_county_centroids_pop as
select
  a.objectid,
  st_centroid(shape) as shape,
  a.name,
  pop_est
from wi_counties as a
join wi_counties_pop as b using (fips)

Interactive: Geoprocessing

select
  rivers.name,
  countries.name,
  st_intersects(rivers.shape, countries.shape)
from rivers
join countries
on st_intersects(rivers.shape, countries.shape)
where countries.continent = "South America"
select 
  max(objectid) as objectid, 
  CONTINENT, 
  st_union(shape) as shape 
from countries 
Group by CONTINENT