- The task
- Available instruments
- A note on longitude
- The mathematics: unfortunately, Earth is not flat
- Implementation: SQL, Looker API, R, JS
2021-06-29, Tallinn
Alexander Matrunich, data scientist
Looker makes use of a simple modeling language called LookML that lets data teams define the relationships in their database so business users can explore, save, and download data without needing to know SQL.
In terms of R tools, Looker is like a browser-based mix of Shiny and RMarkdown on the top of an SQL database.
The most of data wrangling and calculations are to be run in SQL. Then user’s web-browser receives the data (up to 5K rows) and builds a table, or a visualization, or a set of them (a dashboard).
Some calculations can be applied to the received dataset on the browser level.
Why the 5K limit? To save your browser.
dimension_group: partner_life_cycle {
type: duration
intervals: [day, week, month, year]
description: "Duration between first and last ride"
sql_start: ${partner_first_ride_raw} ;;
sql_end: ${partner_last_ride_raw} ;;
}
dimension: partner_rides_per_week {
description: "Rough estimation of partner's supply intensity.
It is zero for life cycles shorter than 7 days."
type: number
sql:
CASE WHEN ${days_partner_life_cycle} >= 7
THEN 7.0 * ${partner_total_rides} / ${days_partner_life_cycle}
ELSE 0 END ;;
value_format_name: decimal_1
}
Looker heatmap example
is_within
matching;Looker provides a heatmaps visualization, but it is limited to datasets up to 5000 rows.
Looker heatmap example
Let’s round off! E.g., how many orders did happen near the point 51.123;32.101?
Locations in Tallinn
Locations in Kampala
is_within
A collection of named polygons. When plotting, we match names from the dataset to names of the map layer. Looker built-in layers:
ST_Within(geom1, geom2)
Not tested, but likely it should kill the database.
ST_Within(geom1, geom2)
In Christopher Columbus time Latitude was the only number they had. The strategy was to traverse a parallel before turning left or right for destination; watching for clouds or birds. <…> Read “Longitude” by Dava Sobel.
— Graham, Preferred order of writing latitude & longitude tuples in GIS services
John Harrison
A_Y_X
, e.g. 2000_-12_234
.hello
The point of the square problem fixed. The issue was that I calculated the distance to the prime meridian from the point itself. But the distance from the point Is shorter than the distance from the point with similar longitude lying on the closest to the equator side of the square.
Artifacts: same square ID with different coordinates. A problem with close-to-zero longitude.
The issue with square IDs duplicates looks solved. The problem was with unnecessary additional transitions from meters to degrees and back.
User-defined SQL functions
CREATE OR REPLACE FUNCTION f_sql_meters_to_prime_meridian (NUMERIC(10,6), NUMERIC(10,6)) -- Parameters: latitude, longitude RETURNS INTEGER STABLE AS $$ SELECT CAST( $2 * PI() / 180 * 6.3781 * 10^6 * COS($1 * PI() / 180 ) AS INTEGER) $$ LANGUAGE SQL CREATE OR REPLACE FUNCTION f_sql_meters_to_equator (NUMERIC(10,6)) -- Parameters: latitude RETURNS INTEGER STABLE AS $$ SELECT CAST( $1 * PI() / 180 * 6.3781 * 10^6 AS INTEGER) $$ LANGUAGE SQL
boltr
library(boltr) lkr_login() squares <- lkr_dash_lookml_query(" - name: add_a_unique_name_1621950537 title: Untitled Visualization model: kpi explore: driver_state_log type: looker_map fields: [ driver_state_log.square_id, driver_state_log.square_side_latitude_closest_to_equator, driver_state_log.square_side_latitude_farest_from_equator, driver_state_log.square_side_longitude_closest_to_meridian0, driver_state_log.square_side_longitude_farest_from_meridian0] filters: # city.name: 'Tallinn' driver_state_log.location_square_side_meters: '2000' driver_state_log.created_hour: 13 months ago limit: 1000000")
$ id <chr> "2000_1935_1481", "2000_1992_2… $ `side latitude closest to equator` <dbl> 34.76500, 35.78909, 36.59758, … $ `side latitude farest from equator` <dbl> 34.78297, 35.80705, 36.61554, … $ `side longitude closest to meridian0` <dbl> 32.389932, 51.429092, 31.77746… $ `side longitude farest from meridian0` <dbl> 32.411802, 51.451240, 31.79983…
mutate(across(-id, as.numeric)) %>% pivot_longer(-id, names_to = c("axis", "relation"), names_prefix = "side ", names_pattern = "^(.+?) (.+?) ", values_to = "degrees")
id axis relation degrees <chr> <chr> <chr> <dbl> 1 2000_1935_1481 latitude closest 34.8 2 2000_1935_1481 latitude farest 34.8 3 2000_1935_1481 longitude closest 32.4 4 2000_1935_1481 longitude farest 32.4 5 2000_1992_2322 latitude closest 35.8 6 2000_1992_2322 latitude farest 35.8 7 2000_1992_2322 longitude closest 51.4 8 2000_1992_2322 longitude farest 51.5
id latitude longitude eqclose mer0close order <chr> <dbl> <dbl> <lgl> <lgl> <dbl> 1 2000_1935_1481 34.8 32.4 TRUE TRUE 1 2 2000_1935_1481 34.8 32.4 FALSE TRUE 2 3 2000_1935_1481 34.8 32.4 FALSE FALSE 3 4 2000_1935_1481 34.8 32.4 TRUE FALSE 4 5 2000_1992_2322 35.8 51.4 TRUE TRUE 1 6 2000_1992_2322 35.8 51.4 FALSE TRUE 2 7 2000_1992_2322 35.8 51.5 FALSE FALSE 3 8 2000_1992_2322 35.8 51.5 TRUE FALSE 4
sp::SpatialPolygonsDataFrame
wants row names!geojsonio::geojson_write()
is very slow.geojsonio::topojson_write()
“is temporarily defunct; check back later”.sf::st_write("squares.geojson")
!We have to use a JavaScript package to convert geojson
to topojson
. Python’s topojson.Topology(data).to_json()
is also an option.
system2("npx", c( "geo2topo", "squares.geojson", "-o squares.topojson"))
These slides as HTML: https://rpubs.com/malexan/heatmaps
git-repo with source markdown: https://gitlab.com/Matrunich/looker-heatmaps