A SQL demo: subqueries, concurrency and transactions

The document is created in Quarto accessed from RStudio.

1. Loading libraries

library(tidyverse)   
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(DBI)   
library(RPostgres)   
library(dbplyr)

Attaching package: 'dbplyr'

The following objects are masked from 'package:dplyr':

    ident, sql

2. Establishing a database connection (PostgreSQL)

I use RPostgres package to connent to a locally hosted PostgreSQL database.

con <- DBI::dbConnect(         
  RPostgres::Postgres(),         
  dbname = 'postgres',         
  host = 'localhost',         
  port = 5432,         
  user = 'postgres',         
  password = 'abcd'  
  )

3. Subqueries

The following two queries can be combined in one:

 


SELECT id FROM account
WHERE email = 'ed@umich.edu';
1 records
id
1

SELECT content FROM comment
WHERE account_id = 1;
2 records
content
I agree
Someone should make “EasySoup” or something like that

 

Here is a subquery combining the two previous queries


SELECT content FROM comment
WHERE account_id = (SELECT id FROM account WHERE email = 'ed@umich.edu');
2 records
content
I agree
Someone should make “EasySoup” or something like that

 

It is worth remembering that subqueries may affect database performance, slowing it down, as the database stores the result of the inner query as a temporary table. Also, they are more difficult to optimize. The use of subqueries should be therefore limited in high-performance situations, like online systems. Subqueries are often used in data mining projects where performance is not critical.

Image copyright: CH. R. Severance

3. Concurrency and transactions

Databases are designed to accept SQL commands from a variety of sources simultaneously and process them atomically (e.g. with simultaneous commands regarding one row one needs to be completed before completing another one).

Single SQL statements modifying a database such as INSERT INTO, UPDATE, DELETE FROM are atomic.

To implement atomicity a database “locks” specific area (e.g. a row or column) before it starts a SQL command that might change that area.

All other access to that area must wait until the area is unlocked.

The efficiency of a locking mechanism (lock granularity/implementation of transactions) in a database is an important aspect of its overall performance.

Image copyright: Ch. R. Severance

ON CONFLICT clause

Sometimes due to UNIQUE / primary key constraints in a table, we cannot update a row with a value that was already inserted.

An UPDATE clause can be used to limit the update to the “rating” column



UPDATE fav SET rating = rating +1
  WHERE post_id = 1 AND account_id = 1
RETURNING *;

In such cases we can add ON CONFLICT statement with INSERT INTO clause, specifying an alternative course of action with an UPDATE clause



INSERT INTO fav (post_id, account_id, rating)
  VALUES (1,1,1)
  ON CONFLICT (post_id, account_id) 
  DO UPDATE SET rating = fav.rating + 1
  RETURNING *;

RETURNING clause returns the new data.

Multi - Statement Transactions

Sometimes we want to put more than one statement in one transaction. For exampe when a client selects a product (for example a flight ticket) they usually have some time to provide the necessary details and eventually confirm the payment.

In such cases, the database temporarily locks a specific area - in this case a flight ticket - making it unavailable for other queries/ clients. If the client does not complete the transaction within a given time - usually max few minutes - the database removes the lock and the area can be accessed by other clients. The duration of the lock is dependent upon the type of the transaction - in some cases, it may be just a few seconds.

SQL clauses used with multi-statement transactions include:

  • BEGIN: indicates the beginning of a multi statement transaction,

  • ROLLBACK: cancels the transaction, returning to the initial situation.

  • COMMIT: confirms the transaction / updates the database.

  • FOR UPDATE OF: creates a temporary lock.

See the following examples: