A SQL demo: stored procedures

Author

Beata Sirowy

The document is created in Quarto accessed from RStudio connected to a locally hosted PostgreSQL database.

1. Loading libraries

library(tidyverse)    
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(DBI)    
library(RPostgres)    
library(dbplyr)

Attaching package: 'dbplyr'

The following objects are masked from 'package:dplyr':

    ident, sql

2. Establishing a database connection (PostgreSQL)

I use RPostgres package to connent to a locally hosted PostgreSQL database.

con <- DBI::dbConnect(            
  RPostgres::Postgres(),            
  dbname = 'postgres',            
  host = 'localhost',            
  port = 5432,            
  user = 'postgres',            
  password = 'abcd'     )

3. Stored procedures

  • pieces of reusable code that runs inside a database server,
  • can be written in multiple languages,

  • the goal: to have fewer SQL statements which can increase efficiency if we deal with many repeatable processes,

  • usually not transferable to other database systems,

  • harder to test and modify,

  • should be used with caution, e.g. when we deal with a major performance problem or have some rule that must be enforced and is difficult to express with regular SQL statements (e.g. only one of two given fields has to be NOT NULL, it does not matter which one)

An example: updating the timestamp with a trigger function

We need a function that automatically updates a timestamp in a given row - PostgreSQL does not update the timestamp upon an update in a row unless the user requests such an update.

To exemplify, we use a table created in a previous demo document:


{sql connection=con}

CREATE TABLE comment ( 
id SERIAL, 
content TEXT NOT NULL, 
account_id INTEGER REFERENCES account(id) ON DELETE CASCADE,
post_id INTEGER REFERENCES post(id) ON DELETE CASCADE,
rating INTEGER, 
created_at TIMESTAMP NOT NULL DEFAULT NOW(), 
updated_at TIMESTAMP NOT NULL DEFAULT NOW (), 
PRIMARY KEY(id)
);

Updating the “rating” column doesn’t change the timestamp in updated_at column

UPDATE comment SET rating = rating +1
  WHERE post_id = 1 AND account_id = 1;

To update the timestamp we need to request it

UPDATE fav SET rating = rating +1, updated_at = NOW()
  WHERE post_id = 1 AND account_id = 1;

We can deal with a stored procedure - in this case a trigger function (when X happens, do Y.)

The function is written in plpgsql language (in built PostgreSQL language - works most efficient for stored procedures).

The stored procedure will automatically update the timestamp with any update in the table..


CREATE OR REPLACE FUNCTION trigger_set_timestamp()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;

$$ LANGUAGE plpgsql;


CREATE TRIGGER set_timestamp
BEFORE UPDATE ON comment
FOR EACH ROW
EXECUTE PROCEDURE trigger_set_timestamp();

The function in my database was executed from the command line / psql shell.

Here is the “comment” table before an update:

After an update: in the “rating” column, the timestamp in the “updated_at” column was changed automatically: