format: html editor: visual —

SQL demo with RStudio

1. Loading libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(DBI)
library(RPostgres)
library(dbplyr)

Attaching package: 'dbplyr'

The following objects are masked from 'package:dplyr':

    ident, sql

2. Establishing a database connection

con <- DBI::dbConnect(
  RPostgres::Postgres(),
  dbname = 'postgres',
  host = 'localhost',
  port = 5432,
  user = 'postgres',
  password = 'abcd' 
)

3. Creating a table


CREATE TABLE tracks01 ( 
id SERIAL,
title VARCHAR(256),
artist VARCHAR(128),
album TEXT,
genre VARCHAR(128),
rating INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id)
);

The structure of the table (accessed from psql shell)

4. Copying data from a csv file into the table

This is done from a command line in psql shelll

postgres=# COPY tracks(title, artist, album, genre, rating, ) 
FROM 'D:\Datasets\library.csv' 
WITH DELIMITER ',' CSV; 

COPY 296
postgres=#

This table has a lot of vertical replication which is not optimal performance-wise. A more efficient solution would be a relational database including several smaller tables, as demonstrated in the following sections.

5. Creating a new database


CREATE DATABASE music_ WITH OWNER 'postgres' ENCODING 'UTF8';

Modifying the database connection details in RStudio/ Quatro so it connects to a new database

{r}
con_ <- DBI::dbConnect(
  RPostgres::Postgres(),
  dbname = 'music_',
  host = 'localhost',
  port = 5432,
  user = 'postgres',
  password = 'xxxxxx')

a) Conceptual map of a database

I am creating a diagram made with the drawSQL tool with a specification of primary, logical, and foreign keys.

b) Creating tables

The clause UNIQUE is used with columns that function as logical keys (strings to be indexed).

The expression ON DELETE CASCADE indicates that deleting a row in a table including the referenced primary key (a parent row) will remove corresponding rows in the table that is connected to it with a foreign key (child rows). For example, deleting the genre “rock” from the table “genre” will remove records that are classified as “rock” in the “track” table.

Other ON DELETE choices include

  • RESTRICT (default) - don’t allow changes that break the constraint;

  • SET NULL - set the foreign key columns in the child rows to null.


CREATE TABLE artist (
id SERIAL,
name VARCHAR(128) UNIQUE,
PRIMARY KEY(id)
);

CREATE TABLE album (
id SERIAL,
title VARCHAR(128) UNIQUE,
artist_id INTEGER REFERENCES artist(id) ON DELETE CASCADE,
PRIMARY KEY(id)
);

CREATE TABLE genre (
id SERIAL,
genre VARCHAR (128) UNIQUE,
PRIMARY KEY(id)
);

CREATE TABLE track (
id SERIAL,
title VARCHAR(128),
rating INTEGER,
album_id INTEGER REFERENCES album(id) ON DELETE CASCADE,
genre_id INTEGER REFERENCES genre(id) ON DELETE CASCADE,
UNIQUE(title, album_id),
PRIMARY KEY(id)
);

As the title alone is not unique. in the last table, I use the combination of title and album_id with the UNIQUE clause.

The preview of track table accessed from psql shell/ command line:

c) Inserting data into tables


INSERT INTO artist (name) VALUES ('Led Zeppelin');

INSERT INTO artist (name) VALUES ('AC/DC');

SELECT * FROM artist;
2 records
id name
1 Led Zeppelin
2 AC/DC

INSERT INTO album (title, artist_id) VALUES ('Who Made Who', 2);

INSERT INTO album (title, artist_id) VALUES ('IV', 1);

SELECT * FROM album;
2 records
id title artist_id
1 Who Made Who 2
2 IV 1


INSERT INTO genre (genre) VALUES ('Rock');

INSERT INTO genre (genre) VALUES ('Metal');

SELECT * FROM genre;
2 records
id genre
1 Rock
2 Metal

INSERT INTO track (title, rating, album_id, genre_id)
VALUES ('Black Dog', 5, 2, 1);

INSERT INTO track (title, rating, album_id, genre_id)
VALUES ('Stairweay', 5, 2, 1);

INSERT INTO track (title, rating, album_id, genre_id)
VALUES ('About to Rock', 4, 1, 2);

INSERT INTO track (title, rating, album_id, genre_id)
VALUES ('Who Made Who', 5, 1, 2);

select * FROM track;
4 records
id title rating album_id genre_id
1 Black Dog 5 2 1
2 Stairweay 5 2 1
3 About to Rock 4 1 2
4 Who Made Who 5 1 2

d) Using joins across tables

Inner join

Selecting album title and artist name requires joining two tables (album, artist) on artist_id (which is the primary key in “artist” table and a foreign key in “album” table).


SELECT album.title, artist.name
FROM album JOIN artist ON album.artist_id = artist.id;
2 records
title name
Who Made Who AC/DC
IV Led Zeppelin

Here is a version of the table showing the inner join.


SELECT album.title, album.artist_id, artist.id, artist.name
FROM album INNER JOIN artist ON album.artist_id = artist.id;
2 records
title artist_id id name
Who Made Who 2 2 AC/DC
IV 1 1 Led Zeppelin

Cross join

Joining the table track with the table genre on CROSS JOIN gives all the combinations of records in these tables:


SELECT track.title, track.genre_id, genre.id, genre.genre
FROM track CROSS JOIN genre;
8 records
title genre_id id genre
Black Dog 1 1 Rock
Stairweay 1 1 Rock
About to Rock 2 1 Rock
Who Made Who 2 1 Rock
Black Dog 1 2 Metal
Stairweay 1 2 Metal
About to Rock 2 2 Metal
Who Made Who 2 2 Metal

An inner join in this case would include only the rows with matching id - in a sense an inner join can be seen as a type of a cross join with a WHERE clause.


SELECT track.title, track.genre_id, genre.id, genre.genre
FROM track INNER JOIN genre ON track.genre_id = genre.id;
4 records
title genre_id id genre
Black Dog 1 1 Rock
Stairweay 1 1 Rock
About to Rock 2 2 Metal
Who Made Who 2 2 Metal

The join reconstructs the vertical replication in the table, which is often necessary for UI.

An inner join among multiple tables


SELECT track.title, artist.name, album.title, genre.genre
FROM track
JOIN genre ON track.genre_id = genre.id
JOIN album ON track.album_id = album.id
JOIN artist ON album.artist_id = artist.id;
4 records
title name title genre
Black Dog Led Zeppelin IV Rock
Stairweay Led Zeppelin IV Rock
About to Rock AC/DC Who Made Who Metal
Who Made Who AC/DC Who Made Who Metal