Introducción a PostgresSQL en R

Visualización Científica

Autor/a
Afiliación

Gabriela Díaz and Katherine Taján

Fecha de publicación

7 de abril de 2024

En este documento encontrará diferentes consultas a una base datos utilizando el paquete PostgreSQL, en conexión con una intancia creada en Docker previamente.

Para esto, se requieren las siguientes librerías:

library(RPostgreSQL)
library(DBI)

Conexión con Docker:

Se conecta con la base de datos creada en Docker.

connection_db <- function() {
  return(dbConnect(RPostgres::Postgres(), 
                dbname = "gabriela_db", 
                host = "localhost", 
                port = 5432, 
                user = "gabriela_user", 
                password = "password"))
}

connect <- connection_db()

Ahora, se crean las tablas con las que se va a trabajar a lo largo de los ejercicios.

Creación de la tabla \(\texttt{members}\)

dbBegin(connect)

dbGetQuery(connect, "DROP TABLE IF EXISTS members CASCADE")
dbGetQuery(connect, "
CREATE TABLE members
    (
       memid integer NOT NULL, 
       surname character varying(200) NOT NULL, 
       firstname character varying(200) NOT NULL, 
       address character varying(300) NOT NULL, 
       zipcode integer NOT NULL, 
       telephone character varying(20) NOT NULL, 
       recommendedby integer,
       joindate timestamp NOT NULL,
       CONSTRAINT members_pk PRIMARY KEY (memid),
       CONSTRAINT fk_members_recommendedby FOREIGN KEY (recommendedby)
         REFERENCES members(memid) ON DELETE SET NULL
    );
")
dbCommit(connect)
dbDisconnect(connect)

Creación de la tabla \(\texttt{facilities}\)

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "DROP TABLE IF EXISTS facilities CASCADE")
dbGetQuery(connect, "
CREATE TABLE facilities
    (
       facid integer NOT NULL, 
       name character varying(100) NOT NULL, 
       membercost numeric NOT NULL, 
       guestcost numeric NOT NULL, 
       initialoutlay numeric NOT NULL, 
       monthlymaintenance numeric NOT NULL, 
       CONSTRAINT facilities_pk PRIMARY KEY (facid)
    );
")
dbCommit(connect)
dbDisconnect(connect)

Creación de la tabla \(\texttt{bookings}\)

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "DROP TABLE IF EXISTS bookings")
dbGetQuery(connect, "
CREATE TABLE bookings
    (
       bookid integer NOT NULL, 
       facid integer NOT NULL, 
       memid integer NOT NULL, 
       starttime timestamp NOT NULL,
       slots integer NOT NULL,
       CONSTRAINT bookings_pk PRIMARY KEY (bookid),
       CONSTRAINT fk_bookings_facid FOREIGN KEY (facid) REFERENCES facilities(facid),
       CONSTRAINT fk_bookings_memid FOREIGN KEY (memid) REFERENCES members(memid)
    );
")
dbCommit(connect)
dbDisconnect(connect)

Se insertan los datos en cada una de las tablas. Para empezar, se realizará con la tabla de \(\texttt{facilities}\). Para el resto, el código es análogo.

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
INSERT INTO facilities (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance) VALUES
(0, 'Tennis Court 1', 5, 25, 10000, 200),
(1, 'Tennis Court 2', 5, 25, 8000, 200),
(2, 'Badminton Court', 0, 15.5, 4000, 50),
(3, 'Table Tennis', 0, 5, 320, 10),
(4, 'Massage Room 1', 35, 80, 4000, 3000),
(5, 'Massage Room 2', 35, 80, 4000, 3000),
(6, 'Squash Court', 3.5, 17.5, 5000, 80),
(7, 'Snooker Table', 0, 5, 450, 15),
(8, 'Pool Table', 0, 5, 400, 15);
")
dbCommit(connect)
dbDisconnect(connect)

Sección 1: Simple SQL Queries

  1. How can you retrieve all the information from the cd.facilities table?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT * FROM facilities;")
dbCommit(connect)
dbDisconnect(connect)
  1. You want to print out a list of all of the facilities and their cost to members. How would you retrieve a list of only facility names and costs?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT name, membercost FROM facilities;") 
dbCommit(connect)
dbDisconnect(connect)
  1. How can you produce a list of facilities that charge a fee to members?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT * FROM facilities WHERE membercost > 0;") 
dbCommit(connect)
dbDisconnect(connect)
  1. How can you produce a list of facilities that charge a fee to members, and that fee is less than 1/50th of the monthly maintenance cost? Return the facid, facility name, member cost, and monthly maintenance of the facilities in question.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT facid, name, membercost, monthlymaintenance 
FROM facilities 
WHERE membercost > 0 and (membercost < monthlymaintenance/50.0);  
") 
dbCommit(connect)
dbDisconnect(connect)
  1. How can you produce a list of all facilities with the word ‘Tennis’ in their name?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT * FROM facilities 
WHERE name LIKE '%Tennis%';
")
dbCommit(connect)
dbDisconnect(connect)
  1. How can you retrieve the details of facilities with ID 1 and 5? Try to do it without using the OR operator.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT * FROM facilities 
WHERE facid in (1,5);      
")
dbCommit(connect)
dbDisconnect(connect)
  1. How can you produce a list of facilities, with each labelled as ‘cheap’ or ‘expensive’ depending on if their monthly maintenance cost is more than $100? Return the name and monthly maintenance of the facilities in question.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT name, 
    CASE WHEN (monthlymaintenance > 100) THEN 'expensive'
    ELSE 'cheap'
    END AS cost
FROM facilities;        
")
dbCommit(connect)
dbDisconnect(connect)
  1. How can you produce a list of members who joined after the start of September 2012? Return the memid, surname, firstname, and joindate of the members in question.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT memid, surname, firstname, joindate 
    FROM members
    WHERE joindate >= '2012-09-01';       
")
dbCommit(connect)
dbDisconnect(connect)
  1. How can you produce an ordered list of the first 10 surnames in the members table? The list must not contain duplicates.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT DISTINCT surname 
FROM members
ORDER BY surname
limit 10;        
")
dbCommit(connect)
dbDisconnect(connect)
  1. You, for some reason, want a combined list of all surnames and all facility names. Yes, this is a contrived example :-). Produce that list!
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT surname FROM members
UNION
SELECT name FROM facilities;      
")
dbCommit(connect)
dbDisconnect(connect)
  1. You’d like to get the signup date of your last member. How can you retrieve this information?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT MAX(joindate) AS latest FROM members;     
")
dbCommit(connect)
dbDisconnect(connect)
  1. You’d like to get the first and last name of the last member(s) who signed up - not just the date. How can you do that?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT firstname, surname, joindate
FROM members
WHERE joindate = (select max(joindate) FROM members);
")
dbCommit(connect)
dbDisconnect(connect)

Sección 2: Joins and Subqueries

  1. How can you produce a list of the start times for bookings by members named ‘David Farrell’?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT bks.starttime FROM   bookings bks
INNER JOIN members mems ON mems.memid = bks.memid
WHERE   mems.firstname='David' AND mems.surname='Farrell';
")
dbCommit(connect)
dbDisconnect(connect)

2.How can you produce a list of the start times for bookings for tennis courts, for the date ‘2012-09-21’? Return a list of start time and facility name pairings, ordered by the time.

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT bks.starttime AS start, facs.name AS name
    FROM facilities facs
    INNER JOIN bookings bks
    ON facs.facid = bks.facid
    WHERE 
        facs.name IN ('Tennis Court 2','Tennis Court 1') AND
        bks.starttime >= '2012-09-21' AND
        bks.starttime < '2012-09-22'
ORDER BY bks.starttime;
")
dbCommit(connect)
dbDisconnect(connect)

3.How can you output a list of all members who have recommended another member? Ensure that there are no duplicates in the list, and that results are ordered by (surname, firstname).

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT DISTINCT recs.firstname AS firstname, recs.surname AS surname
    FROM 
        members mems
        INNER JOIN members recs
        ON recs.memid = mems.recommendedby
ORDER BY surname, firstname;
")
dbCommit(connect)
dbDisconnect(connect)

4.How can you output a list of all members, including the individual who recommended them (if any)? Ensure that results are ordered by (surname, firstname).

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT mems.firstname AS memfname, 
       mems.surname AS memsname, 
       recs.firstname AS recfname, 
       recs.surname AS recsname
    FROM 
        members mems
        LEFT OUTER JOIN members recs
        ON recs.memid = mems.recommendedby
ORDER BY memsname, memfname;
")
dbCommit(connect)
dbDisconnect(connect)
  1. How can you produce a list of all members who have used a tennis court? Include in your output the name of the court, and the name of the member formatted as a single column. Ensure no duplicate data, and order by the member name followed by the facility name.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT DISTINCT mems.firstname || ' ' || mems.surname AS member, facs.name AS facility
    FROM members mems
        INNER JOIN bookings bks
        ON mems.memid = bks.memid
        INNER JOIN facilities facs
        ON bks.facid = facs.facid
WHERE facs.name IN ('Tennis Court 2','Tennis Court 1')
ORDER BY member, facility;
")
dbCommit(connect)
dbDisconnect(connect)

6.How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? Remember that guests have different costs to members (the listed costs are per half-hour ‘slot’), and the guest user is always ID 0. Include in your output the name of the facility, the name of the member formatted as a single column, and the cost. Order by descending cost, and do not use any subqueries.

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT mems.firstname || ' ' || mems.surname AS member, facs.name AS facility, 
    CASE WHEN mems.memid = 0 THEN   bks.slots*facs.guestcost
    ELSE bks.slots*facs.membercost
    END AS cost
  FROM members mems                
  INNER JOIN bookings bks
    ON mems.memid = bks.memid
  INNER JOIN facilities facs
    ON bks.facid = facs.facid
  WHERE
        bks.starttime >= '2012-09-14' AND 
        bks.starttime < '2012-09-15' AND (
            (mems.memid = 0 AND bks.slots*facs.guestcost > 30) OR
            (mems.memid != 0 AND bks.slots*facs.membercost > 30)
        )
ORDER BY cost desc; 
")
dbCommit(connect)
dbDisconnect(connect)

7.How can you output a list of all members, including the individual who recommended them (if any), without using any joins? Ensure that there are no duplicates in the list, and that each firstname + surname pairing is formatted as a column and ordered.

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT DISTINCT mems.firstname || ' ' ||  mems.surname AS member,
                  (select recs.firstname || ' ' || recs.surname as recommender 
                 FROM members recs 
                 WHERE recs.memid = mems.recommendedby
                   )
FROM members mems
ORDER BY member;
")
dbCommit(connect)
dbDisconnect(connect)
  1. The Produce a list of costly bookings exercise contained some messy logic: we had to calculate the booking cost in both the WHERE clause and the CASE statement. Try to simplify this calculation using subqueries. For reference, the question was:

How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? Remember that guests have different costs to members (the listed costs are per half-hour ‘slot’), and the guest user is always ID 0. Include in your output the name of the facility, the name of the member formatted as a single column, and the cost. Order by descending cost.

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT member, facility, cost from (
    SELECT 
        mems.firstname || ' ' || mems.surname AS member,
        facs.name AS facility,
        CASE
            WHEN mems.memid = 0 THEN bks.slots*facs.guestcost
            ELSE bks.slots*facs.membercost
        END AS cost
        FROM members mems
            INNER JOIN bookings bks
                ON mems.memid = bks.memid
            INNER JOIN facilities facs
                ON bks.facid = facs.facid
    WHERE
            bks.starttime >= '2012-09-14' AND
            bks.starttime < '2012-09-15'
    ) AS bookings
    WHERE cost > 30
ORDER BY cost desc;  
")
dbCommit(connect)
dbDisconnect(connect)

Sección 3: Modifying data

1.The club is adding a new facility - a spa. We need to add it into the facilities table. Use the following values:

facid: 9, Name: ‘Spa’, membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
INSERT INTO facilities VALUES (9, 'Spa', 20, 30, 100000, 800);
")
dbCommit(connect)
dbDisconnect(connect)
  1. In the previous exercise, you learned how to add a facility. Now you’re going to add multiple facilities in one command. Use the following values:

facid: 9, Name: ‘Spa’, membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800. facid: 10, Name: ‘Squash Court 2’, membercost: 3.5, guestcost: 17.5, initialoutlay: 5000, monthlymaintenance: 80

Nota: Dada la condición de unicidad de los ficilities, no es posible volver a añadir una fila con el mismo facid. (Se agrega el siguiente facility)

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
INSERT INTO facilities VALUES (10, 'Squash Court 2', 3.5, 17.5, 5000, 80);
")
dbCommit(connect)
dbDisconnect(connect)
  1. Let’s try adding the spa to the facilities table again. This time, though, we want to automatically generate the value for the next facid, rather than specifying it as a constant. Use the following values for everything else:

Name: ‘Spa’, membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800.

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
INSERT INTO facilities
    (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance)
    SELECT (SELECT max(facid) FROM facilities)+1, 'Spa', 20, 30, 100000, 800;
")
dbCommit(connect)
dbDisconnect(connect)
  1. We made a mistake when entering the data for the second tennis court. The initial outlay was 10000 rather than 8000: you need to alter the data to fix the error.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
UPDATE facilities
    SET initialoutlay = 10000
    WHERE facid = 1;  
")
dbCommit(connect)
dbDisconnect(connect)
  1. We want to increase the price of the tennis courts for both members and guests. Update the costs to be 6 for members, and 30 for guests.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
UPDATE facilities
SET membercost = 6, guestcost = 30
WHERE facid in (0,1); 
")
dbCommit(connect)
dbDisconnect(connect)
  1. We want to alter the price of the second tennis court so that it costs 10% more than the first one. Try to do this without using constant values for the prices, so that we can reuse the statement if we want to.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
UPDATE facilities facs
    SET membercost = (SELECT membercost * 1.1 FROM facilities WHERE facid = 0),
        guestcost = (SELECT guestcost * 1.1 FROM facilities WHERE facid = 0)
    WHERE facs.facid = 1;
")
dbCommit(connect)
dbDisconnect(connect)
  1. As part of a clearout of our database, we want to delete all bookings from the cd.bookings table. How can we accomplish this?

Nota: Se deja el código expresado para propósitos del ejercicio, sin embargo no se ejecutará para poder contar con la información de la tabla \(\texttt{bookings}\) en futuros ejercicios.

connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "DELETE FROM bookings;")
# No se tiene en cuenta el comando comit para no ejecutar el código, pero se deja expresado como parte de
# lo que sería la ejecución completa.

#dbCommit(connect)

dbDisconnect(connect)
  1. We want to remove member 37, who has never made a booking, from our database. How can we achieve that?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "DELETE FROM members WHERE memid = 37;")
dbCommit(connect)
dbDisconnect(connect)
  1. In our previous exercises, we deleted a specific member who had never made a booking. How can we make that more general, to delete all members who have never made a booking?
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "DELETE FROM members WHERE memid NOT IN (SELECT memid from bookings);")
dbCommit(connect)
dbDisconnect(connect)

Sección 4: Aggregation

  1. For our first foray into aggregates, we’re going to stick to something simple. We want to know how many facilities exist - simply produce a total count.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT COUNT(*) FROM facilities;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a count of the number of facilities that have a cost to guests of 10 or more.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT COUNT(*) FROM facilities WHERE guestcost >= 10;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a count of the number of recommendations each member has made. Order by member ID.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT recommendedby, count(*) 
    FROM members
    WHERE recommendedby is not null
    GROUP BY recommendedby
ORDER BY recommendedby;
")
  1. Produce a list of the total number of slots booked per facility. For now, just produce an output table consisting of facility id and slots, sorted by facility id.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT facid, sum(slots) AS total_slots
    FROM bookings
    GROUP BY facid
ORDER BY facid;
")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of the total number of slots booked per facility in the month of September 2012. Produce an output table consisting of facility id and slots, sorted by the number of slots.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT facid, sum(slots) as total_slots
    FROM bookings
    WHERE   starttime >= '2012-09-01' AND starttime < '2012-10-01'
    GROUP BY facid
ORDER BY sum(slots);
")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of the total number of slots booked per facility per month in the year of 2012. Produce an output table consisting of facility id and slots, sorted by the id and month.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT facid, EXTRACT(MONTH FROM starttime) AS month, sum(slots) AS total_slots
    FROM bookings
    WHERE EXTRACT(YEAR FROM starttime) = 2012
    GROUP BY facid, month
ORDER BY facid, month;
")
dbCommit(connect)
dbDisconnect(connect)
  1. Find the total number of members (including guests) who have made at least one booking.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT COUNT(DISTINCT memid) FROM bookings;
")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of facilities with more than 1000 slots booked. Produce an output table consisting of facility id and slots, sorted by facility id.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "
SELECT facid, sum(slots) AS total_slots
        FROM bookings
        GROUP BY facid
        HAVING sum(slots) > 1000
        ORDER BY facid;
")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of facilities along with their total revenue. The output table should consist of facility name and revenue, sorted by revenue. Remember that there’s a different cost for guests and members!
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT faci.name, SUM(slots * CASE
              WHEN memid = 0 THEN faci.guestcost
              ELSE faci.membercost END) AS revenue
      FROM bookings book
      INNER JOIN facilities faci ON book.facid = faci.facid
      GROUP BY faci.name
ORDER BY revenue;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of facilities with a total revenue less than 1000. Produce an output table consisting of facility name and revenue, sorted by revenue. Remember that there’s a different cost for guests and members!
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT NAME, revenue FROM(SELECT faci.name, SUM(CASE WHEN memid = 0 THEN slots * faci.guestcost
                ELSE slots * membercost END) AS revenue
        FROM bookings book
        INNER JOIN facilities faci ON book.facid = faci.facid
        GROUP BY faci.name) AS agg WHERE revenue < 1000
ORDER BY revenue; ")
dbCommit(connect)
dbDisconnect(connect)
  1. Output the facility id that has the highest number of slots booked. For bonus points, try a version without a LIMIT clause. This version will probably look messy!
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT facid, SUM(slots) AS TotalSlots
FROM bookings
    GROUP BY facid
ORDER BY SUM(slots) DESC LIMIT 1;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of the total number of slots booked per facility per month in the year of 2012. In this version, include output rows containing totals for all months per facility, and a total for all months for all facilities. The output table should consist of facility id, month and slots, sorted by the id and month. When calculating the aggregated values for all months and all facids, return null values in the month and facid columns.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT facid, EXTRACT(MONTH FROM starttime) AS MONTH, SUM(slots) AS slots
    FROM bookings
    WHERE starttime >= '2012-01-01' AND starttime < '2013-01-01'
    GROUP BY ROLLUP(facid, MONTH)
ORDER BY facid, MONTH;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of the total number of hours booked per facility, remembering that a slot lasts half an hour. The output table should consist of the facility id, name, and hours booked, sorted by facility id. Try formatting the hours to two decimal places.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT faci.facid, faci.name,
    TRIM(to_char(SUM(book.slots)/2.0, '9999999999999999D99')) AS TotalHours

    FROM bookings book
    INNER JOIN facilities faci ON faci.facid = book.facid
    GROUP BY faci.facid, faci.name
ORDER BY faci.facid;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of each member name, id, and their first booking after September 1st 2012. Order by member ID.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT miemb.surname, miemb.firstname, miemb.memid, MIN(book.starttime) AS starttime
    FROM bookings book
    INNER JOIN members miemb ON miemb.memid = book.memid WHERE starttime >= '2012-09-01'
    GROUP BY miemb.surname, miemb.firstname, miemb.memid
ORDER BY miemb.memid; ")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of member names, with each row containing the total member count. Order by join date, and include guest members.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT COUNT(*) OVER(PARTITION BY date_trunc('month',joindate) ORDER BY joindate),
    firstname, surname
    FROM members
ORDER BY joindate;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a monotonically increasing numbered list of members (including guests), ordered by their date of joining. Remember that member IDs are not guaranteed to be sequential.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT ROW_NUMBER() OVER(ORDER BY joindate), firstname, surname
    FROM members
ORDER BY  joindate;")
dbCommit(connect)
dbDisconnect(connect)
  1. Output the facility id that has the highest number of slots booked. Ensure that in the event of a tie, all tieing results get output.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT facid, SUM(slots) AS totalslots
    FROM bookings
    GROUP BY facid
    HAVING SUM(slots) = (SELECT MAX(sum2.totalslots) FROM 
    (SELECT SUM(slots) AS totalslots
        FROM bookings GROUP BY facid) AS sum2);")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of members (including guests), along with the number of hours they’ve booked in facilities, rounded to the nearest ten hours. Rank them by this rounded figure, producing output of first name, surname, rounded hours, rank. Sort by rank, surname, and first name.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT firstname, surname,
    ((SUM(book.slots)+10)/20)*10 AS hours,
    RANK() OVER (ORDER BY ((SUM(book.slots)+10)/20)*10 DESC) AS RANK
    FROM bookings book
    INNER JOIN members miemb ON book.memid = miemb.memid
    GROUP BY miemb.memid
ORDER BY RANK, surname, firstname;
")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of the top three revenue generating facilities (including ties). Output facility name and rank, sorted by rank and facility name.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT NAME, RANK FROM (
    SELECT faci.name as name, RANK() over (ORDER BY SUM(CASE
                WHEN memid = 0 THEN slots * faci.guestcost
                ELSE slots * membercost END) DESC) AS RANK
        FROM bookings book
        INNER JOIN facilities faci ON book.facid = faci.facid
        GROUP BY faci.name) AS subq
    where RANK <= 3
ORDER BY RANK; ")
dbCommit(connect)
dbDisconnect(connect)
  1. Classify facilities into equally sized groups of high, average, and low based on their revenue. Order by classification and facility name.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT NAME, CASE WHEN CLASS=1 THEN 'high'
        WHEN CLASS=2 THEN 'average'
        ELSE 'low' 
        END revenue
    FROM(SELECT faci.name AS NAME, NTILE(3) OVER (ORDER BY SUM(CASE WHEN memid = 0 
    THEN slots * faci.guestcost ELSE slots * membercost END) DESC) AS CLASS
        FROM bookings book
        INNER JOIN facilities faci ON book.facid = faci.facid
        GROUP BY faci.name) AS subq
ORDER BY CLASS, NAME; ")
dbCommit(connect)
dbDisconnect(connect)
  1. Based on the 3 complete months of data so far, calculate the amount of time each facility will take to repay its cost of ownership. Remember to take into account ongoing monthly maintenance. Output facility name and payback time in months, order by facility name. Don’t worry about differences in month lengths, we’re only looking for a rough value here!
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT     faci.name AS NAME,
    faci.initialoutlay/((SUM(CASE WHEN memid = 0 then slots * faci.guestcost
            ELSE slots * membercost
            END)/3) - faci.monthlymaintenance) AS months
    FROM bookings book
    INNER JOIN facilities faci ON book.facid = faci.facid
    GROUP BY faci.facid
ORDER BY NAME; ")
dbCommit(connect)
dbDisconnect(connect)
  1. For each day in August 2012, calculate a rolling average of total revenue over the previous 15 days. Output should contain date and revenue columns, sorted by the date. Remember to account for the possibility of a day having zero revenue. This one’s a bit tough, so don’t be afraid to check out the hint!
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT     dategen.date,
    (SELECT SUM(CASE WHEN memid = 0 THEN slots * faci.guestcost
            ELSE slots * membercost END) AS rev
        FROM bookings book
        INNER JOIN facilities faci ON book.facid = faci.facid WHERE book.starttime > dategen.date - interval '14 days' AND book.starttime < dategen.date + interval '1 day')/15 AS revenue
    FROM(SELECT CAST(generate_series(timestamp '2012-08-01', '2012-08-31','1 day') AS date) AS date)  AS dategen
ORDER BY dategen.date;  ")
dbCommit(connect)
dbDisconnect(connect)

Sección 5. Working with Timestamps .

  1. Produce a timestamp for 1 a.m. on the 31st of August 2012.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT CAST('2012-08-31 01:00:00' AS timestamp);")
dbCommit(connect)
dbDisconnect(connect)
  1. Find the result of subtracting the timestamp ‘2012-07-30 01:00:00’ from the timestamp ‘2012-08-31 01:00:00’
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT timestamp '2012-08-31 01:00:00' - timestamp '2012-07-30 01:00:00' AS interval;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a list of all the dates in October 2012. They can be output as a timestamp (with time set to midnight) or a date.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT generate_series(timestamp '2012-10-01', timestamp '2012-10-31', interval '1 day') AS ts;")
dbCommit(connect)
dbDisconnect(connect)
  1. Get the day of the month from the timestamp ‘2012-08-31’ as an integer.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT extract(day FROM timestamp '2012-08-31');  ")
dbCommit(connect)
dbDisconnect(connect)
  1. Work out the number of seconds between the timestamps ‘2012-08-31 01:00:00’ and ‘2012-09-02 00:00:00’
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT EXTRACT(epoch FROM (timestamp '2012-09-02 00:00:00' - '2012-08-31 01:00:00'));")
dbCommit(connect)
dbDisconnect(connect)
  1. For each month of the year in 2012, output the number of days in that month. Format the output as an integer column containing the month of the year, and a second column containing an interval data type.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT EXTRACT(MONTH FROM mes.month) AS MONTH,  (mes.month + interval '1 month') - mes.month AS LENGTH 
FROM
    (SELECT generate_series(timestamp '2012-01-01', timestamp '2012-12-01', interval '1 month') AS MONTH) mes
ORDER BY MONTH;  ")
dbCommit(connect)
dbDisconnect(connect)
  1. For any given timestamp, work out the number of days remaining in the month. The current day should count as a whole day, regardless of the time. Use ‘2012-02-11 01:00:00’ as an example timestamp for the purposes of making the answer. Format the output as a single interval value.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT (date_trunc('month',ts.testts) + interval '1 month')- date_trunc('day', ts.testts) AS remaining
    FROM (SELECT timestamp '2012-02-11 01:00:00' AS testts) ts;")
dbCommit(connect)
dbDisconnect(connect)
  1. Return a list of the start and end time of the last 10 bookings (ordered by the time at which they end, followed by the time at which they start) in the system.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT starttime, starttime + slots*(interval '30 minutes') endtime
    FROM bookings
    ORDER BY endtime DESC, starttime DESC
    LIMIT 10;")
dbCommit(connect)
dbDisconnect(connect)
  1. Return a count of bookings for each month, sorted by month
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT  date_trunc('month', starttime) AS MONTH, COUNT(*)
    FROM  bookings
    GROUP BY MONTH
    ORDER BY MONTH;")
dbCommit(connect)
dbDisconnect(connect)
  1. Work out the utilisation percentage for each facility by month, sorted by name and month, rounded to 1 decimal place. Opening time is 8am, closing time is 8.30pm. You can treat every month as a full month, regardless of if there were some dates the club was not open.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT NAME, MONTH, 
    round((100*slots)/CAST(25*(CAST((MONTH + interval '1 month') AS date)- CAST (MONTH AS date)) AS numeric),1) AS utilisation
    FROM(SELECT faci.name AS NAME, date_trunc('month', starttime) AS MONTH, SUM(slots) AS slots
        FROM bookings book
            INNER JOIN facilities faci  ON book.facid = faci.facid  GROUP BY faci.facid, MONTH) AS inn
ORDER BY NAME, MONTH; ")
dbCommit(connect)
dbDisconnect(connect)

Sección 6: String Operations.

  1. Output the names of all members, formatted as ‘Surname, Firstname’
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT surname || ', ' || firstname AS NAME FROM members; ")
dbCommit(connect)
dbDisconnect(connect)
  1. Find all facilities whose name begins with ‘Tennis’. Retrieve all columns.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT * FROM facilities WHERE NAME LIKE 'Tennis%';")
dbCommit(connect)
dbDisconnect(connect)
  1. You’ve noticed that the club’s member table has telephone numbers with very inconsistent formatting. You’d like to find all the telephone numbers that contain parentheses, returning the member ID and telephone number sorted by member ID.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "SELECT memid, telephone FROM members 
           WHERE telephone SIMILAR TO '%[()]%';")
dbCommit(connect)
dbDisconnect(connect)
  1. The zip codes in our example dataset have had leading zeroes removed from them by virtue of being stored as a numeric type. Retrieve all zip codes from the members table, padding any zip codes less than 5 characters long with leading zeroes. Order by the new zip code.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT LPAD(CAST(zipcode AS char(5)),5,'0') zip 
    FROM members 
ORDER BY zip;")
dbCommit(connect)
dbDisconnect(connect)
  1. You’d like to produce a count of how many members you have whose surname starts with each letter of the alphabet. Sort by the letter, and don’t worry about printing out a letter if the count is 0.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT substr(surname,1,1) AS letter, COUNT(*) AS COUNT
    FROM members 
  GROUP BY letter
  ORDER BY letter")
dbCommit(connect)
dbDisconnect(connect)
  1. The telephone numbers in the database are very inconsistently formatted. You’d like to print a list of member ids and numbers that have had ‘-’,‘(’,‘)’, and ’ ’ characters removed. Order by member id.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect,"SELECT memid, regexp_replace(telephone, '[^0-9]', '', 'g') AS telephone
    FROM members
  ORDER BY memid;")
dbCommit(connect)
dbDisconnect(connect)

Sección 7: Recursive Queries.

  1. Find the upward recommendation chain for member ID 27: that is, the member who recommended them, and the member who recommended that member, and so on. Return member ID, first name, and surname. Order by descending member id.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "WITH RECURSIVE  recommenders(recommender) AS (
    SELECT miemb.recommendedby FROM members miemb WHERE miemb.memid = 27
    UNION ALL 
    SELECT miemb.recommendedby
    FROM recommenders reco  INNER JOIN  members miemb ON miemb.memid = reco.recommender
)
SELECT reco.recommender, miemb.firstname, miemb.surname
FROM recommenders reco  INNER JOIN members miemb ON reco.recommender = miemb.memid
ORDER BY memid DESC;")
dbCommit(connect)
dbDisconnect(connect)
  1. Find the downward recommendation chain for member ID 1: that is, the members they recommended, the members those members recommended, and so on. Return member ID and name, and order by ascending member id.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "WITH RECURSIVE recommendeds(memid) AS (
    SELECT miemb.memid FROM members miemb WHERE miemb.recommendedby = 1
    UNION ALL
    SELECT miemb.memid
        FROM recommendeds reco INNER JOIN members miemb ON miemb.recommendedby = reco.memid
)
SELECT reco.memid, miemb.firstname, miemb.surname
    FROM recommendeds reco INNER JOIN members miemb ON reco.memid = miemb.memid
ORDER BY memid;")
dbCommit(connect)
dbDisconnect(connect)
  1. Produce a CTE that can return the upward recommendation chain for any member. You should be able to select recommender from recommenders where member=x. Demonstrate it by getting the chains for members 12 and 22. Results table should have member and recommender, ordered by member ascending, recommender descending.
connect <- connection_db()
dbBegin(connect)

dbGetQuery(connect, "WITH RECURSIVE recommenders(recommender, member) AS (
SELECT miemb.recommendedby, miemb.memid
    FROM members miemb
UNION ALL
SELECT miemb.recommendedby, reco.member 
    FROM recommenders reco INNER JOIN members miemb ON miemb.memid = reco.recommender
)
SELECT reco.member, reco.recommender, miemb.firstname, miemb.surname
    FROM recommenders reco INNER JOIN members miemb ON reco.recommender = miemb.memid
    WHERE reco.member = 22 OR reco.member = 12
ORDER BY reco.member ASC, reco.recommender DESC;")
dbCommit(connect)
dbDisconnect(connect)