Names: Aimee Rich, Olivia Fountain, Colleen Lee
In this activity (which I’ve jokingly titled Jep-query! as a call out to the game show Jeopardy!), you’ll do a little bit of exploratory work with a database, then be presented with several ‘answers’, or queries which produce a specific set of output. Your task on most of these will be to describe what an question might look like that yields the given output. To pass specifications on this assignment, you need to get at least nine questions correct. As a note, SQL ouptut in the HTML only shows ten rows, but you can see considerably more in Rstudio by running the chunk.
We’ll connect to and query a database called “sakila” which is a commonly-used database for beginners and others wanting to practice SQL skills. One advantage of the way we’re accessing this is that you don’t get to or need to download any software to store the database locally (although that is possible).
The Sakila database refers to a mystical and nearly deprecated retail location: the movie rental store. We’ll deal mostly with information about movies and actors (names are generally fictional), but we’ll lightly touch on queries that check on store inventories. You can get a general view of the database with this picture, but you can also see this using code.

Before you run the first code chunk, make sure that you have installed both the ‘DBI’ and ‘RMariaDB’ packages.
The following code connects to the database. As a note, we’re inputting the password directly into this file because this is a publicly open database. I DO NOT RECOMMEND hard-coding passwords into your database access generally; you should use code that prompts you to enter credentials for the database.
#Method 1
#con <- dbConnect(
# RMariaDB::MariaDB(),
# host = "relational.fit.cvut.cz",
# port = 3306,
# username = "guest",
# password = "relational",
# dbname = "sakila"
#)
#dbListTables(con)
#Method 2
library(DBI)
library(dbplyr)
con <- dbConnect(drv = RSQLite::SQLite(),
dbname = ":memory:")
customer <- read.csv("customer.csv")
rental <- read.csv("rental.csv")
film <- read.csv("film.csv")
inventory <- read.csv("inventory.csv")
category <- read.csv("category.csv")
film_category <- read.csv("film_category.csv")
actor <- read.csv("actor.csv")
film_actor <- read.csv("film_actor.csv")
payment <- read.csv("payment.csv")
dbWriteTable(conn = con, name = "customer", value = customer)
dbWriteTable(conn = con, name = "rental", value = rental)
dbWriteTable(conn = con, name = "film", value = film)
dbWriteTable(conn = con, name = "inventory", value = inventory)
dbWriteTable(conn = con, name = "category", value = category)
dbWriteTable(conn = con, name = "film_category", value = film_category)
dbWriteTable(conn = con, name = "actor", value = actor)
dbWriteTable(conn = con, name = "film_actor", value = film_actor)
dbWriteTable(conn = con, name = "payment", value = payment)
knitr::opts_chunk$set(connection = "con")
dbListTables(con)
## [1] "actor" "category" "customer" "film"
## [5] "film_actor" "film_category" "inventory" "payment"
## [9] "rental"
How many tables are in the database? What is the name of the second to last table?
#all tables
dbListTables(con)
## [1] "actor" "category" "customer" "film"
## [5] "film_actor" "film_category" "inventory" "payment"
## [9] "rental"
#list fields in film actor
dbListFields(con, "film_actor")
## [1] "actor_id" "film_id" "last_update"
#list fields in actor
dbListFields(con, "actor")
## [1] "actor_id" "first_name" "last_name" "last_update"
#list fields in film
dbListFields(con, "film")
## [1] "film_id" "title" "description"
## [4] "release_year" "language_id" "original_language_id"
## [7] "rental_duration" "rental_rate" "length"
## [10] "replacement_cost" "rating" "special_features"
## [13] "last_update"
Revise and re-run the chunk above as needed to see the contents of various tables in the database.
There are 9 tables. The name of the second to last is ‘payment’.
Report how many fields are in the ‘actor’ table. In addition, Describe how one could connect the ‘film’ table to the ‘actor’ table, noting that you may need to use one or more other tables along the way. There are 4 fields in the actor table. We could connect the film table and actor table via ‘actor_id, ’film_id’ fields. Using actor, film_actor, and film tables.
Starting with the following prompt and going through the end of the document, please describe in common english a question someone might ask about this database which the SQL query answers. For those familiar with Jeopardy!, you don’t have to state your response as a question.
SELECT
*
FROM
actor
WHERE
first_name = 'SCARLETT'
| actor_id | first_name | last_name | last_update |
|---|---|---|---|
| 81 | SCARLETT | DAMON | 2006-02-15 04:34:33 |
| 124 | SCARLETT | BENING | 2006-02-15 04:34:33 |
How many actors in the database have the first name Scarlett? There are 2.
SELECT
*
FROM
actor
WHERE
last_name
LIKE
'Johansson'
| actor_id | first_name | last_name | last_update |
|---|---|---|---|
| 8 | MATTHEW | JOHANSSON | 2006-02-15 04:34:33 |
| 64 | RAY | JOHANSSON | 2006-02-15 04:34:33 |
| 146 | ALBERT | JOHANSSON | 2006-02-15 04:34:33 |
Are there any actors with the last name Johansson?
part 2:
SELECT
count(DISTINCT last_name)
FROM actor
| count(DISTINCT last_name) |
|---|
| 121 |
How many distinct last names of actors are in the database?
SELECT
last_name
FROM
actor
GROUP BY
last_name
HAVING
count(*) = 1
| last_name |
|---|
| ASTAIRE |
| BACALL |
| BALE |
| BALL |
| BARRYMORE |
| BASINGER |
| BERGEN |
| BERGMAN |
| BIRCH |
| BLOOM |
What are the last names that appear once in actor?
SELECT
last_name
FROM
actor
GROUP BY
last_name
HAVING
count(*) > 1;
| last_name |
|---|
| AKROYD |
| ALLEN |
| BAILEY |
| BENING |
| BERRY |
| BOLGER |
| BRODY |
| CAGE |
| CHASE |
| CRAWFORD |
What last names show up more than once in the database?
SELECT avg(length) FROM film;
| avg(length) |
|---|
| 115.272 |
What is the average length of films in the database?
Part 2:
SELECT category.name, AVG(length)
FROM film JOIN film_category USING (film_id) JOIN category USING (category_id)
GROUP BY category.name
ORDER BY AVG(length) DESC;
| name | AVG(length) |
|---|---|
| Sports | 128.2027 |
| Games | 127.8361 |
| Foreign | 121.6986 |
| Drama | 120.8387 |
| Comedy | 115.8276 |
| Family | 114.7826 |
| Music | 113.6471 |
| Travel | 113.3158 |
| Horror | 112.4821 |
| Classics | 111.6667 |
What genre has the longest average length of film in the database?
SELECT category.name, AVG(length)
FROM film
JOIN film_category using (film_id)
JOIN category using (category_id)
GROUP BY category.name
HAVING avg(length) > (SELECT AVG(length) FROM film)
ORDER BY avg(length) DESC;
| name | AVG(length) |
|---|---|
| Sports | 128.2027 |
| Games | 127.8361 |
| Foreign | 121.6986 |
| Drama | 120.8387 |
| Comedy | 115.8276 |
What genres have a longer than overall average average run time (Top 50%)?