Names: (you should edit this line to include your name and the name of your partners)

In this activity (which I’ve jokingly titled Jep-query! as a call out to the game show Jeopardy!), you’ll do a little bit of exploratory work with a database, then be presented with several ‘answers’, or queries which produce a specific set of output. Your task on most of these will be to describe what an question might look like that yields the given output. To pass specifications on this assignment, you need to get at least nine questions correct. As a note, SQL ouptut in the HTML only shows ten rows, but you can see considerably more in Rstudio by running the chunk.

Background

We’ll connect to and query a database called “sakila” which is a commonly-used database for beginners and others wanting to practice SQL skills. One advantage of the way we’re accessing this is that you don’t get to or need to download any software to store the database locally (although that is possible).

The Sakila database refers to a mystical and nearly deprecated retail location: the movie rental store. We’ll deal mostly with information about movies and actors (names are generally fictional), but we’ll lightly touch on queries that check on store inventories. You can get a general view of the database with this picture, but you can also see this using code.

Before you run the first code chunk, make sure that you have installed both the ‘DBI’ and ‘RMariaDB’ packages.

The following code connects to the database. As a note, we’re inputting the password directly into this file because this is a publicly open database. I DO NOT RECOMMEND hard-coding passwords into your database access generally; you should use code that prompts you to enter credentials for the database.

setwd("/Users/abbyjansen/Desktop/STAT 3280")
#Method 1

#con <- dbConnect(
#  RMariaDB::MariaDB(),
#  host = "relational.fit.cvut.cz",
#  port = 3306,
#  username = "guest",
#  password = "relational",
#  dbname = "sakila"
#)

#dbListTables(con)

#Method 2

library(DBI)
library(dbplyr)
con <- dbConnect(drv = RSQLite::SQLite(),
                 dbname = ":memory:")

customer <- read.csv("customer.csv")
rental <- read.csv("rental.csv")
film <- read.csv("film.csv")
inventory <- read.csv("inventory.csv")
category <- read.csv("category.csv")
film_category <- read.csv("film_category.csv")
actor <- read.csv("actor.csv")
film_actor <- read.csv("film_actor.csv")
payment <- read.csv("payment.csv")

dbWriteTable(conn = con, name = "customer", value = customer)
dbWriteTable(conn = con, name = "rental", value = rental)
dbWriteTable(conn = con, name = "film", value = film)
dbWriteTable(conn = con, name = "inventory", value = inventory)
dbWriteTable(conn = con, name = "category", value = category)
dbWriteTable(conn = con, name = "film_category", value = film_category)
dbWriteTable(conn = con, name = "actor", value = actor)
dbWriteTable(conn = con, name = "film_actor", value = film_actor)
dbWriteTable(conn = con, name = "payment", value = payment)
knitr::opts_chunk$set(connection = "con")

dbListTables(con)
## [1] "actor"         "category"      "customer"      "film"         
## [5] "film_actor"    "film_category" "inventory"     "payment"      
## [9] "rental"

Q1

How many tables are in the database? What is the name of the second to last table?

There are 9 tables in the data base and the name of the second to last table is payment.

length(dbListTables(con))
## [1] 9
dbListTables(con)[8]
## [1] "payment"

Revise and re-run the chunk above as needed to see the contents of various tables in the database.

Q2

Report how many fields are in the ‘actor’ table. In addition, Describe how one could connect the ‘film’ table to the ‘actor’ table, noting that you may need to use one or more other tables along the way.

length(dbListFields(con, "actor"))
## [1] 4

There are 4 fields in the actor table.

I can connect the film table to the actor table by joining the film_id field from film with the film_id field from film_actor. I then can connect the actor_id field from film_actor with actor_id field from actor.

Q3

Starting with the following prompt and going through the end of the document, please describe in common english a question someone might ask about this database which the SQL query answers. For those familiar with Jeopardy!, you don’t have to state your respose as a question.

What actors have the first name “Scarlett”? Show all the fields from the actor table (actor_id, first_name, last_name, last_update).

Answer: 0

SELECT 
*
FROM 
actor 
WHERE 
first_name = 'Scarlett'
0 records
actor_id first_name last_name last_update

What actors have the last name “Johanson”? Show all the fields from the actor table (actor_id, first_name, last_name, last_update).

Q4 (two parts)

SELECT 
* 
FROM 
actor 
WHERE 
last_name 
LIKE 
'Johansson'
3 records
actor_id first_name last_name last_update
8 MATTHEW JOHANSSON 2006-02-15 04:34:33
64 RAY JOHANSSON 2006-02-15 04:34:33
146 ALBERT JOHANSSON 2006-02-15 04:34:33

How many actor last names are in this data base? Do not count the same last name twice.

part 2:

SELECT
count(DISTINCT last_name) 
FROM actor
1 records
count(DISTINCT last_name)
121

What actor last names only appear one time in the actor table? List these last names.

Q5

SELECT 
last_name 
FROM
actor 
GROUP BY 
last_name 
HAVING 
count(*) = 1
Displaying records 1 - 10
last_name
ASTAIRE
BACALL
BALE
BALL
BARRYMORE
BASINGER
BERGEN
BERGMAN
BIRCH
BLOOM

What actor last names appear more than once in the actor table? List these last names.

Q6

SELECT 
last_name 
FROM 
actor 
GROUP BY 
last_name 
HAVING 
count(*) > 1;
Displaying records 1 - 10
last_name
AKROYD
ALLEN
BAILEY
BENING
BERRY
BOLGER
BRODY
CAGE
CHASE
CRAWFORD

Q7 (two parts)

What is the average length of each film from the film table?

SELECT avg(length) FROM film;
1 records
avg(length)
115.272

Part 2:

What is the average length of each film category from the film table? Please list the categories by average length of the film in descending order.

SELECT category.name, AVG(length)
FROM film JOIN film_category USING (film_id) JOIN category USING (category_id)
GROUP BY category.name
ORDER BY AVG(length) DESC;
Displaying records 1 - 10
name AVG(length)
Sports 128.2027
Games 127.8361
Foreign 121.6986
Drama 120.8387
Comedy 115.8276
Family 114.7826
Music 113.6471
Travel 113.3158
Horror 112.4821
Classics 111.6667

Q8

What film categories have an average film length that is greater than the average film length of all films from the film table?

SELECT category.name, AVG(length)
FROM film 
JOIN film_category using (film_id) 
JOIN category using (category_id)
GROUP BY category.name
HAVING avg(length) > (SELECT AVG(length) FROM film)
ORDER BY avg(length) DESC;
5 records
name AVG(length)
Sports 128.2027
Games 127.8361
Foreign 121.6986
Drama 120.8387
Comedy 115.8276