Q1

List the top 10 customers who have rented the most films, including their first and last names, email addresses, and the total number of films they have rented.

SELECT c.first_name,
       c.last_name,
       c.email,
       COUNT(r.customer_id) AS total_films_rented
FROM customer c

JOIN rental r ON c.customer_id = r.customer_id
GROUP BY c.customer_id
ORDER BY total_films_rented DESC
LIMIT 10;
Displaying records 1 - 10
first_name last_name email total_films_rented
ELEANOR HUNT 46
KARL SEAL 45
MARCIA DEAN 42
CLARA SHAW 42
TAMMY SANDERS 41
WESLEY BULL 40
SUE PETERS 40
TIM CARY 39
MARION SNYDER 39
RHONDA KENNEDY 39

Q2

Find the titles and descriptions of films that have never been rented, sorted alphabetically by title.

SELECT f.title, f.description
FROM film f

LEFT JOIN inventory i ON f.film_id = i.film_id

LEFT JOIN rental r ON i.inventory_id = r.inventory_id

WHERE r.rental_id IS NULL
ORDER BY f.title;
Displaying records 1 - 10
title description
ACADEMY DINOSAUR A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies
ALICE FANTASIA A Emotional Drama of a A Shark And a Database Administrator who must Vanquish a Pioneer in Soviet Georgia
APOLLO TEEN A Action-Packed Reflection of a Crocodile And a Explorer who must Find a Sumo Wrestler in An Abandoned Mine Shaft
ARGONAUTS TOWN A Emotional Epistle of a Forensic Psychologist And a Butler who must Challenge a Waitress in An Abandoned Mine Shaft
ARK RIDGEMONT A Beautiful Yarn of a Pioneer And a Monkey who must Pursue a Explorer in The Sahara Desert
ARSENIC INDEPENDENCE A Fanciful Documentary of a Mad Cow And a Womanizer who must Find a Dentist in Berlin
BOONDOCK BALLROOM A Fateful Panorama of a Crocodile And a Boy who must Defeat a Monkey in The Gulf of Mexico
BUTCH PANTHER A Lacklusture Yarn of a Feminist And a Database Administrator who must Face a Hunter in New Orleans
CATCH AMISTAD A Boring Reflection of a Lumberjack And a Feminist who must Discover a Woman in Nigeria
CHINATOWN GLADIATOR A Brilliant Panorama of a Technical Writer And a Lumberjack who must Escape a Butler in Ancient India

Q3

Retrieve a list of film categories and the average length of films in each category, ordered by the average length in descending order. Include the category name and the average length.

SELECT c.name AS category_name, 
AVG(f.length) AS average_length_of_film
FROM film_category fc

JOIN film f ON fc.film_id = f.film_id
JOIN category c ON fc.category_id = c.category_id
GROUP BY c.name
ORDER BY average_length_of_film DESC;
Displaying records 1 - 10
category_name average_length_of_film
Sports 128.2027
Games 127.8361
Foreign 121.6986
Drama 120.8387
Comedy 115.8276
Family 114.7826
Music 113.6471
Travel 113.3158
Horror 112.4821
Classics 111.6667

Q4

Identify the 5 actors who have appeared in the most films, along with the number of films they’ve acted in. Include their first and last names.

SELECT a.first_name, a.last_name, COUNT(*) AS film_count
FROM actor a

JOIN film_actor fa ON a.actor_id = fa.actor_id
GROUP BY a.actor_id
ORDER BY film_count DESC
LIMIT 5;
5 records
first_name last_name film_count
GINA DEGENERES 42
WALTER TORN 41
MARY KEITEL 40
MATTHEW CARREY 39
SANDRA KILMER 37

Q5

There are no rentals of a film with Johnny Depp,

Q6

Find the top 10 earning film titles along with their total revenue (rental fees), sorted by revenue in descending order. Include the film title and total revenue.

SELECT f.title AS film_title, SUM(p.amount) AS total_revenue
FROM film f
JOIN inventory i ON f.film_id = i.film_id
JOIN rental r ON i.inventory_id = r.inventory_id
JOIN payment p ON r.rental_id = p.rental_id
GROUP BY f.film_id
ORDER BY total_revenue DESC
LIMIT 10;
Displaying records 1 - 10
film_title total_revenue
TELEGRAPH VOYAGE 231.73
WIFE TURN 223.69
ZORRO ARK 214.69
GOODFELLAS SALUTE 209.69
SATURDAY LAMBS 204.72
TITANS JERK 201.71
TORQUE BOUND 198.72
HARRY IDAHO 195.70
INNOCENT USUAL 191.74
HUSTLER PARTY 190.78

Plotly Section

Q1

Use the diamonds dataset to create a 2D plotly scatterplot (you can choose the variables). The plot should have a title and subtitle, axis labels, and hover information that is useful.

library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot1 <- plot_ly(data = diamonds, x = ~carat, y = ~price, color = ~color,
                type = 'scatter', mode = 'markers') %>%
        layout(title = "Diamonds Scatterplot <br><sup> Comparing Carats and Price </sup>",
               xaxis = list(title = "Carat"),
               yaxis = list(title = "Price"),
               hovermode = "closest",
               showlegend = TRUE,
               legend = list(title = "Color"))


plot1

Q2

Compare and contrast ggplot2 and plotly. What is an example of a time when you would definitely want to use each one over the other?

library(ggplot2)

plot2 <- ggplot(diamonds, aes(x = carat, y = price, color = color)) +
  geom_point() +
  labs(title = "Diamonds Scatterplot: Carat vs. Price",
       x = "Carat",
       y = "Price",
       color = "Color") +
  theme_minimal()

plot2

COMPARE: ggplot and plotly are similar in that they are popular tools to create a wide range of visualizations, like the scatter plots above. I like using both ggplot and plotly because I can customize each of my plots fairly easily.

CONTRAST: Despite their similarities, ggplot and plotly differ significantly in how one writes each’s respective code. I mostly would use ggplot for making pretty, static visualizations that are simple to understand and code. For me, the aesthetic aspect and “layering” ideology is very intuitive and I can create a graphic fairly quickly because of such. I would probably choose to use ggplot when needing to present data to a layman, or if I am on a time crunch, because of its aesthetics and ease of use.

In contrast, plotly is better for creating interactive visualizations. I would use it if I wanted to have features in my graphic such as hovering, zooming, and filtering. Also, we used plotly when creating an animation; I don’t believe ggplot can do that. The dynamic properties of plotly make it ideal for showing change over time, so I would choose to use it over ggplot when making an interactive webapp for users to explore.

Q3

Using the gapminder dataset, make a Plotly plot which compares two variables over time. Explain the meaning of this comparison in context and describe what narrative is advanced by this comparison.

library(gapminder)

plot3 <- plot_ly(data = gapminder, x = ~gdpPercap, y = ~lifeExp, 
                text = ~country, frame = ~year, mode = 'markers') %>%
  
        layout(title = "Life Expectancy and GDP per Capita from 1952 to 2007",
               xaxis = list(title = "GDP per Capita"),
               yaxis = list(title = "Life Expectancy"),
               showlegend = FALSE) %>%
  
        animation_slider(currentvalue = list(prefix = "Year: "))
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
plot3

The context of this comparison is that GDP per capita and Life Expectancy are, in theory, highly correlated because higher income means one can afford more access to healthcare and other goods that improve their health and thus life expectancy. The narrative of the animation of that life expectancy and GDP per capita have both increased over time in tandem, and their logistic relationship becomes more pronounced towards the 2000’s; this means that there is some limit (as of now) to amount that higher income can improve a country’s life expectancy.