This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
rental <- read.csv("/Users/niki/Documents/INTROTOR/rental.csv")
inventory <- read.csv("/Users/niki/Documents/INTROTOR/inventory.csv")
customer <- read.csv("/Users/niki/Documents/INTROTOR/customer.csv")
actor <- read.csv("/Users/niki/Documents/INTROTOR/actor.csv")
category <- read.csv("/Users/niki/Documents/INTROTOR/category.csv")
payment <- read.csv("/Users/niki/Documents/INTROTOR/payment.csv")
film <- read.csv("/Users/niki/Documents/INTROTOR/film.csv")
film_actor <- read.csv("/Users/niki/Documents/INTROTOR/film_actor.csv")
film_category <- read.csv("/Users/niki/Documents/INTROTOR/film_category.csv")
#QUESTION 1.1
top_customersx <- rental %>%
inner_join(customer, by = "customer_id") %>%
inner_join(inventory, by = "inventory_id") %>%
group_by(customer_id, first_name, last_name, email) %>%
summarise(total_rentals = n(), .groups = "drop") %>%
arrange(desc(total_rentals)) %>%
slice_head(n = 10)
print(top_customersx)
## # A tibble: 10 × 5
## customer_id first_name last_name email total_rentals
## <int> <chr> <chr> <chr> <int>
## 1 148 ELEANOR HUNT ELEANOR.HUNT@sakilacustomer.o… 46
## 2 526 KARL SEAL KARL.SEAL@sakilacustomer.org 45
## 3 144 CLARA SHAW CLARA.SHAW@sakilacustomer.org 42
## 4 236 MARCIA DEAN MARCIA.DEAN@sakilacustomer.org 42
## 5 75 TAMMY SANDERS TAMMY.SANDERS@sakilacustomer.… 41
## 6 197 SUE PETERS SUE.PETERS@sakilacustomer.org 40
## 7 469 WESLEY BULL WESLEY.BULL@sakilacustomer.org 40
## 8 137 RHONDA KENNEDY RHONDA.KENNEDY@sakilacustomer… 39
## 9 178 MARION SNYDER MARION.SNYDER@sakilacustomer.… 39
## 10 468 TIM CARY TIM.CARY@sakilacustomer.org 39
#QUESTION 1.2
films_never_rented <- film %>%
left_join(inventory, by = "film_id") %>%
left_join(rental, by = "inventory_id") %>%
filter(is.na(rental_id)) %>%
select(title, description) %>%
arrange(title)
print(films_never_rented)
## title
## 1 ACADEMY DINOSAUR
## 2 ALICE FANTASIA
## 3 APOLLO TEEN
## 4 ARGONAUTS TOWN
## 5 ARK RIDGEMONT
## 6 ARSENIC INDEPENDENCE
## 7 BOONDOCK BALLROOM
## 8 BUTCH PANTHER
## 9 CATCH AMISTAD
## 10 CHINATOWN GLADIATOR
## 11 CHOCOLATE DUCK
## 12 COMMANDMENTS EXPRESS
## 13 CROSSING DIVORCE
## 14 CROWDS TELEMARK
## 15 CRYSTAL BREAKING
## 16 DAZED PUNK
## 17 DELIVERANCE MULHOLLAND
## 18 FIREHOUSE VIETNAM
## 19 FLOATS GARDEN
## 20 FRANKENSTEIN STRANGER
## 21 GLADIATOR WESTWARD
## 22 GUMP DATE
## 23 HATE HANDICAP
## 24 HOCUS FRIDA
## 25 KENTUCKIAN GIANT
## 26 KILL BROTHERHOOD
## 27 MUPPET MILE
## 28 ORDER BETRAYED
## 29 PEARL DESTINY
## 30 PERDITION FARGO
## 31 PSYCHO SHRUNK
## 32 RAIDERS ANTITRUST
## 33 RAINBOW SHOCK
## 34 ROOF CHAMPION
## 35 SISTER FREDDY
## 36 SKY MIRACLE
## 37 SUICIDES SILENCE
## 38 TADPOLE PARK
## 39 TREASURE COMMAND
## 40 VILLAIN DESPERATE
## 41 VOLUME HOUSE
## 42 WAKE JAWS
## 43 WALLS ARTIST
## description
## 1 A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies
## 2 A Emotional Drama of a A Shark And a Database Administrator who must Vanquish a Pioneer in Soviet Georgia
## 3 A Action-Packed Reflection of a Crocodile And a Explorer who must Find a Sumo Wrestler in An Abandoned Mine Shaft
## 4 A Emotional Epistle of a Forensic Psychologist And a Butler who must Challenge a Waitress in An Abandoned Mine Shaft
## 5 A Beautiful Yarn of a Pioneer And a Monkey who must Pursue a Explorer in The Sahara Desert
## 6 A Fanciful Documentary of a Mad Cow And a Womanizer who must Find a Dentist in Berlin
## 7 A Fateful Panorama of a Crocodile And a Boy who must Defeat a Monkey in The Gulf of Mexico
## 8 A Lacklusture Yarn of a Feminist And a Database Administrator who must Face a Hunter in New Orleans
## 9 A Boring Reflection of a Lumberjack And a Feminist who must Discover a Woman in Nigeria
## 10 A Brilliant Panorama of a Technical Writer And a Lumberjack who must Escape a Butler in Ancient India
## 11 A Unbelieveable Story of a Mad Scientist And a Technical Writer who must Discover a Composer in Ancient China
## 12 A Fanciful Saga of a Student And a Mad Scientist who must Battle a Hunter in An Abandoned Mine Shaft
## 13 A Beautiful Documentary of a Dog And a Robot who must Redeem a Womanizer in Berlin
## 14 A Intrepid Documentary of a Astronaut And a Forensic Psychologist who must Find a Frisbee in An Abandoned Fun House
## 15 A Fast-Paced Character Study of a Feminist And a Explorer who must Face a Pastry Chef in Ancient Japan
## 16 A Action-Packed Story of a Pioneer And a Technical Writer who must Discover a Forensic Psychologist in An Abandoned Amusement Park
## 17 A Astounding Saga of a Monkey And a Moose who must Conquer a Butler in A Shark Tank
## 18 A Awe-Inspiring Character Study of a Boat And a Boy who must Kill a Pastry Chef in The Sahara Desert
## 19 A Action-Packed Epistle of a Robot And a Car who must Chase a Boat in Ancient Japan
## 20 A Insightful Character Study of a Feminist And a Pioneer who must Pursue a Pastry Chef in Nigeria
## 21 A Astounding Reflection of a Squirrel And a Sumo Wrestler who must Sink a Dentist in Ancient Japan
## 22 A Intrepid Yarn of a Explorer And a Student who must Kill a Husband in An Abandoned Mine Shaft
## 23 A Intrepid Reflection of a Mad Scientist And a Pioneer who must Overcome a Hunter in The First Manned Space Station
## 24 A Awe-Inspiring Tale of a Girl And a Madman who must Outgun a Student in A Shark Tank
## 25 A Stunning Yarn of a Woman And a Frisbee who must Escape a Waitress in A U-Boat
## 26 A Touching Display of a Hunter And a Secret Agent who must Redeem a Husband in The Outback
## 27 A Lacklusture Story of a Madman And a Teacher who must Kill a Frisbee in The Gulf of Mexico
## 28 A Amazing Saga of a Dog And a A Shark who must Challenge a Cat in The Sahara Desert
## 29 A Lacklusture Yarn of a Astronaut And a Pastry Chef who must Sink a Dog in A U-Boat
## 30 A Fast-Paced Story of a Car And a Cat who must Outgun a Hunter in Berlin
## 31 A Amazing Panorama of a Crocodile And a Explorer who must Fight a Husband in Nigeria
## 32 A Amazing Drama of a Teacher And a Feminist who must Meet a Woman in The First Manned Space Station
## 33 A Action-Packed Story of a Hunter And a Boy who must Discover a Lumberjack in Ancient India
## 34 A Lacklusture Reflection of a Car And a Explorer who must Find a Monkey in A Baloon
## 35 A Stunning Saga of a Butler And a Woman who must Pursue a Explorer in Australia
## 36 A Epic Drama of a Mad Scientist And a Explorer who must Succumb a Waitress in An Abandoned Fun House
## 37 A Emotional Character Study of a Car And a Girl who must Face a Composer in A U-Boat
## 38 A Beautiful Tale of a Frisbee And a Moose who must Vanquish a Dog in An Abandoned Amusement Park
## 39 A Emotional Saga of a Car And a Madman who must Discover a Pioneer in California
## 40 A Boring Yarn of a Pioneer And a Feminist who must Redeem a Cat in An Abandoned Amusement Park
## 41 A Boring Tale of a Dog And a Woman who must Meet a Dentist in California
## 42 A Beautiful Saga of a Feminist And a Composer who must Challenge a Moose in Berlin
## 43 A Insightful Panorama of a Teacher And a Teacher who must Overcome a Mad Cow in An Abandoned Fun House
#QUESTION 1.3
avg_length_by_category <- film %>%
inner_join(film_category, by = "film_id") %>%
inner_join(category, by = "category_id") %>%
group_by(name) %>%
summarize(average_length = mean(length, na.rm = TRUE)) %>%
arrange(desc(average_length))
print(avg_length_by_category)
## # A tibble: 16 × 2
## name average_length
## <chr> <dbl>
## 1 Sports 128.
## 2 Games 128.
## 3 Foreign 122.
## 4 Drama 121.
## 5 Comedy 116.
## 6 Family 115.
## 7 Music 114.
## 8 Travel 113.
## 9 Horror 112.
## 10 Classics 112.
## 11 Action 112.
## 12 New 111.
## 13 Animation 111.
## 14 Children 110.
## 15 Documentary 109.
## 16 Sci-Fi 108.
#QUESTION 1.4
top_actors1 <- film_actor %>%
inner_join(actor, by = "actor_id") %>%
group_by(first_name, last_name) %>%
summarize(film_count = n(), .groups = "drop") %>%
arrange(desc(film_count)) %>%
mutate(rank = row_number()) %>%
filter(rank <= 5) %>%
select(-rank)
print(top_actors1)
## # A tibble: 5 × 3
## first_name last_name film_count
## <chr> <chr> <int>
## 1 SUSAN DAVIS 54
## 2 GINA DEGENERES 42
## 3 WALTER TORN 41
## 4 MARY KEITEL 40
## 5 MATTHEW CARREY 39
#QUESTION 1.5
#I am assuming the answer to this question is 0. There are no customers who got
#a Johnny Depp film. (I tried to run my code but kept getting an error with no
#name detection).
#customers_with_johnnydepp <- rental %>%
# inner_join(customer, by = "customer_id") %>%
# inner_join(inventory, by = "inventory_id") %>%
#inner_join(film_actor, by = "film_id") %>%
#inner_join(actor, by = "actor_id") %>%
#filter(actor.first_name == "Johnny" & actor.last_name == "Depp") %>%
#distinct(customer.first_name, customer.last_name)
#print(customers_with_johnnydepp)
#QUESTION 1.6
top_earning_films <- rental %>%
inner_join(inventory, by = "inventory_id") %>%
inner_join(film, by = "film_id") %>%
group_by(title) %>%
summarize(total_revenue = sum(rental_rate), .groups = "drop") %>%
arrange(desc(total_revenue)) %>%
slice_head(n = 10)
print(top_earning_films)
## # A tibble: 10 × 2
## title total_revenue
## <chr> <dbl>
## 1 BUCKET BROTHERHOOD 170.
## 2 SCALAWAG DUCK 160.
## 3 APACHE DIVINE 155.
## 4 GOODFELLAS SALUTE 155.
## 5 WIFE TURN 155.
## 6 ZORRO ARK 155.
## 7 CAT CONEHEADS 150.
## 8 DOGMA FAMILY 150.
## 9 HARRY IDAHO 150.
## 10 MASSACRE USUAL 150.
#QUESTION 2.1
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(ggplot2)
data(diamonds)
plot1 <- plot_ly(data = diamonds,
x = ~carat,
y = ~price,
type = 'scatter',
mode = 'markers',
marker = list(color = ~clarity,
colorscale = 'Picnic',
size = 5,
opacity = 0.6),
text = ~paste("Cut: ", cut, "<br>Clarity: ", clarity, "<br>Color: ", color, "<br>Price: $", price))
plot1 <- plot1 %>% layout(
title = "Price vs Carat of Diamonds",
subtitle = "Every point represents a diamond, colored by clarity",
xaxis = list(title = "Carat"),
yaxis = list(title = "Price"),
hovermode = "closest"
)
plot1
## Warning: 'layout' objects don't have these attributes: 'subtitle'
## Valid attributes include:
## '_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'
#QUESTION 2.2
#Both Plotly and Ggplot have their strengths and weaknesses. To being with I would
#say it is better to use Plotly when you are seeking more complexity in including
#interactive features into the visualization. Plotly typically is better for web
#based applications. With, Ggplot, I think it is better with larger data set. I see
#Ggplot better equipped to handle large, complex datasets in static context. I will
#say that Ggplot has so much more depth in terms of aesthetics. It is very
#customizable through layers, scales, and themes. Lastly, it works best with
#tidyverse for both data manipulation and visualization. Plotly, struggles more
#data transformations.
#QUESTION 2.3
#The variables being compared are Life Expetancy and Time. With this graph
#we can see that countries with increments in GDP per capita show positive
#improvements in life expectancy. The narrative this plot shows is that
#economic development can be correlated with human well-being. This is primariliy
#seen by an increase in life expectancy.
library(plotly)
library(gapminder)
filtered_data <- gapminder %>%
filter(country %in% c("United States", "China", "India", "Germany", "Brazil"))
plot2 <- plot_ly(
data = filtered_data,
x = ~year,
y = ~lifeExp,
color = ~country,
type = 'scatter',
mode = 'lines+markers',
hoverinfo = 'text',
text = ~paste(
"Country: ", country, "<br>",
"Year: ", year, "<br>",
"Life Expectancy: ", round(lifeExp, 1), "<br>",
"GDP per Capita: $", round(gdpPercap, 0)
)
) %>%
layout(
title = list(
text = "Life Expectancy Over Time",
x = 0.5
),
xaxis = list(title = "Year"),
yaxis = list(title = "Life Expectancy (Years)"),
legend = list(title = list(text = "Country"))
)
plot2
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors