The packages used in these notebooks are the RMariaDB and DBI. Additionally, since I typically use one mysql database mydb on my local machine (a Windows one) I included a setup code in my my.ini file, see code below.
[rs-dbi]
database=“mydb”
user=“root”
password=“your password”
You can also check this link here for your setup if needed.
library(DBI)
## Warning: package 'DBI' was built under R version 4.1.3
library(RMariaDB)
## Warning: package 'RMariaDB' was built under R version 4.1.3
con <- dbConnect(RMariaDB::MariaDB(), default.file=normalizePath(path="C:/ProgramData/MySQL/MySQL Server 8.0/my.ini"), group = "rs-dbi")
dbListTables(con)
## [1] "airbnb_nyc" "imdbtop250"
I also checked the Fields in the table imdbtop250
dbListFields(con, "imdbtop250")
## [1] "Ranking" "IMDByear" "IMDBlink" "Title" "Date" "RunTime"
## [7] "Genre" "Rating" "Score" "Votes" "Gross" "Director"
## [13] "Cast1" "Cast2" "Cast3" "Cast4"
sql1 <- "SELECT * FROM imdbtop250 limit 10"
res1 <- dbSendQuery(con,sql1)
dbFetch(res1)
## Ranking IMDByear IMDBlink Title Date
## 1 1 1996 /title/tt0076759/ Star Wars: Episode IV - A New Hope 1977
## 2 2 1996 /title/tt0111161/ The Shawshank Redemption 1994
## 3 3 1996 /title/tt0117951/ Trainspotting 1996
## 4 4 1996 /title/tt0114814/ The Usual Suspects 1995
## 5 5 1996 /title/tt0108598/ The Wrong Trousers 1993
## 6 6 1996 /title/tt0112691/ A Close Shave 1995
## 7 7 1996 /title/tt0117731/ Star Trek: First Contact 1996
## 8 8 1996 /title/tt0034583/ Casablanca 1942
## 9 9 1996 /title/tt0083658/ Blade Runner 1982
## 10 10 1996 /title/tt0108052/ Schindler's List 1993
## RunTime Genre Rating Score Votes Gross
## 1 121 Action, Adventure, Fantasy 8.6 90 1299781 322.74
## 2 142 Drama 9.3 80 2529673 28.34
## 3 93 Drama 8.1 83 665213 16.50
## 4 106 Crime, Drama, Mystery 8.5 77 1045626 23.34
## 5 30 Animation, Short, Comedy 8.3 NA 53316 NA
## 6 30 Animation, Short, Comedy 8.1 NA 40586 NA
## 7 111 Action, Adventure, Drama 7.6 71 122819 92.00
## 8 102 Drama, Romance, War 8.5 100 551575 1.02
## 9 117 Action, Drama, Sci-Fi 8.1 84 736925 32.87
## 10 195 Biography, Drama, History 8.9 94 1292510 96.90
## Director Cast1 Cast2 Cast3
## 1 George Lucas Mark Hamill Harrison Ford Carrie Fisher
## 2 Frank Darabont Tim Robbins Morgan Freeman Bob Gunton
## 3 Danny Boyle Ewan McGregor Ewen Bremner Jonny Lee Miller
## 4 Bryan Singer Kevin Spacey Gabriel Byrne Chazz Palminteri
## 5 Nick Park Peter Sallis Peter Hawkins <NA>
## 6 Nick Park Peter Sallis Anne Reid Justin Fletcher
## 7 Jonathan Frakes Patrick Stewart Jonathan Frakes Brent Spiner
## 8 Michael Curtiz Humphrey Bogart Ingrid Bergman Paul Henreid
## 9 Ridley Scott Harrison Ford Rutger Hauer Sean Young
## 10 Steven Spielberg Liam Neeson Ralph Fiennes Ben Kingsley
## Cast4
## 1 Alec Guinness
## 2 William Sadler
## 3 Kevin McKidd
## 4 Stephen Baldwin
## 5 <NA>
## 6 Peter Hawkins
## 7 LeVar Burton
## 8 Claude Rains
## 9 Edward James Olmos
## 10 Caroline Goodall
dbClearResult(res1)
The query below identifies the distinct movie titles that containing the word star and the rating in descending order of rating.
Note: Since a Movie title could still be in the top 250 for a different IMDByear, the keyword distinct is important.
sql2 <- "SELECT DISTINCT Title, Rating FROM imdbtop250 WHERE Title LIKE '%star%' ORDER BY Rating DESC"
res2 <- dbSendQuery(con,sql2)
dbFetch(res2)
## Title Rating
## 1 Star Wars: Episode V - The Empire Strikes Back 8.7
## 2 Star Wars: Episode IV - A New Hope 8.6
## 3 Like Stars on Earth 8.4
## 4 Star Wars: Episode VI - Return of the Jedi 8.3
## 5 Star Trek 7.9
## 6 Rogue One: A Star Wars Story 7.8
## 7 Star Wars: Episode VII - The Force Awakens 7.8
## 8 Star Trek: First Contact 7.6
## 9 Star Wars: Episode III - Revenge of the Sith 7.5
## 10 Lone Star 7.4
dbClearResult(res2)
The next query gets the first 10 distinct Action movie titles with a rating greater than 8
sql3 <- "SELECT DISTINCT Title, Genre, Rating FROM imdbtop250 WHERE Genre LIKE '%action%' AND Rating > 8 limit 10"
res3 <- dbSendQuery(con,sql3)
dbFetch(res3)
## Title Genre
## 1 Star Wars: Episode IV - A New Hope Action, Adventure, Fantasy
## 2 Blade Runner Action, Drama, Sci-Fi
## 3 Seven Samurai Action, Drama
## 4 Star Wars: Episode V - The Empire Strikes Back Action, Adventure, Fantasy
## 5 Léon: The Professional Action, Crime, Drama
## 6 Indiana Jones and the Raiders of the Lost Ark Action, Adventure
## 7 Aliens Action, Adventure, Sci-Fi
## 8 Star Wars: Episode VI - Return of the Jedi Action, Adventure, Fantasy
## 9 Heat Action, Crime, Drama
## 10 Ran Action, Drama, War
## Rating
## 1 8.6
## 2 8.1
## 3 8.6
## 4 8.7
## 5 8.5
## 6 8.4
## 7 8.3
## 8 8.3
## 9 8.2
## 10 8.2
dbClearResult(res3)
Select all distinct Movies titles in which Harrison Ford stars as a cast
sql4 <- "SELECT DISTINCT Title, Cast1, Cast2, Cast3, Cast4 FROM imdbtop250 WHERE Cast1 LIKE '%Harrison Ford%' OR Cast2 LIKE '%Harrison Ford%' OR Cast3 LIKE '%Harrison Ford%' OR Cast4 LIKE '%Harrison Ford%' ORDER BY Title"
res4 <- dbSendQuery(con,sql4)
dbFetch(res4)
## Title Cast1
## 1 Blade Runner Harrison Ford
## 2 Blade Runner 2049 Harrison Ford
## 3 Indiana Jones and the Last Crusade Harrison Ford
## 4 Indiana Jones and the Raiders of the Lost Ark Harrison Ford
## 5 Star Wars: Episode IV - A New Hope Mark Hamill
## 6 Star Wars: Episode V - The Empire Strikes Back Mark Hamill
## 7 Star Wars: Episode VI - Return of the Jedi Mark Hamill
## 8 The Fugitive Harrison Ford
## 9 Witness Harrison Ford
## Cast2 Cast3 Cast4
## 1 Rutger Hauer Sean Young Edward James Olmos
## 2 Ryan Gosling Ana de Armas Dave Bautista
## 3 Sean Connery Alison Doody Denholm Elliott
## 4 Karen Allen Paul Freeman John Rhys-Davies
## 5 Harrison Ford Carrie Fisher Alec Guinness
## 6 Harrison Ford Carrie Fisher Billy Dee Williams
## 7 Harrison Ford Carrie Fisher Billy Dee Williams
## 8 Tommy Lee Jones Sela Ward Julianne Moore
## 9 Kelly McGillis Lukas Haas Josef Sommer
dbClearResult(res4)
The Movie Title(s) with the least Gross
fun1 <- "SELECT DISTINCT Title, Gross FROM imdbtop250 WHERE Gross = (SELECT Min(Gross) FROM imdbtop250)"
agg1 <- dbSendQuery(con,fun1)
dbFetch(agg1)
## Title Gross
## 1 All About Eve 0.01
## 2 Dial M for Murder 0.01
## 3 Memories of Murder 0.01
## 4 Udaan 0.01
dbClearResult(agg1)
The Movie Title(s) with the highest Rating
fun2 <- "SELECT DISTINCT Title, Rating FROM imdbtop250 WHERE Rating = (SELECT MAX(Rating) FROM imdbtop250)"
agg2 <- dbSendQuery(con,fun2)
dbFetch(agg2)
## Title Rating
## 1 The Shawshank Redemption 9.3
## 2 Jai Bhim 9.3
dbClearResult(agg2)
Number of times a Movie Title appears in the chart
fun3 <- "SELECT Title, COUNT(*) AS `Number of Times Movie Title appears in Chart` FROM imdbtop250 GROUP BY Title limit 20"
agg3 <- dbSendQuery(con,fun3)
dbFetch(agg3)
## Title
## 1 Star Wars: Episode IV - A New Hope
## 2 The Shawshank Redemption
## 3 Trainspotting
## 4 The Usual Suspects
## 5 The Wrong Trousers
## 6 A Close Shave
## 7 Star Trek: First Contact
## 8 Casablanca
## 9 Blade Runner
## 10 Schindler's List
## 11 Citizen Kane
## 12 Pulp Fiction
## 13 Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb
## 14 The Godfather
## 15 The English Patient
## 16 Braveheart
## 17 Beautiful Thing
## 18 Lone Star
## 19 Toy Story
## 20 One Flew Over the Cuckoo's Nest
## Number of Times Movie Title appears in Chart
## 1 26
## 2 26
## 3 26
## 4 26
## 5 5
## 6 5
## 7 1
## 8 26
## 9 26
## 10 26
## 11 26
## 12 26
## 13 26
## 14 26
## 15 2
## 16 26
## 17 1
## 18 3
## 19 26
## 20 26
dbClearResult(agg3)
Average Rating and Total Gross in USD millions per IMDByear
fun4 <- "SELECT IMDByear, ROUND(AVG(Rating),2) AS `Average Rating`, ROUND(SUM(Gross),2) AS `Total Gross in USD millions` FROM imdbtop250 GROUP BY IMDByear"
agg4 <- dbSendQuery(con,fun4)
dbFetch(agg4)
## IMDByear Average Rating Total Gross in USD millions
## 1 1996 7.83 10371.94
## 2 1997 7.94 12837.39
## 3 1998 7.98 12817.70
## 4 1999 8.07 11181.34
## 5 2000 8.09 11435.71
## 6 2001 8.09 12692.43
## 7 2002 8.11 12859.91
## 8 2003 8.13 14004.74
## 9 2004 8.15 12399.39
## 10 2005 8.16 13195.06
## 11 2006 8.18 12967.02
## 12 2007 8.19 13016.98
## 13 2008 8.19 13595.25
## 14 2009 8.20 16400.35
## 15 2010 8.22 16552.35
## 16 2011 8.23 17497.85
## 17 2012 8.23 18737.71
## 18 2013 8.25 18159.90
## 19 2014 8.26 18158.57
## 20 2015 8.25 19947.59
## 21 2016 8.27 19177.97
## 22 2017 8.27 18346.65
## 23 2018 8.28 18789.37
## 24 2019 8.29 19520.70
## 25 2020 8.30 18958.73
## 26 2021 8.31 18785.83
dbClearResult(agg4)
In the code chunk below, I read the Mysql as a R data frame and I also disconnected from the Mysql connection.
df <- data.frame(dbReadTable(con, "imdbtop250"))
dbDisconnect(con)
checking the structure of the data frame.
str(df)
## 'data.frame': 6500 obs. of 16 variables:
## $ Ranking : int 1 2 3 4 5 6 7 8 9 10 ...
## $ IMDByear: chr "1996" "1996" "1996" "1996" ...
## $ IMDBlink: chr "/title/tt0076759/" "/title/tt0111161/" "/title/tt0117951/" "/title/tt0114814/" ...
## $ Title : chr "Star Wars: Episode IV - A New Hope" "The Shawshank Redemption" "Trainspotting" "The Usual Suspects" ...
## $ Date : chr "1977" "1994" "1996" "1995" ...
## $ RunTime : int 121 142 93 106 30 30 111 102 117 195 ...
## $ Genre : chr "Action, Adventure, Fantasy" "Drama" "Drama" "Crime, Drama, Mystery" ...
## $ Rating : num 8.6 9.3 8.1 8.5 8.3 8.1 7.6 8.5 8.1 8.9 ...
## $ Score : int 90 80 83 77 NA NA 71 100 84 94 ...
## $ Votes : int 1299781 2529673 665213 1045626 53316 40586 122819 551575 736925 1292510 ...
## $ Gross : num 322.7 28.3 16.5 23.3 NA ...
## $ Director: chr "George Lucas" "Frank Darabont" "Danny Boyle" "Bryan Singer" ...
## $ Cast1 : chr " Mark Hamill" " Tim Robbins" " Ewan McGregor" " Kevin Spacey" ...
## $ Cast2 : chr " Harrison Ford" " Morgan Freeman" " Ewen Bremner" " Gabriel Byrne" ...
## $ Cast3 : chr " Carrie Fisher" " Bob Gunton" " Jonny Lee Miller" " Chazz Palminteri" ...
## $ Cast4 : chr " Alec Guinness" " William Sadler" " Kevin McKidd" " Stephen Baldwin" ...
checking the first 5 rows of the R data frame
head(df,5)
## Ranking IMDByear IMDBlink Title Date
## 1 1 1996 /title/tt0076759/ Star Wars: Episode IV - A New Hope 1977
## 2 2 1996 /title/tt0111161/ The Shawshank Redemption 1994
## 3 3 1996 /title/tt0117951/ Trainspotting 1996
## 4 4 1996 /title/tt0114814/ The Usual Suspects 1995
## 5 5 1996 /title/tt0108598/ The Wrong Trousers 1993
## RunTime Genre Rating Score Votes Gross Director
## 1 121 Action, Adventure, Fantasy 8.6 90 1299781 322.74 George Lucas
## 2 142 Drama 9.3 80 2529673 28.34 Frank Darabont
## 3 93 Drama 8.1 83 665213 16.50 Danny Boyle
## 4 106 Crime, Drama, Mystery 8.5 77 1045626 23.34 Bryan Singer
## 5 30 Animation, Short, Comedy 8.3 NA 53316 NA Nick Park
## Cast1 Cast2 Cast3 Cast4
## 1 Mark Hamill Harrison Ford Carrie Fisher Alec Guinness
## 2 Tim Robbins Morgan Freeman Bob Gunton William Sadler
## 3 Ewan McGregor Ewen Bremner Jonny Lee Miller Kevin McKidd
## 4 Kevin Spacey Gabriel Byrne Chazz Palminteri Stephen Baldwin
## 5 Peter Sallis Peter Hawkins <NA> <NA>