The packages used in these notebooks are the RMariaDB and DBI. Additionally, since I typically use one mysql database mydb on my local machine (a Windows one) I included a setup code in my my.ini file, see code below.

RMariaDB setup

[rs-dbi]
database=“mydb”
user=“root”
password=“your password”

You can also check this link here for your setup if needed.

Loading Required Packages and making our Mysql connection

library(DBI)
## Warning: package 'DBI' was built under R version 4.1.3
library(RMariaDB)
## Warning: package 'RMariaDB' was built under R version 4.1.3
con <- dbConnect(RMariaDB::MariaDB(), default.file=normalizePath(path="C:/ProgramData/MySQL/MySQL Server 8.0/my.ini"), group = "rs-dbi")

Checking list of tables in the database

dbListTables(con)
## [1] "airbnb_nyc" "imdbtop250"

I also checked the Fields in the table imdbtop250

dbListFields(con, "imdbtop250")
##  [1] "Ranking"  "IMDByear" "IMDBlink" "Title"    "Date"     "RunTime" 
##  [7] "Genre"    "Rating"   "Score"    "Votes"    "Gross"    "Director"
## [13] "Cast1"    "Cast2"    "Cast3"    "Cast4"

Queries on the imdbtop250 table

Query 1
sql1 <- "SELECT * FROM imdbtop250 limit 10"
res1 <- dbSendQuery(con,sql1)
dbFetch(res1)
##    Ranking IMDByear          IMDBlink                              Title Date
## 1        1     1996 /title/tt0076759/ Star Wars: Episode IV - A New Hope 1977
## 2        2     1996 /title/tt0111161/           The Shawshank Redemption 1994
## 3        3     1996 /title/tt0117951/                      Trainspotting 1996
## 4        4     1996 /title/tt0114814/                 The Usual Suspects 1995
## 5        5     1996 /title/tt0108598/                 The Wrong Trousers 1993
## 6        6     1996 /title/tt0112691/                      A Close Shave 1995
## 7        7     1996 /title/tt0117731/           Star Trek: First Contact 1996
## 8        8     1996 /title/tt0034583/                         Casablanca 1942
## 9        9     1996 /title/tt0083658/                       Blade Runner 1982
## 10      10     1996 /title/tt0108052/                   Schindler's List 1993
##    RunTime                      Genre Rating Score   Votes  Gross
## 1      121 Action, Adventure, Fantasy    8.6    90 1299781 322.74
## 2      142                      Drama    9.3    80 2529673  28.34
## 3       93                      Drama    8.1    83  665213  16.50
## 4      106      Crime, Drama, Mystery    8.5    77 1045626  23.34
## 5       30   Animation, Short, Comedy    8.3    NA   53316     NA
## 6       30   Animation, Short, Comedy    8.1    NA   40586     NA
## 7      111   Action, Adventure, Drama    7.6    71  122819  92.00
## 8      102        Drama, Romance, War    8.5   100  551575   1.02
## 9      117      Action, Drama, Sci-Fi    8.1    84  736925  32.87
## 10     195  Biography, Drama, History    8.9    94 1292510  96.90
##            Director            Cast1             Cast2              Cast3
## 1      George Lucas      Mark Hamill     Harrison Ford      Carrie Fisher
## 2    Frank Darabont      Tim Robbins    Morgan Freeman         Bob Gunton
## 3       Danny Boyle    Ewan McGregor      Ewen Bremner   Jonny Lee Miller
## 4      Bryan Singer     Kevin Spacey     Gabriel Byrne   Chazz Palminteri
## 5         Nick Park     Peter Sallis     Peter Hawkins               <NA>
## 6         Nick Park     Peter Sallis         Anne Reid    Justin Fletcher
## 7   Jonathan Frakes  Patrick Stewart   Jonathan Frakes       Brent Spiner
## 8    Michael Curtiz  Humphrey Bogart    Ingrid Bergman       Paul Henreid
## 9      Ridley Scott    Harrison Ford      Rutger Hauer         Sean Young
## 10 Steven Spielberg      Liam Neeson     Ralph Fiennes       Ben Kingsley
##                   Cast4
## 1         Alec Guinness
## 2        William Sadler
## 3          Kevin McKidd
## 4       Stephen Baldwin
## 5                  <NA>
## 6         Peter Hawkins
## 7          LeVar Burton
## 8          Claude Rains
## 9    Edward James Olmos
## 10     Caroline Goodall
dbClearResult(res1)
Query 2

The query below identifies the distinct movie titles that containing the word star and the rating in descending order of rating.

Note: Since a Movie title could still be in the top 250 for a different IMDByear, the keyword distinct is important.

sql2 <- "SELECT DISTINCT Title, Rating FROM imdbtop250 WHERE Title LIKE '%star%' ORDER BY Rating DESC"
res2 <- dbSendQuery(con,sql2)
dbFetch(res2)
##                                             Title Rating
## 1  Star Wars: Episode V - The Empire Strikes Back    8.7
## 2              Star Wars: Episode IV - A New Hope    8.6
## 3                             Like Stars on Earth    8.4
## 4      Star Wars: Episode VI - Return of the Jedi    8.3
## 5                                       Star Trek    7.9
## 6                    Rogue One: A Star Wars Story    7.8
## 7      Star Wars: Episode VII - The Force Awakens    7.8
## 8                        Star Trek: First Contact    7.6
## 9    Star Wars: Episode III - Revenge of the Sith    7.5
## 10                                      Lone Star    7.4
dbClearResult(res2)
Query 3

The next query gets the first 10 distinct Action movie titles with a rating greater than 8

sql3 <- "SELECT DISTINCT Title, Genre, Rating FROM imdbtop250 WHERE Genre LIKE '%action%' AND Rating > 8 limit 10"
res3 <- dbSendQuery(con,sql3)
dbFetch(res3)
##                                             Title                      Genre
## 1              Star Wars: Episode IV - A New Hope Action, Adventure, Fantasy
## 2                                    Blade Runner      Action, Drama, Sci-Fi
## 3                                   Seven Samurai              Action, Drama
## 4  Star Wars: Episode V - The Empire Strikes Back Action, Adventure, Fantasy
## 5                          Léon: The Professional       Action, Crime, Drama
## 6   Indiana Jones and the Raiders of the Lost Ark          Action, Adventure
## 7                                          Aliens  Action, Adventure, Sci-Fi
## 8      Star Wars: Episode VI - Return of the Jedi Action, Adventure, Fantasy
## 9                                            Heat       Action, Crime, Drama
## 10                                            Ran         Action, Drama, War
##    Rating
## 1     8.6
## 2     8.1
## 3     8.6
## 4     8.7
## 5     8.5
## 6     8.4
## 7     8.3
## 8     8.3
## 9     8.2
## 10    8.2
dbClearResult(res3)
Query 4

Select all distinct Movies titles in which Harrison Ford stars as a cast

sql4 <- "SELECT DISTINCT Title, Cast1, Cast2, Cast3, Cast4 FROM imdbtop250 WHERE Cast1 LIKE '%Harrison Ford%' OR Cast2 LIKE '%Harrison Ford%' OR Cast3 LIKE '%Harrison Ford%' OR Cast4 LIKE '%Harrison Ford%' ORDER BY Title"
res4 <- dbSendQuery(con,sql4)
dbFetch(res4)
##                                            Title          Cast1
## 1                                   Blade Runner  Harrison Ford
## 2                              Blade Runner 2049  Harrison Ford
## 3             Indiana Jones and the Last Crusade  Harrison Ford
## 4  Indiana Jones and the Raiders of the Lost Ark  Harrison Ford
## 5             Star Wars: Episode IV - A New Hope    Mark Hamill
## 6 Star Wars: Episode V - The Empire Strikes Back    Mark Hamill
## 7     Star Wars: Episode VI - Return of the Jedi    Mark Hamill
## 8                                   The Fugitive  Harrison Ford
## 9                                        Witness  Harrison Ford
##               Cast2           Cast3                Cast4
## 1      Rutger Hauer      Sean Young   Edward James Olmos
## 2      Ryan Gosling    Ana de Armas        Dave Bautista
## 3      Sean Connery    Alison Doody      Denholm Elliott
## 4       Karen Allen    Paul Freeman     John Rhys-Davies
## 5     Harrison Ford   Carrie Fisher        Alec Guinness
## 6     Harrison Ford   Carrie Fisher   Billy Dee Williams
## 7     Harrison Ford   Carrie Fisher   Billy Dee Williams
## 8   Tommy Lee Jones       Sela Ward       Julianne Moore
## 9    Kelly McGillis      Lukas Haas         Josef Sommer
dbClearResult(res4)

Aggregation Functions “MIN(), MAX(), COUNT(), AVG(), SUM()”

Function 1

The Movie Title(s) with the least Gross

fun1 <- "SELECT DISTINCT Title, Gross FROM imdbtop250 WHERE Gross = (SELECT Min(Gross) FROM imdbtop250)"
agg1 <- dbSendQuery(con,fun1)
dbFetch(agg1)
##                Title Gross
## 1      All About Eve  0.01
## 2  Dial M for Murder  0.01
## 3 Memories of Murder  0.01
## 4              Udaan  0.01
dbClearResult(agg1)
Function 2

The Movie Title(s) with the highest Rating

fun2 <- "SELECT DISTINCT Title, Rating FROM imdbtop250 WHERE Rating = (SELECT MAX(Rating) FROM imdbtop250)"
agg2 <- dbSendQuery(con,fun2)
dbFetch(agg2)
##                      Title Rating
## 1 The Shawshank Redemption    9.3
## 2                 Jai Bhim    9.3
dbClearResult(agg2)
Function 3

Number of times a Movie Title appears in the chart

fun3 <- "SELECT Title, COUNT(*) AS `Number of Times Movie Title appears in Chart` FROM imdbtop250 GROUP BY Title limit 20"
agg3 <- dbSendQuery(con,fun3)
dbFetch(agg3)
##                                                                   Title
## 1                                    Star Wars: Episode IV - A New Hope
## 2                                              The Shawshank Redemption
## 3                                                         Trainspotting
## 4                                                    The Usual Suspects
## 5                                                    The Wrong Trousers
## 6                                                         A Close Shave
## 7                                              Star Trek: First Contact
## 8                                                            Casablanca
## 9                                                          Blade Runner
## 10                                                     Schindler's List
## 11                                                         Citizen Kane
## 12                                                         Pulp Fiction
## 13 Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb
## 14                                                        The Godfather
## 15                                                  The English Patient
## 16                                                           Braveheart
## 17                                                      Beautiful Thing
## 18                                                            Lone Star
## 19                                                            Toy Story
## 20                                      One Flew Over the Cuckoo's Nest
##    Number of Times Movie Title appears in Chart
## 1                                            26
## 2                                            26
## 3                                            26
## 4                                            26
## 5                                             5
## 6                                             5
## 7                                             1
## 8                                            26
## 9                                            26
## 10                                           26
## 11                                           26
## 12                                           26
## 13                                           26
## 14                                           26
## 15                                            2
## 16                                           26
## 17                                            1
## 18                                            3
## 19                                           26
## 20                                           26
dbClearResult(agg3)
Function 4

Average Rating and Total Gross in USD millions per IMDByear

fun4 <- "SELECT IMDByear, ROUND(AVG(Rating),2) AS `Average Rating`, ROUND(SUM(Gross),2) AS `Total Gross in USD millions` FROM imdbtop250 GROUP BY IMDByear"
agg4 <- dbSendQuery(con,fun4)
dbFetch(agg4)
##    IMDByear Average Rating Total Gross in USD millions
## 1      1996           7.83                    10371.94
## 2      1997           7.94                    12837.39
## 3      1998           7.98                    12817.70
## 4      1999           8.07                    11181.34
## 5      2000           8.09                    11435.71
## 6      2001           8.09                    12692.43
## 7      2002           8.11                    12859.91
## 8      2003           8.13                    14004.74
## 9      2004           8.15                    12399.39
## 10     2005           8.16                    13195.06
## 11     2006           8.18                    12967.02
## 12     2007           8.19                    13016.98
## 13     2008           8.19                    13595.25
## 14     2009           8.20                    16400.35
## 15     2010           8.22                    16552.35
## 16     2011           8.23                    17497.85
## 17     2012           8.23                    18737.71
## 18     2013           8.25                    18159.90
## 19     2014           8.26                    18158.57
## 20     2015           8.25                    19947.59
## 21     2016           8.27                    19177.97
## 22     2017           8.27                    18346.65
## 23     2018           8.28                    18789.37
## 24     2019           8.29                    19520.70
## 25     2020           8.30                    18958.73
## 26     2021           8.31                    18785.83
dbClearResult(agg4)

Reading Mysql table as a data frame in R

In the code chunk below, I read the Mysql as a R data frame and I also disconnected from the Mysql connection.

df <- data.frame(dbReadTable(con, "imdbtop250"))
dbDisconnect(con)

checking the structure of the data frame.

str(df)
## 'data.frame':    6500 obs. of  16 variables:
##  $ Ranking : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ IMDByear: chr  "1996" "1996" "1996" "1996" ...
##  $ IMDBlink: chr  "/title/tt0076759/" "/title/tt0111161/" "/title/tt0117951/" "/title/tt0114814/" ...
##  $ Title   : chr  "Star Wars: Episode IV - A New Hope" "The Shawshank Redemption" "Trainspotting" "The Usual Suspects" ...
##  $ Date    : chr  "1977" "1994" "1996" "1995" ...
##  $ RunTime : int  121 142 93 106 30 30 111 102 117 195 ...
##  $ Genre   : chr  "Action, Adventure, Fantasy" "Drama" "Drama" "Crime, Drama, Mystery" ...
##  $ Rating  : num  8.6 9.3 8.1 8.5 8.3 8.1 7.6 8.5 8.1 8.9 ...
##  $ Score   : int  90 80 83 77 NA NA 71 100 84 94 ...
##  $ Votes   : int  1299781 2529673 665213 1045626 53316 40586 122819 551575 736925 1292510 ...
##  $ Gross   : num  322.7 28.3 16.5 23.3 NA ...
##  $ Director: chr  "George Lucas" "Frank Darabont" "Danny Boyle" "Bryan Singer" ...
##  $ Cast1   : chr  " Mark Hamill" " Tim Robbins" " Ewan McGregor" " Kevin Spacey" ...
##  $ Cast2   : chr  "  Harrison Ford" "  Morgan Freeman" "  Ewen Bremner" "  Gabriel Byrne" ...
##  $ Cast3   : chr  "  Carrie Fisher" "  Bob Gunton" "  Jonny Lee Miller" "  Chazz Palminteri" ...
##  $ Cast4   : chr  "  Alec Guinness" "  William Sadler" "  Kevin McKidd" "  Stephen Baldwin" ...

checking the first 5 rows of the R data frame

head(df,5)
##   Ranking IMDByear          IMDBlink                              Title Date
## 1       1     1996 /title/tt0076759/ Star Wars: Episode IV - A New Hope 1977
## 2       2     1996 /title/tt0111161/           The Shawshank Redemption 1994
## 3       3     1996 /title/tt0117951/                      Trainspotting 1996
## 4       4     1996 /title/tt0114814/                 The Usual Suspects 1995
## 5       5     1996 /title/tt0108598/                 The Wrong Trousers 1993
##   RunTime                      Genre Rating Score   Votes  Gross       Director
## 1     121 Action, Adventure, Fantasy    8.6    90 1299781 322.74   George Lucas
## 2     142                      Drama    9.3    80 2529673  28.34 Frank Darabont
## 3      93                      Drama    8.1    83  665213  16.50    Danny Boyle
## 4     106      Crime, Drama, Mystery    8.5    77 1045626  23.34   Bryan Singer
## 5      30   Animation, Short, Comedy    8.3    NA   53316     NA      Nick Park
##            Cast1            Cast2              Cast3             Cast4
## 1    Mark Hamill    Harrison Ford      Carrie Fisher     Alec Guinness
## 2    Tim Robbins   Morgan Freeman         Bob Gunton    William Sadler
## 3  Ewan McGregor     Ewen Bremner   Jonny Lee Miller      Kevin McKidd
## 4   Kevin Spacey    Gabriel Byrne   Chazz Palminteri   Stephen Baldwin
## 5   Peter Sallis    Peter Hawkins               <NA>              <NA>

References

  1. https://www.rdocumentation.org/packages/RMariaDB/versions/1.2.2/topics/MariaDB
  2. https://rmariadb.r-dbi.org/