Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends if necessary) to rate each of these movies that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database of your choosing. Load the information from the SQL database into an R dataframe.
The database to store the table with names, movies and ratings is to be created using SQL.
CREATE DATABASE rating;
USE rating;
CREATE TABLE rating (
name_response varchar(100),
movie_name varchar(100),
rating int
);
INSERT INTO rating
VALUES
-- ('Anna', 'MINIONS. THE RISE OF GRU', 3),
('Alex', 'MINIONS. THE RISE OF GRU', 2),
('Inna', 'MINIONS. THE RISE OF GRU', 4),
('Dan', 'MINIONS. THE RISE OF GRU', 3),
('Lucy', 'MINIONS. THE RISE OF GRU', NULL),
('Anna', 'JURASSIC WORLD DOMINION', 4),
('Alex', 'JURASSIC WORLD DOMINION', 3),
('Inna', 'JURASSIC WORLD DOMINION', NULL),
('Dan', 'JURASSIC WORLD DOMINION', 3),
('Lucy', 'JURASSIC WORLD DOMINION', 4),
('Anna', 'ROGUE ONE: A STAR WARS STORY', NULL),
('Alex', 'ROGUE ONE: A STAR WARS STORY', 5),
('Inna', 'ROGUE ONE: A STAR WARS STORY', 1),
('Dan', 'ROGUE ONE: A STAR WARS STORY', 3),
('Lucy', 'ROGUE ONE: A STAR WARS STORY', NULL),
('Anna', 'SPIDER-MAN: NO WAY HOME', 5),
('Alex', 'SPIDER-MAN: NO WAY HOME', 5),
('Inna', 'SPIDER-MAN: NO WAY HOME', 4),
('Dan', 'SPIDER-MAN: NO WAY HOME', 5),
('Lucy', 'SPIDER-MAN: NO WAY HOME', 4),
('Anna', 'ORPHAN: FIRST KILL', NULL),
('Alex', 'ORPHAN: FIRST KILL', 1),
('Inna', 'ORPHAN: FIRST KILL', NULL),
('Dan', 'ORPHAN: FIRST KILL', 2),
('Lucy', 'ORPHAN: FIRST KILL', NULL),
('Anna', 'ELVIS', 4),
('Alex', 'ELVIS', 3),
('Inna', 'ELVIS', 3),
('Dan', 'ELVIS', 4),
('Lucy', 'ELVIS', NULL);
To load SQL database using R, RMySQL library is used.
db = dbConnect(MySQL(), user='root', password = '336261', dbname='rating', host='localhost')
Once R is connected to the database, we can view and select necessary table from the database.
ratings <- dbGetQuery(db, 'SELECT * FROM rating')
Let’s see some information about the table and what’s inside to check if data was imported correctly.
summary(ratings)
## name_response movie_name rating
## Length:60 Length:60 Min. :1.000
## Class :character Class :character 1st Qu.:3.000
## Mode :character Mode :character Median :3.500
## Mean :3.409
## 3rd Qu.:4.000
## Max. :5.000
## NA's :16
head(ratings,n=5)
## name_response movie_name rating
## 1 Anna MINIONS. THE RISE OF GRU 3
## 2 Anna MINIONS. THE RISE OF GRU 3
## 3 Alex MINIONS. THE RISE OF GRU 2
## 4 Inna MINIONS. THE RISE OF GRU 4
## 5 Dan MINIONS. THE RISE OF GRU 3
From the data, we can gather some information about or movies and respondents.
Let’s check who didn’t watch watch which movie and didn’t provide rating as well as which movie has the least number of views. As we see, Lucy was the one who didn’t see almost all the movies. ORPHAN: FIRST KILL was the most unseen movie, Alex watched all six movies. SQL:
SELECT *
FROM rating
WHERE rating IS NULL;
R:
na_rating <- subset(ratings, is.na(ratings$rating))
na_rating
## name_response movie_name rating
## 6 Lucy MINIONS. THE RISE OF GRU NA
## 9 Inna JURASSIC WORLD DOMINION NA
## 12 Anna ROGUE ONE: A STAR WARS STORY NA
## 16 Lucy ROGUE ONE: A STAR WARS STORY NA
## 22 Anna ORPHAN: FIRST KILL NA
## 24 Inna ORPHAN: FIRST KILL NA
## 26 Lucy ORPHAN: FIRST KILL NA
## 31 Lucy ELVIS NA
## 35 Lucy MINIONS. THE RISE OF GRU NA
## 38 Inna JURASSIC WORLD DOMINION NA
## 41 Anna ROGUE ONE: A STAR WARS STORY NA
## 45 Lucy ROGUE ONE: A STAR WARS STORY NA
## 51 Anna ORPHAN: FIRST KILL NA
## 53 Inna ORPHAN: FIRST KILL NA
## 55 Lucy ORPHAN: FIRST KILL NA
## 60 Lucy ELVIS NA
Find average rating for each movie, put in descending order to see the best rated movie. We also need to ignore NULL values to make R calculate the average. It looks like SPIDER-MAN: NO WAY HOME is the best movie to watch at the current moment. SQL:
SELECT movie_name, AVG(rating) AS rating_avg
FROM rating
GROUP BY movie_name
ORDER BY rating DESC;
R:
ratings %>%
group_by(movie_name) %>%
summarise_at(vars(rating), list(avg_rating = mean), na.rm=TRUE) %>%
arrange(desc(avg_rating))
## # A tibble: 6 x 2
## movie_name avg_rating
## <chr> <dbl>
## 1 SPIDER-MAN: NO WAY HOME 4.6
## 2 ELVIS 3.5
## 3 JURASSIC WORLD DOMINION 3.5
## 4 MINIONS. THE RISE OF GRU 3
## 5 ROGUE ONE: A STAR WARS STORY 3
## 6 ORPHAN: FIRST KILL 1.5
Disconnect from database:
on.exit(dbDisconnect(db))
Using above steps, we have connected to the SQL database and imported it to R data frame. Using both, SQL and R, we did some research on the data: the best rated movie is Spider-Man: No Way Home, don’t watch Orphan: First Kill, Lucy doesn’t like to watch movies, Alex is in trend and watched all latest films.