Make sure ‘movies’ and ‘movie_ratings’ tables present in the ‘movies’ database
mydb = dbConnect(MySQL(), user='root', password='mysql@123', dbname='movies', host='localhost')
movie_ratings <- dbGetQuery(mydb, "select * from movie_ratings")
movies <- dbGetQuery(mydb, "select * from movies")
head(movie_ratings)
## movie_id viewer_name rating
## 1 1001 John Antony 3
## 2 1002 John Antony NA
## 3 1003 John Antony 4
## 4 1004 John Antony 4
## 5 1005 John Antony 3
## 6 1006 John Antony 2
head(movies)
## movie_id movie_name movie_type year
## 1 1001 Womb Ghosts Horror 2010
## 2 1002 Paranormal Activity: The Ghost Dimension Horror 2015
## 3 1003 Ex Machine-1 Sci-Fi 2016
## 4 1004 Matrix Sci-Fi 1999
## 5 1005 Gravity Sci-Fi 2013
## 6 1006 Incredibles Animation 2017
group_by function seprates data frame into different group based on the specified columns. This object doesn’t seem useful by itself, the real power comes when we use this with other function like summarise which does aggregation
Note: summarise() takes the data frame and apply one or more aggregation function. For eg: avg_rating is the mean aggregation.
group_by() takes the data frame and create group based on specified columns with() uses to order a dataframe by a specified column
movie_ratings_join <- inner_join(movie_ratings, movies, by = c("movie_id" = "movie_id"))
#group by movie_name and movie_type
grp_movie_name_type <- group_by(movie_ratings_join, movie_name, movie_type)
#top rated movie df
top_rated_movie_name <- summarise(grp_movie_name_type, avg_rating=mean(rating, na.rm=TRUE))
#top rated 3 movies( slice operator using with() and order() function )
top_rated_movies <- head(top_rated_movie_name[with(top_rated_movie_name, order(-avg_rating)),],3)
htmlTable(top_rated_movies, caption = 'Top rated Movies')
| Top rated Movies | |||
| movie_name | movie_type | avg_rating | |
|---|---|---|---|
| 1 | Coco | Animation | 3.42857142857143 |
| 2 | Matrix | Sci-Fi | 3 |
| 3 | Gravity | Sci-Fi | 2.5 |