Reading the movie rating data from Survey into Mysql db and then into R


Overview of the solution

Mysql database has two tables Movies and Survey

Movies

-movie id (primary key)

-movie name

Survey

-name of the participant

-age group

-gender

-movie id (Reference key)

-rating

Survey table has entry for each rating the participant voted for.

In order to handle the missing data, we left join movies table with survey so that we have all entries. We apply omit so that we donโ€™t include null values.

We plot the result on a graph with movie and mean of rating. Highest mean movie is the favorite among the participants and lowest being less favourite.

library(RMariaDB)
library(ggplot2)
survey <- dbConnect(RMariaDB::MariaDB(), user='root', password='My$ql@123', dbname='survey', host='localhost')
## Left join movies table in order to get the missing data
survey <- dbGetQuery(survey, "select s.participant,s.age_grp,s.age_grp,s.rating, m.name Movie from movies m
left join  survey s
on m.id=s.movie_id
order by 5")
## Handling of missing data in the movie rating by using omit
survey <- na.omit(survey)
## View the data having the complete rating data
View(survey)

## Plot the data on a graph with movies and mean(rating)
ggplot(survey) + geom_bar(aes(Movie, rating, fill=Movie), stat="summary", fun="mean") + labs(x="Movies", y="Avg.Ratings", title="") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


Conclusion

print("Based on the rating from my friends for the top 5 movies released in 2019, PARASITE movie has the highest rating and Marriage story has the lowest rating")
## [1] "Based on the rating from my friends for the top 5 movies released in 2019, PARASITE movie has the highest rating and Marriage story has the lowest rating"