I selected some of the most popular movies of this year and asked my friends and family to review them on a scale of 1 to 5. Then created a MySQL db schema named ‘movie rating’ containing a table named ‘mr’. To connect to MySQL I have chosen to use the RMySQL package.
mydb = dbConnect(MySQL(), user='root', password='rwl25574', dbname='movie rating', host='localhost')
After connecting to the db I then queried the table ‘mr’ using dbGetQuery and save it to the variable named ‘mr’. Lets look at a summary of the data frame as well. The sumamry is useful to point out that there are many missing values in the dataframe.
mr <- dbGetQuery(mydb, "SELECT * FROM mr")
mr <- as.data.frame(mr)
summary(mr)
## name Incredibles_2 Avengers_Infinity_War DeadPool_2
## Length:5 Min. :4.0 Min. :4.0 Min. :4.00
## Class :character 1st Qu.:4.0 1st Qu.:4.0 1st Qu.:4.25
## Mode :character Median :4.5 Median :4.0 Median :4.50
## Mean :4.5 Mean :4.2 Mean :4.50
## 3rd Qu.:5.0 3rd Qu.:4.0 3rd Qu.:4.75
## Max. :5.0 Max. :5.0 Max. :5.00
## NA's :1 NA's :3
## Black_Panther Ant_Man_and_the_Wasp Oceans_8
## Min. :4.0 Min. :4 Min. :4
## 1st Qu.:4.0 1st Qu.:4 1st Qu.:4
## Median :4.0 Median :4 Median :4
## Mean :4.4 Mean :4 Mean :4
## 3rd Qu.:5.0 3rd Qu.:4 3rd Qu.:4
## Max. :5.0 Max. :4 Max. :4
## NA's :4 NA's :3
Below I found the average rating for each movie and made a bar graph to visualize them.
par(mar=c(12,20,4,2))
mean.reviews <- apply(mr[2:7], 2, function(x) mean(x, na.rm = TRUE))
barplot(mean.reviews, ylab = "Avg Rating", ylim =c(0,5), las = 2)