Introduction The focus of this assignment is to create an R dataframe that shows rates of movies by gender of the population asked rating. Movies_db data will be sourced from a tb database in MySQL and combined with a CSV file of population data, located on GitHub. The final R dataframe will have the following columns:
Title gender_person Rating
library(RMySQL)
library(tidyverse)
library(dplyr)
library(DBI)
Getting and Preparing the Data
Step 1. Connect to MySQL and retrieve the tb dataset stored in a database table.
mydb = dbConnect(MySQL(), user='root', password='Albania777', dbname='movies_db', host='localhost')
Return the movies query below and store the results a dataframe called movies
movies.df <- dbGetQuery(mydb, "select title, gender_person,rating from movies_observ")
names(movies.df)
## [1] "title" "gender_person" "rating"
summary(movies.df)
## title gender_person rating
## Length:12 Length:12 Min. :2.000
## Class :character Class :character 1st Qu.:2.750
## Mode :character Mode :character Median :3.500
## Mean :3.417
## 3rd Qu.:4.000
## Max. :5.000
print(movies.df)
## title gender_person rating
## 1 The Lion King F 5
## 2 The Lion King M 3
## 3 A star is Born F 5
## 4 A star is Born M 3
## 5 Mission Impossible M 2
## 6 Mission Impossible F 4
## 7 Captain Marvel M 3
## 8 Captain Marvel F 4
## 9 Aladdin F 4
## 10 Aladdin M 4
## 11 Frozen 2 F 2
## 12 Frozen 2 M 2
qplot(title, rating, data=movies.df,xlab = "Rating", ylab = "Movie", main = "Individual Movie Rating by Gender") + facet_wrap(~gender_person) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(movies.df, aes(x = reorder(title, rating), y = rating, fill = title), xlab = 'Rating', col = I("grey")) + geom_bar(stat = "identity") +
ggtitle("Movie Cummulative Ratings") + labs(x = "Movie") + coord_flip()
dbDisconnect(mydb)
## [1] TRUE