Note: MySQL database software need to be installed in the local computer in order to successfully execute the R markdown code.
Run the DDL script (MovieRatings_DDL Script.sql) in order to create a database called ‘movieratings’ along with the necessary tables as mentioned above - GitHub location:DDL Script
Run the Data Load script (Data Load DML Script.sql) to load necessary data into the database - GitHub Location: Load Script
Note: All the necessary CSV data files are also available in the below mention location - GitHub Location: Source Data Files
Load necessary libraries -
library(RODBC)
library(magrittr)
library(kableExtra)
library(data.table)
library(plyr)
library(ggplot2)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following objects are masked from 'package:data.table':
##
## dcast, melt
I have used an ODBC data source called ‘MySQL_MovieRatings’ in order to connect to the database.
con <- odbcConnect("MySQL_MovieRatings")
I have taken a list of six Hollywood movies released in 2018 for this analysis. Movies belonging to multiple genres have been captured as pipe (‘|’) separated values.
movies_df <- as.data.frame(sqlFetch(con,"movies"),stringsAsFactors = FALSE)
movies_df %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
Movie_ID | Movie_Title | Movie_Genre | Movie_Year | Director_Name |
---|---|---|---|---|
1 | Black Panther | Action|Adventure|Sci-Fi | 2018 | Ryan Coogler |
2 | Incredibles 2 | Animation|Action | 2018 | Brand Bird |
3 | Avengers: Infinity War | Action|Adventure|Fantasy | 2018 | Anthony Russo |
4 | Annihilation | Adventure|Drama|Horror | 2018 | Alex Garland |
5 | Ant-Man and the Wasp | Action|Adventure|Sci-Fi | 2018 | Peyton Reed |
6 | Mission Impossible: Fallout | Action|Adventure|Thriller | 2018 | Christopher McQuarrie |
These users contributed in the movie rating survey process.
users_df <- as.data.frame(sqlFetch(con,"users"),stringsAsFactors = FALSE)
users_df %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
User_ID | First_Name | Last_Name | Gender | Age |
---|---|---|---|---|
1 | Soumya | Ghosh | M | 36 |
2 | Oliver | Asmus | M | 36 |
3 | Ricky | Sharma | M | 34 |
4 | Idhant | Ghosh | M | 6 |
5 | Bidisha | Kar | F | 35 |
6 | Krishanu | Ghosh | M | 34 |
7 | Jim | Duffy | M | 42 |
8 | Arron | Higgins | M | 41 |
9 | Alex | Smith | M | 43 |
10 | Reese | McGarvey | F | 7 |
11 | Jacob | Oram | M | 45 |
12 | Pritam | Mukherjee | M | 33 |
Each user rated all the 6 movies which has been loaded through SQL scripts into the database. A rank has been created in SQL based on the ratings provided by the users using SQL window function.
Query = "SELECT
User_ID
,Movie_ID
,Rating
,RANK() OVER (PARTITION BY User_ID ORDER BY Rating desc) AS Movie_Rank
FROM
ratings
ORDER BY
User_ID
,Movie_Rank;"
ratings_df <- sqlQuery(con,Query)
ratings_df %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
User_ID | Movie_ID | Rating | Movie_Rank |
---|---|---|---|
1 | 3 | 4.2 | 1 |
1 | 6 | 4.1 | 2 |
1 | 1 | 4.0 | 3 |
1 | 2 | 3.5 | 4 |
1 | 5 | 3.0 | 5 |
1 | 4 | 2.5 | 6 |
2 | 3 | 4.3 | 1 |
2 | 1 | 3.2 | 2 |
2 | 4 | 3.0 | 3 |
2 | 5 | 2.8 | 4 |
2 | 2 | 2.4 | 5 |
2 | 6 | 2.2 | 6 |
3 | 6 | 4.5 | 1 |
3 | 4 | 4.2 | 2 |
3 | 3 | 3.9 | 3 |
3 | 5 | 3.7 | 4 |
3 | 1 | 3.5 | 5 |
3 | 2 | 2.8 | 6 |
4 | 1 | 4.6 | 1 |
4 | 5 | 4.1 | 2 |
4 | 3 | 3.7 | 3 |
4 | 6 | 3.6 | 4 |
4 | 2 | 3.2 | 5 |
4 | 4 | 2.9 | 6 |
5 | 5 | 4.3 | 1 |
5 | 2 | 4.1 | 2 |
5 | 1 | 3.7 | 3 |
5 | 3 | 3.2 | 4 |
5 | 6 | 3.0 | 5 |
5 | 4 | 2.4 | 6 |
6 | 1 | 4.6 | 1 |
6 | 3 | 4.2 | 2 |
6 | 4 | 3.7 | 3 |
6 | 2 | 3.6 | 4 |
6 | 6 | 3.0 | 5 |
6 | 5 | 2.9 | 6 |
10 | 1 | 4.6 | 1 |
10 | 4 | 3.7 | 2 |
10 | 2 | 3.6 | 3 |
10 | 6 | 3.0 | 4 |
10 | 5 | 2.0 | 5 |
10 | 3 | 1.3 | 6 |
Here I have joined all the 3 data sets and loaded into a data frame to prepare for further analysis along with dynamically created movie rank informtaion.
Query = "SELECT
m.Movie_ID
,m.Movie_Title
,m.Director_Name
,m.Movie_Year
,u.User_ID
,u.First_Name
,u.Last_Name
,u.Gender
,u.Age
,r.Rating
,RANK() OVER (PARTITION BY r.User_ID ORDER BY r.Rating desc) AS Movie_Rank
FROM
movieratings.ratings as r
INNER JOIN
movieratings.movies as m
ON r.Movie_ID = m.Movie_ID
INNER JOIN
movieratings.users as u
ON r.User_ID = u.User_ID;"
detail_ratings_df <- sqlQuery(con,Query)
detail_ratings_df %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
Movie_ID | Movie_Title | Director_Name | Movie_Year | User_ID | First_Name | Last_Name | Gender | Age | Rating | Movie_Rank |
---|---|---|---|---|---|---|---|---|---|---|
3 | Avengers: Infinity War | Anthony Russo | 2018 | 1 | Soumya | Ghosh | M | 36 | 4.2 | 1 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 1 | Soumya | Ghosh | M | 36 | 4.1 | 2 |
1 | Black Panther | Ryan Coogler | 2018 | 1 | Soumya | Ghosh | M | 36 | 4.0 | 3 |
2 | Incredibles 2 | Brand Bird | 2018 | 1 | Soumya | Ghosh | M | 36 | 3.5 | 4 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 1 | Soumya | Ghosh | M | 36 | 3.0 | 5 |
4 | Annihilation | Alex Garland | 2018 | 1 | Soumya | Ghosh | M | 36 | 2.5 | 6 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 2 | Oliver | Asmus | M | 36 | 4.3 | 1 |
1 | Black Panther | Ryan Coogler | 2018 | 2 | Oliver | Asmus | M | 36 | 3.2 | 2 |
4 | Annihilation | Alex Garland | 2018 | 2 | Oliver | Asmus | M | 36 | 3.0 | 3 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 2 | Oliver | Asmus | M | 36 | 2.8 | 4 |
2 | Incredibles 2 | Brand Bird | 2018 | 2 | Oliver | Asmus | M | 36 | 2.4 | 5 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 2 | Oliver | Asmus | M | 36 | 2.2 | 6 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 3 | Ricky | Sharma | M | 34 | 4.5 | 1 |
4 | Annihilation | Alex Garland | 2018 | 3 | Ricky | Sharma | M | 34 | 4.2 | 2 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 3 | Ricky | Sharma | M | 34 | 3.9 | 3 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 3 | Ricky | Sharma | M | 34 | 3.7 | 4 |
1 | Black Panther | Ryan Coogler | 2018 | 3 | Ricky | Sharma | M | 34 | 3.5 | 5 |
2 | Incredibles 2 | Brand Bird | 2018 | 3 | Ricky | Sharma | M | 34 | 2.8 | 6 |
1 | Black Panther | Ryan Coogler | 2018 | 4 | Idhant | Ghosh | M | 6 | 4.6 | 1 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 4 | Idhant | Ghosh | M | 6 | 4.1 | 2 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 4 | Idhant | Ghosh | M | 6 | 3.7 | 3 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 4 | Idhant | Ghosh | M | 6 | 3.6 | 4 |
2 | Incredibles 2 | Brand Bird | 2018 | 4 | Idhant | Ghosh | M | 6 | 3.2 | 5 |
4 | Annihilation | Alex Garland | 2018 | 4 | Idhant | Ghosh | M | 6 | 2.9 | 6 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 5 | Bidisha | Kar | F | 35 | 4.3 | 1 |
2 | Incredibles 2 | Brand Bird | 2018 | 5 | Bidisha | Kar | F | 35 | 4.1 | 2 |
1 | Black Panther | Ryan Coogler | 2018 | 5 | Bidisha | Kar | F | 35 | 3.7 | 3 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 5 | Bidisha | Kar | F | 35 | 3.2 | 4 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 5 | Bidisha | Kar | F | 35 | 3.0 | 5 |
4 | Annihilation | Alex Garland | 2018 | 5 | Bidisha | Kar | F | 35 | 2.4 | 6 |
1 | Black Panther | Ryan Coogler | 2018 | 6 | Krishanu | Ghosh | M | 34 | 4.6 | 1 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 6 | Krishanu | Ghosh | M | 34 | 4.2 | 2 |
4 | Annihilation | Alex Garland | 2018 | 6 | Krishanu | Ghosh | M | 34 | 3.7 | 3 |
2 | Incredibles 2 | Brand Bird | 2018 | 6 | Krishanu | Ghosh | M | 34 | 3.6 | 4 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 6 | Krishanu | Ghosh | M | 34 | 3.0 | 5 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 6 | Krishanu | Ghosh | M | 34 | 2.9 | 6 |
1 | Black Panther | Ryan Coogler | 2018 | 10 | Reese | McGarvey | F | 7 | 4.6 | 1 |
4 | Annihilation | Alex Garland | 2018 | 10 | Reese | McGarvey | F | 7 | 3.7 | 2 |
2 | Incredibles 2 | Brand Bird | 2018 | 10 | Reese | McGarvey | F | 7 | 3.6 | 3 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 10 | Reese | McGarvey | F | 7 | 3.0 | 4 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 10 | Reese | McGarvey | F | 7 | 2.0 | 5 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 10 | Reese | McGarvey | F | 7 | 1.3 | 6 |
I have parsed the pipe(‘|’) delimited genre column in the movies data frame to identify what all genres a particular movie belongs to as separate columns.
genres_df <- data.frame(sapply(tstrsplit(movies_df$Movie_Genre,'[|]',type.convert = TRUE),c),stringsAsFactors = FALSE)
genres_df %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
X1 | X2 | X3 |
---|---|---|
Action | Adventure | Sci-Fi |
Animation | Action | NA |
Action | Adventure | Fantasy |
Adventure | Drama | Horror |
Action | Adventure | Sci-Fi |
Action | Adventure | Thriller |
I have Prepared a matrix with all the movie genres and assigned an indicator ‘1’ if the genre applicable for a given movie.
genre_list <- c("Action","Adventure","Animation","Drama","Fantasy","Horror","Sci-Fi","Thriller")
genre_matrix <- matrix(0,nrow(genres_df)+1,length(genre_list))
genre_matrix[1,] <- genre_list #set first row to genre list
colnames(genre_matrix) <- genre_list
#iterate through matrix
for (i in 1:nrow(genres_df)) {
for (c in 1:ncol(genres_df)) {
genmat_col = which(genre_matrix[1,] == genres_df[i,c])
genre_matrix[i+1,genmat_col] <- 1
}
}
genre_matrix %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
Action | Adventure | Animation | Drama | Fantasy | Horror | Sci-Fi | Thriller |
---|---|---|---|---|---|---|---|
Action | Adventure | Animation | Drama | Fantasy | Horror | Sci-Fi | Thriller |
1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
Converted the matrix into a data frame with integer indcator values.
genre_matrix2 <- as.data.frame(genre_matrix[-1,], stringsAsFactors=FALSE) #remove first row, which was the genre list
for (c in 1:ncol(genre_matrix2)) {
genre_matrix2[,c] <- as.integer(genre_matrix2[,c])
} #convert from characters to integers
genre_matrix2 %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
Action | Adventure | Animation | Drama | Fantasy | Horror | Sci-Fi | Thriller |
---|---|---|---|---|---|---|---|
1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
I have combined the genre indicators to the movies_df data frame.
movies_df <- cbind(movies_df,genre_matrix2)
movies_df %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
Movie_ID | Movie_Title | Movie_Genre | Movie_Year | Director_Name | Action | Adventure | Animation | Drama | Fantasy | Horror | Sci-Fi | Thriller |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Black Panther | Action|Adventure|Sci-Fi | 2018 | Ryan Coogler | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | Incredibles 2 | Animation|Action | 2018 | Brand Bird | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
3 | Avengers: Infinity War | Action|Adventure|Fantasy | 2018 | Anthony Russo | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
4 | Annihilation | Adventure|Drama|Horror | 2018 | Alex Garland | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
5 | Ant-Man and the Wasp | Action|Adventure|Sci-Fi | 2018 | Peyton Reed | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
6 | Mission Impossible: Fallout | Action|Adventure|Thriller | 2018 | Christopher McQuarrie | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
Here I have joined the Data Frames ‘detail_ratings_df’ and ‘movies_df’ based on ‘Movie_ID’ to add genres related indicators to analytical data sets.
detail_ratings_df <- merge(x = detail_ratings_df, y = subset(movies_df,select = c("Movie_ID",genre_list)), by = "Movie_ID")
detail_ratings_df %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width="100%",height="300px")
Movie_ID | Movie_Title | Director_Name | Movie_Year | User_ID | First_Name | Last_Name | Gender | Age | Rating | Movie_Rank | Action | Adventure | Animation | Drama | Fantasy | Horror | Sci-Fi | Thriller |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Black Panther | Ryan Coogler | 2018 | 6 | Krishanu | Ghosh | M | 34 | 4.6 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | Black Panther | Ryan Coogler | 2018 | 10 | Reese | McGarvey | F | 7 | 4.6 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | Black Panther | Ryan Coogler | 2018 | 2 | Oliver | Asmus | M | 36 | 3.2 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | Black Panther | Ryan Coogler | 2018 | 4 | Idhant | Ghosh | M | 6 | 4.6 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | Black Panther | Ryan Coogler | 2018 | 3 | Ricky | Sharma | M | 34 | 3.5 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | Black Panther | Ryan Coogler | 2018 | 1 | Soumya | Ghosh | M | 36 | 4.0 | 3 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | Black Panther | Ryan Coogler | 2018 | 5 | Bidisha | Kar | F | 35 | 3.7 | 3 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | Incredibles 2 | Brand Bird | 2018 | 4 | Idhant | Ghosh | M | 6 | 3.2 | 5 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | Incredibles 2 | Brand Bird | 2018 | 3 | Ricky | Sharma | M | 34 | 2.8 | 6 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | Incredibles 2 | Brand Bird | 2018 | 2 | Oliver | Asmus | M | 36 | 2.4 | 5 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | Incredibles 2 | Brand Bird | 2018 | 1 | Soumya | Ghosh | M | 36 | 3.5 | 4 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | Incredibles 2 | Brand Bird | 2018 | 5 | Bidisha | Kar | F | 35 | 4.1 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | Incredibles 2 | Brand Bird | 2018 | 10 | Reese | McGarvey | F | 7 | 3.6 | 3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | Incredibles 2 | Brand Bird | 2018 | 6 | Krishanu | Ghosh | M | 34 | 3.6 | 4 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 1 | Soumya | Ghosh | M | 36 | 4.2 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 3 | Ricky | Sharma | M | 34 | 3.9 | 3 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 10 | Reese | McGarvey | F | 7 | 1.3 | 6 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 4 | Idhant | Ghosh | M | 6 | 3.7 | 3 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 6 | Krishanu | Ghosh | M | 34 | 4.2 | 2 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 2 | Oliver | Asmus | M | 36 | 4.3 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | Avengers: Infinity War | Anthony Russo | 2018 | 5 | Bidisha | Kar | F | 35 | 3.2 | 4 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
4 | Annihilation | Alex Garland | 2018 | 1 | Soumya | Ghosh | M | 36 | 2.5 | 6 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
4 | Annihilation | Alex Garland | 2018 | 6 | Krishanu | Ghosh | M | 34 | 3.7 | 3 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
4 | Annihilation | Alex Garland | 2018 | 5 | Bidisha | Kar | F | 35 | 2.4 | 6 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
4 | Annihilation | Alex Garland | 2018 | 2 | Oliver | Asmus | M | 36 | 3.0 | 3 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
4 | Annihilation | Alex Garland | 2018 | 3 | Ricky | Sharma | M | 34 | 4.2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
4 | Annihilation | Alex Garland | 2018 | 4 | Idhant | Ghosh | M | 6 | 2.9 | 6 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
4 | Annihilation | Alex Garland | 2018 | 10 | Reese | McGarvey | F | 7 | 3.7 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 4 | Idhant | Ghosh | M | 6 | 4.1 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 5 | Bidisha | Kar | F | 35 | 4.3 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 2 | Oliver | Asmus | M | 36 | 2.8 | 4 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 6 | Krishanu | Ghosh | M | 34 | 2.9 | 6 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 10 | Reese | McGarvey | F | 7 | 2.0 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 3 | Ricky | Sharma | M | 34 | 3.7 | 4 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
5 | Ant-Man and the Wasp | Peyton Reed | 2018 | 1 | Soumya | Ghosh | M | 36 | 3.0 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 4 | Idhant | Ghosh | M | 6 | 3.6 | 4 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 2 | Oliver | Asmus | M | 36 | 2.2 | 6 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 3 | Ricky | Sharma | M | 34 | 4.5 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 10 | Reese | McGarvey | F | 7 | 3.0 | 4 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 5 | Bidisha | Kar | F | 35 | 3.0 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 6 | Krishanu | Ghosh | M | 34 | 3.0 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
6 | Mission Impossible: Fallout | Christopher McQuarrie | 2018 | 1 | Soumya | Ghosh | M | 36 | 4.1 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
Structure of the analytical data set.
str(detail_ratings_df)
## 'data.frame': 42 obs. of 19 variables:
## $ Movie_ID : int 1 1 1 1 1 1 1 2 2 2 ...
## $ Movie_Title : Factor w/ 6 levels "Annihilation",..: 4 4 4 4 4 4 4 5 5 5 ...
## $ Director_Name: Factor w/ 6 levels "Alex Garland",..: 6 6 6 6 6 6 6 3 3 3 ...
## $ Movie_Year : int 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
## $ User_ID : int 6 10 2 4 3 1 5 4 3 2 ...
## $ First_Name : Factor w/ 7 levels "Bidisha","Idhant",..: 3 5 4 2 6 7 1 2 6 4 ...
## $ Last_Name : Factor w/ 5 levels "Asmus","Ghosh",..: 2 4 1 2 5 2 3 2 5 1 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 1 2 2 2 2 1 2 2 2 ...
## $ Age : int 34 7 36 6 34 36 35 6 34 36 ...
## $ Rating : num 4.6 4.6 3.2 4.6 3.5 4 3.7 3.2 2.8 2.4 ...
## $ Movie_Rank : int 1 1 2 1 5 3 3 5 6 5 ...
## $ Action : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Adventure : int 1 1 1 1 1 1 1 0 0 0 ...
## $ Animation : int 0 0 0 0 0 0 0 1 1 1 ...
## $ Drama : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Fantasy : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Horror : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Sci-Fi : int 1 1 1 1 1 1 1 0 0 0 ...
## $ Thriller : int 0 0 0 0 0 0 0 0 0 0 ...
Here I have created a side by side Boxplot for Ratings by Movie titles. This plot provides a comparative between the 6 selected titles.
p<-ggplot(detail_ratings_df, aes(x = Movie_Title, y = Rating, color = Movie_Title)) +
geom_boxplot()
p + theme(legend.position="bottom") + labs(title = "Movie Rating Boxplot By Title", x = "Movie Title") + theme(
plot.title = element_text(color="black", size=14, face="bold.italic"),
axis.title.x = element_text(color="blue", size=14, face="bold"),
axis.title.y = element_text(color="#993333", size=14, face="bold")
)
Here, I have calculate Mean Percentage of Movie Genre by Gender:
summary_df <- subset(detail_ratings_df,select = c(8,12:19))
summary_df <- aggregate(summary_df,by = list(summary_df$Gender),FUN=mean, na.action=na.omit)
summary_df <- summary_df[-2]
summary_df <- cbind(summary_df,gender=c(1:2))
summary_df <-summary_df[,2:length(summary_df)]
summary_df.long <- melt(summary_df,id.vars="gender")
ggplot(summary_df.long,aes(x=variable,y=value,fill=factor(gender)))+
geom_bar(stat="identity",position="dodge")+
scale_fill_discrete(name="Gender",
breaks=c(1, 2),
labels=c("Female", "Male")) + labs(title = "Movie Genre Popularity By Gender", x = "Genre", y = "Mean Percentage")