Sylvia Gong
July 26, 17
My focus for final project would be movies on IMDB. Every year, thousands of movies are produced worldwide. Before going to cinema, people tend to check IMDB to see whether their favourite actors and directors are in the cast list, how high the movie scores and probably check on the critics of the movie. Then, they will decide whether or not they are willing to buy a ticket.
What interests me is how investors make their decisions and make predictions on the performance of incoming movies. Do they have certain evaluation method or criteria to see if the movie has potential to attract the audience and make profits? How much they can rely on IMDB scores to maximize the Movie Box? And to what extent should they post ads on facebook? I hope to provide a reference for investors by analyzing the correlation of facebook likes vs IMDB score, Comparing net profit with movie type and sorting the top 50 actors and directors based on the IMDB score of movies they are involved in.
I will use data from Kaggle. My dataset(IMDB 5000 Movie Dataset)contains 28 variables, spanning across 100 years in 66 countries over 5000 movies from IMDB, scrapped from IMDB by Chuan Sun. Some key variables I plan to use are:
To realize this analysis in R studio, these packages are needed:
library(readr): CSV file, e.g. the read_csv function
library(ggplot2): Data visualization
library(dplyr): Manipulating data
library(tidyverse): Tidying data
Analysis goals and approaches are as follows:
First thing for an investor to consider is how competitive the cast is. So I will generate overall rating of actors and directors respectively based on the mean score of movies they were formerly involved in. Then I will rank and list Top 50 actors and directors to suggest that new movies with these people is a guarantee of high box office and huge success.
Another visualization is basically about the corelation of the movie type& duration vs profit. So I would create a new variable called profit(profit=gross-budget).Then using histogram and bar chart to figure out what type of movie makes the most profit and how duration affects profit.
In order to make marketing and advertising decisions, investors would like to know how effective their ads on facebook are. So I will use scatter plot to examine the correlation between facebook likes and its IMDB score.
Movie investors would like to predict how well a movie will perform before its release. So certain standards would probably include: duration, type, actors and directors and the propaganda means. To examine to what extent these variables are related to the IMDB scores and box office, my analysis would gave answers and provide reference.