The Movies Datasets are extracted from: https://www.kaggle.com/rounakbanik/the-movies-dataset

For movies_metadata, several steps are performed to get the data prepared for model selection:

The analysis primarily focuses on two datasets which are movies_metadata.csv and ratings.csv. We let average rating be the createria to evaluate the quality of movies. For the purpose of building model to assess the movie, we decide to leave out several variables based on their variable types and practical significance. These variables include budget, revenue, runtime, timing factors (day of the week and year), production countries, cast and crew. And We use python to clean the raw data, leaving only relevant observations. We also created indicator variables based on characteristics of each factor.

Dependent Var Coefficiennts
budget -0.027
popularity 0.040
revenue 0.003
runtime 0.024
dayMon 0.361
dayTue 0.499
dayWed 0.345
dayThu 0.211
daySat 0.341
daySun 0.463
year -0.023
country# 0.138
cast_size 0.008
crew_size 0.006
En Langrage 0.338
MadeinusUS -0.874
not winter 0.202
goodactor 0.469
gooddir 0.478

What Makes a good Movie

Year:

People tend to beautify memories, and this also works for movies. As time goes on (year), it generally become harder for a movie to be considered good. On one hand, people?s expectations towards movies are getting higher. In other words, production companies need to detect the trend and tailor the preferences of people. On the other hand, survivorship bias may also exist. Most movies produced in 80s and 90s are well-known and classic, including The Shawshank Redemption, Forrest Gump and Titanic, and they never fade away with time elapsing. However, this does not mean that there were no junk movies in the past. The movie database is built in the recent years, namely, people might ignored those bad old movies, and only gave high scores to those classic movies.

Seasons:

Seasons indeed affect people?s views about movies, especially when it comes to winter. One possible reason might be that there are lot of holidays during the winter. As a result, people tend to be in good mood in those days and are more willing to give high grades for those movies. Nevertheless, self-selection bias might be an issue since those industry giants, who has larger influence in the market, have more chance to release movies in peak seasons.

Money:

Intuitively, revenue and budget shall have vital positive impact on the quality of movies. However, in the model we build, this is not the case when we hold all other variables constant. Investing a huge amount of money may not guarantee a good score.

Actors/Directors:

Star actors and famous directors to some extent guarantee the quality of movies. First, they are more experienced and have better over performance compared to. Second, they typically would not accept bad movie scripts. Those popular actors also have a strong fanbase who will gave high scores to them regardless of the quality of the movies.

Size:

Crew size, cast size and production country size have positive impacts on scores because generally the a large and conprehensive team would make a movie more diversified. However, larger size does not indicate a movie is good.

Length:

Runtime is also positively related to scores. With more runtime, a movie can better shape the characteristics of roles, enrich the plots and intensify the feelings of audience.

## Appendix:

Correlation between year and score

Correlation between cast size and score

Correlation between runtime and score

Movie title and tagline world cloud