Purpose: By doing a regresson analysis, we want to know: 1) Among the 27 variables given, which of them are critical in telling the IMDB rating of a movie. 2) Is there any correlation between genre & IMDB raging,face number in poster & IMDB rating,director name & IMDB rating and duration & IMDB rating. 3) Predict the IMDB Score using our model
m<- read.csv('movie_metadata.csv')
This data set was found from Kaggle. The author scraped 5000+ movies from IMDB website using a Python library called “scrapy” and obtain all needed 28 variables for 5043 movies and 4906 posters (998MB), spanning across 100 years in 66 countries. There are 2399 unique director names, and thousands of actors/actresses. Below are the 28 variables: “movie_title” “color” “num_critic_for_reviews” “movie_facebook_likes” “duration” “director_name” “director_facebook_likes” “actor_3_name” “actor_3_facebook_likes” “actor_2_name” “actor_2_facebook_likes” “actor_1_name” “actor_1_facebook_likes” “gross” “genres” “num_voted_users” “cast_total_facebook_likes” “facenumber_in_poster” “plot_keywords” “movie_imdb_link” “num_user_for_reviews” “language” “country” “content_rating” “budget” “title_year” “imdb_score” “aspect_ratio”
This dataset is a proof of concept. It can be used for experimental and learning purpose.For comprehensive movie analysis and accurate movie ratings prediction, 28 attributes from 5000 movies might not be enough. A decent dataset could contain hundreds of attributes from 50K or more movies, and requires tons of feature engineering.
Assign the first word of genres as the genre of each movie:(genres been split into words in Excel):
# remove columns X-X.8
which(colnames(m)=='genres')
[1] 10
which(colnames(m)=='X.8')
[1] 19
m<-m[,-c(11:19)]
Only keep movie data for USA, bacause the “budget” variable was not all converted to US dollars, which might cause a problem in later analysis. If we want to convert all budgets into US dollarts, we have to take in to consideration for inflation as well. This might make the problem more complicated. Therefore, for pratice purpose, we decided to only study data for movies of USA.
movie.usa<-m[which(m[,'country']=='USA'),]
Double check:
movie.usa$country
[1] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[23] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[45] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[67] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[89] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[111] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[133] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[155] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[177] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[199] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[221] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[243] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[265] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[287] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[309] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[331] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[353] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[375] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[397] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[419] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[441] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[463] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[485] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[507] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[529] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[551] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[573] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[595] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[617] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[639] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[661] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[683] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[705] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[727] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[749] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[771] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[793] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[815] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[837] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[859] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[881] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[903] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[925] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[947] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[969] USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA
[991] USA USA USA USA USA USA USA USA USA USA
[ reached getOption("max.print") -- omitted 2807 entries ]
66 Levels: Afghanistan Argentina Aruba Australia Bahamas Belgium Brazil Bulgaria ... West Germany
Remove ‘language’ since after removing all countries except for USA, there is only 4 languages aside from English, not meaningful for our prediction.
summary(movie.usa$language)
Aboriginal Arabic Aramaic Bosnian Cantonese Chinese Czech
10 0 0 1 1 1 0 0
Danish Dari Dutch Dzongkha English Filipino French German
0 1 0 0 3779 1 0 0
Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese
0 1 1 0 0 0 0 1
Kannada Kazakh Korean Mandarin Maya Mongolian None Norwegian
0 0 0 0 1 0 1 0
Panjabi Persian Polish Portuguese Romanian Russian Slovenian Spanish
0 0 0 0 0 0 0 7
Swahili Swedish Tamil Telugu Thai Urdu Vietnamese Zulu
0 0 0 0 0 0 1 0
movie.usa<-movie.usa[, -which(names(movie.usa)=='language')]
Remove ‘movie_imdb_link’ column since it’s not useful for our analysis and store the rest od the data as ‘movie’.
movie.df= data.frame(movie.usa)
mm<-movie.df[, -which(names(movie.df)=='movie_imdb_link')]
str(mm)
'data.frame': 3807 obs. of 26 variables:
$ color : Factor w/ 3 levels ""," Black and White",..: 3 3 3 3 3 3 3 3 3 3 ...
$ director_name : Factor w/ 2399 levels "","\xcc\xe4mile Gaudreault",..: 926 799 379 106 2030 1652 1225 2394 284 799 ...
$ num_critic_for_reviews : int 723 302 813 462 392 324 635 673 434 313 ...
$ duration : int 178 169 164 132 156 100 141 183 169 151 ...
$ director_facebook_likes : int 0 563 22000 475 0 15 0 0 0 563 ...
$ actor_3_facebook_likes : int 855 1000 23000 530 4000 284 19000 2000 903 1000 ...
$ actor_2_name : Factor w/ 3033 levels "","50 Cent","A. Michael Baldwin",..: 1408 2218 534 2549 1228 801 2440 1704 1911 2218 ...
$ actor_1_facebook_likes : int 1000 40000 27000 640 24000 799 26000 15000 18000 40000 ...
$ gross : int 760505847 309404152 448130642 73058679 336530303 200807262 458991599 330249062 200069408 423032628 ...
$ genres : Factor w/ 21 levels "Action","Adventure",..: 1 1 1 1 1 2 1 1 1 1 ...
$ actor_1_name : Factor w/ 2098 levels "","\xcc\xd2lafur Darri \xcc\xd2lafsson",..: 303 982 1968 441 786 221 337 740 1104 982 ...
$ movie_title : Factor w/ 4917 levels "[Rec] 2\xe5\xca",..: 397 2731 3707 1960 3289 3459 398 460 3416 2732 ...
$ num_voted_users : int 886204 471220 1144337 212204 383056 294810 462669 371639 240396 522040 ...
$ cast_total_facebook_likes: int 4834 48350 106759 1873 46055 2036 92000 24450 29991 48486 ...
$ actor_3_name : Factor w/ 3522 levels "","\xcc\xd2scar Jaenada",..: 3442 1393 1769 2714 1969 2162 3018 57 1134 1393 ...
$ facenumber_in_poster : int 0 0 0 1 0 1 4 0 0 2 ...
$ plot_keywords : Factor w/ 4761 levels "","10 year old|dog|florida|girl|supermarket",..: 1320 4283 3484 651 4745 29 1142 1564 3312 2188 ...
$ num_user_for_reviews : int 3054 1238 2701 738 1902 387 1117 3018 2367 1832 ...
$ country : Factor w/ 66 levels "","Afghanistan",..: 65 65 65 65 65 65 65 65 65 65 ...
$ content_rating : Factor w/ 19 levels "","Approved",..: 10 10 10 10 10 9 10 10 10 10 ...
$ budget : num 2.37e+08 3.00e+08 2.50e+08 2.64e+08 2.58e+08 ...
$ title_year : int 2009 2007 2012 2012 2007 2010 2015 2016 2006 2006 ...
$ actor_2_facebook_likes : int 936 5000 23000 632 11000 553 21000 4000 10000 5000 ...
$ imdb_score : num 7.9 7.1 8.5 6.6 6.2 7.8 7.5 6.9 6.1 7.3 ...
$ aspect_ratio : num 1.78 2.35 2.35 2.35 2.35 1.85 2.35 2.35 2.35 2.35 ...
$ movie_facebook_likes : int 33000 0 164000 24000 0 29000 118000 197000 0 5000 ...
Check for missing values:
library(Amelia)
Loading required package: Rcpp
package ‘Rcpp’ was built under R version 3.3.2##
## Amelia II: Multiple Imputation
## (Version 1.7.4, built: 2015-12-05)
## Copyright (C) 2005-2017 James Honaker, Gary King and Matthew Blackwell
## Refer to http://gking.harvard.edu/amelia/ for more information
##
missmap(mm, main = "Missing values vs observed")
sapply(mm,function(x) sum(is.na(x))) # number of missing values for each variable
color director_name num_critic_for_reviews
0 0 39
duration director_facebook_likes actor_3_facebook_likes
6 74 13
actor_2_name actor_1_facebook_likes gross
0 4 572
genres actor_1_name movie_title
0 0 0
num_voted_users cast_total_facebook_likes actor_3_name
0 0 0
facenumber_in_poster plot_keywords num_user_for_reviews
12 0 13
country content_rating budget
0 0 298
title_year actor_2_facebook_likes imdb_score
74 7 0
aspect_ratio movie_facebook_likes
222 0
We noticed that there are many missing values for budget,aspect ratio and gross.
Omit missing values:
movie<-na.omit(mm)
sapply(movie,function(x) sum(is.na(x))) # double check for missing values
color director_name num_critic_for_reviews
0 0 0
duration director_facebook_likes actor_3_facebook_likes
0 0 0
actor_2_name actor_1_facebook_likes gross
0 0 0
genres actor_1_name movie_title
0 0 0
num_voted_users cast_total_facebook_likes actor_3_name
0 0 0
facenumber_in_poster plot_keywords num_user_for_reviews
0 0 0
country content_rating budget
0 0 0
title_year actor_2_facebook_likes imdb_score
0 0 0
aspect_ratio movie_facebook_likes
0 0
library(psych)
package ‘psych’ was built under R version 3.3.2
Attaching package: ‘psych’
The following object is masked from ‘package:car’:
logit
library(car)
library(RColorBrewer)
library(corrplot)
library(ggplot2)
package ‘ggplot2’ was built under R version 3.3.2
Attaching package: ‘ggplot2’
The following objects are masked from ‘package:psych’:
%+%, alpha
Explore title_year predictor:
range(movie$title_year) # check movie title year
[1] 1920 2016
sum(with(movie,title_year=='2009')) # 145
[1] 145
sum(with(movie,title_year=='2014')) # 121
[1] 121
Visualization of title Year vs. Score:
scatterplot(x=movie$title_year,y=movie$imdb_score)
There are many outliers for title year. The mojority of data points are around the year of 2000 and later,which make sense that this is less movies in the early years. Also, an intering notice is that movies from early years tend to have higher scores.
Visualization of IMDB Score:
max(movie$imdb_score) # 9.4
[1] 9.3
ggplot(movie, aes(x = imdb_score)) +
geom_histogram(aes(fill = ..count..), binwidth =0.5) +
scale_x_continuous(name = "IMDB Score",
breaks = seq(0,10),
limits=c(1, 10)) +
ggtitle("Histogram of Movie IMDB Score") +
scale_fill_gradient("Count", low = "blue", high = "red")
sum(with(movie,imdb_score>=8))
[1] 148
# 148 movies with IMDB score greater or equal to 8.
IMDB score looks normal.The highest score is 9.4 out of scale 10. And we can consider movies with a score greater or equal to 8 a great movie from many perspectives.
Exploring correlation :
pairs.panels(movie[c('director_name','duration','facenumber_in_poster','imdb_score','genres')])
from the plot, only duration and IMBD score has a high correlation. face number in posters has a negative correaltion with IMBD score. genre has little correlatin with score Interesting, director name has no correlation with IMDB score
pairs.panels(movie[c('color','actor_1_name','title_year','imdb_score','aspect_ratio','gross')])
Color and title year has highly positive correlation. Color and aspect ratia,gross has smaller positive correlations. Actor 1 namem has very small positive correlation with gross, meaning who plays the movies does not have impact on the gross. Title year and aspect ratio and color are highly positively correlated. IMDB score has very small positive correlation with actor 1 name ,which means who was the actor 1 does not make the movie has a higher score. Interestingly, IMDB score has a negative correlation with title year,which means the old movies seems to have a higher score. the result agrees with out pbservation from the scatter plot. IMDB and aspect ratio has small positive correlation. IMDB has a strong positive correlation with gross.
Corplot for all numerical variables:
nums<- sapply(movie,is.numeric) # select numeric columns
movie.num<- movie[,nums]
corrplot(cor(movie.num),method='ellipse')
Note: corrplot cannot use data.frame, use cor() to change it to matrix.
From the correlation plot, we can tell that: Face number in poster has negative correlation with all other predictors. Cast total facebook likes and actor 1 facebook likes has a stronger positive correlation. budget and gross have strong correaltion which is not surprising. Interestingly, IMDB scores has strong positive corrlation with number of critics for review, which means the more the critics review, the higher the score.Duration and number of voted users also have strong positive correlation with IMDB scores.
Find the pairs of correlations
which(colnames(movie.num)=='title_year')
[1] 12
movie.num<- movie.num[,-12] # taking out title_year
corr.test(movie.num,y=NULL,use='pairwise',method='pearson',adjust='holm',alpha=0.05) # x must be numeric
Call:corr.test(x = movie.num, y = NULL, use = "pairwise", method = "pearson",
adjust = "holm", alpha = 0.05)
Correlation matrix
num_critic_for_reviews duration director_facebook_likes
num_critic_for_reviews 1.00 0.26 0.19
duration 0.26 1.00 0.21
director_facebook_likes 0.19 0.21 1.00
actor_3_facebook_likes 0.28 0.14 0.12
actor_1_facebook_likes 0.17 0.09 0.09
gross 0.48 0.28 0.14
num_voted_users 0.60 0.37 0.32
cast_total_facebook_likes 0.25 0.13 0.12
facenumber_in_poster -0.03 0.01 -0.05
num_user_for_reviews 0.57 0.36 0.24
budget 0.49 0.30 0.09
actor_2_facebook_likes 0.28 0.15 0.12
imdb_score 0.36 0.38 0.22
aspect_ratio 0.18 0.16 0.05
movie_facebook_likes 0.71 0.25 0.17
actor_3_facebook_likes actor_1_facebook_likes gross num_voted_users
num_critic_for_reviews 0.28 0.17 0.48 0.60
duration 0.14 0.09 0.28 0.37
director_facebook_likes 0.12 0.09 0.14 0.32
actor_3_facebook_likes 1.00 0.25 0.30 0.28
actor_1_facebook_likes 0.25 1.00 0.13 0.17
gross 0.30 0.13 1.00 0.64
num_voted_users 0.28 0.17 0.64 1.00
cast_total_facebook_likes 0.48 0.95 0.22 0.25
facenumber_in_poster 0.10 0.05 -0.04 -0.04
num_user_for_reviews 0.22 0.12 0.55 0.78
budget 0.27 0.15 0.64 0.40
actor_2_facebook_likes 0.55 0.38 0.25 0.25
imdb_score 0.09 0.12 0.27 0.51
aspect_ratio 0.05 0.05 0.07 0.09
movie_facebook_likes 0.31 0.12 0.38 0.52
cast_total_facebook_likes facenumber_in_poster num_user_for_reviews
num_critic_for_reviews 0.25 -0.03 0.57
duration 0.13 0.01 0.36
director_facebook_likes 0.12 -0.05 0.24
actor_3_facebook_likes 0.48 0.10 0.22
actor_1_facebook_likes 0.95 0.05 0.12
gross 0.22 -0.04 0.55
num_voted_users 0.25 -0.04 0.78
cast_total_facebook_likes 1.00 0.07 0.18
facenumber_in_poster 0.07 1.00 -0.09
num_user_for_reviews 0.18 -0.09 1.00
budget 0.23 -0.03 0.40
actor_2_facebook_likes 0.63 0.07 0.20
imdb_score 0.14 -0.07 0.35
aspect_ratio 0.07 0.01 0.10
movie_facebook_likes 0.21 0.01 0.39
budget actor_2_facebook_likes imdb_score aspect_ratio
num_critic_for_reviews 0.49 0.28 0.36 0.18
duration 0.30 0.15 0.38 0.16
director_facebook_likes 0.09 0.12 0.22 0.05
actor_3_facebook_likes 0.27 0.55 0.09 0.05
actor_1_facebook_likes 0.15 0.38 0.12 0.05
gross 0.64 0.25 0.27 0.07
num_voted_users 0.40 0.25 0.51 0.09
cast_total_facebook_likes 0.23 0.63 0.14 0.07
facenumber_in_poster -0.03 0.07 -0.07 0.01
num_user_for_reviews 0.40 0.20 0.35 0.10
budget 1.00 0.25 0.07 0.18
actor_2_facebook_likes 0.25 1.00 0.13 0.07
imdb_score 0.07 0.13 1.00 0.04
aspect_ratio 0.18 0.07 0.04 1.00
movie_facebook_likes 0.33 0.25 0.29 0.11
movie_facebook_likes
num_critic_for_reviews 0.71
duration 0.25
director_facebook_likes 0.17
actor_3_facebook_likes 0.31
actor_1_facebook_likes 0.12
gross 0.38
num_voted_users 0.52
cast_total_facebook_likes 0.21
facenumber_in_poster 0.01
num_user_for_reviews 0.39
budget 0.33
actor_2_facebook_likes 0.25
imdb_score 0.29
aspect_ratio 0.11
movie_facebook_likes 1.00
Sample Size
[1] 3005
Probability values (Entries above the diagonal are adjusted for multiple tests.)
num_critic_for_reviews duration director_facebook_likes
num_critic_for_reviews 0.00 0.00 0.00
duration 0.00 0.00 0.00
director_facebook_likes 0.00 0.00 0.00
actor_3_facebook_likes 0.00 0.00 0.00
actor_1_facebook_likes 0.00 0.00 0.00
gross 0.00 0.00 0.00
num_voted_users 0.00 0.00 0.00
cast_total_facebook_likes 0.00 0.00 0.00
facenumber_in_poster 0.09 0.66 0.00
num_user_for_reviews 0.00 0.00 0.00
budget 0.00 0.00 0.00
actor_2_facebook_likes 0.00 0.00 0.00
imdb_score 0.00 0.00 0.00
aspect_ratio 0.00 0.00 0.01
movie_facebook_likes 0.00 0.00 0.00
actor_3_facebook_likes actor_1_facebook_likes gross num_voted_users
num_critic_for_reviews 0.00 0.00 0.00 0.00
duration 0.00 0.00 0.00 0.00
director_facebook_likes 0.00 0.00 0.00 0.00
actor_3_facebook_likes 0.00 0.00 0.00 0.00
actor_1_facebook_likes 0.00 0.00 0.00 0.00
gross 0.00 0.00 0.00 0.00
num_voted_users 0.00 0.00 0.00 0.00
cast_total_facebook_likes 0.00 0.00 0.00 0.00
facenumber_in_poster 0.00 0.01 0.05 0.02
num_user_for_reviews 0.00 0.00 0.00 0.00
budget 0.00 0.00 0.00 0.00
actor_2_facebook_likes 0.00 0.00 0.00 0.00
imdb_score 0.00 0.00 0.00 0.00
aspect_ratio 0.01 0.00 0.00 0.00
movie_facebook_likes 0.00 0.00 0.00 0.00
cast_total_facebook_likes facenumber_in_poster num_user_for_reviews
num_critic_for_reviews 0 0.46 0
duration 0 1.00 0
director_facebook_likes 0 0.05 0
actor_3_facebook_likes 0 0.00 0
actor_1_facebook_likes 0 0.06 0
gross 0 0.28 0
num_voted_users 0 0.13 0
cast_total_facebook_likes 0 0.00 0
facenumber_in_poster 0 0.00 0
num_user_for_reviews 0 0.00 0
budget 0 0.14 0
actor_2_facebook_likes 0 0.00 0
imdb_score 0 0.00 0
aspect_ratio 0 0.55 0
movie_facebook_likes 0 0.50 0
budget actor_2_facebook_likes imdb_score aspect_ratio
num_critic_for_reviews 0.00 0 0.00 0.00
duration 0.00 0 0.00 0.00
director_facebook_likes 0.00 0 0.00 0.08
actor_3_facebook_likes 0.00 0 0.00 0.06
actor_1_facebook_likes 0.00 0 0.00 0.04
gross 0.00 0 0.00 0.00
num_voted_users 0.00 0 0.00 0.00
cast_total_facebook_likes 0.00 0 0.00 0.00
facenumber_in_poster 0.56 0 0.00 1.00
num_user_for_reviews 0.00 0 0.00 0.00
budget 0.00 0 0.00 0.00
actor_2_facebook_likes 0.00 0 0.00 0.00
imdb_score 0.00 0 0.00 0.26
aspect_ratio 0.00 0 0.04 0.00
movie_facebook_likes 0.00 0 0.00 0.00
movie_facebook_likes
num_critic_for_reviews 0
duration 0
director_facebook_likes 0
actor_3_facebook_likes 0
actor_1_facebook_likes 0
gross 0
num_voted_users 0
cast_total_facebook_likes 0
facenumber_in_poster 1
num_user_for_reviews 0
budget 0
actor_2_facebook_likes 0
imdb_score 0
aspect_ratio 0
movie_facebook_likes 0
To see confidence intervals of the correlations, print with the short=FALSE option
# Boxplots for significant categorical predictors
Boxplot(movie$imdb_score,movie$color)
[1] "2110" "1763" "2467" "2216" "2391" "2541" "270" "1708" "2477" "423" "1530" "2444"
Black and white movies seems to have a hither meadian rate, and overall a little higher scores. Colors movies have many outliers.
Boxplot for genre:
fill <- "Blue"
line <- "Red"
ggplot(movie, aes(x = genres, y =imdb_score)) +
geom_boxplot(fill = fill, colour = line) +
scale_y_continuous(name = "IMDB Score",
breaks = seq(0, 11, 0.5),
limits=c(0, 11)) +
scale_x_discrete(name = "Genres") +
ggtitle("Boxplot of IMDB Score and Genres")
From the boxplot of genres, “Documentation” has the highest median score.And Trill movies has the lowest median. But it is also because there is 1 observation for thrill movies in our data set.
summary(movie$genres)
Action Adventure Animation Biography Comedy Crime Documentary
751 291 36 137 853 204 25
Drama Family Fantasy Film-Noir Game-Show History Horror
506 3 31 0 0 0 138
Music Musical Mystery Romance Sci-Fi Thriller Western
0 2 16 2 7 1 2
library(ggplot2)
fill <- "Blue"
line <- "Red"
ggplot(movie, aes(x = as.factor(title_year), y =imdb_score)) +
geom_boxplot(fill = fill, colour = line) +
scale_y_continuous(name = "IMDB Score",
breaks = seq(1.5, 10, 0.5),
limits=c(1.5, 10)) +
scale_x_discrete(name = "title_year") +
ggtitle("Boxplot of IMDB Score and Genres")
The median of imdb score of all years seem different. So let’s try to treat title_year as categorical.
# Scatter plot matrix for correlation significant numerical variables
scatterplotMatrix(~movie$imdb_score+movie$num_voted_users+movie$num_critic_for_reviews+movie$num_user_for_reviews+movie$duration+movie$facenumber_in_poster+movie$gross+movie$movie_facebook_likes+movie$director_facebook_likes+movie$cast_total_facebook_likes+movie$budget)
movie.sig<-movie[,c('imdb_score','num_voted_users','num_critic_for_reviews','num_user_for_reviews','duration','facenumber_in_poster','gross','movie_facebook_likes','director_facebook_likes','cast_total_facebook_likes','budget','title_year','genres')]
Step function to check AIC criteria:
null=lm(movie.sig$imdb_score~1) # set null model
summary(null)
Call:
lm(formula = movie.sig$imdb_score ~ 1)
Residuals:
Min 1Q Median 3Q Max
-4.7873 -0.5873 0.1127 0.7127 2.9127
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.3873 0.0192 332.6 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.053 on 3004 degrees of freedom
full1=lm(movie.sig$imdb_score~movie.sig$num_voted_users+movie.sig$num_critic_for_reviews+movie.sig$num_user_for_reviews+movie.sig$duration+movie.sig$facenumber_in_poster+movie.sig$gross+movie.sig$movie_facebook_likes+movie.sig$director_facebook_likes+movie.sig$cast_total_facebook_likes+movie.sig$budget+factor(movie.sig$title_year)+factor(movie.sig$genres))
summary(full1)
Call:
lm(formula = movie.sig$imdb_score ~ movie.sig$num_voted_users +
movie.sig$num_critic_for_reviews + movie.sig$num_user_for_reviews +
movie.sig$duration + movie.sig$facenumber_in_poster + movie.sig$gross +
movie.sig$movie_facebook_likes + movie.sig$director_facebook_likes +
movie.sig$cast_total_facebook_likes + movie.sig$budget +
factor(movie.sig$title_year) + factor(movie.sig$genres))
Residuals:
Min 1Q Median 3Q Max
-4.5897 -0.3615 0.0729 0.4856 2.1550
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.442e+00 7.706e-01 4.467 8.24e-06 ***
movie.sig$num_voted_users 3.302e-06 1.810e-07 18.243 < 2e-16 ***
movie.sig$num_critic_for_reviews 4.325e-03 2.402e-04 18.007 < 2e-16 ***
movie.sig$num_user_for_reviews -6.180e-04 6.494e-05 -9.517 < 2e-16 ***
movie.sig$duration 8.234e-03 8.069e-04 10.204 < 2e-16 ***
movie.sig$facenumber_in_poster -1.327e-02 6.923e-03 -1.916 0.05544 .
movie.sig$gross -8.069e-12 3.121e-10 -0.026 0.97938
movie.sig$movie_facebook_likes -5.585e-06 1.064e-06 -5.247 1.65e-07 ***
movie.sig$director_facebook_likes -2.722e-06 4.558e-06 -0.597 0.55045
movie.sig$cast_total_facebook_likes 9.422e-07 7.227e-07 1.304 0.19244
movie.sig$budget -4.721e-09 5.166e-10 -9.137 < 2e-16 ***
factor(movie.sig$title_year)1929 1.942e+00 1.357e+00 1.431 0.15259
factor(movie.sig$title_year)1933 3.127e+00 1.082e+00 2.889 0.00389 **
factor(movie.sig$title_year)1935 3.275e+00 1.082e+00 3.027 0.00250 **
factor(movie.sig$title_year)1936 3.419e+00 1.083e+00 3.158 0.00160 **
factor(movie.sig$title_year)1937 1.916e+00 1.091e+00 1.757 0.07903 .
factor(movie.sig$title_year)1939 1.748e+00 9.393e-01 1.861 0.06291 .
factor(movie.sig$title_year)1940 1.943e+00 1.090e+00 1.782 0.07478 .
factor(movie.sig$title_year)1946 1.989e+00 9.384e-01 2.119 0.03414 *
factor(movie.sig$title_year)1947 2.717e+00 1.080e+00 2.515 0.01197 *
factor(movie.sig$title_year)1948 2.290e+00 1.083e+00 2.115 0.03451 *
factor(movie.sig$title_year)1950 1.974e+00 1.084e+00 1.822 0.06862 .
factor(movie.sig$title_year)1952 1.305e+00 1.083e+00 1.206 0.22805
factor(movie.sig$title_year)1953 1.756e+00 9.376e-01 1.873 0.06121 .
factor(movie.sig$title_year)1954 2.689e+00 1.080e+00 2.489 0.01286 *
factor(movie.sig$title_year)1959 2.639e+00 1.083e+00 2.437 0.01485 *
factor(movie.sig$title_year)1960 2.738e+00 1.086e+00 2.520 0.01179 *
factor(movie.sig$title_year)1961 1.910e+00 1.081e+00 1.767 0.07728 .
factor(movie.sig$title_year)1963 2.429e+00 1.085e+00 2.240 0.02519 *
factor(movie.sig$title_year)1964 2.215e+00 9.384e-01 2.361 0.01830 *
factor(movie.sig$title_year)1965 1.548e+00 8.586e-01 1.802 0.07159 .
factor(movie.sig$title_year)1969 2.242e+00 1.084e+00 2.068 0.03869 *
factor(movie.sig$title_year)1970 1.474e+00 8.867e-01 1.663 0.09648 .
factor(movie.sig$title_year)1971 1.460e+00 9.362e-01 1.560 0.11891
factor(movie.sig$title_year)1972 1.067e+00 9.387e-01 1.136 0.25597
factor(movie.sig$title_year)1973 2.403e+00 8.562e-01 2.807 0.00503 **
factor(movie.sig$title_year)1974 2.218e+00 8.273e-01 2.681 0.00739 **
factor(movie.sig$title_year)1975 1.115e+00 9.403e-01 1.186 0.23583
factor(movie.sig$title_year)1976 1.669e+00 9.381e-01 1.779 0.07540 .
factor(movie.sig$title_year)1977 1.866e+00 8.400e-01 2.221 0.02643 *
factor(movie.sig$title_year)1978 2.020e+00 8.198e-01 2.464 0.01378 *
factor(movie.sig$title_year)1979 1.326e+00 8.574e-01 1.546 0.12219
factor(movie.sig$title_year)1980 1.803e+00 7.955e-01 2.267 0.02347 *
factor(movie.sig$title_year)1981 1.498e+00 8.075e-01 1.855 0.06366 .
factor(movie.sig$title_year)1982 1.676e+00 7.908e-01 2.119 0.03419 *
factor(movie.sig$title_year)1983 1.888e+00 8.075e-01 2.338 0.01944 *
factor(movie.sig$title_year)1984 1.777e+00 7.870e-01 2.258 0.02404 *
factor(movie.sig$title_year)1985 1.820e+00 8.014e-01 2.271 0.02319 *
factor(movie.sig$title_year)1986 1.628e+00 7.847e-01 2.074 0.03814 *
factor(movie.sig$title_year)1987 1.379e+00 7.814e-01 1.765 0.07764 .
factor(movie.sig$title_year)1988 1.784e+00 7.797e-01 2.289 0.02218 *
factor(movie.sig$title_year)1989 1.793e+00 7.786e-01 2.303 0.02137 *
factor(movie.sig$title_year)1990 1.707e+00 7.812e-01 2.185 0.02897 *
factor(movie.sig$title_year)1991 1.556e+00 7.785e-01 1.998 0.04577 *
factor(movie.sig$title_year)1992 1.943e+00 7.786e-01 2.495 0.01264 *
factor(movie.sig$title_year)1993 1.618e+00 7.767e-01 2.083 0.03735 *
factor(movie.sig$title_year)1994 1.596e+00 7.749e-01 2.059 0.03957 *
factor(movie.sig$title_year)1995 1.590e+00 7.723e-01 2.058 0.03965 *
factor(movie.sig$title_year)1996 1.576e+00 7.701e-01 2.047 0.04078 *
factor(movie.sig$title_year)1997 1.470e+00 7.701e-01 1.909 0.05639 .
factor(movie.sig$title_year)1998 1.549e+00 7.701e-01 2.011 0.04443 *
factor(movie.sig$title_year)1999 1.399e+00 7.690e-01 1.819 0.06895 .
factor(movie.sig$title_year)2000 1.220e+00 7.689e-01 1.586 0.11278
factor(movie.sig$title_year)2001 1.314e+00 7.685e-01 1.709 0.08754 .
factor(movie.sig$title_year)2002 1.236e+00 7.685e-01 1.608 0.10793
factor(movie.sig$title_year)2003 1.126e+00 7.692e-01 1.464 0.14323
factor(movie.sig$title_year)2004 1.239e+00 7.692e-01 1.611 0.10723
factor(movie.sig$title_year)2005 1.212e+00 7.694e-01 1.575 0.11540
factor(movie.sig$title_year)2006 1.095e+00 7.690e-01 1.424 0.15460
factor(movie.sig$title_year)2007 1.091e+00 7.695e-01 1.418 0.15636
factor(movie.sig$title_year)2008 8.872e-01 7.693e-01 1.153 0.24889
factor(movie.sig$title_year)2009 8.571e-01 7.695e-01 1.114 0.26545
factor(movie.sig$title_year)2010 8.090e-01 7.698e-01 1.051 0.29337
factor(movie.sig$title_year)2011 6.344e-01 7.703e-01 0.824 0.41021
factor(movie.sig$title_year)2012 7.178e-01 7.702e-01 0.932 0.35139
factor(movie.sig$title_year)2013 7.544e-01 7.703e-01 0.979 0.32750
factor(movie.sig$title_year)2014 9.061e-01 7.702e-01 1.176 0.23953
factor(movie.sig$title_year)2015 9.843e-01 7.706e-01 1.277 0.20158
factor(movie.sig$title_year)2016 1.428e+00 7.754e-01 1.841 0.06570 .
factor(movie.sig$genres)Adventure 3.932e-01 5.440e-02 7.227 6.28e-13 ***
factor(movie.sig$genres)Animation 7.409e-01 1.368e-01 5.417 6.55e-08 ***
factor(movie.sig$genres)Biography 6.773e-01 7.675e-02 8.825 < 2e-16 ***
factor(movie.sig$genres)Comedy 1.815e-01 4.394e-02 4.132 3.70e-05 ***
factor(movie.sig$genres)Crime 4.623e-01 6.495e-02 7.119 1.37e-12 ***
factor(movie.sig$genres)Documentary 1.123e+00 1.610e-01 6.976 3.74e-12 ***
factor(movie.sig$genres)Drama 5.687e-01 4.918e-02 11.563 < 2e-16 ***
factor(movie.sig$genres)Family 2.556e-01 4.515e-01 0.566 0.57140
factor(movie.sig$genres)Fantasy -2.319e-01 1.452e-01 -1.597 0.11044
factor(movie.sig$genres)Horror -4.106e-01 7.748e-02 -5.299 1.25e-07 ***
factor(movie.sig$genres)Musical 7.483e-02 8.189e-01 0.091 0.92719
factor(movie.sig$genres)Mystery 2.051e-01 1.958e-01 1.048 0.29486
factor(movie.sig$genres)Romance 7.581e-01 5.431e-01 1.396 0.16283
factor(movie.sig$genres)Sci-Fi 2.155e-01 2.954e-01 0.729 0.46579
factor(movie.sig$genres)Thriller -3.305e-01 7.688e-01 -0.430 0.66731
factor(movie.sig$genres)Western -7.065e-02 5.566e-01 -0.127 0.89901
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7637 on 2910 degrees of freedom
Multiple R-squared: 0.4902, Adjusted R-squared: 0.4737
F-statistic: 29.77 on 94 and 2910 DF, p-value: < 2.2e-16
step(null,scope = list(lower=null,upper=full1),direction = 'forward')
Start: AIC=309.81
movie.sig$imdb_score ~ 1
Df Sum of Sq RSS AIC
+ movie.sig$num_voted_users 1 871.90 2457.2 -600.74
+ movie.sig$duration 1 491.13 2838.0 -167.82
+ movie.sig$num_critic_for_reviews 1 428.38 2900.8 -102.10
+ movie.sig$num_user_for_reviews 1 407.62 2921.5 -80.68
+ factor(movie.sig$genres) 16 331.02 2998.1 27.10
+ movie.sig$movie_facebook_likes 1 282.82 3046.3 45.02
+ movie.sig$gross 1 242.62 3086.5 84.42
+ movie.sig$director_facebook_likes 1 166.17 3163.0 157.95
+ movie.sig$cast_total_facebook_likes 1 64.28 3264.8 253.22
+ factor(movie.sig$title_year) 68 201.59 3127.5 258.10
+ movie.sig$budget 1 16.26 3312.9 297.09
+ movie.sig$facenumber_in_poster 1 15.14 3314.0 298.11
<none> 3329.1 309.81
Step: AIC=-600.74
movie.sig$imdb_score ~ movie.sig$num_voted_users
Df Sum of Sq RSS AIC
+ factor(movie.sig$genres) 16 311.531 2145.7 -976.12
+ movie.sig$duration 1 147.786 2309.4 -785.13
+ movie.sig$budget 1 73.211 2384.0 -689.63
+ factor(movie.sig$title_year) 68 164.699 2292.5 -673.22
+ movie.sig$num_user_for_reviews 1 21.297 2435.9 -624.90
+ movie.sig$gross 1 16.929 2440.3 -619.51
+ movie.sig$num_critic_for_reviews 1 14.632 2442.6 -616.69
+ movie.sig$director_facebook_likes 1 13.657 2443.6 -615.49
+ movie.sig$facenumber_in_poster 1 6.789 2450.4 -607.05
+ movie.sig$movie_facebook_likes 1 2.627 2454.6 -601.95
<none> 2457.2 -600.74
+ movie.sig$cast_total_facebook_likes 1 0.524 2456.7 -599.38
Step: AIC=-976.12
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres)
Df Sum of Sq RSS AIC
+ factor(movie.sig$title_year) 68 169.011 1976.7 -1086.66
+ movie.sig$duration 1 74.584 2071.1 -1080.44
+ movie.sig$budget 1 28.689 2117.0 -1014.57
+ movie.sig$num_critic_for_reviews 1 23.116 2122.6 -1006.67
+ movie.sig$num_user_for_reviews 1 12.251 2133.4 -991.33
+ movie.sig$director_facebook_likes 1 3.707 2142.0 -979.32
+ movie.sig$facenumber_in_poster 1 3.274 2142.4 -978.71
+ movie.sig$movie_facebook_likes 1 1.686 2144.0 -976.49
<none> 2145.7 -976.12
+ movie.sig$gross 1 1.391 2144.3 -976.07
+ movie.sig$cast_total_facebook_likes 1 0.362 2145.3 -974.63
Step: AIC=-1086.66
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year)
Df Sum of Sq RSS AIC
+ movie.sig$num_critic_for_reviews 1 124.119 1852.6 -1279.5
+ movie.sig$duration 1 42.067 1934.6 -1149.3
+ movie.sig$budget 1 9.722 1967.0 -1099.5
+ movie.sig$movie_facebook_likes 1 6.179 1970.5 -1094.1
+ movie.sig$num_user_for_reviews 1 5.685 1971.0 -1093.3
+ movie.sig$gross 1 2.494 1974.2 -1088.5
+ movie.sig$facenumber_in_poster 1 2.421 1974.3 -1088.3
+ movie.sig$cast_total_facebook_likes 1 2.206 1974.5 -1088.0
<none> 1976.7 -1086.7
+ movie.sig$director_facebook_likes 1 1.135 1975.5 -1086.4
Step: AIC=-1279.54
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews
Df Sum of Sq RSS AIC
+ movie.sig$num_user_for_reviews 1 43.496 1809.1 -1348.9
+ movie.sig$budget 1 42.322 1810.2 -1347.0
+ movie.sig$duration 1 24.346 1828.2 -1317.3
+ movie.sig$gross 1 12.691 1839.9 -1298.2
+ movie.sig$movie_facebook_likes 1 6.919 1845.7 -1288.8
<none> 1852.6 -1279.5
+ movie.sig$facenumber_in_poster 1 0.614 1852.0 -1278.5
+ movie.sig$cast_total_facebook_likes 1 0.309 1852.3 -1278.0
+ movie.sig$director_facebook_likes 1 0.087 1852.5 -1277.7
Step: AIC=-1348.93
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews
Df Sum of Sq RSS AIC
+ movie.sig$budget 1 35.821 1773.2 -1407.0
+ movie.sig$duration 1 34.245 1774.8 -1404.4
+ movie.sig$movie_facebook_likes 1 11.143 1797.9 -1365.5
+ movie.sig$gross 1 9.280 1799.8 -1362.4
<none> 1809.1 -1348.9
+ movie.sig$facenumber_in_poster 1 0.796 1808.3 -1348.2
+ movie.sig$cast_total_facebook_likes 1 0.098 1809.0 -1347.1
+ movie.sig$director_facebook_likes 1 0.063 1809.0 -1347.0
Step: AIC=-1407.03
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget
Df Sum of Sq RSS AIC
+ movie.sig$duration 1 57.072 1716.2 -1503.3
+ movie.sig$movie_facebook_likes 1 13.161 1760.1 -1427.4
<none> 1773.2 -1407.0
+ movie.sig$cast_total_facebook_likes 1 0.931 1772.3 -1406.6
+ movie.sig$facenumber_in_poster 1 0.782 1772.5 -1406.3
+ movie.sig$director_facebook_likes 1 0.040 1773.2 -1405.1
+ movie.sig$gross 1 0.019 1773.2 -1405.1
Step: AIC=-1503.34
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget + movie.sig$duration
Df Sum of Sq RSS AIC
+ movie.sig$movie_facebook_likes 1 15.9234 1700.2 -1529.3
+ movie.sig$facenumber_in_poster 1 1.8762 1714.3 -1504.6
<none> 1716.2 -1503.3
+ movie.sig$cast_total_facebook_likes 1 0.6024 1715.6 -1502.4
+ movie.sig$director_facebook_likes 1 0.1491 1716.0 -1501.6
+ movie.sig$gross 1 0.0523 1716.1 -1501.4
Step: AIC=-1529.35
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget + movie.sig$duration +
movie.sig$movie_facebook_likes
Df Sum of Sq RSS AIC
+ movie.sig$facenumber_in_poster 1 1.89050 1698.4 -1530.7
<none> 1700.2 -1529.3
+ movie.sig$cast_total_facebook_likes 1 0.74712 1699.5 -1528.7
+ movie.sig$director_facebook_likes 1 0.13168 1700.1 -1527.6
+ movie.sig$gross 1 0.00233 1700.2 -1527.3
Step: AIC=-1530.69
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget + movie.sig$duration +
movie.sig$movie_facebook_likes + movie.sig$facenumber_in_poster
Df Sum of Sq RSS AIC
<none> 1698.4 -1530.7
+ movie.sig$cast_total_facebook_likes 1 0.95072 1697.4 -1530.4
+ movie.sig$director_facebook_likes 1 0.16803 1698.2 -1529.0
+ movie.sig$gross 1 0.00204 1698.4 -1528.7
Call:
lm(formula = movie.sig$imdb_score ~ movie.sig$num_voted_users +
factor(movie.sig$genres) + factor(movie.sig$title_year) +
movie.sig$num_critic_for_reviews + movie.sig$num_user_for_reviews +
movie.sig$budget + movie.sig$duration + movie.sig$movie_facebook_likes +
movie.sig$facenumber_in_poster)
Coefficients:
(Intercept) movie.sig$num_voted_users
3.443e+00 3.308e-06
factor(movie.sig$genres)Adventure factor(movie.sig$genres)Animation
3.929e-01 7.387e-01
factor(movie.sig$genres)Biography factor(movie.sig$genres)Comedy
6.766e-01 1.814e-01
factor(movie.sig$genres)Crime factor(movie.sig$genres)Documentary
4.629e-01 1.117e+00
factor(movie.sig$genres)Drama factor(movie.sig$genres)Family
5.693e-01 2.400e-01
factor(movie.sig$genres)Fantasy factor(movie.sig$genres)Horror
-2.307e-01 -4.092e-01
factor(movie.sig$genres)Musical factor(movie.sig$genres)Mystery
7.334e-02 1.963e-01
factor(movie.sig$genres)Romance factor(movie.sig$genres)Sci-Fi
7.543e-01 2.153e-01
factor(movie.sig$genres)Thriller factor(movie.sig$genres)Western
-3.335e-01 -8.652e-02
factor(movie.sig$title_year)1929 factor(movie.sig$title_year)1933
1.938e+00 3.127e+00
factor(movie.sig$title_year)1935 factor(movie.sig$title_year)1936
3.275e+00 3.419e+00
factor(movie.sig$title_year)1937 factor(movie.sig$title_year)1939
1.916e+00 1.747e+00
factor(movie.sig$title_year)1940 factor(movie.sig$title_year)1946
1.945e+00 1.989e+00
factor(movie.sig$title_year)1947 factor(movie.sig$title_year)1948
2.717e+00 2.291e+00
factor(movie.sig$title_year)1950 factor(movie.sig$title_year)1952
1.976e+00 1.306e+00
factor(movie.sig$title_year)1953 factor(movie.sig$title_year)1954
1.757e+00 2.697e+00
factor(movie.sig$title_year)1959 factor(movie.sig$title_year)1960
2.638e+00 2.700e+00
factor(movie.sig$title_year)1961 factor(movie.sig$title_year)1963
1.912e+00 2.435e+00
factor(movie.sig$title_year)1964 factor(movie.sig$title_year)1965
2.215e+00 1.547e+00
factor(movie.sig$title_year)1969 factor(movie.sig$title_year)1970
2.243e+00 1.475e+00
factor(movie.sig$title_year)1971 factor(movie.sig$title_year)1972
1.461e+00 1.079e+00
factor(movie.sig$title_year)1973 factor(movie.sig$title_year)1974
2.401e+00 2.225e+00
factor(movie.sig$title_year)1975 factor(movie.sig$title_year)1976
1.093e+00 1.677e+00
factor(movie.sig$title_year)1977 factor(movie.sig$title_year)1978
1.853e+00 2.023e+00
factor(movie.sig$title_year)1979 factor(movie.sig$title_year)1980
1.339e+00 1.803e+00
factor(movie.sig$title_year)1981 factor(movie.sig$title_year)1982
1.498e+00 1.674e+00
factor(movie.sig$title_year)1983 factor(movie.sig$title_year)1984
1.890e+00 1.780e+00
factor(movie.sig$title_year)1985 factor(movie.sig$title_year)1986
1.819e+00 1.626e+00
factor(movie.sig$title_year)1987 factor(movie.sig$title_year)1988
1.380e+00 1.788e+00
factor(movie.sig$title_year)1989 factor(movie.sig$title_year)1990
1.797e+00 1.707e+00
factor(movie.sig$title_year)1991 factor(movie.sig$title_year)1992
1.561e+00 1.945e+00
factor(movie.sig$title_year)1993 factor(movie.sig$title_year)1994
1.621e+00 1.599e+00
factor(movie.sig$title_year)1995 factor(movie.sig$title_year)1996
1.593e+00 1.580e+00
factor(movie.sig$title_year)1997 factor(movie.sig$title_year)1998
1.473e+00 1.555e+00
factor(movie.sig$title_year)1999 factor(movie.sig$title_year)2000
1.405e+00 1.224e+00
factor(movie.sig$title_year)2001 factor(movie.sig$title_year)2002
1.319e+00 1.240e+00
factor(movie.sig$title_year)2003 factor(movie.sig$title_year)2004
1.130e+00 1.249e+00
factor(movie.sig$title_year)2005 factor(movie.sig$title_year)2006
1.218e+00 1.102e+00
factor(movie.sig$title_year)2007 factor(movie.sig$title_year)2008
1.098e+00 8.938e-01
factor(movie.sig$title_year)2009 factor(movie.sig$title_year)2010
8.661e-01 8.140e-01
factor(movie.sig$title_year)2011 factor(movie.sig$title_year)2012
6.394e-01 7.259e-01
factor(movie.sig$title_year)2013 factor(movie.sig$title_year)2014
7.608e-01 9.095e-01
factor(movie.sig$title_year)2015 factor(movie.sig$title_year)2016
9.924e-01 1.435e+00
movie.sig$num_critic_for_reviews movie.sig$num_user_for_reviews
4.333e-03 -6.212e-04
movie.sig$budget movie.sig$duration
-4.660e-09 8.216e-03
movie.sig$movie_facebook_likes movie.sig$facenumber_in_poster
-5.556e-06 -1.242e-02
full2=lm(movie.sig$imdb_score~poly(movie.sig$num_voted_users,2)+poly(movie.sig$num_critic_for_reviews,2)+poly(movie.sig$num_user_for_reviews,2)+poly(movie.sig$duration,2)+movie.sig$facenumber_in_poster+poly(movie.sig$gross,2)+poly(movie.sig$movie_facebook_likes,2)+movie.sig$director_facebook_likes+movie.sig$cast_total_facebook_likes+movie.sig$budget+factor(movie.sig$title_year)+movie.sig$genres+movie.sig$facenumber_in_poster*movie.sig$num_critic_for_reviews+movie.sig$num_user_for_reviews*movie.sig$num_voted_users+movie.sig$num_voted_users*movie.sig$gross+movie.sig$gross*movie.sig$budget)
summary(full2)
Call:
lm(formula = movie.sig$imdb_score ~ poly(movie.sig$num_voted_users,
2) + poly(movie.sig$num_critic_for_reviews, 2) + poly(movie.sig$num_user_for_reviews,
2) + poly(movie.sig$duration, 2) + movie.sig$facenumber_in_poster +
poly(movie.sig$gross, 2) + poly(movie.sig$movie_facebook_likes,
2) + movie.sig$director_facebook_likes + movie.sig$cast_total_facebook_likes +
movie.sig$budget + factor(movie.sig$title_year) + movie.sig$genres +
movie.sig$facenumber_in_poster * movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews * movie.sig$num_voted_users +
movie.sig$num_voted_users * movie.sig$gross + movie.sig$gross *
movie.sig$budget)
Residuals:
Min 1Q Median 3Q Max
-5.0063 -0.3576 0.0462 0.4432 2.1605
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value
(Intercept) 5.204e+00 7.263e-01 7.165
poly(movie.sig$num_voted_users, 2)1 2.341e+01 3.441e+00 6.803
poly(movie.sig$num_voted_users, 2)2 -1.994e+01 2.189e+00 -9.108
poly(movie.sig$num_critic_for_reviews, 2)1 2.463e+01 1.877e+00 13.124
poly(movie.sig$num_critic_for_reviews, 2)2 -1.188e+01 1.031e+00 -11.514
poly(movie.sig$num_user_for_reviews, 2)1 -2.408e+01 2.373e+00 -10.149
poly(movie.sig$num_user_for_reviews, 2)2 7.292e+00 1.634e+00 4.462
poly(movie.sig$duration, 2)1 1.084e+01 9.428e-01 11.500
poly(movie.sig$duration, 2)2 -3.255e+00 7.836e-01 -4.155
movie.sig$facenumber_in_poster -5.368e-03 1.102e-02 -0.487
poly(movie.sig$gross, 2)1 -1.789e+01 2.404e+00 -7.442
poly(movie.sig$gross, 2)2 -6.179e+00 1.463e+00 -4.224
poly(movie.sig$movie_facebook_likes, 2)1 2.325e+00 1.462e+00 1.590
poly(movie.sig$movie_facebook_likes, 2)2 6.159e-02 8.416e-01 0.073
movie.sig$director_facebook_likes 2.046e-06 4.366e-06 0.469
movie.sig$cast_total_facebook_likes 2.325e-08 6.857e-07 0.034
movie.sig$budget -8.543e-09 7.172e-10 -11.912
factor(movie.sig$title_year)1929 1.949e+00 1.284e+00 1.518
factor(movie.sig$title_year)1933 3.060e+00 1.023e+00 2.991
factor(movie.sig$title_year)1935 3.224e+00 1.023e+00 3.151
factor(movie.sig$title_year)1936 2.975e+00 1.024e+00 2.906
factor(movie.sig$title_year)1937 1.918e+00 1.032e+00 1.859
factor(movie.sig$title_year)1939 1.532e+00 8.894e-01 1.722
factor(movie.sig$title_year)1940 1.690e+00 1.030e+00 1.641
factor(movie.sig$title_year)1946 1.891e+00 8.871e-01 2.132
factor(movie.sig$title_year)1947 2.625e+00 1.021e+00 2.570
factor(movie.sig$title_year)1948 2.207e+00 1.023e+00 2.157
factor(movie.sig$title_year)1950 2.048e+00 1.024e+00 2.000
factor(movie.sig$title_year)1952 1.270e+00 1.023e+00 1.241
factor(movie.sig$title_year)1953 1.708e+00 8.862e-01 1.928
factor(movie.sig$title_year)1954 2.322e+00 1.022e+00 2.273
factor(movie.sig$title_year)1959 2.027e+00 1.024e+00 1.979
factor(movie.sig$title_year)1960 2.077e+00 1.028e+00 2.020
factor(movie.sig$title_year)1961 1.702e+00 1.022e+00 1.665
factor(movie.sig$title_year)1963 2.631e+00 1.026e+00 2.565
factor(movie.sig$title_year)1964 2.060e+00 8.873e-01 2.322
factor(movie.sig$title_year)1965 1.576e+00 8.123e-01 1.941
factor(movie.sig$title_year)1969 2.015e+00 1.025e+00 1.966
factor(movie.sig$title_year)1970 1.390e+00 8.385e-01 1.657
factor(movie.sig$title_year)1971 1.434e+00 8.850e-01 1.621
factor(movie.sig$title_year)1972 1.485e+00 8.893e-01 1.669
factor(movie.sig$title_year)1973 2.265e+00 8.098e-01 2.798
factor(movie.sig$title_year)1974 2.270e+00 7.829e-01 2.900
factor(movie.sig$title_year)1975 1.041e+00 8.897e-01 1.170
factor(movie.sig$title_year)1976 1.370e+00 8.869e-01 1.544
factor(movie.sig$title_year)1977 2.044e+00 7.958e-01 2.568
factor(movie.sig$title_year)1978 2.025e+00 7.754e-01 2.611
factor(movie.sig$title_year)1979 1.346e+00 8.122e-01 1.657
factor(movie.sig$title_year)1980 1.818e+00 7.529e-01 2.415
factor(movie.sig$title_year)1981 1.476e+00 7.633e-01 1.934
factor(movie.sig$title_year)1982 1.543e+00 7.479e-01 2.063
factor(movie.sig$title_year)1983 1.823e+00 7.634e-01 2.388
factor(movie.sig$title_year)1984 1.628e+00 7.441e-01 2.188
factor(movie.sig$title_year)1985 1.745e+00 7.577e-01 2.303
factor(movie.sig$title_year)1986 1.506e+00 7.419e-01 2.030
factor(movie.sig$title_year)1987 1.292e+00 7.388e-01 1.749
factor(movie.sig$title_year)1988 1.651e+00 7.371e-01 2.240
factor(movie.sig$title_year)1989 1.677e+00 7.360e-01 2.279
factor(movie.sig$title_year)1990 1.551e+00 7.387e-01 2.099
factor(movie.sig$title_year)1991 1.527e+00 7.359e-01 2.075
factor(movie.sig$title_year)1992 1.851e+00 7.361e-01 2.515
factor(movie.sig$title_year)1993 1.593e+00 7.344e-01 2.169
factor(movie.sig$title_year)1994 1.702e+00 7.328e-01 2.323
factor(movie.sig$title_year)1995 1.526e+00 7.301e-01 2.090
factor(movie.sig$title_year)1996 1.513e+00 7.280e-01 2.079
factor(movie.sig$title_year)1997 1.362e+00 7.281e-01 1.871
factor(movie.sig$title_year)1998 1.499e+00 7.282e-01 2.058
factor(movie.sig$title_year)1999 1.367e+00 7.272e-01 1.881
factor(movie.sig$title_year)2000 1.176e+00 7.273e-01 1.617
factor(movie.sig$title_year)2001 1.274e+00 7.269e-01 1.752
factor(movie.sig$title_year)2002 1.215e+00 7.269e-01 1.671
factor(movie.sig$title_year)2003 1.068e+00 7.276e-01 1.468
factor(movie.sig$title_year)2004 1.130e+00 7.276e-01 1.553
factor(movie.sig$title_year)2005 1.118e+00 7.277e-01 1.537
factor(movie.sig$title_year)2006 9.849e-01 7.274e-01 1.354
factor(movie.sig$title_year)2007 8.735e-01 7.280e-01 1.200
factor(movie.sig$title_year)2008 6.967e-01 7.278e-01 0.957
factor(movie.sig$title_year)2009 6.603e-01 7.282e-01 0.907
factor(movie.sig$title_year)2010 5.734e-01 7.285e-01 0.787
factor(movie.sig$title_year)2011 3.745e-01 7.290e-01 0.514
factor(movie.sig$title_year)2012 5.302e-01 7.288e-01 0.728
factor(movie.sig$title_year)2013 4.507e-01 7.293e-01 0.618
factor(movie.sig$title_year)2014 6.078e-01 7.292e-01 0.834
factor(movie.sig$title_year)2015 7.152e-01 7.298e-01 0.980
factor(movie.sig$title_year)2016 1.221e+00 7.345e-01 1.663
movie.sig$genresAdventure 4.147e-01 5.209e-02 7.963
movie.sig$genresAnimation 8.255e-01 1.307e-01 6.315
movie.sig$genresBiography 6.369e-01 7.311e-02 8.712
movie.sig$genresComedy 1.729e-01 4.198e-02 4.119
movie.sig$genresCrime 4.532e-01 6.208e-02 7.301
movie.sig$genresDocumentary 1.218e+00 1.531e-01 7.956
movie.sig$genresDrama 5.564e-01 4.696e-02 11.848
movie.sig$genresFamily 7.131e-01 4.314e-01 1.653
movie.sig$genresFantasy -2.514e-01 1.379e-01 -1.823
movie.sig$genresHorror -3.781e-01 7.504e-02 -5.039
movie.sig$genresMusical -1.898e-02 7.744e-01 -0.025
movie.sig$genresMystery 2.197e-01 1.854e-01 1.185
movie.sig$genresRomance 8.488e-01 5.134e-01 1.653
movie.sig$genresSci-Fi 1.748e-01 2.795e-01 0.626
movie.sig$genresThriller -4.007e-02 7.275e-01 -0.055
movie.sig$genresWestern 8.554e-02 5.268e-01 0.162
movie.sig$num_critic_for_reviews NA NA NA
movie.sig$num_user_for_reviews NA NA NA
movie.sig$num_voted_users NA NA NA
movie.sig$gross NA NA NA
movie.sig$facenumber_in_poster:movie.sig$num_critic_for_reviews -4.689e-05 4.201e-05 -1.116
movie.sig$num_user_for_reviews:movie.sig$num_voted_users 8.369e-10 2.778e-10 3.013
movie.sig$num_voted_users:movie.sig$gross 2.173e-15 1.090e-15 1.993
movie.sig$budget:movie.sig$gross 3.385e-17 4.074e-18 8.309
Pr(>|t|)
(Intercept) 9.85e-13 ***
poly(movie.sig$num_voted_users, 2)1 1.24e-11 ***
poly(movie.sig$num_voted_users, 2)2 < 2e-16 ***
poly(movie.sig$num_critic_for_reviews, 2)1 < 2e-16 ***
poly(movie.sig$num_critic_for_reviews, 2)2 < 2e-16 ***
poly(movie.sig$num_user_for_reviews, 2)1 < 2e-16 ***
poly(movie.sig$num_user_for_reviews, 2)2 8.44e-06 ***
poly(movie.sig$duration, 2)1 < 2e-16 ***
poly(movie.sig$duration, 2)2 3.35e-05 ***
movie.sig$facenumber_in_poster 0.62618
poly(movie.sig$gross, 2)1 1.30e-13 ***
poly(movie.sig$gross, 2)2 2.47e-05 ***
poly(movie.sig$movie_facebook_likes, 2)1 0.11189
poly(movie.sig$movie_facebook_likes, 2)2 0.94167
movie.sig$director_facebook_likes 0.63937
movie.sig$cast_total_facebook_likes 0.97295
movie.sig$budget < 2e-16 ***
factor(movie.sig$title_year)1929 0.12918
factor(movie.sig$title_year)1933 0.00280 **
factor(movie.sig$title_year)1935 0.00164 **
factor(movie.sig$title_year)1936 0.00369 **
factor(movie.sig$title_year)1937 0.06314 .
factor(movie.sig$title_year)1939 0.08513 .
factor(movie.sig$title_year)1940 0.10099
factor(movie.sig$title_year)1946 0.03308 *
factor(movie.sig$title_year)1947 0.01022 *
factor(movie.sig$title_year)1948 0.03111 *
factor(movie.sig$title_year)1950 0.04562 *
factor(movie.sig$title_year)1952 0.21453
factor(movie.sig$title_year)1953 0.05398 .
factor(movie.sig$title_year)1954 0.02310 *
factor(movie.sig$title_year)1959 0.04787 *
factor(movie.sig$title_year)1960 0.04348 *
factor(movie.sig$title_year)1961 0.09593 .
factor(movie.sig$title_year)1963 0.01038 *
factor(movie.sig$title_year)1964 0.02029 *
factor(movie.sig$title_year)1965 0.05240 .
factor(movie.sig$title_year)1969 0.04939 *
factor(movie.sig$title_year)1970 0.09753 .
factor(movie.sig$title_year)1971 0.10517
factor(movie.sig$title_year)1972 0.09515 .
factor(movie.sig$title_year)1973 0.00518 **
factor(movie.sig$title_year)1974 0.00376 **
factor(movie.sig$title_year)1975 0.24207
factor(movie.sig$title_year)1976 0.12261
factor(movie.sig$title_year)1977 0.01028 *
factor(movie.sig$title_year)1978 0.00907 **
factor(movie.sig$title_year)1979 0.09756 .
factor(movie.sig$title_year)1980 0.01580 *
factor(movie.sig$title_year)1981 0.05320 .
factor(movie.sig$title_year)1982 0.03917 *
factor(movie.sig$title_year)1983 0.01699 *
factor(movie.sig$title_year)1984 0.02875 *
factor(movie.sig$title_year)1985 0.02133 *
factor(movie.sig$title_year)1986 0.04245 *
factor(movie.sig$title_year)1987 0.08034 .
factor(movie.sig$title_year)1988 0.02516 *
factor(movie.sig$title_year)1989 0.02274 *
factor(movie.sig$title_year)1990 0.03587 *
factor(movie.sig$title_year)1991 0.03807 *
factor(movie.sig$title_year)1992 0.01197 *
factor(movie.sig$title_year)1993 0.03014 *
factor(movie.sig$title_year)1994 0.02024 *
factor(movie.sig$title_year)1995 0.03674 *
factor(movie.sig$title_year)1996 0.03775 *
factor(movie.sig$title_year)1997 0.06144 .
factor(movie.sig$title_year)1998 0.03965 *
factor(movie.sig$title_year)1999 0.06013 .
factor(movie.sig$title_year)2000 0.10601
factor(movie.sig$title_year)2001 0.07980 .
factor(movie.sig$title_year)2002 0.09475 .
factor(movie.sig$title_year)2003 0.14217
factor(movie.sig$title_year)2004 0.12046
factor(movie.sig$title_year)2005 0.12445
factor(movie.sig$title_year)2006 0.17586
factor(movie.sig$title_year)2007 0.23027
factor(movie.sig$title_year)2008 0.33851
factor(movie.sig$title_year)2009 0.36457
factor(movie.sig$title_year)2010 0.43128
factor(movie.sig$title_year)2011 0.60746
factor(movie.sig$title_year)2012 0.46697
factor(movie.sig$title_year)2013 0.53662
factor(movie.sig$title_year)2014 0.40459
factor(movie.sig$title_year)2015 0.32716
factor(movie.sig$title_year)2016 0.09647 .
movie.sig$genresAdventure 2.39e-15 ***
movie.sig$genresAnimation 3.11e-10 ***
movie.sig$genresBiography < 2e-16 ***
movie.sig$genresComedy 3.92e-05 ***
movie.sig$genresCrime 3.66e-13 ***
movie.sig$genresDocumentary 2.52e-15 ***
movie.sig$genresDrama < 2e-16 ***
movie.sig$genresFamily 0.09839 .
movie.sig$genresFantasy 0.06835 .
movie.sig$genresHorror 4.97e-07 ***
movie.sig$genresMusical 0.98044
movie.sig$genresMystery 0.23630
movie.sig$genresRomance 0.09843 .
movie.sig$genresSci-Fi 0.53168
movie.sig$genresThriller 0.95608
movie.sig$genresWestern 0.87102
movie.sig$num_critic_for_reviews NA
movie.sig$num_user_for_reviews NA
movie.sig$num_voted_users NA
movie.sig$gross NA
movie.sig$facenumber_in_poster:movie.sig$num_critic_for_reviews 0.26440
movie.sig$num_user_for_reviews:movie.sig$num_voted_users 0.00261 **
movie.sig$num_voted_users:movie.sig$gross 0.04636 *
movie.sig$budget:movie.sig$gross < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7217 on 2900 degrees of freedom
Multiple R-squared: 0.5463, Adjusted R-squared: 0.5301
F-statistic: 33.58 on 104 and 2900 DF, p-value: < 2.2e-16
step(null,scope=list(lower=null,upper=full2),direction='forward')
Start: AIC=309.81
movie.sig$imdb_score ~ 1
Df Sum of Sq RSS AIC
+ poly(movie.sig$num_voted_users, 2) 2 976.96 2352.2 -730.05
+ movie.sig$num_voted_users 1 871.90 2457.2 -600.74
+ poly(movie.sig$duration, 2) 2 536.11 2793.0 -213.83
+ poly(movie.sig$num_user_for_reviews, 2) 2 483.99 2845.1 -158.27
+ poly(movie.sig$num_critic_for_reviews, 2) 2 436.49 2892.6 -108.52
+ movie.sig$num_critic_for_reviews 1 428.38 2900.8 -102.10
+ movie.sig$num_user_for_reviews 1 407.62 2921.5 -80.68
+ poly(movie.sig$movie_facebook_likes, 2) 2 317.80 3011.3 12.32
+ movie.sig$genres 16 331.02 2998.1 27.10
+ poly(movie.sig$gross, 2) 2 251.27 3077.9 77.99
+ movie.sig$gross 1 242.62 3086.5 84.42
+ movie.sig$director_facebook_likes 1 166.17 3163.0 157.95
+ movie.sig$cast_total_facebook_likes 1 64.28 3264.8 253.22
+ factor(movie.sig$title_year) 68 201.59 3127.5 258.10
+ movie.sig$budget 1 16.26 3312.9 297.09
+ movie.sig$facenumber_in_poster 1 15.14 3314.0 298.11
<none> 3329.1 309.81
Step: AIC=-730.05
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2)
Df Sum of Sq RSS AIC
+ movie.sig$genres 16 337.58 2014.6 -1163.60
+ poly(movie.sig$duration, 2) 2 137.87 2214.3 -907.55
+ movie.sig$budget 1 133.09 2219.1 -903.07
+ factor(movie.sig$title_year) 68 169.70 2182.5 -819.06
+ poly(movie.sig$gross, 2) 2 58.78 2293.4 -802.09
+ movie.sig$gross 1 54.53 2297.6 -798.53
+ poly(movie.sig$num_user_for_reviews, 2) 2 29.12 2323.1 -763.48
+ movie.sig$num_user_for_reviews 1 25.39 2326.8 -760.66
+ movie.sig$director_facebook_likes 1 17.94 2334.2 -751.05
+ movie.sig$facenumber_in_poster 1 6.62 2345.5 -736.52
+ poly(movie.sig$num_critic_for_reviews, 2) 2 5.36 2346.8 -732.90
<none> 2352.2 -730.05
+ movie.sig$num_critic_for_reviews 1 0.18 2352.0 -728.28
+ movie.sig$cast_total_facebook_likes 1 0.15 2352.0 -728.23
+ poly(movie.sig$movie_facebook_likes, 2) 2 1.29 2350.9 -727.70
Step: AIC=-1163.6
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2) + movie.sig$genres
Df Sum of Sq RSS AIC
+ factor(movie.sig$title_year) 68 177.578 1837.0 -1304.9
+ movie.sig$budget 1 65.238 1949.3 -1260.5
+ poly(movie.sig$duration, 2) 2 65.750 1948.8 -1259.3
+ movie.sig$gross 1 19.722 1994.9 -1191.2
+ poly(movie.sig$gross, 2) 2 20.698 1993.9 -1190.6
+ poly(movie.sig$num_user_for_reviews, 2) 2 20.024 1994.6 -1189.6
+ movie.sig$num_user_for_reviews 1 14.834 1999.8 -1183.8
+ poly(movie.sig$num_critic_for_reviews, 2) 2 9.375 2005.2 -1173.6
+ movie.sig$director_facebook_likes 1 6.114 2008.5 -1170.7
+ movie.sig$facenumber_in_poster 1 3.792 2010.8 -1167.3
<none> 2014.6 -1163.6
+ movie.sig$cast_total_facebook_likes 1 0.355 2014.2 -1162.1
+ movie.sig$num_critic_for_reviews 1 0.042 2014.5 -1161.7
+ poly(movie.sig$movie_facebook_likes, 2) 2 0.813 2013.8 -1160.8
Step: AIC=-1304.89
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2) + movie.sig$genres +
factor(movie.sig$title_year)
Df Sum of Sq RSS AIC
+ poly(movie.sig$num_critic_for_reviews, 2) 2 87.358 1749.7 -1447.3
+ movie.sig$num_critic_for_reviews 1 43.850 1793.2 -1375.5
+ poly(movie.sig$duration, 2) 2 36.718 1800.3 -1361.6
+ movie.sig$budget 1 33.594 1803.4 -1358.3
+ poly(movie.sig$gross, 2) 2 28.115 1808.9 -1347.2
+ movie.sig$gross 1 24.954 1812.0 -1344.0
+ poly(movie.sig$num_user_for_reviews, 2) 2 14.944 1822.1 -1325.4
+ movie.sig$num_user_for_reviews 1 8.992 1828.0 -1317.6
+ poly(movie.sig$movie_facebook_likes, 2) 2 5.724 1831.3 -1310.3
+ movie.sig$director_facebook_likes 1 2.736 1834.3 -1307.4
+ movie.sig$facenumber_in_poster 1 2.244 1834.8 -1306.6
<none> 1837.0 -1304.9
+ movie.sig$cast_total_facebook_likes 1 0.137 1836.9 -1303.1
Step: AIC=-1447.3
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2) + movie.sig$genres +
factor(movie.sig$title_year) + poly(movie.sig$num_critic_for_reviews,
2)
Df Sum of Sq RSS AIC
+ poly(movie.sig$num_user_for_reviews, 2) 2 73.226 1676.4 -1571.8
+ movie.sig$budget 1 50.370 1699.3 -1533.1
+ movie.sig$num_user_for_reviews 1 39.168 1710.5 -1513.3
+ poly(movie.sig$gross, 2) 2 30.392 1719.2 -1496.0
+ poly(movie.sig$duration, 2) 2 25.330 1724.3 -1487.1
+ movie.sig$gross 1 22.505 1727.1 -1484.2
<none> 1749.7 -1447.3
+ movie.sig$director_facebook_likes 1 1.061 1748.6 -1447.1
+ poly(movie.sig$movie_facebook_likes, 2) 2 1.904 1747.7 -1446.6
+ movie.sig$facenumber_in_poster 1 0.644 1749.0 -1446.4
+ movie.sig$cast_total_facebook_likes 1 0.024 1749.6 -1445.3
Step: AIC=-1571.77
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2) + movie.sig$genres +
factor(movie.sig$title_year) + poly(movie.sig$num_critic_for_reviews,
2) + poly(movie.sig$num_user_for_reviews, 2)
Df Sum of Sq RSS AIC
+ movie.sig$budget 1 41.840 1634.6 -1645.7
+ poly(movie.sig$duration, 2) 2 41.646 1634.8 -1643.4
+ poly(movie.sig$gross, 2) 2 31.255 1645.2 -1624.3
+ movie.sig$gross 1 23.404 1653.0 -1612.0
+ movie.sig$director_facebook_likes 1 1.296 1675.1 -1572.1
<none> 1676.4 -1571.8
+ movie.sig$facenumber_in_poster 1 0.815 1675.6 -1571.2
+ poly(movie.sig$movie_facebook_likes, 2) 2 1.805 1674.6 -1571.0
+ movie.sig$cast_total_facebook_likes 1 0.008 1676.4 -1569.8
Step: AIC=-1645.72
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2) + movie.sig$genres +
factor(movie.sig$title_year) + poly(movie.sig$num_critic_for_reviews,
2) + poly(movie.sig$num_user_for_reviews, 2) + movie.sig$budget
Df Sum of Sq RSS AIC
+ poly(movie.sig$duration, 2) 2 69.659 1564.9 -1772.6
+ poly(movie.sig$gross, 2) 2 8.040 1626.5 -1656.5
+ movie.sig$gross 1 3.883 1630.7 -1650.9
+ movie.sig$director_facebook_likes 1 1.322 1633.3 -1646.2
<none> 1634.6 -1645.7
+ movie.sig$facenumber_in_poster 1 0.854 1633.7 -1645.3
+ movie.sig$cast_total_facebook_likes 1 0.324 1634.3 -1644.3
+ poly(movie.sig$movie_facebook_likes, 2) 2 1.043 1633.5 -1643.6
Step: AIC=-1772.59
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2) + movie.sig$genres +
factor(movie.sig$title_year) + poly(movie.sig$num_critic_for_reviews,
2) + poly(movie.sig$num_user_for_reviews, 2) + movie.sig$budget +
poly(movie.sig$duration, 2)
Df Sum of Sq RSS AIC
+ poly(movie.sig$gross, 2) 2 7.2463 1557.7 -1782.5
+ movie.sig$gross 1 2.8856 1562.0 -1776.1
+ movie.sig$facenumber_in_poster 1 2.5144 1562.4 -1775.4
<none> 1564.9 -1772.6
+ movie.sig$director_facebook_likes 1 0.1493 1564.8 -1770.9
+ movie.sig$cast_total_facebook_likes 1 0.0899 1564.8 -1770.8
+ poly(movie.sig$movie_facebook_likes, 2) 2 0.3493 1564.6 -1769.3
Step: AIC=-1782.54
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2) + movie.sig$genres +
factor(movie.sig$title_year) + poly(movie.sig$num_critic_for_reviews,
2) + poly(movie.sig$num_user_for_reviews, 2) + movie.sig$budget +
poly(movie.sig$duration, 2) + poly(movie.sig$gross, 2)
Df Sum of Sq RSS AIC
+ movie.sig$facenumber_in_poster 1 2.57765 1555.1 -1785.5
<none> 1557.7 -1782.5
+ movie.sig$director_facebook_likes 1 0.14917 1557.5 -1780.8
+ movie.sig$cast_total_facebook_likes 1 0.07798 1557.6 -1780.7
+ poly(movie.sig$movie_facebook_likes, 2) 2 0.49944 1557.2 -1779.5
Step: AIC=-1785.51
movie.sig$imdb_score ~ poly(movie.sig$num_voted_users, 2) + movie.sig$genres +
factor(movie.sig$title_year) + poly(movie.sig$num_critic_for_reviews,
2) + poly(movie.sig$num_user_for_reviews, 2) + movie.sig$budget +
poly(movie.sig$duration, 2) + poly(movie.sig$gross, 2) +
movie.sig$facenumber_in_poster
Df Sum of Sq RSS AIC
<none> 1555.1 -1785.5
+ movie.sig$cast_total_facebook_likes 1 0.16144 1554.9 -1783.8
+ movie.sig$director_facebook_likes 1 0.10766 1555.0 -1783.7
+ poly(movie.sig$movie_facebook_likes, 2) 2 0.46225 1554.6 -1782.4
Call:
lm(formula = movie.sig$imdb_score ~ poly(movie.sig$num_voted_users,
2) + movie.sig$genres + factor(movie.sig$title_year) + poly(movie.sig$num_critic_for_reviews,
2) + poly(movie.sig$num_user_for_reviews, 2) + movie.sig$budget +
poly(movie.sig$duration, 2) + poly(movie.sig$gross, 2) +
movie.sig$facenumber_in_poster)
Coefficients:
(Intercept) poly(movie.sig$num_voted_users, 2)1
5.276e+00 3.372e+01
poly(movie.sig$num_voted_users, 2)2 movie.sig$genresAdventure
-1.391e+01 4.127e-01
movie.sig$genresAnimation movie.sig$genresBiography
7.875e-01 6.634e-01
movie.sig$genresComedy movie.sig$genresCrime
2.065e-01 4.822e-01
movie.sig$genresDocumentary movie.sig$genresDrama
1.272e+00 5.879e-01
movie.sig$genresFamily movie.sig$genresFantasy
2.312e-01 -1.752e-01
movie.sig$genresHorror movie.sig$genresMusical
-3.127e-01 -1.316e-01
movie.sig$genresMystery movie.sig$genresRomance
2.552e-01 8.610e-01
movie.sig$genresSci-Fi movie.sig$genresThriller
2.024e-01 -2.463e-02
movie.sig$genresWestern factor(movie.sig$title_year)1929
1.171e-01 2.169e+00
factor(movie.sig$title_year)1933 factor(movie.sig$title_year)1935
3.105e+00 3.268e+00
factor(movie.sig$title_year)1936 factor(movie.sig$title_year)1937
3.030e+00 1.763e+00
factor(movie.sig$title_year)1939 factor(movie.sig$title_year)1940
1.515e+00 1.766e+00
factor(movie.sig$title_year)1946 factor(movie.sig$title_year)1947
1.937e+00 2.674e+00
factor(movie.sig$title_year)1948 factor(movie.sig$title_year)1950
2.248e+00 2.045e+00
factor(movie.sig$title_year)1952 factor(movie.sig$title_year)1953
1.289e+00 1.750e+00
factor(movie.sig$title_year)1954 factor(movie.sig$title_year)1959
2.412e+00 2.134e+00
factor(movie.sig$title_year)1960 factor(movie.sig$title_year)1961
2.177e+00 1.767e+00
factor(movie.sig$title_year)1963 factor(movie.sig$title_year)1964
2.676e+00 2.072e+00
factor(movie.sig$title_year)1965 factor(movie.sig$title_year)1969
1.576e+00 2.028e+00
factor(movie.sig$title_year)1970 factor(movie.sig$title_year)1971
1.408e+00 1.458e+00
factor(movie.sig$title_year)1972 factor(movie.sig$title_year)1973
1.454e+00 2.199e+00
factor(movie.sig$title_year)1974 factor(movie.sig$title_year)1975
2.222e+00 8.110e-01
factor(movie.sig$title_year)1976 factor(movie.sig$title_year)1977
1.400e+00 1.781e+00
factor(movie.sig$title_year)1978 factor(movie.sig$title_year)1979
2.010e+00 1.371e+00
factor(movie.sig$title_year)1980 factor(movie.sig$title_year)1981
1.789e+00 1.434e+00
factor(movie.sig$title_year)1982 factor(movie.sig$title_year)1983
1.492e+00 1.757e+00
factor(movie.sig$title_year)1984 factor(movie.sig$title_year)1985
1.611e+00 1.727e+00
factor(movie.sig$title_year)1986 factor(movie.sig$title_year)1987
1.514e+00 1.296e+00
factor(movie.sig$title_year)1988 factor(movie.sig$title_year)1989
1.650e+00 1.670e+00
factor(movie.sig$title_year)1990 factor(movie.sig$title_year)1991
1.487e+00 1.501e+00
factor(movie.sig$title_year)1992 factor(movie.sig$title_year)1993
1.811e+00 1.571e+00
factor(movie.sig$title_year)1994 factor(movie.sig$title_year)1995
1.601e+00 1.478e+00
factor(movie.sig$title_year)1996 factor(movie.sig$title_year)1997
1.472e+00 1.326e+00
factor(movie.sig$title_year)1998 factor(movie.sig$title_year)1999
1.471e+00 1.320e+00
factor(movie.sig$title_year)2000 factor(movie.sig$title_year)2001
1.154e+00 1.247e+00
factor(movie.sig$title_year)2002 factor(movie.sig$title_year)2003
1.185e+00 1.051e+00
factor(movie.sig$title_year)2004 factor(movie.sig$title_year)2005
1.106e+00 1.098e+00
factor(movie.sig$title_year)2006 factor(movie.sig$title_year)2007
9.854e-01 9.008e-01
factor(movie.sig$title_year)2008 factor(movie.sig$title_year)2009
6.958e-01 6.579e-01
factor(movie.sig$title_year)2010 factor(movie.sig$title_year)2011
5.867e-01 3.743e-01
factor(movie.sig$title_year)2012 factor(movie.sig$title_year)2013
5.354e-01 4.718e-01
factor(movie.sig$title_year)2014 factor(movie.sig$title_year)2015
6.415e-01 7.175e-01
factor(movie.sig$title_year)2016 poly(movie.sig$num_critic_for_reviews, 2)1
1.228e+00 2.562e+01
poly(movie.sig$num_critic_for_reviews, 2)2 poly(movie.sig$num_user_for_reviews, 2)1
-8.947e+00 -1.776e+01
poly(movie.sig$num_user_for_reviews, 2)2 movie.sig$budget
1.078e+01 -4.207e-09
poly(movie.sig$duration, 2)1 poly(movie.sig$duration, 2)2
1.054e+01 -3.179e+00
poly(movie.sig$gross, 2)1 poly(movie.sig$gross, 2)2
-3.561e+00 2.408e+00
movie.sig$facenumber_in_poster
-1.454e-02
full3=
lm(movie.sig$imdb_score ~movie.sig$num_voted_users+movie.sig$num_critic_for_reviews+movie.sig$num_user_for_reviews+movie.sig$duration+movie.sig$facenumber_in_poster+movie.sig$gross+movie.sig$movie_facebook_likes+movie.sig$director_facebook_likes+movie.sig$cast_total_facebook_likes+movie.sig$budget+factor(movie.sig$title_year)+factor(movie.sig$genres)+movie.sig$duration*movie.sig$num_voted_users+movie.sig$num_voted_users*movie.sig$num_user_for_reviews+movie.sig$gross*movie.sig$budget,data=movie.sig)
summary(full3)
Call:
lm(formula = movie.sig$imdb_score ~ movie.sig$num_voted_users +
movie.sig$num_critic_for_reviews + movie.sig$num_user_for_reviews +
movie.sig$duration + movie.sig$facenumber_in_poster + movie.sig$gross +
movie.sig$movie_facebook_likes + movie.sig$director_facebook_likes +
movie.sig$cast_total_facebook_likes + movie.sig$budget +
factor(movie.sig$title_year) + factor(movie.sig$genres) +
movie.sig$duration * movie.sig$num_voted_users + movie.sig$num_voted_users *
movie.sig$num_user_for_reviews + movie.sig$gross * movie.sig$budget,
data = movie.sig)
Residuals:
Min 1Q Median 3Q Max
-4.7129 -0.3562 0.0782 0.4729 2.0864
Coefficients:
Estimate Std. Error t value
(Intercept) 2.981e+00 7.587e-01 3.928
movie.sig$num_voted_users 7.878e-06 4.894e-07 16.096
movie.sig$num_critic_for_reviews 3.381e-03 2.630e-04 12.855
movie.sig$num_user_for_reviews -4.598e-04 7.757e-05 -5.927
movie.sig$duration 1.261e-02 9.547e-04 13.210
movie.sig$facenumber_in_poster -1.527e-02 6.800e-03 -2.245
movie.sig$gross -1.794e-09 4.268e-10 -4.204
movie.sig$movie_facebook_likes -4.371e-06 1.059e-06 -4.126
movie.sig$director_facebook_likes 1.614e-06 4.496e-06 0.359
movie.sig$cast_total_facebook_likes 6.334e-07 7.104e-07 0.892
movie.sig$budget -6.198e-09 5.927e-10 -10.457
factor(movie.sig$title_year)1929 1.902e+00 1.333e+00 1.427
factor(movie.sig$title_year)1933 3.244e+00 1.062e+00 3.053
factor(movie.sig$title_year)1935 3.416e+00 1.063e+00 3.215
factor(movie.sig$title_year)1936 3.320e+00 1.063e+00 3.123
factor(movie.sig$title_year)1937 2.114e+00 1.072e+00 1.973
factor(movie.sig$title_year)1939 1.846e+00 9.226e-01 2.001
factor(movie.sig$title_year)1940 2.010e+00 1.070e+00 1.878
factor(movie.sig$title_year)1946 1.897e+00 9.215e-01 2.059
factor(movie.sig$title_year)1947 2.817e+00 1.061e+00 2.655
factor(movie.sig$title_year)1948 2.359e+00 1.063e+00 2.219
factor(movie.sig$title_year)1950 1.990e+00 1.064e+00 1.870
factor(movie.sig$title_year)1952 1.233e+00 1.063e+00 1.160
factor(movie.sig$title_year)1953 1.849e+00 9.205e-01 2.008
factor(movie.sig$title_year)1954 2.671e+00 1.061e+00 2.518
factor(movie.sig$title_year)1959 2.587e+00 1.063e+00 2.433
factor(movie.sig$title_year)1960 2.364e+00 1.067e+00 2.215
factor(movie.sig$title_year)1961 1.872e+00 1.061e+00 1.764
factor(movie.sig$title_year)1963 2.165e+00 1.065e+00 2.032
factor(movie.sig$title_year)1964 2.256e+00 9.215e-01 2.448
factor(movie.sig$title_year)1965 1.423e+00 8.434e-01 1.687
factor(movie.sig$title_year)1969 2.306e+00 1.065e+00 2.166
factor(movie.sig$title_year)1970 1.356e+00 8.708e-01 1.557
factor(movie.sig$title_year)1971 1.446e+00 9.192e-01 1.573
factor(movie.sig$title_year)1972 1.717e+00 9.239e-01 1.859
factor(movie.sig$title_year)1973 2.518e+00 8.410e-01 2.993
factor(movie.sig$title_year)1974 2.548e+00 8.136e-01 3.131
factor(movie.sig$title_year)1975 1.208e+00 9.242e-01 1.307
factor(movie.sig$title_year)1976 1.739e+00 9.212e-01 1.887
factor(movie.sig$title_year)1977 1.930e+00 8.252e-01 2.339
factor(movie.sig$title_year)1978 2.070e+00 8.051e-01 2.572
factor(movie.sig$title_year)1979 1.710e+00 8.430e-01 2.029
factor(movie.sig$title_year)1980 1.781e+00 7.811e-01 2.280
factor(movie.sig$title_year)1981 1.557e+00 7.929e-01 1.964
factor(movie.sig$title_year)1982 1.747e+00 7.765e-01 2.249
factor(movie.sig$title_year)1983 1.930e+00 7.930e-01 2.434
factor(movie.sig$title_year)1984 1.843e+00 7.728e-01 2.385
factor(movie.sig$title_year)1985 1.840e+00 7.869e-01 2.338
factor(movie.sig$title_year)1986 1.698e+00 7.705e-01 2.204
factor(movie.sig$title_year)1987 1.452e+00 7.672e-01 1.893
factor(movie.sig$title_year)1988 1.829e+00 7.656e-01 2.390
factor(movie.sig$title_year)1989 1.824e+00 7.645e-01 2.386
factor(movie.sig$title_year)1990 1.762e+00 7.672e-01 2.296
factor(movie.sig$title_year)1991 1.628e+00 7.644e-01 2.129
factor(movie.sig$title_year)1992 1.952e+00 7.646e-01 2.553
factor(movie.sig$title_year)1993 1.624e+00 7.627e-01 2.129
factor(movie.sig$title_year)1994 1.666e+00 7.610e-01 2.190
factor(movie.sig$title_year)1995 1.602e+00 7.583e-01 2.113
factor(movie.sig$title_year)1996 1.635e+00 7.561e-01 2.162
factor(movie.sig$title_year)1997 1.528e+00 7.562e-01 2.020
factor(movie.sig$title_year)1998 1.601e+00 7.562e-01 2.118
factor(movie.sig$title_year)1999 1.479e+00 7.551e-01 1.959
factor(movie.sig$title_year)2000 1.309e+00 7.551e-01 1.733
factor(movie.sig$title_year)2001 1.387e+00 7.547e-01 1.838
factor(movie.sig$title_year)2002 1.349e+00 7.547e-01 1.787
factor(movie.sig$title_year)2003 1.227e+00 7.554e-01 1.624
factor(movie.sig$title_year)2004 1.309e+00 7.553e-01 1.733
factor(movie.sig$title_year)2005 1.317e+00 7.555e-01 1.744
factor(movie.sig$title_year)2006 1.195e+00 7.551e-01 1.582
factor(movie.sig$title_year)2007 1.189e+00 7.556e-01 1.573
factor(movie.sig$title_year)2008 1.015e+00 7.554e-01 1.343
factor(movie.sig$title_year)2009 1.009e+00 7.557e-01 1.335
factor(movie.sig$title_year)2010 9.395e-01 7.559e-01 1.243
factor(movie.sig$title_year)2011 7.924e-01 7.564e-01 1.048
factor(movie.sig$title_year)2012 9.045e-01 7.564e-01 1.196
factor(movie.sig$title_year)2013 9.371e-01 7.565e-01 1.239
factor(movie.sig$title_year)2014 1.061e+00 7.564e-01 1.403
factor(movie.sig$title_year)2015 1.160e+00 7.568e-01 1.533
factor(movie.sig$title_year)2016 1.609e+00 7.616e-01 2.113
factor(movie.sig$genres)Adventure 3.740e-01 5.348e-02 6.993
factor(movie.sig$genres)Animation 8.077e-01 1.345e-01 6.006
factor(movie.sig$genres)Biography 6.746e-01 7.544e-02 8.942
factor(movie.sig$genres)Comedy 1.760e-01 4.324e-02 4.070
factor(movie.sig$genres)Crime 4.505e-01 6.397e-02 7.043
factor(movie.sig$genres)Documentary 1.073e+00 1.584e-01 6.773
factor(movie.sig$genres)Drama 5.281e-01 4.860e-02 10.867
factor(movie.sig$genres)Family 3.975e-01 4.439e-01 0.896
factor(movie.sig$genres)Fantasy -2.096e-01 1.427e-01 -1.470
factor(movie.sig$genres)Horror -3.874e-01 7.628e-02 -5.078
factor(movie.sig$genres)Musical 1.753e-01 8.043e-01 0.218
factor(movie.sig$genres)Mystery 1.427e-01 1.924e-01 0.742
factor(movie.sig$genres)Romance 7.384e-01 5.332e-01 1.385
factor(movie.sig$genres)Sci-Fi 1.362e-01 2.902e-01 0.470
factor(movie.sig$genres)Thriller -4.247e-01 7.550e-01 -0.563
factor(movie.sig$genres)Western -9.548e-02 5.466e-01 -0.175
movie.sig$num_voted_users:movie.sig$duration -3.022e-08 3.517e-09 -8.591
movie.sig$num_voted_users:movie.sig$num_user_for_reviews -2.767e-10 1.008e-10 -2.745
movie.sig$gross:movie.sig$budget 1.541e-17 2.893e-18 5.326
Pr(>|t|)
(Intercept) 8.75e-05 ***
movie.sig$num_voted_users < 2e-16 ***
movie.sig$num_critic_for_reviews < 2e-16 ***
movie.sig$num_user_for_reviews 3.46e-09 ***
movie.sig$duration < 2e-16 ***
movie.sig$facenumber_in_poster 0.02483 *
movie.sig$gross 2.70e-05 ***
movie.sig$movie_facebook_likes 3.79e-05 ***
movie.sig$director_facebook_likes 0.71967
movie.sig$cast_total_facebook_likes 0.37267
movie.sig$budget < 2e-16 ***
factor(movie.sig$title_year)1929 0.15361
factor(movie.sig$title_year)1933 0.00229 **
factor(movie.sig$title_year)1935 0.00132 **
factor(movie.sig$title_year)1936 0.00181 **
factor(movie.sig$title_year)1937 0.04864 *
factor(movie.sig$title_year)1939 0.04547 *
factor(movie.sig$title_year)1940 0.06047 .
factor(movie.sig$title_year)1946 0.03956 *
factor(movie.sig$title_year)1947 0.00797 **
factor(movie.sig$title_year)1948 0.02655 *
factor(movie.sig$title_year)1950 0.06152 .
factor(movie.sig$title_year)1952 0.24620
factor(movie.sig$title_year)1953 0.04468 *
factor(movie.sig$title_year)1954 0.01187 *
factor(movie.sig$title_year)1959 0.01503 *
factor(movie.sig$title_year)1960 0.02685 *
factor(movie.sig$title_year)1961 0.07782 .
factor(movie.sig$title_year)1963 0.04223 *
factor(movie.sig$title_year)1964 0.01441 *
factor(movie.sig$title_year)1965 0.09165 .
factor(movie.sig$title_year)1969 0.03041 *
factor(movie.sig$title_year)1970 0.11952
factor(movie.sig$title_year)1971 0.11585
factor(movie.sig$title_year)1972 0.06318 .
factor(movie.sig$title_year)1973 0.00278 **
factor(movie.sig$title_year)1974 0.00176 **
factor(movie.sig$title_year)1975 0.19140
factor(movie.sig$title_year)1976 0.05921 .
factor(movie.sig$title_year)1977 0.01940 *
factor(movie.sig$title_year)1978 0.01017 *
factor(movie.sig$title_year)1979 0.04256 *
factor(movie.sig$title_year)1980 0.02267 *
factor(movie.sig$title_year)1981 0.04968 *
factor(movie.sig$title_year)1982 0.02457 *
factor(movie.sig$title_year)1983 0.01499 *
factor(movie.sig$title_year)1984 0.01714 *
factor(movie.sig$title_year)1985 0.01946 *
factor(movie.sig$title_year)1986 0.02764 *
factor(movie.sig$title_year)1987 0.05846 .
factor(movie.sig$title_year)1988 0.01693 *
factor(movie.sig$title_year)1989 0.01710 *
factor(movie.sig$title_year)1990 0.02175 *
factor(movie.sig$title_year)1991 0.03331 *
factor(movie.sig$title_year)1992 0.01074 *
factor(movie.sig$title_year)1993 0.03331 *
factor(movie.sig$title_year)1994 0.02861 *
factor(movie.sig$title_year)1995 0.03472 *
factor(movie.sig$title_year)1996 0.03069 *
factor(movie.sig$title_year)1997 0.04345 *
factor(movie.sig$title_year)1998 0.03428 *
factor(movie.sig$title_year)1999 0.05024 .
factor(movie.sig$title_year)2000 0.08315 .
factor(movie.sig$title_year)2001 0.06612 .
factor(movie.sig$title_year)2002 0.07401 .
factor(movie.sig$title_year)2003 0.10453
factor(movie.sig$title_year)2004 0.08329 .
factor(movie.sig$title_year)2005 0.08128 .
factor(movie.sig$title_year)2006 0.11367
factor(movie.sig$title_year)2007 0.11580
factor(movie.sig$title_year)2008 0.17928
factor(movie.sig$title_year)2009 0.18193
factor(movie.sig$title_year)2010 0.21401
factor(movie.sig$title_year)2011 0.29492
factor(movie.sig$title_year)2012 0.23189
factor(movie.sig$title_year)2013 0.21556
factor(movie.sig$title_year)2014 0.16087
factor(movie.sig$title_year)2015 0.12548
factor(movie.sig$title_year)2016 0.03470 *
factor(movie.sig$genres)Adventure 3.31e-12 ***
factor(movie.sig$genres)Animation 2.14e-09 ***
factor(movie.sig$genres)Biography < 2e-16 ***
factor(movie.sig$genres)Comedy 4.83e-05 ***
factor(movie.sig$genres)Crime 2.35e-12 ***
factor(movie.sig$genres)Documentary 1.52e-11 ***
factor(movie.sig$genres)Drama < 2e-16 ***
factor(movie.sig$genres)Family 0.37052
factor(movie.sig$genres)Fantasy 0.14180
factor(movie.sig$genres)Horror 4.05e-07 ***
factor(movie.sig$genres)Musical 0.82749
factor(movie.sig$genres)Mystery 0.45833
factor(movie.sig$genres)Romance 0.16621
factor(movie.sig$genres)Sci-Fi 0.63874
factor(movie.sig$genres)Thriller 0.57381
factor(movie.sig$genres)Western 0.86134
movie.sig$num_voted_users:movie.sig$duration < 2e-16 ***
movie.sig$num_voted_users:movie.sig$num_user_for_reviews 0.00609 **
movie.sig$gross:movie.sig$budget 1.08e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7498 on 2907 degrees of freedom
Multiple R-squared: 0.5091, Adjusted R-squared: 0.4927
F-statistic: 31.08 on 97 and 2907 DF, p-value: < 2.2e-16
step(null,scope=list(lower=null,upper=full3),direction='forward')
Start: AIC=309.81
movie.sig$imdb_score ~ 1
Df Sum of Sq RSS AIC
+ movie.sig$num_voted_users 1 871.90 2457.2 -600.74
+ movie.sig$duration 1 491.13 2838.0 -167.82
+ movie.sig$num_critic_for_reviews 1 428.38 2900.8 -102.10
+ movie.sig$num_user_for_reviews 1 407.62 2921.5 -80.68
+ factor(movie.sig$genres) 16 331.02 2998.1 27.10
+ movie.sig$movie_facebook_likes 1 282.82 3046.3 45.02
+ movie.sig$gross 1 242.62 3086.5 84.42
+ movie.sig$director_facebook_likes 1 166.17 3163.0 157.95
+ movie.sig$cast_total_facebook_likes 1 64.28 3264.8 253.22
+ factor(movie.sig$title_year) 68 201.59 3127.5 258.10
+ movie.sig$budget 1 16.26 3312.9 297.09
+ movie.sig$facenumber_in_poster 1 15.14 3314.0 298.11
<none> 3329.1 309.81
Step: AIC=-600.74
movie.sig$imdb_score ~ movie.sig$num_voted_users
Df Sum of Sq RSS AIC
+ factor(movie.sig$genres) 16 311.531 2145.7 -976.12
+ movie.sig$duration 1 147.786 2309.4 -785.13
+ movie.sig$budget 1 73.211 2384.0 -689.63
+ factor(movie.sig$title_year) 68 164.699 2292.5 -673.22
+ movie.sig$num_user_for_reviews 1 21.297 2435.9 -624.90
+ movie.sig$gross 1 16.929 2440.3 -619.51
+ movie.sig$num_critic_for_reviews 1 14.632 2442.6 -616.69
+ movie.sig$director_facebook_likes 1 13.657 2443.6 -615.49
+ movie.sig$facenumber_in_poster 1 6.789 2450.4 -607.05
+ movie.sig$movie_facebook_likes 1 2.627 2454.6 -601.95
<none> 2457.2 -600.74
+ movie.sig$cast_total_facebook_likes 1 0.524 2456.7 -599.38
Step: AIC=-976.12
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres)
Df Sum of Sq RSS AIC
+ factor(movie.sig$title_year) 68 169.011 1976.7 -1086.66
+ movie.sig$duration 1 74.584 2071.1 -1080.44
+ movie.sig$budget 1 28.689 2117.0 -1014.57
+ movie.sig$num_critic_for_reviews 1 23.116 2122.6 -1006.67
+ movie.sig$num_user_for_reviews 1 12.251 2133.4 -991.33
+ movie.sig$director_facebook_likes 1 3.707 2142.0 -979.32
+ movie.sig$facenumber_in_poster 1 3.274 2142.4 -978.71
+ movie.sig$movie_facebook_likes 1 1.686 2144.0 -976.49
<none> 2145.7 -976.12
+ movie.sig$gross 1 1.391 2144.3 -976.07
+ movie.sig$cast_total_facebook_likes 1 0.362 2145.3 -974.63
Step: AIC=-1086.66
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year)
Df Sum of Sq RSS AIC
+ movie.sig$num_critic_for_reviews 1 124.119 1852.6 -1279.5
+ movie.sig$duration 1 42.067 1934.6 -1149.3
+ movie.sig$budget 1 9.722 1967.0 -1099.5
+ movie.sig$movie_facebook_likes 1 6.179 1970.5 -1094.1
+ movie.sig$num_user_for_reviews 1 5.685 1971.0 -1093.3
+ movie.sig$gross 1 2.494 1974.2 -1088.5
+ movie.sig$facenumber_in_poster 1 2.421 1974.3 -1088.3
+ movie.sig$cast_total_facebook_likes 1 2.206 1974.5 -1088.0
<none> 1976.7 -1086.7
+ movie.sig$director_facebook_likes 1 1.135 1975.5 -1086.4
Step: AIC=-1279.54
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews
Df Sum of Sq RSS AIC
+ movie.sig$num_user_for_reviews 1 43.496 1809.1 -1348.9
+ movie.sig$budget 1 42.322 1810.2 -1347.0
+ movie.sig$duration 1 24.346 1828.2 -1317.3
+ movie.sig$gross 1 12.691 1839.9 -1298.2
+ movie.sig$movie_facebook_likes 1 6.919 1845.7 -1288.8
<none> 1852.6 -1279.5
+ movie.sig$facenumber_in_poster 1 0.614 1852.0 -1278.5
+ movie.sig$cast_total_facebook_likes 1 0.309 1852.3 -1278.0
+ movie.sig$director_facebook_likes 1 0.087 1852.5 -1277.7
Step: AIC=-1348.93
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews
Df Sum of Sq RSS AIC
+ movie.sig$budget 1 35.821 1773.2 -1407.0
+ movie.sig$duration 1 34.245 1774.8 -1404.4
+ movie.sig$num_voted_users:movie.sig$num_user_for_reviews 1 16.640 1792.4 -1374.7
+ movie.sig$movie_facebook_likes 1 11.143 1797.9 -1365.5
+ movie.sig$gross 1 9.280 1799.8 -1362.4
<none> 1809.1 -1348.9
+ movie.sig$facenumber_in_poster 1 0.796 1808.3 -1348.2
+ movie.sig$cast_total_facebook_likes 1 0.098 1809.0 -1347.1
+ movie.sig$director_facebook_likes 1 0.063 1809.0 -1347.0
Step: AIC=-1407.03
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget
Df Sum of Sq RSS AIC
+ movie.sig$duration 1 57.072 1716.2 -1503.3
+ movie.sig$num_voted_users:movie.sig$num_user_for_reviews 1 21.344 1751.9 -1441.4
+ movie.sig$movie_facebook_likes 1 13.161 1760.1 -1427.4
<none> 1773.2 -1407.0
+ movie.sig$cast_total_facebook_likes 1 0.931 1772.3 -1406.6
+ movie.sig$facenumber_in_poster 1 0.782 1772.5 -1406.3
+ movie.sig$director_facebook_likes 1 0.040 1773.2 -1405.1
+ movie.sig$gross 1 0.019 1773.2 -1405.1
Step: AIC=-1503.34
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget + movie.sig$duration
Df Sum of Sq RSS AIC
+ movie.sig$num_voted_users:movie.sig$duration 1 52.699 1663.5 -1595.1
+ movie.sig$num_voted_users:movie.sig$num_user_for_reviews 1 17.848 1698.3 -1532.8
+ movie.sig$movie_facebook_likes 1 15.923 1700.2 -1529.3
+ movie.sig$facenumber_in_poster 1 1.876 1714.3 -1504.6
<none> 1716.2 -1503.3
+ movie.sig$cast_total_facebook_likes 1 0.602 1715.6 -1502.4
+ movie.sig$director_facebook_likes 1 0.149 1716.0 -1501.6
+ movie.sig$gross 1 0.052 1716.1 -1501.4
Step: AIC=-1595.06
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget + movie.sig$duration +
movie.sig$num_voted_users:movie.sig$duration
Df Sum of Sq RSS AIC
+ movie.sig$movie_facebook_likes 1 8.4618 1655.0 -1608.4
+ movie.sig$facenumber_in_poster 1 2.3669 1661.1 -1597.3
+ movie.sig$num_voted_users:movie.sig$num_user_for_reviews 1 1.7546 1661.7 -1596.2
<none> 1663.5 -1595.1
+ movie.sig$cast_total_facebook_likes 1 0.4088 1663.1 -1593.8
+ movie.sig$gross 1 0.0808 1663.4 -1593.2
+ movie.sig$director_facebook_likes 1 0.0178 1663.5 -1593.1
Step: AIC=-1608.38
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget + movie.sig$duration +
movie.sig$movie_facebook_likes + movie.sig$num_voted_users:movie.sig$duration
Df Sum of Sq RSS AIC
+ movie.sig$facenumber_in_poster 1 2.34650 1652.7 -1610.6
+ movie.sig$num_voted_users:movie.sig$num_user_for_reviews 1 1.47268 1653.5 -1609.1
<none> 1655.0 -1608.4
+ movie.sig$cast_total_facebook_likes 1 0.50846 1654.5 -1607.3
+ movie.sig$gross 1 0.14883 1654.9 -1606.7
+ movie.sig$director_facebook_likes 1 0.01395 1655.0 -1606.4
Step: AIC=-1610.64
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget + movie.sig$duration +
movie.sig$movie_facebook_likes + movie.sig$facenumber_in_poster +
movie.sig$num_voted_users:movie.sig$duration
Df Sum of Sq RSS AIC
+ movie.sig$num_voted_users:movie.sig$num_user_for_reviews 1 1.50433 1651.2 -1611.4
<none> 1652.7 -1610.6
+ movie.sig$cast_total_facebook_likes 1 0.69695 1652.0 -1609.9
+ movie.sig$gross 1 0.15338 1652.5 -1608.9
+ movie.sig$director_facebook_likes 1 0.00468 1652.7 -1608.7
Step: AIC=-1611.38
movie.sig$imdb_score ~ movie.sig$num_voted_users + factor(movie.sig$genres) +
factor(movie.sig$title_year) + movie.sig$num_critic_for_reviews +
movie.sig$num_user_for_reviews + movie.sig$budget + movie.sig$duration +
movie.sig$movie_facebook_likes + movie.sig$facenumber_in_poster +
movie.sig$num_voted_users:movie.sig$duration + movie.sig$num_voted_users:movie.sig$num_user_for_reviews
Df Sum of Sq RSS AIC
<none> 1651.2 -1611.4
+ movie.sig$cast_total_facebook_likes 1 0.65198 1650.5 -1610.6
+ movie.sig$gross 1 0.29295 1650.9 -1609.9
+ movie.sig$director_facebook_likes 1 0.01835 1651.2 -1609.4
Call:
lm(formula = movie.sig$imdb_score ~ movie.sig$num_voted_users +
factor(movie.sig$genres) + factor(movie.sig$title_year) +
movie.sig$num_critic_for_reviews + movie.sig$num_user_for_reviews +
movie.sig$budget + movie.sig$duration + movie.sig$movie_facebook_likes +
movie.sig$facenumber_in_poster + movie.sig$num_voted_users:movie.sig$duration +
movie.sig$num_voted_users:movie.sig$num_user_for_reviews)
Coefficients:
(Intercept)
3.029e+00
movie.sig$num_voted_users
6.989e-06
factor(movie.sig$genres)Adventure
3.696e-01
factor(movie.sig$genres)Animation
7.657e-01
factor(movie.sig$genres)Biography
6.937e-01
factor(movie.sig$genres)Comedy
1.882e-01
factor(movie.sig$genres)Crime
4.801e-01
factor(movie.sig$genres)Documentary
1.118e+00
factor(movie.sig$genres)Drama
5.559e-01
factor(movie.sig$genres)Family
2.468e-01
factor(movie.sig$genres)Fantasy
-1.968e-01
factor(movie.sig$genres)Horror
-3.729e-01
factor(movie.sig$genres)Musical
2.400e-02
factor(movie.sig$genres)Mystery
1.638e-01
factor(movie.sig$genres)Romance
7.586e-01
factor(movie.sig$genres)Sci-Fi
1.779e-01
factor(movie.sig$genres)Thriller
-3.260e-01
factor(movie.sig$genres)Western
-3.489e-02
factor(movie.sig$title_year)1929
2.064e+00
factor(movie.sig$title_year)1933
3.241e+00
factor(movie.sig$title_year)1935
3.408e+00
factor(movie.sig$title_year)1936
3.379e+00
factor(movie.sig$title_year)1937
1.891e+00
factor(movie.sig$title_year)1939
1.733e+00
factor(movie.sig$title_year)1940
1.952e+00
factor(movie.sig$title_year)1946
1.893e+00
factor(movie.sig$title_year)1947
2.805e+00
factor(movie.sig$title_year)1948
2.380e+00
factor(movie.sig$title_year)1950
1.986e+00
factor(movie.sig$title_year)1952
1.205e+00
factor(movie.sig$title_year)1953
1.828e+00
factor(movie.sig$title_year)1954
2.688e+00
factor(movie.sig$title_year)1959
2.603e+00
factor(movie.sig$title_year)1960
2.445e+00
factor(movie.sig$title_year)1961
1.832e+00
factor(movie.sig$title_year)1963
2.180e+00
factor(movie.sig$title_year)1964
2.153e+00
factor(movie.sig$title_year)1965
1.364e+00
factor(movie.sig$title_year)1969
2.185e+00
factor(movie.sig$title_year)1970
1.367e+00
factor(movie.sig$title_year)1971
1.411e+00
factor(movie.sig$title_year)1972
1.569e+00
factor(movie.sig$title_year)1973
2.373e+00
factor(movie.sig$title_year)1974
2.462e+00
factor(movie.sig$title_year)1975
1.028e+00
factor(movie.sig$title_year)1976
1.685e+00
factor(movie.sig$title_year)1977
1.772e+00
factor(movie.sig$title_year)1978
1.989e+00
factor(movie.sig$title_year)1979
1.619e+00
factor(movie.sig$title_year)1980
1.738e+00
factor(movie.sig$title_year)1981
1.521e+00
factor(movie.sig$title_year)1982
1.667e+00
factor(movie.sig$title_year)1983
1.866e+00
factor(movie.sig$title_year)1984
1.763e+00
factor(movie.sig$title_year)1985
1.773e+00
factor(movie.sig$title_year)1986
1.657e+00
factor(movie.sig$title_year)1987
1.407e+00
factor(movie.sig$title_year)1988
1.794e+00
factor(movie.sig$title_year)1989
1.781e+00
factor(movie.sig$title_year)1990
1.680e+00
factor(movie.sig$title_year)1991
1.581e+00
factor(movie.sig$title_year)1992
1.891e+00
factor(movie.sig$title_year)1993
1.592e+00
factor(movie.sig$title_year)1994
1.592e+00
factor(movie.sig$title_year)1995
1.566e+00
factor(movie.sig$title_year)1996
1.582e+00
factor(movie.sig$title_year)1997
1.484e+00
factor(movie.sig$title_year)1998
1.552e+00
factor(movie.sig$title_year)1999
1.425e+00
factor(movie.sig$title_year)2000
1.252e+00
factor(movie.sig$title_year)2001
1.334e+00
factor(movie.sig$title_year)2002
1.285e+00
factor(movie.sig$title_year)2003
1.165e+00
factor(movie.sig$title_year)2004
1.262e+00
factor(movie.sig$title_year)2005
1.267e+00
factor(movie.sig$title_year)2006
1.151e+00
factor(movie.sig$title_year)2007
1.154e+00
factor(movie.sig$title_year)2008
9.643e-01
factor(movie.sig$title_year)2009
9.625e-01
factor(movie.sig$title_year)2010
8.890e-01
factor(movie.sig$title_year)2011
7.266e-01
factor(movie.sig$title_year)2012
8.457e-01
factor(movie.sig$title_year)2013
8.659e-01
factor(movie.sig$title_year)2014
9.827e-01
factor(movie.sig$title_year)2015
1.080e+00
factor(movie.sig$title_year)2016
1.519e+00
movie.sig$num_critic_for_reviews
3.528e-03
movie.sig$num_user_for_reviews
-4.818e-04
movie.sig$budget
-4.699e-09
movie.sig$duration
1.184e-02
movie.sig$movie_facebook_likes
-4.026e-06
movie.sig$facenumber_in_poster
-1.393e-02
movie.sig$num_voted_users:movie.sig$duration
-2.586e-08
movie.sig$num_voted_users:movie.sig$num_user_for_reviews
-1.600e-10
For convenience to interpret the result, I will start with Full3(additive mode with interactiin terms). After checking residual, then decide should we add higher order terms.
Split data into Test and Train:
indx = sample(1:nrow(movie.sig), as.integer(0.8*nrow(movie.sig)))
indx # ramdomize rows, save 90% of data into index
[1] 659 2170 1667 405 2965 1291 376 1226 2663 1670 1580 804 2405 1090 1313 1117 2367
[18] 1988 2078 567 2487 1191 2758 2510 2102 1761 853 2825 2873 1512 10 2138 2209 2539
[35] 1060 2433 699 229 1949 272 1227 1701 1960 979 2442 1697 452 2313 95 1774 1641
[52] 862 2956 1751 28 364 2511 2888 2259 398 436 2568 487 1297 2536 1982 560 620
[69] 2206 760 89 887 1602 1420 1835 349 2518 2762 393 785 845 1303 2672 1176 566
[86] 1834 798 604 1662 2340 617 1172 2352 744 2055 2813 2488 1377 2296 666 1901 1423
[103] 2696 739 2263 2386 2755 1025 309 783 1301 2139 1740 2877 1260 1807 1055 1288 1832
[120] 286 2838 1729 438 1163 543 1410 445 1542 1624 2337 1756 1374 761 1171 989 1334
[137] 880 1700 368 2531 1978 365 2038 1008 1247 1804 1142 60 1676 2981 21 188 2430
[154] 1907 306 2048 1336 2570 1847 1052 2071 1416 2708 2147 596 2217 2863 2052 2707 416
[171] 2953 1716 2330 2885 1565 990 2191 2621 2087 2748 702 1068 1307 677 149 1582 2424
[188] 284 2128 111 708 410 1802 2328 1264 2368 486 1192 2791 2704 2690 1603 162 1147
[205] 1003 2347 924 2280 2042 969 2601 2478 2237 848 2711 765 151 1391 623 2089 2972
[222] 3002 2363 1850 1646 2062 2401 2381 2577 555 1530 80 1051 2512 1396 524 2481 2607
[239] 44 253 2233 1271 2731 970 1121 275 47 1577 766 653 865 534 1567 1845 2162
[256] 2754 2409 110 1876 578 1219 705 2169 358 1704 1440 2121 948 2594 1169 1578 2969
[273] 1920 556 256 1820 1780 2499 1977 2604 2261 1446 594 895 2535 1839 1101 2784 2197
[290] 719 784 2149 285 1811 1959 1449 2874 2823 1837 2820 1218 600 627 773 918 2827
[307] 266 703 411 1000 354 2219 1775 1771 2173 1046 1893 2426 2653 2561 2955 1447 1588
[324] 389 342 2190 907 1619 827 1462 1139 938 2890 2793 22 1415 1830 2032 314 1590
[341] 2520 1548 2495 535 2897 2861 1070 57 2546 2817 1224 291 2023 2100 2037 1529 1367
[358] 1429 844 2802 2216 2798 2119 2172 1722 2642 1611 304 1198 2344 488 1280 1075 1628
[375] 922 1753 2202 2364 2112 1113 161 750 1206 2447 2402 1930 138 2733 2960 1433 2676
[392] 1004 1781 2751 2919 2182 2248 1728 326 545 549 2741 1465 1803 163 222 908 1326
[409] 953 2683 2174 952 926 1997 480 35 146 241 2513 2602 1044 1995 755 770 2322
[426] 344 2544 2469 2936 2317 1968 1550 3004 166 399 1471 735 105 1712 1287 1164 874
[443] 1498 2989 350 1263 1235 447 1357 2129 1325 328 2801 489 2562 943 1386 114 1749
[460] 2297 1422 2515 2992 2085 248 1317 942 465 2323 939 2797 1524 575 1559 98 790
[477] 2448 1511 754 2537 1095 944 1958 2003 586 1849 656 763 428 2869 837 1772 1355
[494] 1258 312 2942 1639 1001 1598 2400 2470 1392 1993 757 2629 1880 2334 2590 574 2605
[511] 959 1460 2534 1251 2243 2521 2267 858 1319 2650 71 1231 1922 2483 1261 170 689
[528] 97 609 1189 891 2691 2998 292 1711 713 2720 1966 1124 799 2336 961 2258 2555
[545] 412 1050 164 723 2412 2207 223 1399 830 748 1692 599 1689 1009 6 476 1647
[562] 571 821 115 2810 2779 1089 192 1678 819 1228 1917 2171 1174 1131 1562 1305 1842
[579] 1360 2819 243 2506 99 688 1007 415 2356 1985 48 1207 2832 1972 1468 551 1925
[596] 1856 1296 2151 1165 1413 152 1127 2293 1061 11 2660 700 1407 346 2840 2818 1800
[613] 2656 2983 353 54 2109 507 135 520 2178 1340 2614 541 1944 2370 2086 1311 323
[630] 2586 1989 839 741 2451 1300 2988 2450 167 307 1594 234 2427 294 1715 996 780
[647] 2790 2771 2845 363 2576 2723 2058 1637 130 2241 1505 559 724 94 437 1765 36
[664] 128 2208 1435 1030 1768 1916 131 852 1157 2468 2647 517 1324 2500 383 1119 2311
[681] 747 2908 84 396 897 2348 2285 402 132 2316 254 2064 228 775 693 1242 2338
[698] 2011 1881 2201 1607 995 1600 1424 794 2196 2808 481 1290 2185 993 2327 55 651
[715] 2581 1861 2895 446 2856 442 1969 1350 449 2753 2193 1444 1136 2004 288 2889 988
[732] 441 823 2725 2362 841 2096 250 1408 2893 1031 2145 718 1604 2714 1153 875 140
[749] 665 2072 1308 2759 56 1076 2385 951 889 1743 2471 2016 991 1595 1318 925 134
[766] 1801 1011 262 2148 1947 1195 189 1540 355 1259 2114 2126 863 255 1889 2945 1372
[783] 2600 1194 2855 2110 14 1541 1853 2995 973 2766 2523 2554 1962 1887 986 2836 1909
[800] 1638 2306 333 2242 443 1851 1793 1976 1418 904 178 2422 1679 227 431 671 1888
[817] 1980 329 2019 1719 2270 176 2371 2770 1869 1721 381 1973 2986 1528 2049 800 339
[834] 1122 1797 1953 334 602 901 1210 2099 1762 1770 2291 1145 2615 786 2592 2870 1034
[851] 1021 1266 429 635 606 2574 817 2962 2319 2204 1927 2898 1175 2094 1116 2346 1245
[868] 403 191 2077 2508 928 2843 133 2211 1937 1898 2853 927 2366 2940 1200 2375 1279
[885] 1390 1152 2772 454 2459 1924 1309 1019 1799 676 1010 2302 427 1501 19 1532 2070
[902] 881 2350 1516 2358 2274 1576 565 2954 2862 2372 978 1015 1634 2924 2599 287 90
[919] 117 1621 1244 2060 752 2039 2103 2979 2106 2009 2703 210 303 193 1064 498 263
[936] 2452 2332 998 2018 2833 205 1792 467 2828 1337 1610 2687 2001 20 435 1680 2796
[953] 1763 1987 179 1272 268 1815 9 1255 2737 1527 818 2341 2699 2514 826 206 1979
[970] 1473 709 648 1986 1964 1082 41 2957 612 518 2894 1857 1597 2497 2941 2967 849
[987] 749 2875 316 824 1649 1376 1065 680 424 797 2034 1661 2864 1488
[ reached getOption("max.print") -- omitted 1404 entries ]
movie_train = movie.sig[indx,]
movie_test = movie.sig[-indx,]
# lm.fit 1: linear model with interaction term from the step function we chose for Full3
# insig terms: director facebooklike','cast total fb likes','face num in posters'
# Chosen Step function(voted,genre, year, critic,users,budget, duration,voted*duration)
lm.fit1<-lm(imdb_score~num_voted_users+factor(genres)+factor(title_year)+num_critic_for_reviews+num_user_for_reviews+budget+duration+num_voted_users*duration,movie_train)
summary(lm.fit1)
Call:
lm(formula = imdb_score ~ num_voted_users + factor(genres) +
factor(title_year) + num_critic_for_reviews + num_user_for_reviews +
budget + duration + num_voted_users * duration, data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-5.0465 -0.3589 0.0762 0.4920 2.1491
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.037e+00 7.704e-01 3.942 8.31e-05 ***
num_voted_users 7.102e-06 5.101e-07 13.922 < 2e-16 ***
factor(genres)Adventure 3.584e-01 6.110e-02 5.866 5.10e-09 ***
factor(genres)Animation 8.063e-01 1.430e-01 5.640 1.91e-08 ***
factor(genres)Biography 7.175e-01 8.436e-02 8.505 < 2e-16 ***
factor(genres)Comedy 1.782e-01 4.823e-02 3.695 0.000225 ***
factor(genres)Crime 4.623e-01 7.240e-02 6.385 2.06e-10 ***
factor(genres)Documentary 1.192e+00 1.711e-01 6.969 4.15e-12 ***
factor(genres)Drama 5.826e-01 5.474e-02 10.641 < 2e-16 ***
factor(genres)Family 4.569e-01 7.905e-01 0.578 0.563358
factor(genres)Fantasy -1.682e-01 1.566e-01 -1.075 0.282707
factor(genres)Horror -3.553e-01 8.848e-02 -4.016 6.11e-05 ***
factor(genres)Musical 1.984e+00 1.078e+00 1.841 0.065706 .
factor(genres)Mystery 2.148e-01 2.084e-01 1.031 0.302732
factor(genres)Romance 7.297e-01 5.411e-01 1.348 0.177668
factor(genres)Sci-Fi 2.432e-01 3.168e-01 0.768 0.442712
factor(genres)Thriller -2.587e-01 7.666e-01 -0.337 0.735829
factor(genres)Western 1.050e+00 8.023e-01 1.309 0.190808
factor(title_year)1929 NA NA NA NA
factor(title_year)1933 3.236e+00 1.078e+00 3.003 0.002700 **
factor(title_year)1935 3.401e+00 1.078e+00 3.156 0.001619 **
factor(title_year)1936 3.428e+00 1.078e+00 3.180 0.001492 **
factor(title_year)1937 1.889e+00 1.086e+00 1.739 0.082150 .
factor(title_year)1940 1.950e+00 1.086e+00 1.795 0.072765 .
factor(title_year)1946 1.539e+00 1.078e+00 1.428 0.153427
factor(title_year)1950 1.967e+00 1.079e+00 1.822 0.068566 .
factor(title_year)1952 1.191e+00 1.078e+00 1.105 0.269428
factor(title_year)1953 1.824e+00 9.338e-01 1.954 0.050874 .
factor(title_year)1954 2.739e+00 1.075e+00 2.547 0.010917 *
factor(title_year)1959 2.640e+00 1.078e+00 2.449 0.014412 *
factor(title_year)1960 2.493e+00 1.082e+00 2.305 0.021247 *
factor(title_year)1963 2.229e+00 1.081e+00 2.062 0.039350 *
factor(title_year)1964 2.190e+00 9.346e-01 2.343 0.019226 *
factor(title_year)1965 1.329e+00 8.560e-01 1.553 0.120667
factor(title_year)1969 2.220e+00 1.080e+00 2.056 0.039888 *
factor(title_year)1970 1.316e+00 8.836e-01 1.490 0.136445
factor(title_year)1971 1.418e+00 9.322e-01 1.521 0.128306
factor(title_year)1972 1.673e+00 9.364e-01 1.787 0.074117 .
factor(title_year)1973 2.446e+00 8.522e-01 2.870 0.004146 **
factor(title_year)1974 2.756e+00 8.366e-01 3.294 0.001002 **
factor(title_year)1975 1.136e+00 9.367e-01 1.212 0.225516
factor(title_year)1976 2.013e+00 1.078e+00 1.868 0.061905 .
factor(title_year)1977 2.050e+00 8.811e-01 2.326 0.020096 *
factor(title_year)1978 2.314e+00 8.256e-01 2.802 0.005115 **
factor(title_year)1979 1.698e+00 8.841e-01 1.920 0.054935 .
factor(title_year)1980 1.472e+00 8.053e-01 1.828 0.067680 .
factor(title_year)1981 1.423e+00 8.158e-01 1.744 0.081207 .
factor(title_year)1982 1.592e+00 7.923e-01 2.009 0.044656 *
factor(title_year)1983 1.910e+00 8.043e-01 2.375 0.017612 *
factor(title_year)1984 1.797e+00 7.850e-01 2.289 0.022175 *
factor(title_year)1985 1.689e+00 8.039e-01 2.101 0.035748 *
factor(title_year)1986 1.648e+00 7.876e-01 2.093 0.036465 *
factor(title_year)1987 1.305e+00 7.823e-01 1.668 0.095423 .
factor(title_year)1988 1.849e+00 7.790e-01 2.373 0.017717 *
factor(title_year)1989 1.616e+00 7.839e-01 2.061 0.039392 *
factor(title_year)1990 1.556e+00 7.837e-01 1.985 0.047251 *
factor(title_year)1991 1.526e+00 7.785e-01 1.960 0.050114 .
factor(title_year)1992 1.917e+00 7.768e-01 2.468 0.013662 *
factor(title_year)1993 1.614e+00 7.771e-01 2.077 0.037949 *
factor(title_year)1994 1.557e+00 7.728e-01 2.014 0.044079 *
factor(title_year)1995 1.535e+00 7.707e-01 1.991 0.046547 *
factor(title_year)1996 1.549e+00 7.683e-01 2.016 0.043884 *
factor(title_year)1997 1.488e+00 7.679e-01 1.937 0.052836 .
factor(title_year)1998 1.630e+00 7.681e-01 2.122 0.033944 *
factor(title_year)1999 1.418e+00 7.670e-01 1.849 0.064597 .
factor(title_year)2000 1.270e+00 7.664e-01 1.657 0.097600 .
factor(title_year)2001 1.350e+00 7.661e-01 1.762 0.078185 .
factor(title_year)2002 1.285e+00 7.660e-01 1.678 0.093508 .
factor(title_year)2003 1.171e+00 7.672e-01 1.527 0.126960
factor(title_year)2004 1.264e+00 7.667e-01 1.648 0.099412 .
factor(title_year)2005 1.247e+00 7.667e-01 1.626 0.104074
factor(title_year)2006 1.240e+00 7.669e-01 1.618 0.105887
factor(title_year)2007 1.169e+00 7.671e-01 1.524 0.127765
factor(title_year)2008 9.658e-01 7.667e-01 1.260 0.207882
factor(title_year)2009 9.737e-01 7.670e-01 1.269 0.204396
factor(title_year)2010 8.310e-01 7.675e-01 1.083 0.279040
factor(title_year)2011 6.883e-01 7.680e-01 0.896 0.370233
factor(title_year)2012 8.258e-01 7.681e-01 1.075 0.282450
factor(title_year)2013 7.970e-01 7.685e-01 1.037 0.299830
factor(title_year)2014 8.698e-01 7.684e-01 1.132 0.257774
factor(title_year)2015 9.401e-01 7.689e-01 1.223 0.221592
factor(title_year)2016 1.498e+00 7.749e-01 1.933 0.053389 .
num_critic_for_reviews 3.375e-03 2.484e-04 13.588 < 2e-16 ***
num_user_for_reviews -5.647e-04 7.332e-05 -7.702 1.97e-14 ***
budget -4.442e-09 4.954e-10 -8.968 < 2e-16 ***
duration 1.180e-02 1.001e-03 11.796 < 2e-16 ***
num_voted_users:duration -2.931e-08 3.429e-09 -8.548 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7602 on 2318 degrees of freedom
Multiple R-squared: 0.4977, Adjusted R-squared: 0.4793
F-statistic: 27.02 on 85 and 2318 DF, p-value: < 2.2e-16
The P-value is very samll.All terms are significant but face number in posters is the least significant variable.Adjusted R^2 is 0.4882 (treated year as numeric = 0.4727), which means 48.82% of the variability can be explained by this model.
Do Lack of fit test to see if removing the predictors improve model performance:
# full4 =full3, but instead of on movie.sig, it's on training data
full4<-lm(imdb_score ~num_voted_users+num_critic_for_reviews+num_user_for_reviews+duration+facenumber_in_poster+gross+movie_facebook_likes+director_facebook_likes+cast_total_facebook_likes+budget+factor(title_year)+factor(genres)+duration*num_voted_users+num_voted_users*num_user_for_reviews+gross*budget,data=movie_train)
anova(full4,lm.fit1) # H0: reduced model fits===lack of fit=0
Analysis of Variance Table
Model 1: imdb_score ~ num_voted_users + num_critic_for_reviews + num_user_for_reviews +
duration + facenumber_in_poster + gross + movie_facebook_likes +
director_facebook_likes + cast_total_facebook_likes + budget +
factor(title_year) + factor(genres) + duration * num_voted_users +
num_voted_users * num_user_for_reviews + gross * budget
Model 2: imdb_score ~ num_voted_users + factor(genres) + factor(title_year) +
num_critic_for_reviews + num_user_for_reviews + budget +
duration + num_voted_users * duration
Res.Df RSS Df Sum of Sq F Pr(>F)
1 2311 1314.1
2 2318 1339.6 -7 -25.461 6.3964 1.792e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
P-value is very small, reject null, the reduced model does not fit.
Diagnostics:
plot(lm.fit1)
not plotting observations with leverage one:
57, 171, 223, 281, 470, 496, 614, 677, 1026, 1179, 1484, 1497, 1802, 2092, 2220
not plotting observations with leverage one:
57, 171, 223, 281, 470, 496, 614, 677, 1026, 1179, 1484, 1497, 1802, 2092, 2220
NaNs producedNaNs produced
# residual vs fitted indicates might be higher order term. Normal plot not good.
library(car)
residualPlots(lm.fit1)
Test stat Pr(>|t|)
num_voted_users -5.241 0.000
factor(genres) NA NA
factor(title_year) NA NA
num_critic_for_reviews -9.033 0.000
num_user_for_reviews 1.940 0.052
budget 6.476 0.000
duration -3.411 0.001
Tukey test -14.089 0.000
All of the residual vs predictor plots have a general trend of curviture, which indicates the current model does not fit. Higher order terms should be included.
Let’s add the interaction term for voted num and num-reveiw to see if model improved:
lm.fit2<-lm(imdb_score~num_voted_users+factor(genres)+factor(title_year)+num_critic_for_reviews+num_user_for_reviews+budget+duration+num_voted_users*duration+num_voted_users*num_user_for_reviews,movie_train)
summary(lm.fit2)
Call:
lm(formula = imdb_score ~ num_voted_users + factor(genres) +
factor(title_year) + num_critic_for_reviews + num_user_for_reviews +
budget + duration + num_voted_users * duration + num_voted_users *
num_user_for_reviews, data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-5.0598 -0.3547 0.0763 0.4926 2.1355
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.072e+00 7.707e-01 3.987 6.91e-05 ***
num_voted_users 7.046e-06 5.116e-07 13.772 < 2e-16 ***
factor(genres)Adventure 3.594e-01 6.110e-02 5.883 4.61e-09 ***
factor(genres)Animation 8.076e-01 1.429e-01 5.650 1.80e-08 ***
factor(genres)Biography 7.186e-01 8.435e-02 8.520 < 2e-16 ***
factor(genres)Comedy 1.786e-01 4.822e-02 3.704 0.000217 ***
factor(genres)Crime 4.643e-01 7.240e-02 6.412 1.73e-10 ***
factor(genres)Documentary 1.189e+00 1.711e-01 6.949 4.77e-12 ***
factor(genres)Drama 5.849e-01 5.476e-02 10.681 < 2e-16 ***
factor(genres)Family 4.335e-01 7.906e-01 0.548 0.583500
factor(genres)Fantasy -1.687e-01 1.565e-01 -1.078 0.281180
factor(genres)Horror -3.577e-01 8.848e-02 -4.043 5.44e-05 ***
factor(genres)Musical 1.982e+00 1.077e+00 1.840 0.065920 .
factor(genres)Mystery 2.083e-01 2.084e-01 1.000 0.317571
factor(genres)Romance 7.292e-01 5.410e-01 1.348 0.177893
factor(genres)Sci-Fi 2.359e-01 3.168e-01 0.745 0.456472
factor(genres)Thriller -2.697e-01 7.665e-01 -0.352 0.724964
factor(genres)Western 1.052e+00 8.021e-01 1.311 0.189953
factor(title_year)1929 NA NA NA NA
factor(title_year)1933 3.232e+00 1.077e+00 3.000 0.002730 **
factor(title_year)1935 3.394e+00 1.078e+00 3.150 0.001653 **
factor(title_year)1936 3.413e+00 1.078e+00 3.166 0.001563 **
factor(title_year)1937 1.877e+00 1.086e+00 1.728 0.084054 .
factor(title_year)1940 1.939e+00 1.086e+00 1.786 0.074252 .
factor(title_year)1946 1.548e+00 1.078e+00 1.437 0.150984
factor(title_year)1950 1.963e+00 1.079e+00 1.819 0.069014 .
factor(title_year)1952 1.202e+00 1.078e+00 1.115 0.264976
factor(title_year)1953 1.825e+00 9.336e-01 1.955 0.050734 .
factor(title_year)1954 2.728e+00 1.075e+00 2.537 0.011252 *
factor(title_year)1959 2.626e+00 1.078e+00 2.436 0.014934 *
factor(title_year)1960 2.471e+00 1.082e+00 2.285 0.022411 *
factor(title_year)1963 2.239e+00 1.081e+00 2.072 0.038412 *
factor(title_year)1964 2.188e+00 9.345e-01 2.341 0.019301 *
factor(title_year)1965 1.340e+00 8.558e-01 1.566 0.117514
factor(title_year)1969 2.202e+00 1.080e+00 2.040 0.041462 *
factor(title_year)1970 1.336e+00 8.836e-01 1.512 0.130782
factor(title_year)1971 1.424e+00 9.320e-01 1.528 0.126586
factor(title_year)1972 1.658e+00 9.363e-01 1.771 0.076656 .
factor(title_year)1973 2.431e+00 8.521e-01 2.853 0.004373 **
factor(title_year)1974 2.704e+00 8.373e-01 3.230 0.001255 **
factor(title_year)1975 1.087e+00 9.372e-01 1.159 0.246424
factor(title_year)1976 2.013e+00 1.078e+00 1.868 0.061943 .
factor(title_year)1977 2.040e+00 8.809e-01 2.316 0.020633 *
factor(title_year)1978 2.296e+00 8.255e-01 2.781 0.005458 **
factor(title_year)1979 1.653e+00 8.846e-01 1.869 0.061760 .
factor(title_year)1980 1.469e+00 8.051e-01 1.825 0.068121 .
factor(title_year)1981 1.418e+00 8.156e-01 1.738 0.082269 .
factor(title_year)1982 1.590e+00 7.921e-01 2.007 0.044839 *
factor(title_year)1983 1.893e+00 8.042e-01 2.353 0.018683 *
factor(title_year)1984 1.785e+00 7.849e-01 2.274 0.023034 *
factor(title_year)1985 1.682e+00 8.038e-01 2.092 0.036506 *
factor(title_year)1986 1.638e+00 7.875e-01 2.080 0.037670 *
factor(title_year)1987 1.297e+00 7.822e-01 1.658 0.097388 .
factor(title_year)1988 1.839e+00 7.789e-01 2.361 0.018305 *
factor(title_year)1989 1.607e+00 7.838e-01 2.050 0.040486 *
factor(title_year)1990 1.540e+00 7.836e-01 1.965 0.049548 *
factor(title_year)1991 1.515e+00 7.784e-01 1.946 0.051738 .
factor(title_year)1992 1.904e+00 7.767e-01 2.452 0.014298 *
factor(title_year)1993 1.605e+00 7.770e-01 2.066 0.038956 *
factor(title_year)1994 1.558e+00 7.727e-01 2.017 0.043823 *
factor(title_year)1995 1.524e+00 7.706e-01 1.978 0.048035 *
factor(title_year)1996 1.544e+00 7.682e-01 2.010 0.044589 *
factor(title_year)1997 1.478e+00 7.678e-01 1.925 0.054295 .
factor(title_year)1998 1.620e+00 7.680e-01 2.110 0.034970 *
factor(title_year)1999 1.410e+00 7.668e-01 1.839 0.066020 .
factor(title_year)2000 1.264e+00 7.663e-01 1.649 0.099210 .
factor(title_year)2001 1.338e+00 7.660e-01 1.746 0.080930 .
factor(title_year)2002 1.274e+00 7.659e-01 1.664 0.096265 .
factor(title_year)2003 1.165e+00 7.671e-01 1.518 0.129030
factor(title_year)2004 1.253e+00 7.666e-01 1.634 0.102303
factor(title_year)2005 1.242e+00 7.666e-01 1.620 0.105377
factor(title_year)2006 1.234e+00 7.667e-01 1.609 0.107788
factor(title_year)2007 1.166e+00 7.670e-01 1.521 0.128440
factor(title_year)2008 9.714e-01 7.666e-01 1.267 0.205177
factor(title_year)2009 9.780e-01 7.669e-01 1.275 0.202331
factor(title_year)2010 8.389e-01 7.673e-01 1.093 0.274407
factor(title_year)2011 6.963e-01 7.679e-01 0.907 0.364652
factor(title_year)2012 8.362e-01 7.680e-01 1.089 0.276398
factor(title_year)2013 8.064e-01 7.684e-01 1.049 0.294073
factor(title_year)2014 8.786e-01 7.683e-01 1.144 0.252931
factor(title_year)2015 9.514e-01 7.688e-01 1.238 0.216023
factor(title_year)2016 1.511e+00 7.748e-01 1.950 0.051284 .
num_critic_for_reviews 3.246e-03 2.657e-04 12.215 < 2e-16 ***
num_user_for_reviews -5.010e-04 8.698e-05 -5.760 9.51e-09 ***
budget -4.505e-09 4.974e-10 -9.057 < 2e-16 ***
duration 1.146e-02 1.031e-03 11.117 < 2e-16 ***
num_voted_users:duration -2.746e-08 3.690e-09 -7.442 1.39e-13 ***
num_voted_users:num_user_for_reviews -1.463e-10 1.075e-10 -1.361 0.173692
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7601 on 2317 degrees of freedom
Multiple R-squared: 0.4981, Adjusted R-squared: 0.4795
F-statistic: 26.74 on 86 and 2317 DF, p-value: < 2.2e-16
Adding interaction with num voted and num review is not significant, therefore not helping.
Try fit model based on full4, but dropping insig terms: Then do lack of fit with full4.
full4<-lm(imdb_score ~num_voted_users+num_critic_for_reviews+num_user_for_reviews+duration+facenumber_in_poster+gross+movie_facebook_likes+director_facebook_likes+cast_total_facebook_likes+budget+factor(title_year)+factor(genres)+duration*num_voted_users+num_voted_users*num_user_for_reviews+gross*budget,data=movie_train)
summary(full4)
Call:
lm(formula = imdb_score ~ num_voted_users + num_critic_for_reviews +
num_user_for_reviews + duration + facenumber_in_poster +
gross + movie_facebook_likes + director_facebook_likes +
cast_total_facebook_likes + budget + factor(title_year) +
factor(genres) + duration * num_voted_users + num_voted_users *
num_user_for_reviews + gross * budget, data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-4.6884 -0.3642 0.0753 0.4807 2.1373
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.051e+00 7.648e-01 3.990 6.82e-05 ***
num_voted_users 7.679e-06 5.428e-07 14.147 < 2e-16 ***
num_critic_for_reviews 3.689e-03 2.985e-04 12.360 < 2e-16 ***
num_user_for_reviews -5.478e-04 8.766e-05 -6.249 4.89e-10 ***
duration 1.216e-02 1.038e-03 11.715 < 2e-16 ***
facenumber_in_poster -1.642e-02 8.740e-03 -1.878 0.060471 .
gross -1.266e-09 4.926e-10 -2.570 0.010220 *
movie_facebook_likes -5.951e-06 1.286e-06 -4.626 3.93e-06 ***
director_facebook_likes 4.424e-07 5.127e-06 0.086 0.931237
cast_total_facebook_likes 7.459e-07 7.508e-07 0.994 0.320546
budget -6.418e-09 6.701e-10 -9.577 < 2e-16 ***
factor(title_year)1929 2.056e+00 1.071e+00 1.920 0.054952 .
factor(title_year)1933 3.199e+00 1.069e+00 2.992 0.002799 **
factor(title_year)1935 3.368e+00 1.069e+00 3.150 0.001652 **
factor(title_year)1936 3.280e+00 1.070e+00 3.066 0.002195 **
factor(title_year)1937 1.965e+00 1.079e+00 1.820 0.068816 .
factor(title_year)1940 1.918e+00 1.078e+00 1.780 0.075218 .
factor(title_year)1946 1.517e+00 1.069e+00 1.419 0.156094
factor(title_year)1950 1.949e+00 1.071e+00 1.820 0.068865 .
factor(title_year)1952 1.175e+00 1.070e+00 1.098 0.272318
factor(title_year)1953 1.794e+00 9.264e-01 1.937 0.052928 .
factor(title_year)1954 2.658e+00 1.067e+00 2.491 0.012795 *
factor(title_year)1959 2.556e+00 1.070e+00 2.389 0.016976 *
factor(title_year)1960 2.401e+00 1.076e+00 2.232 0.025684 *
factor(title_year)1963 2.173e+00 1.073e+00 2.026 0.042897 *
factor(title_year)1964 2.184e+00 9.277e-01 2.355 0.018622 *
factor(title_year)1965 1.396e+00 8.495e-01 1.644 0.100371
factor(title_year)1969 2.214e+00 1.072e+00 2.066 0.038978 *
factor(title_year)1970 1.314e+00 8.770e-01 1.499 0.134080
factor(title_year)1971 1.409e+00 9.248e-01 1.524 0.127699
factor(title_year)1972 1.725e+00 9.303e-01 1.855 0.063759 .
factor(title_year)1973 2.453e+00 8.467e-01 2.897 0.003802 **
factor(title_year)1974 2.673e+00 8.314e-01 3.216 0.001320 **
factor(title_year)1975 1.142e+00 9.313e-01 1.227 0.220071
factor(title_year)1976 1.957e+00 1.069e+00 1.830 0.067366 .
factor(title_year)1977 2.065e+00 8.759e-01 2.358 0.018454 *
factor(title_year)1978 2.308e+00 8.196e-01 2.816 0.004908 **
factor(title_year)1979 1.635e+00 8.786e-01 1.861 0.062853 .
factor(title_year)1980 1.466e+00 7.991e-01 1.834 0.066721 .
factor(title_year)1981 1.385e+00 8.095e-01 1.711 0.087142 .
factor(title_year)1982 1.583e+00 7.862e-01 2.013 0.044187 *
factor(title_year)1983 1.887e+00 7.982e-01 2.364 0.018138 *
factor(title_year)1984 1.790e+00 7.791e-01 2.297 0.021681 *
factor(title_year)1985 1.694e+00 7.977e-01 2.123 0.033852 *
factor(title_year)1986 1.655e+00 7.816e-01 2.118 0.034314 *
factor(title_year)1987 1.301e+00 7.763e-01 1.676 0.093897 .
factor(title_year)1988 1.837e+00 7.729e-01 2.376 0.017579 *
factor(title_year)1989 1.596e+00 7.778e-01 2.052 0.040238 *
factor(title_year)1990 1.577e+00 7.779e-01 2.027 0.042768 *
factor(title_year)1991 1.527e+00 7.725e-01 1.977 0.048195 *
factor(title_year)1992 1.912e+00 7.710e-01 2.480 0.013223 *
factor(title_year)1993 1.612e+00 7.710e-01 2.091 0.036609 *
factor(title_year)1994 1.599e+00 7.669e-01 2.085 0.037201 *
factor(title_year)1995 1.529e+00 7.648e-01 1.999 0.045701 *
factor(title_year)1996 1.569e+00 7.623e-01 2.058 0.039672 *
factor(title_year)1997 1.480e+00 7.620e-01 1.942 0.052255 .
factor(title_year)1998 1.633e+00 7.623e-01 2.142 0.032319 *
factor(title_year)1999 1.418e+00 7.612e-01 1.863 0.062568 .
factor(title_year)2000 1.258e+00 7.606e-01 1.654 0.098309 .
factor(title_year)2001 1.341e+00 7.604e-01 1.764 0.077826 .
factor(title_year)2002 1.281e+00 7.603e-01 1.684 0.092220 .
factor(title_year)2003 1.163e+00 7.614e-01 1.527 0.126857
factor(title_year)2004 1.240e+00 7.611e-01 1.630 0.103304
factor(title_year)2005 1.218e+00 7.611e-01 1.600 0.109704
factor(title_year)2006 1.200e+00 7.612e-01 1.576 0.115178
factor(title_year)2007 1.104e+00 7.615e-01 1.450 0.147057
factor(title_year)2008 9.250e-01 7.612e-01 1.215 0.224401
factor(title_year)2009 9.374e-01 7.615e-01 1.231 0.218445
factor(title_year)2010 8.396e-01 7.617e-01 1.102 0.270488
factor(title_year)2011 7.271e-01 7.623e-01 0.954 0.340290
factor(title_year)2012 8.862e-01 7.623e-01 1.163 0.245126
factor(title_year)2013 9.064e-01 7.628e-01 1.188 0.234846
factor(title_year)2014 9.769e-01 7.626e-01 1.281 0.200323
factor(title_year)2015 1.078e+00 7.632e-01 1.413 0.157913
factor(title_year)2016 1.663e+00 7.692e-01 2.162 0.030700 *
factor(genres)Adventure 3.665e-01 6.103e-02 6.006 2.20e-09 ***
factor(genres)Animation 8.150e-01 1.428e-01 5.707 1.30e-08 ***
factor(genres)Biography 6.928e-01 8.420e-02 8.228 3.15e-16 ***
factor(genres)Comedy 1.819e-01 4.877e-02 3.729 0.000196 ***
factor(genres)Crime 4.294e-01 7.224e-02 5.945 3.19e-09 ***
factor(genres)Documentary 1.159e+00 1.711e-01 6.775 1.58e-11 ***
factor(genres)Drama 5.647e-01 5.481e-02 10.304 < 2e-16 ***
factor(genres)Family 8.971e-01 8.030e-01 1.117 0.264025
factor(genres)Fantasy -1.940e-01 1.555e-01 -1.247 0.212397
factor(genres)Horror -4.003e-01 8.823e-02 -4.537 6.01e-06 ***
factor(genres)Musical NA NA NA NA
factor(genres)Mystery 1.626e-01 2.072e-01 0.785 0.432642
factor(genres)Romance 7.590e-01 5.369e-01 1.414 0.157606
factor(genres)Sci-Fi 1.901e-01 3.146e-01 0.604 0.545649
factor(genres)Thriller -3.669e-01 7.608e-01 -0.482 0.629664
factor(genres)Western 9.963e-01 7.996e-01 1.246 0.212913
num_voted_users:duration -2.923e-08 3.773e-09 -7.748 1.39e-14 ***
num_voted_users:num_user_for_reviews -2.215e-10 1.098e-10 -2.016 0.043867 *
gross:budget 1.480e-17 3.385e-18 4.372 1.29e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7541 on 2311 degrees of freedom
Multiple R-squared: 0.5073, Adjusted R-squared: 0.4877
F-statistic: 25.86 on 92 and 2311 DF, p-value: < 2.2e-16
lm.fit3<-lm(imdb_score ~num_voted_users+num_critic_for_reviews+num_user_for_reviews+duration+gross+movie_facebook_likes+budget+factor(title_year)+factor(genres)+duration*num_voted_users+num_voted_users*num_user_for_reviews+gross*budget,data=movie_train)
summary(lm.fit3)
Call:
lm(formula = imdb_score ~ num_voted_users + num_critic_for_reviews +
num_user_for_reviews + duration + gross + movie_facebook_likes +
budget + factor(title_year) + factor(genres) + duration *
num_voted_users + num_voted_users * num_user_for_reviews +
gross * budget, data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-4.6972 -0.3618 0.0785 0.4786 2.1764
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.041e+00 7.649e-01 3.975 7.24e-05 ***
num_voted_users 7.695e-06 5.421e-07 14.195 < 2e-16 ***
num_critic_for_reviews 3.754e-03 2.959e-04 12.683 < 2e-16 ***
num_user_for_reviews -5.512e-04 8.759e-05 -6.293 3.71e-10 ***
duration 1.208e-02 1.035e-03 11.672 < 2e-16 ***
gross -1.264e-09 4.927e-10 -2.565 0.010379 *
movie_facebook_likes -5.951e-06 1.286e-06 -4.627 3.91e-06 ***
budget -6.382e-09 6.694e-10 -9.533 < 2e-16 ***
factor(title_year)1929 1.940e+00 1.069e+00 1.815 0.069674 .
factor(title_year)1933 3.191e+00 1.069e+00 2.984 0.002876 **
factor(title_year)1935 3.359e+00 1.069e+00 3.141 0.001704 **
factor(title_year)1936 3.283e+00 1.070e+00 3.068 0.002182 **
factor(title_year)1937 1.939e+00 1.080e+00 1.796 0.072630 .
factor(title_year)1940 1.913e+00 1.078e+00 1.774 0.076121 .
factor(title_year)1946 1.533e+00 1.070e+00 1.434 0.151848
factor(title_year)1950 1.956e+00 1.071e+00 1.827 0.067872 .
factor(title_year)1952 1.190e+00 1.070e+00 1.112 0.266242
factor(title_year)1953 1.799e+00 9.267e-01 1.941 0.052320 .
factor(title_year)1954 2.642e+00 1.067e+00 2.475 0.013392 *
factor(title_year)1959 2.541e+00 1.070e+00 2.375 0.017646 *
factor(title_year)1960 2.362e+00 1.074e+00 2.199 0.028002 *
factor(title_year)1963 2.198e+00 1.073e+00 2.049 0.040619 *
factor(title_year)1964 2.178e+00 9.279e-01 2.348 0.018980 *
factor(title_year)1965 1.358e+00 8.495e-01 1.599 0.110063
factor(title_year)1969 2.198e+00 1.072e+00 2.050 0.040440 *
factor(title_year)1970 1.279e+00 8.770e-01 1.458 0.145004
factor(title_year)1971 1.416e+00 9.250e-01 1.531 0.125935
factor(title_year)1972 1.723e+00 9.301e-01 1.853 0.064057 .
factor(title_year)1973 2.470e+00 8.467e-01 2.917 0.003571 **
factor(title_year)1974 2.684e+00 8.316e-01 3.227 0.001267 **
factor(title_year)1975 1.143e+00 9.313e-01 1.228 0.219633
factor(title_year)1976 1.955e+00 1.070e+00 1.828 0.067749 .
factor(title_year)1977 2.008e+00 8.747e-01 2.296 0.021778 *
factor(title_year)1978 2.297e+00 8.197e-01 2.802 0.005114 **
factor(title_year)1979 1.642e+00 8.786e-01 1.869 0.061747 .
factor(title_year)1980 1.465e+00 7.993e-01 1.833 0.066988 .
factor(title_year)1981 1.395e+00 8.096e-01 1.723 0.085055 .
factor(title_year)1982 1.579e+00 7.864e-01 2.008 0.044738 *
factor(title_year)1983 1.900e+00 7.984e-01 2.380 0.017384 *
factor(title_year)1984 1.794e+00 7.793e-01 2.302 0.021404 *
factor(title_year)1985 1.693e+00 7.979e-01 2.122 0.033960 *
factor(title_year)1986 1.652e+00 7.817e-01 2.114 0.034654 *
factor(title_year)1987 1.305e+00 7.765e-01 1.681 0.092848 .
factor(title_year)1988 1.836e+00 7.731e-01 2.375 0.017635 *
factor(title_year)1989 1.603e+00 7.780e-01 2.060 0.039498 *
factor(title_year)1990 1.569e+00 7.781e-01 2.017 0.043815 *
factor(title_year)1991 1.532e+00 7.726e-01 1.983 0.047469 *
factor(title_year)1992 1.918e+00 7.711e-01 2.488 0.012925 *
factor(title_year)1993 1.621e+00 7.712e-01 2.101 0.035729 *
factor(title_year)1994 1.599e+00 7.671e-01 2.084 0.037270 *
factor(title_year)1995 1.531e+00 7.650e-01 2.001 0.045472 *
factor(title_year)1996 1.568e+00 7.625e-01 2.056 0.039894 *
factor(title_year)1997 1.480e+00 7.622e-01 1.941 0.052347 .
factor(title_year)1998 1.635e+00 7.625e-01 2.144 0.032160 *
factor(title_year)1999 1.416e+00 7.613e-01 1.859 0.063087 .
factor(title_year)2000 1.259e+00 7.608e-01 1.655 0.098135 .
factor(title_year)2001 1.340e+00 7.605e-01 1.762 0.078172 .
factor(title_year)2002 1.281e+00 7.604e-01 1.685 0.092111 .
factor(title_year)2003 1.159e+00 7.616e-01 1.522 0.128257
factor(title_year)2004 1.235e+00 7.612e-01 1.623 0.104763
factor(title_year)2005 1.213e+00 7.613e-01 1.593 0.111255
factor(title_year)2006 1.200e+00 7.613e-01 1.576 0.115266
factor(title_year)2007 1.099e+00 7.616e-01 1.444 0.148994
factor(title_year)2008 9.189e-01 7.613e-01 1.207 0.227577
factor(title_year)2009 9.281e-01 7.616e-01 1.219 0.223105
factor(title_year)2010 8.254e-01 7.619e-01 1.083 0.278726
factor(title_year)2011 7.114e-01 7.624e-01 0.933 0.350860
factor(title_year)2012 8.701e-01 7.624e-01 1.141 0.253842
factor(title_year)2013 8.786e-01 7.628e-01 1.152 0.249493
factor(title_year)2014 9.562e-01 7.627e-01 1.254 0.210062
factor(title_year)2015 1.067e+00 7.633e-01 1.398 0.162263
factor(title_year)2016 1.643e+00 7.692e-01 2.135 0.032830 *
factor(genres)Adventure 3.669e-01 6.104e-02 6.011 2.13e-09 ***
factor(genres)Animation 8.306e-01 1.426e-01 5.825 6.51e-09 ***
factor(genres)Biography 7.029e-01 8.378e-02 8.389 < 2e-16 ***
factor(genres)Comedy 1.709e-01 4.831e-02 3.537 0.000413 ***
factor(genres)Crime 4.315e-01 7.222e-02 5.975 2.65e-09 ***
factor(genres)Documentary 1.181e+00 1.705e-01 6.925 5.62e-12 ***
factor(genres)Drama 5.691e-01 5.475e-02 10.395 < 2e-16 ***
factor(genres)Family 9.116e-01 8.009e-01 1.138 0.255134
factor(genres)Fantasy -1.831e-01 1.554e-01 -1.179 0.238707
factor(genres)Horror -3.893e-01 8.803e-02 -4.422 1.02e-05 ***
factor(genres)Musical NA NA NA NA
factor(genres)Mystery 1.635e-01 2.070e-01 0.790 0.429655
factor(genres)Romance 7.468e-01 5.371e-01 1.391 0.164508
factor(genres)Sci-Fi 1.976e-01 3.146e-01 0.628 0.530012
factor(genres)Thriller -3.772e-01 7.610e-01 -0.496 0.620209
factor(genres)Western 1.033e+00 7.961e-01 1.298 0.194557
num_voted_users:duration -2.928e-08 3.768e-09 -7.769 1.18e-14 ***
num_voted_users:num_user_for_reviews -2.181e-10 1.096e-10 -1.990 0.046701 *
gross:budget 1.480e-17 3.378e-18 4.380 1.24e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7543 on 2314 degrees of freedom
Multiple R-squared: 0.5064, Adjusted R-squared: 0.4874
F-statistic: 26.67 on 89 and 2314 DF, p-value: < 2.2e-16
LAck of fit for full4 and lm.fit3
anova(full4,lm.fit3)
Analysis of Variance Table
Model 1: imdb_score ~ num_voted_users + num_critic_for_reviews + num_user_for_reviews +
duration + facenumber_in_poster + gross + movie_facebook_likes +
director_facebook_likes + cast_total_facebook_likes + budget +
factor(title_year) + factor(genres) + duration * num_voted_users +
num_voted_users * num_user_for_reviews + gross * budget
Model 2: imdb_score ~ num_voted_users + num_critic_for_reviews + num_user_for_reviews +
duration + gross + movie_facebook_likes + budget + factor(title_year) +
factor(genres) + duration * num_voted_users + num_voted_users *
num_user_for_reviews + gross * budget
Res.Df RSS Df Sum of Sq F Pr(>F)
1 2311 1314.1
2 2314 1316.6 -3 -2.4398 1.4302 0.232
Dropping insig terms help improve model.
Note: Step function is not really helping in deciding which predictors to put in model, since when doing lack of fit for (full3,model with predictor chooseing step) indicates that the reduced model does not fit—> dropping terms as indicating in Step function is not a good choice.
Fit model with higer order terms:
# lm.fit4: model based on lm.fit3 adding higer order for all numerical variables
lm.fit4<-lm(imdb_score ~poly(num_voted_users,2)+poly(num_critic_for_reviews,2)+poly(num_user_for_reviews,2)+poly(duration,2)+poly(gross,2)+poly(movie_facebook_likes,2)+poly(budget,2)+factor(title_year)+factor(genres)+duration*num_voted_users+num_voted_users*num_user_for_reviews+gross*budget,data=movie_train)
summary(lm.fit4)
Call:
lm(formula = imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews,
2) + poly(num_user_for_reviews, 2) + poly(duration, 2) +
poly(gross, 2) + poly(movie_facebook_likes, 2) + poly(budget,
2) + factor(title_year) + factor(genres) + duration * num_voted_users +
num_voted_users * num_user_for_reviews + gross * budget,
data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-5.0218 -0.3524 0.0497 0.4363 2.1993
Coefficients: (6 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.186e+00 7.240e-01 7.163 1.06e-12 ***
poly(num_voted_users, 2)1 4.096e+01 5.068e+00 8.082 1.01e-15 ***
poly(num_voted_users, 2)2 -1.564e+01 2.208e+00 -7.083 1.86e-12 ***
poly(num_critic_for_reviews, 2)1 2.336e+01 1.767e+00 13.214 < 2e-16 ***
poly(num_critic_for_reviews, 2)2 -1.084e+01 1.001e+00 -10.833 < 2e-16 ***
poly(num_user_for_reviews, 2)1 -2.383e+01 2.296e+00 -10.380 < 2e-16 ***
poly(num_user_for_reviews, 2)2 7.092e+00 1.595e+00 4.446 9.15e-06 ***
poly(duration, 2)1 1.293e+01 1.107e+00 11.684 < 2e-16 ***
poly(duration, 2)2 -2.798e+00 7.884e-01 -3.548 0.000396 ***
poly(gross, 2)1 -5.231e+00 2.323e+00 -2.252 0.024443 *
poly(gross, 2)2 -1.503e+00 1.235e+00 -1.217 0.223750
poly(movie_facebook_likes, 2)1 1.077e+00 1.438e+00 0.749 0.453833
poly(movie_facebook_likes, 2)2 1.906e+00 8.775e-01 2.172 0.029935 *
poly(budget, 2)1 -1.418e+01 1.960e+00 -7.236 6.26e-13 ***
poly(budget, 2)2 6.282e+00 1.119e+00 5.612 2.24e-08 ***
factor(title_year)1929 1.847e+00 1.017e+00 1.816 0.069576 .
factor(title_year)1933 3.076e+00 1.018e+00 3.022 0.002541 **
factor(title_year)1935 3.258e+00 1.018e+00 3.200 0.001393 **
factor(title_year)1936 2.916e+00 1.019e+00 2.861 0.004266 **
factor(title_year)1937 1.449e+00 1.029e+00 1.408 0.159297
factor(title_year)1940 1.491e+00 1.026e+00 1.453 0.146365
factor(title_year)1946 1.500e+00 1.018e+00 1.474 0.140547
factor(title_year)1950 2.025e+00 1.019e+00 1.987 0.047065 *
factor(title_year)1952 1.096e+00 1.018e+00 1.077 0.281756
factor(title_year)1953 1.659e+00 8.819e-01 1.881 0.060113 .
factor(title_year)1954 2.277e+00 1.016e+00 2.241 0.025152 *
factor(title_year)1959 1.967e+00 1.019e+00 1.930 0.053781 .
factor(title_year)1960 1.951e+00 1.023e+00 1.907 0.056606 .
factor(title_year)1963 2.380e+00 1.022e+00 2.330 0.019914 *
factor(title_year)1964 1.882e+00 8.834e-01 2.130 0.033242 *
factor(title_year)1965 1.301e+00 8.091e-01 1.607 0.108087
factor(title_year)1969 1.822e+00 1.021e+00 1.785 0.074433 .
factor(title_year)1970 1.210e+00 8.348e-01 1.450 0.147246
factor(title_year)1971 1.324e+00 8.803e-01 1.504 0.132702
factor(title_year)1972 1.606e+00 8.866e-01 1.811 0.070217 .
factor(title_year)1973 2.099e+00 8.065e-01 2.603 0.009297 **
factor(title_year)1974 2.489e+00 7.922e-01 3.142 0.001698 **
factor(title_year)1975 8.069e-01 8.877e-01 0.909 0.363427
factor(title_year)1976 1.724e+00 1.018e+00 1.693 0.090536 .
factor(title_year)1977 1.794e+00 8.328e-01 2.154 0.031348 *
factor(title_year)1978 2.225e+00 7.806e-01 2.851 0.004401 **
factor(title_year)1979 1.437e+00 8.395e-01 1.711 0.087191 .
factor(title_year)1980 1.544e+00 7.625e-01 2.025 0.042968 *
factor(title_year)1981 1.325e+00 7.707e-01 1.720 0.085638 .
factor(title_year)1982 1.381e+00 7.487e-01 1.845 0.065236 .
factor(title_year)1983 1.759e+00 7.598e-01 2.316 0.020672 *
factor(title_year)1984 1.569e+00 7.418e-01 2.115 0.034513 *
factor(title_year)1985 1.585e+00 7.595e-01 2.087 0.036965 *
factor(title_year)1986 1.530e+00 7.440e-01 2.056 0.039906 *
factor(title_year)1987 1.194e+00 7.392e-01 1.616 0.106276
factor(title_year)1988 1.668e+00 7.359e-01 2.267 0.023469 *
factor(title_year)1989 1.477e+00 7.405e-01 1.995 0.046175 *
factor(title_year)1990 1.433e+00 7.406e-01 1.934 0.053188 .
factor(title_year)1991 1.493e+00 7.353e-01 2.031 0.042419 *
factor(title_year)1992 1.854e+00 7.339e-01 2.526 0.011610 *
factor(title_year)1993 1.643e+00 7.343e-01 2.237 0.025377 *
factor(title_year)1994 1.647e+00 7.302e-01 2.255 0.024224 *
factor(title_year)1995 1.485e+00 7.281e-01 2.040 0.041482 *
factor(title_year)1996 1.542e+00 7.258e-01 2.124 0.033742 *
factor(title_year)1997 1.397e+00 7.257e-01 1.926 0.054272 .
factor(title_year)1998 1.618e+00 7.261e-01 2.229 0.025932 *
factor(title_year)1999 1.377e+00 7.250e-01 1.899 0.057652 .
factor(title_year)2000 1.222e+00 7.247e-01 1.687 0.091824 .
factor(title_year)2001 1.326e+00 7.244e-01 1.830 0.067362 .
factor(title_year)2002 1.261e+00 7.244e-01 1.740 0.081964 .
factor(title_year)2003 1.097e+00 7.255e-01 1.512 0.130687
factor(title_year)2004 1.147e+00 7.252e-01 1.581 0.113909
factor(title_year)2005 1.124e+00 7.252e-01 1.550 0.121356
factor(title_year)2006 1.063e+00 7.253e-01 1.465 0.143020
factor(title_year)2007 8.368e-01 7.256e-01 1.153 0.248953
factor(title_year)2008 6.883e-01 7.254e-01 0.949 0.342817
factor(title_year)2009 6.621e-01 7.257e-01 0.912 0.361676
factor(title_year)2010 5.454e-01 7.260e-01 0.751 0.452586
factor(title_year)2011 4.013e-01 7.267e-01 0.552 0.580825
factor(title_year)2012 5.815e-01 7.264e-01 0.801 0.423456
factor(title_year)2013 4.949e-01 7.272e-01 0.681 0.496246
factor(title_year)2014 5.977e-01 7.272e-01 0.822 0.411238
factor(title_year)2015 7.098e-01 7.279e-01 0.975 0.329579
factor(title_year)2016 1.271e+00 7.337e-01 1.732 0.083429 .
factor(genres)Adventure 3.877e-01 5.879e-02 6.594 5.28e-11 ***
factor(genres)Animation 8.672e-01 1.372e-01 6.322 3.09e-10 ***
factor(genres)Biography 6.381e-01 8.031e-02 7.946 2.99e-15 ***
factor(genres)Comedy 1.341e-01 4.658e-02 2.878 0.004039 **
factor(genres)Crime 3.975e-01 6.896e-02 5.765 9.28e-09 ***
factor(genres)Documentary 1.243e+00 1.632e-01 7.614 3.84e-14 ***
factor(genres)Drama 5.458e-01 5.253e-02 10.389 < 2e-16 ***
factor(genres)Family 6.427e-01 8.056e-01 0.798 0.425022
factor(genres)Fantasy -2.324e-01 1.486e-01 -1.563 0.118075
factor(genres)Horror -4.070e-01 8.610e-02 -4.727 2.41e-06 ***
factor(genres)Musical NA NA NA NA
factor(genres)Mystery 1.811e-01 1.973e-01 0.918 0.358642
factor(genres)Romance 8.649e-01 5.113e-01 1.692 0.090853 .
factor(genres)Sci-Fi 2.507e-01 2.994e-01 0.837 0.402458
factor(genres)Thriller -5.580e-02 7.252e-01 -0.077 0.938675
factor(genres)Western 1.037e+00 7.576e-01 1.368 0.171361
duration NA NA NA NA
num_voted_users NA NA NA NA
num_user_for_reviews NA NA NA NA
gross NA NA NA NA
budget NA NA NA NA
duration:num_voted_users -1.747e-08 3.746e-09 -4.665 3.26e-06 ***
num_voted_users:num_user_for_reviews 8.766e-10 3.007e-10 2.915 0.003594 **
gross:budget 1.208e-17 6.330e-18 1.908 0.056550 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7176 on 2307 degrees of freedom
Multiple R-squared: 0.5546, Adjusted R-squared: 0.536
F-statistic: 29.92 on 96 and 2307 DF, p-value: < 2.2e-16
The second order term for ‘gross’ is not sig, can be droped. movie fb like is not sig, can be drop
# lm.fit5: based on lm.fit4 dropping he second order term for 'gross' is not sig, can be droped movie fb like is not sig, can be drop nad gross and budget interaction.
lm.fit5<-lm(imdb_score~poly(num_voted_users,2)+poly(num_critic_for_reviews,2)+poly(num_user_for_reviews,2)+poly(duration,2)+gross+poly(budget,2)+factor(title_year)+factor(genres)+duration*num_voted_users+num_voted_users*num_user_for_reviews,data=movie_train)
summary(lm.fit5)
Call:
lm(formula = imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews,
2) + poly(num_user_for_reviews, 2) + poly(duration, 2) +
gross + poly(budget, 2) + factor(title_year) + factor(genres) +
duration * num_voted_users + num_voted_users * num_user_for_reviews,
data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-5.0230 -0.3461 0.0533 0.4357 2.1903
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.214e+00 7.250e-01 7.191 8.62e-13 ***
poly(num_voted_users, 2)1 3.910e+01 4.948e+00 7.902 4.21e-15 ***
poly(num_voted_users, 2)2 -1.524e+01 2.177e+00 -7.000 3.35e-12 ***
poly(num_critic_for_reviews, 2)1 2.387e+01 1.636e+00 14.588 < 2e-16 ***
poly(num_critic_for_reviews, 2)2 -9.559e+00 8.464e-01 -11.294 < 2e-16 ***
poly(num_user_for_reviews, 2)1 -2.334e+01 2.271e+00 -10.279 < 2e-16 ***
poly(num_user_for_reviews, 2)2 7.076e+00 1.593e+00 4.443 9.29e-06 ***
poly(duration, 2)1 1.284e+01 1.107e+00 11.597 < 2e-16 ***
poly(duration, 2)2 -2.762e+00 7.875e-01 -3.508 0.000461 ***
gross -3.373e-10 3.449e-10 -0.978 0.328194
poly(budget, 2)1 -1.136e+01 1.187e+00 -9.572 < 2e-16 ***
poly(budget, 2)2 7.568e+00 8.000e-01 9.460 < 2e-16 ***
factor(title_year)1929 1.868e+00 1.019e+00 1.834 0.066851 .
factor(title_year)1933 3.102e+00 1.019e+00 3.044 0.002362 **
factor(title_year)1935 3.284e+00 1.019e+00 3.221 0.001296 **
factor(title_year)1936 2.980e+00 1.020e+00 2.921 0.003522 **
factor(title_year)1937 1.376e+00 1.028e+00 1.337 0.181201
factor(title_year)1940 1.504e+00 1.027e+00 1.465 0.143191
factor(title_year)1946 1.516e+00 1.019e+00 1.488 0.136985
factor(title_year)1950 2.038e+00 1.020e+00 1.997 0.045943 *
factor(title_year)1952 1.108e+00 1.019e+00 1.087 0.277290
factor(title_year)1953 1.676e+00 8.830e-01 1.898 0.057798 .
factor(title_year)1954 2.326e+00 1.017e+00 2.286 0.022327 *
factor(title_year)1959 2.007e+00 1.020e+00 1.966 0.049379 *
factor(title_year)1960 1.986e+00 1.024e+00 1.939 0.052568 .
factor(title_year)1963 2.394e+00 1.023e+00 2.341 0.019332 *
factor(title_year)1964 1.891e+00 8.842e-01 2.138 0.032612 *
factor(title_year)1965 1.284e+00 8.100e-01 1.585 0.113123
factor(title_year)1969 1.832e+00 1.022e+00 1.793 0.073073 .
factor(title_year)1970 1.238e+00 8.359e-01 1.481 0.138614
factor(title_year)1971 1.337e+00 8.814e-01 1.516 0.129565
factor(title_year)1972 1.556e+00 8.874e-01 1.754 0.079621 .
factor(title_year)1973 2.057e+00 8.067e-01 2.550 0.010836 *
factor(title_year)1974 2.472e+00 7.931e-01 3.117 0.001852 **
factor(title_year)1975 6.799e-01 8.870e-01 0.767 0.443450
factor(title_year)1976 1.765e+00 1.019e+00 1.731 0.083542 .
factor(title_year)1977 1.821e+00 8.336e-01 2.184 0.029055 *
factor(title_year)1978 2.218e+00 7.815e-01 2.838 0.004573 **
factor(title_year)1979 1.425e+00 8.404e-01 1.696 0.090093 .
factor(title_year)1980 1.537e+00 7.634e-01 2.014 0.044161 *
factor(title_year)1981 1.330e+00 7.716e-01 1.723 0.084990 .
factor(title_year)1982 1.399e+00 7.496e-01 1.867 0.062079 .
factor(title_year)1983 1.751e+00 7.607e-01 2.301 0.021471 *
factor(title_year)1984 1.573e+00 7.426e-01 2.119 0.034234 *
factor(title_year)1985 1.594e+00 7.605e-01 2.096 0.036229 *
factor(title_year)1986 1.539e+00 7.449e-01 2.066 0.038902 *
factor(title_year)1987 1.205e+00 7.401e-01 1.628 0.103587
factor(title_year)1988 1.686e+00 7.368e-01 2.288 0.022226 *
factor(title_year)1989 1.493e+00 7.414e-01 2.013 0.044216 *
factor(title_year)1990 1.437e+00 7.415e-01 1.938 0.052771 .
factor(title_year)1991 1.506e+00 7.363e-01 2.045 0.040942 *
factor(title_year)1992 1.869e+00 7.348e-01 2.543 0.011048 *
factor(title_year)1993 1.648e+00 7.352e-01 2.242 0.025056 *
factor(title_year)1994 1.653e+00 7.312e-01 2.261 0.023866 *
factor(title_year)1995 1.506e+00 7.291e-01 2.065 0.038988 *
factor(title_year)1996 1.563e+00 7.268e-01 2.151 0.031583 *
factor(title_year)1997 1.420e+00 7.266e-01 1.955 0.050726 .
factor(title_year)1998 1.645e+00 7.270e-01 2.262 0.023766 *
factor(title_year)1999 1.400e+00 7.259e-01 1.929 0.053861 .
factor(title_year)2000 1.255e+00 7.255e-01 1.730 0.083720 .
factor(title_year)2001 1.355e+00 7.253e-01 1.868 0.061839 .
factor(title_year)2002 1.286e+00 7.253e-01 1.773 0.076379 .
factor(title_year)2003 1.125e+00 7.263e-01 1.549 0.121519
factor(title_year)2004 1.180e+00 7.260e-01 1.625 0.104329
factor(title_year)2005 1.157e+00 7.260e-01 1.593 0.111187
factor(title_year)2006 1.098e+00 7.261e-01 1.513 0.130463
factor(title_year)2007 8.760e-01 7.264e-01 1.206 0.227924
factor(title_year)2008 7.260e-01 7.262e-01 1.000 0.317541
factor(title_year)2009 6.953e-01 7.266e-01 0.957 0.338715
factor(title_year)2010 5.733e-01 7.270e-01 0.789 0.430409
factor(title_year)2011 4.119e-01 7.275e-01 0.566 0.571388
factor(title_year)2012 5.962e-01 7.272e-01 0.820 0.412413
factor(title_year)2013 5.090e-01 7.278e-01 0.699 0.484450
factor(title_year)2014 6.209e-01 7.278e-01 0.853 0.393647
factor(title_year)2015 7.377e-01 7.283e-01 1.013 0.311204
factor(title_year)2016 1.297e+00 7.339e-01 1.767 0.077278 .
factor(genres)Adventure 3.931e-01 5.868e-02 6.698 2.64e-11 ***
factor(genres)Animation 8.640e-01 1.372e-01 6.300 3.56e-10 ***
factor(genres)Biography 6.375e-01 8.035e-02 7.934 3.29e-15 ***
factor(genres)Comedy 1.373e-01 4.648e-02 2.954 0.003170 **
factor(genres)Crime 4.053e-01 6.891e-02 5.882 4.64e-09 ***
factor(genres)Documentary 1.241e+00 1.633e-01 7.597 4.38e-14 ***
factor(genres)Drama 5.473e-01 5.255e-02 10.416 < 2e-16 ***
factor(genres)Family 1.067e-01 7.564e-01 0.141 0.887857
factor(genres)Fantasy -2.287e-01 1.487e-01 -1.538 0.124122
factor(genres)Horror -4.007e-01 8.604e-02 -4.657 3.40e-06 ***
factor(genres)Musical NA NA NA NA
factor(genres)Mystery 1.784e-01 1.973e-01 0.904 0.366068
factor(genres)Romance 8.436e-01 5.119e-01 1.648 0.099478 .
factor(genres)Sci-Fi 2.558e-01 2.997e-01 0.853 0.393559
factor(genres)Thriller -6.825e-02 7.258e-01 -0.094 0.925085
factor(genres)Western 1.040e+00 7.586e-01 1.371 0.170472
duration NA NA NA NA
num_voted_users NA NA NA NA
num_user_for_reviews NA NA NA NA
duration:num_voted_users -1.580e-08 3.694e-09 -4.279 1.96e-05 ***
num_voted_users:num_user_for_reviews 8.430e-10 2.978e-10 2.830 0.004689 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7186 on 2311 degrees of freedom
Multiple R-squared: 0.5526, Adjusted R-squared: 0.5348
F-statistic: 31.03 on 92 and 2311 DF, p-value: < 2.2e-16
anova(lm.fit4,lm.fit5)
Analysis of Variance Table
Model 1: imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews,
2) + poly(num_user_for_reviews, 2) + poly(duration, 2) +
poly(gross, 2) + poly(movie_facebook_likes, 2) + poly(budget,
2) + factor(title_year) + factor(genres) + duration * num_voted_users +
num_voted_users * num_user_for_reviews + gross * budget
Model 2: imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews,
2) + poly(num_user_for_reviews, 2) + poly(duration, 2) +
gross + poly(budget, 2) + factor(title_year) + factor(genres) +
duration * num_voted_users + num_voted_users * num_user_for_reviews
Res.Df RSS Df Sum of Sq F Pr(>F)
1 2307 1188.1
2 2311 1193.3 -4 -5.2296 2.5387 0.03818 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lm.fit5 is not betetr than lm.fit4. Also,lm.fit4 has higher R^2(0.5442). therefore, lm.fit4 better.
Diagnostics for lm.fit5:
plot(lm.fit4)
not plotting observations with leverage one:
57, 171, 281, 470, 496, 677, 1026, 1179, 1484, 1497, 1567, 1625, 1802, 2092, 2220
not plotting observations with leverage one:
57, 171, 281, 470, 496, 677, 1026, 1179, 1484, 1497, 1567, 1625, 1802, 2092, 2220
NaNs producedNaNs produced
library(car)
package ‘car’ was built under R version 3.3.2
residualPlots(lm.fit4)
library(car)
residualPlots(lm.fit4)
Test stat Pr(>|t|)
poly(num_voted_users, 2) NA NA
poly(num_critic_for_reviews, 2) NA NA
poly(num_user_for_reviews, 2) NA NA
poly(duration, 2) NA NA
poly(gross, 2) NA NA
poly(movie_facebook_likes, 2) NA NA
poly(budget, 2) NA NA
factor(title_year) NA NA
factor(genres) NA NA
duration -0.508 0.612
num_voted_users 0.363 0.717
num_user_for_reviews -0.858 0.391
gross -0.018 0.985
budget -0.384 0.701
Tukey test -11.819 0.000
everything looks good since they are straight line. But the resudial vs fitted is cerved.
Marginal Model plot:
library(car)
marginalModelPlots(lm.fit4)
Splines and/or polynomials replaced by a fitted linear combination
Good fit. Model doing well.
Check for residual ourliers: Note: the reslur outliers are from the whole dataset, instead of train.
library(car)
qqPlot(lm.fit4$residuals,id.n = 20)
2835 3341 3924 2269 3526 900 2193 2853 2984 3665 2067 320 4541 2582 1191 4286 496 1999 4084
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
3601
20
library(car)
package ‘car’ was built under R version 3.3.2
outlierTest(lm.fit4) # H0: residual is not an outlier
rstudent unadjusted p-value Bonferonni p
2835 -7.331055 3.1440e-13 7.5110e-10
3924 -5.885833 4.5384e-09 1.0842e-05
2269 -4.855973 1.2783e-06 3.0538e-03
900 -4.655161 3.4210e-06 8.1727e-03
3506 -4.585580 4.7694e-06 1.1394e-02
2193 -4.551960 5.5910e-06 1.3357e-02
2984 -4.520537 6.4803e-06 1.5481e-02
2853 -4.490062 7.4710e-06 1.7848e-02
All of the 10 residuals have significant p-values, therefore, we can drop them.
Before we drop, let’s do some digsnostics to double check which to drop.
library(car)
influencePlot(lm.fit4, id.n=20)
From the influcence plot, we decided to drop observations: 3268,3281,98,837,4708,1602,2835,3467,4929,1938
# lm.fit5: model based on lm.fit3 removing 10 outliers.
movie_train<-movie_train[-c(3268,3281,98,837,4708,1602,2835,3467,4929,1938),]
lm.fit6<-lm(imdb_score~poly(num_voted_users,2)+poly(num_critic_for_reviews,2)+poly(num_user_for_reviews,2)+poly(duration,2)+poly(gross,2)+poly(movie_facebook_likes,2)+poly(budget,2)+factor(title_year)+factor(genres)+duration*num_voted_users+num_voted_users*num_user_for_reviews+gross*budget,data=movie_train)
summary(lm.fit6)
Call:
lm(formula = imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews,
2) + poly(num_user_for_reviews, 2) + poly(duration, 2) +
poly(gross, 2) + poly(movie_facebook_likes, 2) + poly(budget,
2) + factor(title_year) + factor(genres) + duration * num_voted_users +
num_voted_users * num_user_for_reviews + gross * budget,
data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-5.0214 -0.3520 0.0511 0.4360 2.1976
Coefficients: (6 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.185e+00 7.242e-01 7.159 1.09e-12 ***
poly(num_voted_users, 2)1 4.076e+01 5.069e+00 8.041 1.41e-15 ***
poly(num_voted_users, 2)2 -1.572e+01 2.208e+00 -7.118 1.46e-12 ***
poly(num_critic_for_reviews, 2)1 2.336e+01 1.763e+00 13.251 < 2e-16 ***
poly(num_critic_for_reviews, 2)2 -1.073e+01 9.958e-01 -10.776 < 2e-16 ***
poly(num_user_for_reviews, 2)1 -2.385e+01 2.296e+00 -10.387 < 2e-16 ***
poly(num_user_for_reviews, 2)2 7.081e+00 1.595e+00 4.439 9.44e-06 ***
poly(duration, 2)1 1.285e+01 1.108e+00 11.603 < 2e-16 ***
poly(duration, 2)2 -2.766e+00 7.887e-01 -3.508 0.000461 ***
poly(gross, 2)1 -5.185e+00 2.324e+00 -2.231 0.025753 *
poly(gross, 2)2 -1.494e+00 1.236e+00 -1.209 0.226860
poly(movie_facebook_likes, 2)1 1.057e+00 1.430e+00 0.739 0.459978
poly(movie_facebook_likes, 2)2 1.905e+00 8.764e-01 2.174 0.029794 *
poly(budget, 2)1 -1.414e+01 1.955e+00 -7.233 6.38e-13 ***
poly(budget, 2)2 6.319e+00 1.119e+00 5.649 1.81e-08 ***
factor(title_year)1929 1.845e+00 1.018e+00 1.813 0.069910 .
factor(title_year)1933 3.074e+00 1.018e+00 3.019 0.002560 **
factor(title_year)1935 3.256e+00 1.018e+00 3.197 0.001409 **
factor(title_year)1936 2.914e+00 1.019e+00 2.858 0.004304 **
factor(title_year)1937 1.446e+00 1.030e+00 1.405 0.160246
factor(title_year)1940 1.490e+00 1.026e+00 1.451 0.146829
factor(title_year)1946 1.503e+00 1.018e+00 1.476 0.139955
factor(title_year)1950 2.023e+00 1.019e+00 1.985 0.047275 *
factor(title_year)1952 1.099e+00 1.018e+00 1.079 0.280608
factor(title_year)1953 1.659e+00 8.822e-01 1.880 0.060176 .
factor(title_year)1954 2.277e+00 1.016e+00 2.240 0.025169 *
factor(title_year)1959 1.967e+00 1.020e+00 1.930 0.053762 .
factor(title_year)1960 1.950e+00 1.023e+00 1.906 0.056789 .
factor(title_year)1963 2.384e+00 1.022e+00 2.333 0.019740 *
factor(title_year)1964 1.885e+00 8.837e-01 2.133 0.033040 *
factor(title_year)1965 1.304e+00 8.093e-01 1.611 0.107290
factor(title_year)1969 1.820e+00 1.021e+00 1.782 0.074833 .
factor(title_year)1970 1.213e+00 8.351e-01 1.453 0.146320
factor(title_year)1971 1.326e+00 8.806e-01 1.506 0.132318
factor(title_year)1972 1.607e+00 8.869e-01 1.812 0.070169 .
factor(title_year)1973 2.099e+00 8.067e-01 2.602 0.009320 **
factor(title_year)1974 2.489e+00 7.925e-01 3.141 0.001704 **
factor(title_year)1975 8.067e-01 8.879e-01 0.909 0.363665
factor(title_year)1976 1.725e+00 1.018e+00 1.693 0.090507 .
factor(title_year)1977 1.795e+00 8.330e-01 2.155 0.031254 *
factor(title_year)1978 2.226e+00 7.808e-01 2.851 0.004394 **
factor(title_year)1979 1.437e+00 8.398e-01 1.711 0.087276 .
factor(title_year)1980 1.545e+00 7.627e-01 2.026 0.042851 *
factor(title_year)1981 1.325e+00 7.709e-01 1.719 0.085783 .
factor(title_year)1982 1.383e+00 7.489e-01 1.846 0.064992 .
factor(title_year)1983 1.849e+00 7.650e-01 2.417 0.015738 *
factor(title_year)1984 1.569e+00 7.420e-01 2.115 0.034577 *
factor(title_year)1985 1.587e+00 7.597e-01 2.089 0.036832 *
factor(title_year)1986 1.529e+00 7.442e-01 2.055 0.040002 *
factor(title_year)1987 1.193e+00 7.394e-01 1.614 0.106751
factor(title_year)1988 1.669e+00 7.361e-01 2.267 0.023476 *
factor(title_year)1989 1.478e+00 7.407e-01 1.995 0.046165 *
factor(title_year)1990 1.433e+00 7.408e-01 1.934 0.053187 .
factor(title_year)1991 1.493e+00 7.355e-01 2.030 0.042477 *
factor(title_year)1992 1.854e+00 7.341e-01 2.526 0.011601 *
factor(title_year)1993 1.643e+00 7.345e-01 2.237 0.025363 *
factor(title_year)1994 1.647e+00 7.304e-01 2.255 0.024201 *
factor(title_year)1995 1.486e+00 7.283e-01 2.040 0.041426 *
factor(title_year)1996 1.543e+00 7.260e-01 2.125 0.033708 *
factor(title_year)1997 1.398e+00 7.259e-01 1.926 0.054203 .
factor(title_year)1998 1.619e+00 7.263e-01 2.229 0.025901 *
factor(title_year)1999 1.378e+00 7.252e-01 1.900 0.057586 .
factor(title_year)2000 1.223e+00 7.249e-01 1.687 0.091709 .
factor(title_year)2001 1.326e+00 7.246e-01 1.830 0.067311 .
factor(title_year)2002 1.261e+00 7.246e-01 1.740 0.081943 .
factor(title_year)2003 1.087e+00 7.257e-01 1.498 0.134348
factor(title_year)2004 1.148e+00 7.254e-01 1.582 0.113765
factor(title_year)2005 1.124e+00 7.254e-01 1.550 0.121255
factor(title_year)2006 1.063e+00 7.255e-01 1.466 0.142892
factor(title_year)2007 8.374e-01 7.258e-01 1.154 0.248728
factor(title_year)2008 6.891e-01 7.256e-01 0.950 0.342352
factor(title_year)2009 6.628e-01 7.259e-01 0.913 0.361358
factor(title_year)2010 5.461e-01 7.263e-01 0.752 0.452193
factor(title_year)2011 4.019e-01 7.269e-01 0.553 0.580404
factor(title_year)2012 5.815e-01 7.266e-01 0.800 0.423649
factor(title_year)2013 4.972e-01 7.275e-01 0.684 0.494356
factor(title_year)2014 5.984e-01 7.274e-01 0.823 0.410753
factor(title_year)2015 7.110e-01 7.281e-01 0.977 0.328890
factor(title_year)2016 1.271e+00 7.339e-01 1.732 0.083351 .
factor(genres)Adventure 3.861e-01 5.886e-02 6.559 6.66e-11 ***
factor(genres)Animation 8.650e-01 1.372e-01 6.304 3.47e-10 ***
factor(genres)Biography 6.384e-01 8.037e-02 7.944 3.04e-15 ***
factor(genres)Comedy 1.331e-01 4.662e-02 2.854 0.004355 **
factor(genres)Crime 3.963e-01 6.902e-02 5.742 1.06e-08 ***
factor(genres)Documentary 1.241e+00 1.633e-01 7.603 4.18e-14 ***
factor(genres)Drama 5.449e-01 5.269e-02 10.342 < 2e-16 ***
factor(genres)Family 6.358e-01 8.059e-01 0.789 0.430177
factor(genres)Fantasy -2.374e-01 1.487e-01 -1.596 0.110638
factor(genres)Horror -4.083e-01 8.614e-02 -4.740 2.27e-06 ***
factor(genres)Musical NA NA NA NA
factor(genres)Mystery 1.807e-01 1.973e-01 0.916 0.359880
factor(genres)Romance 8.626e-01 5.115e-01 1.687 0.091832 .
factor(genres)Sci-Fi 2.341e-01 2.998e-01 0.781 0.434949
factor(genres)Thriller -5.910e-02 7.254e-01 -0.081 0.935080
factor(genres)Western 1.034e+00 7.578e-01 1.365 0.172397
duration NA NA NA NA
num_voted_users NA NA NA NA
num_user_for_reviews NA NA NA NA
gross NA NA NA NA
budget NA NA NA NA
duration:num_voted_users -1.740e-08 3.749e-09 -4.642 3.65e-06 ***
num_voted_users:num_user_for_reviews 8.855e-10 3.010e-10 2.942 0.003291 **
gross:budget 1.191e-17 6.333e-18 1.880 0.060231 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7178 on 2303 degrees of freedom
Multiple R-squared: 0.5549, Adjusted R-squared: 0.5363
F-statistic: 29.9 on 96 and 2303 DF, p-value: < 2.2e-16
compareCoefs(lm.fit4, lm.fit6)
Call:
1: lm(formula = imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews, 2) +
poly(num_user_for_reviews, 2) + poly(duration, 2) + poly(gross, 2) +
poly(movie_facebook_likes, 2) + poly(budget, 2) + factor(title_year) + factor(genres) +
duration * num_voted_users + num_voted_users * num_user_for_reviews + gross * budget, data
= movie_train)
2: lm(formula = imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews, 2) +
poly(num_user_for_reviews, 2) + poly(duration, 2) + poly(gross, 2) +
poly(movie_facebook_likes, 2) + poly(budget, 2) + factor(title_year) + factor(genres) +
duration * num_voted_users + num_voted_users * num_user_for_reviews + gross * budget, data
= movie_train)
Est. 1 SE 1 Est. 2 SE 2
(Intercept) 5.19e+00 7.24e-01 5.18e+00 7.24e-01
poly(num_voted_users, 2)1 4.10e+01 5.07e+00 4.08e+01 5.07e+00
poly(num_voted_users, 2)2 -1.56e+01 2.21e+00 -1.57e+01 2.21e+00
poly(num_critic_for_reviews, 2)1 2.34e+01 1.77e+00 2.34e+01 1.76e+00
poly(num_critic_for_reviews, 2)2 -1.08e+01 1.00e+00 -1.07e+01 9.96e-01
poly(num_user_for_reviews, 2)1 -2.38e+01 2.30e+00 -2.39e+01 2.30e+00
poly(num_user_for_reviews, 2)2 7.09e+00 1.59e+00 7.08e+00 1.60e+00
poly(duration, 2)1 1.29e+01 1.11e+00 1.29e+01 1.11e+00
poly(duration, 2)2 -2.80e+00 7.88e-01 -2.77e+00 7.89e-01
poly(gross, 2)1 -5.23e+00 2.32e+00 -5.19e+00 2.32e+00
poly(gross, 2)2 -1.50e+00 1.23e+00 -1.49e+00 1.24e+00
poly(movie_facebook_likes, 2)1 1.08e+00 1.44e+00 1.06e+00 1.43e+00
poly(movie_facebook_likes, 2)2 1.91e+00 8.77e-01 1.91e+00 8.76e-01
poly(budget, 2)1 -1.42e+01 1.96e+00 -1.41e+01 1.95e+00
poly(budget, 2)2 6.28e+00 1.12e+00 6.32e+00 1.12e+00
factor(title_year)1929 1.85e+00 1.02e+00 1.85e+00 1.02e+00
factor(title_year)1933 3.08e+00 1.02e+00 3.07e+00 1.02e+00
factor(title_year)1935 3.26e+00 1.02e+00 3.26e+00 1.02e+00
factor(title_year)1936 2.92e+00 1.02e+00 2.91e+00 1.02e+00
factor(title_year)1937 1.45e+00 1.03e+00 1.45e+00 1.03e+00
factor(title_year)1940 1.49e+00 1.03e+00 1.49e+00 1.03e+00
factor(title_year)1946 1.50e+00 1.02e+00 1.50e+00 1.02e+00
factor(title_year)1950 2.02e+00 1.02e+00 2.02e+00 1.02e+00
factor(title_year)1952 1.10e+00 1.02e+00 1.10e+00 1.02e+00
factor(title_year)1953 1.66e+00 8.82e-01 1.66e+00 8.82e-01
factor(title_year)1954 2.28e+00 1.02e+00 2.28e+00 1.02e+00
factor(title_year)1959 1.97e+00 1.02e+00 1.97e+00 1.02e+00
factor(title_year)1960 1.95e+00 1.02e+00 1.95e+00 1.02e+00
factor(title_year)1963 2.38e+00 1.02e+00 2.38e+00 1.02e+00
factor(title_year)1964 1.88e+00 8.83e-01 1.88e+00 8.84e-01
factor(title_year)1965 1.30e+00 8.09e-01 1.30e+00 8.09e-01
factor(title_year)1969 1.82e+00 1.02e+00 1.82e+00 1.02e+00
factor(title_year)1970 1.21e+00 8.35e-01 1.21e+00 8.35e-01
factor(title_year)1971 1.32e+00 8.80e-01 1.33e+00 8.81e-01
factor(title_year)1972 1.61e+00 8.87e-01 1.61e+00 8.87e-01
factor(title_year)1973 2.10e+00 8.06e-01 2.10e+00 8.07e-01
factor(title_year)1974 2.49e+00 7.92e-01 2.49e+00 7.92e-01
factor(title_year)1975 8.07e-01 8.88e-01 8.07e-01 8.88e-01
factor(title_year)1976 1.72e+00 1.02e+00 1.72e+00 1.02e+00
factor(title_year)1977 1.79e+00 8.33e-01 1.80e+00 8.33e-01
factor(title_year)1978 2.23e+00 7.81e-01 2.23e+00 7.81e-01
factor(title_year)1979 1.44e+00 8.40e-01 1.44e+00 8.40e-01
factor(title_year)1980 1.54e+00 7.62e-01 1.55e+00 7.63e-01
factor(title_year)1981 1.33e+00 7.71e-01 1.32e+00 7.71e-01
factor(title_year)1982 1.38e+00 7.49e-01 1.38e+00 7.49e-01
factor(title_year)1983 1.76e+00 7.60e-01 1.85e+00 7.65e-01
factor(title_year)1984 1.57e+00 7.42e-01 1.57e+00 7.42e-01
factor(title_year)1985 1.59e+00 7.60e-01 1.59e+00 7.60e-01
factor(title_year)1986 1.53e+00 7.44e-01 1.53e+00 7.44e-01
factor(title_year)1987 1.19e+00 7.39e-01 1.19e+00 7.39e-01
factor(title_year)1988 1.67e+00 7.36e-01 1.67e+00 7.36e-01
factor(title_year)1989 1.48e+00 7.40e-01 1.48e+00 7.41e-01
factor(title_year)1990 1.43e+00 7.41e-01 1.43e+00 7.41e-01
factor(title_year)1991 1.49e+00 7.35e-01 1.49e+00 7.36e-01
factor(title_year)1992 1.85e+00 7.34e-01 1.85e+00 7.34e-01
factor(title_year)1993 1.64e+00 7.34e-01 1.64e+00 7.34e-01
factor(title_year)1994 1.65e+00 7.30e-01 1.65e+00 7.30e-01
factor(title_year)1995 1.49e+00 7.28e-01 1.49e+00 7.28e-01
factor(title_year)1996 1.54e+00 7.26e-01 1.54e+00 7.26e-01
factor(title_year)1997 1.40e+00 7.26e-01 1.40e+00 7.26e-01
factor(title_year)1998 1.62e+00 7.26e-01 1.62e+00 7.26e-01
factor(title_year)1999 1.38e+00 7.25e-01 1.38e+00 7.25e-01
factor(title_year)2000 1.22e+00 7.25e-01 1.22e+00 7.25e-01
factor(title_year)2001 1.33e+00 7.24e-01 1.33e+00 7.25e-01
factor(title_year)2002 1.26e+00 7.24e-01 1.26e+00 7.25e-01
factor(title_year)2003 1.10e+00 7.25e-01 1.09e+00 7.26e-01
factor(title_year)2004 1.15e+00 7.25e-01 1.15e+00 7.25e-01
factor(title_year)2005 1.12e+00 7.25e-01 1.12e+00 7.25e-01
factor(title_year)2006 1.06e+00 7.25e-01 1.06e+00 7.25e-01
factor(title_year)2007 8.37e-01 7.26e-01 8.37e-01 7.26e-01
factor(title_year)2008 6.88e-01 7.25e-01 6.89e-01 7.26e-01
factor(title_year)2009 6.62e-01 7.26e-01 6.63e-01 7.26e-01
factor(title_year)2010 5.45e-01 7.26e-01 5.46e-01 7.26e-01
factor(title_year)2011 4.01e-01 7.27e-01 4.02e-01 7.27e-01
factor(title_year)2012 5.82e-01 7.26e-01 5.81e-01 7.27e-01
factor(title_year)2013 4.95e-01 7.27e-01 4.97e-01 7.27e-01
factor(title_year)2014 5.98e-01 7.27e-01 5.98e-01 7.27e-01
factor(title_year)2015 7.10e-01 7.28e-01 7.11e-01 7.28e-01
factor(title_year)2016 1.27e+00 7.34e-01 1.27e+00 7.34e-01
factor(genres)Adventure 3.88e-01 5.88e-02 3.86e-01 5.89e-02
factor(genres)Animation 8.67e-01 1.37e-01 8.65e-01 1.37e-01
factor(genres)Biography 6.38e-01 8.03e-02 6.38e-01 8.04e-02
factor(genres)Comedy 1.34e-01 4.66e-02 1.33e-01 4.66e-02
factor(genres)Crime 3.98e-01 6.90e-02 3.96e-01 6.90e-02
factor(genres)Documentary 1.24e+00 1.63e-01 1.24e+00 1.63e-01
factor(genres)Drama 5.46e-01 5.25e-02 5.45e-01 5.27e-02
factor(genres)Family 6.43e-01 8.06e-01 6.36e-01 8.06e-01
factor(genres)Fantasy -2.32e-01 1.49e-01 -2.37e-01 1.49e-01
factor(genres)Horror -4.07e-01 8.61e-02 -4.08e-01 8.61e-02
factor(genres)Musical
factor(genres)Mystery 1.81e-01 1.97e-01 1.81e-01 1.97e-01
factor(genres)Romance 8.65e-01 5.11e-01 8.63e-01 5.11e-01
factor(genres)Sci-Fi 2.51e-01 2.99e-01 2.34e-01 3.00e-01
factor(genres)Thriller -5.58e-02 7.25e-01 -5.91e-02 7.25e-01
factor(genres)Western 1.04e+00 7.58e-01 1.03e+00 7.58e-01
duration
num_voted_users
num_user_for_reviews
gross
budget
duration:num_voted_users -1.75e-08 3.75e-09 -1.74e-08 3.75e-09
num_voted_users:num_user_for_reviews 8.77e-10 3.01e-10 8.85e-10 3.01e-10
gross:budget 1.21e-17 6.33e-18 1.19e-17 6.33e-18
Removing outliers did not change the cefficients too much.
Diagnostics for lm.fit6:
library(car)
residualPlots(lm.fit6)
Looks good except for residuals vs fitted values show some curviture.But, in the box plot for genre, the spread for box is not always the same, which might be a problem.
plot(lm.fit6)
not plotting observations with leverage one:
123, 442, 574, 649, 756, 838, 927, 962, 1305, 1684, 1693, 1774, 2544, 2545
not plotting observations with leverage one:
123, 442, 574, 649, 756, 838, 927, 962, 1305, 1684, 1693, 1774, 2544, 2545
NaNs producedNaNs produced
Now,let’s look at model assumption for both lm.fit3 and lm.fit5:
# normality
shapiro.test(lm.fit4$residuals)
Shapiro-Wilk normality test
data: lm.fit4$residuals
W = 0.9487, p-value < 2.2e-16
shapiro.test(lm.fit6$residuals)
Shapiro-Wilk normality test
data: lm.fit6$residuals
W = 0.94862, p-value < 2.2e-16
Both models failed the normality assumption. I think this is due to the many outliers in the data set.
# equal variance : H0: variance is not constant
library(car)
package ‘car’ was built under R version 3.3.2
ncvTest(lm.fit4)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 214.9373 Df = 1 p = 1.150151e-48
ncvTest(lm.fit6)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 214.9373 Df = 1 p = 1.150151e-48
Both models passed the equal variance assumption.
This is just to explore more interesting facts Plots for data with fitted regression line:
library(ggplot2)
package ‘ggplot2’ was built under R version 3.3.2
ggplot(data=movie_train,aes(x=duration,y=imdb_score,colour=factor(genres)))+stat_smooth(method=lm,fullrange = FALSE)+geom_point()
library(ggplot2)
ggplot(data=movie_train,aes(x=num_voted_users,y=imdb_score,colour=factor(genres)))+stat_smooth(method=lm,fullrange = FALSE)+geom_point()
library(ggplot2)
ggplot(data=movie_train,aes(x=facenumber_in_poster,y=imdb_score,colour=factor(genres)))+stat_smooth(method=lm,fullrange = FALSE)+geom_point()
library(ggplot2)
ggplot(data=movie_train,aes(x=gross,y=imdb_score,colour=factor(genres)))+stat_smooth(method=lm,fullrange = FALSE)+geom_point()
library(ggplot2)
ggplot(data=movie_train,aes(x=budget,y=imdb_score,colour=factor(genres)))+stat_smooth(method=lm,fullrange = FALSE)+geom_point()
Rewriting model lm.fit5 in another notation: # Note, if write in lm(train\(score~train\)x1+train$x2….), it will create the same number of values with the train data set when predict().
# lm.fit7 =lm.fit 6 using difference writing
lm.fit7<-lm(imdb_score~poly(num_voted_users,2)+poly(num_critic_for_reviews,2)+poly(num_user_for_reviews,2)+poly(duration,2)+poly(gross,2)+poly(movie_facebook_likes,2)+poly(budget,2)+factor(title_year)+factor(genres)+duration*num_voted_users+num_voted_users*num_user_for_reviews+gross*budget,data=movie_train)
summary(lm.fit7)
Call:
lm(formula = imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews,
2) + poly(num_user_for_reviews, 2) + poly(duration, 2) +
poly(gross, 2) + poly(movie_facebook_likes, 2) + poly(budget,
2) + factor(title_year) + factor(genres) + duration * num_voted_users +
num_voted_users * num_user_for_reviews + gross * budget,
data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-5.0214 -0.3520 0.0511 0.4360 2.1976
Coefficients: (6 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.185e+00 7.242e-01 7.159 1.09e-12 ***
poly(num_voted_users, 2)1 4.076e+01 5.069e+00 8.041 1.41e-15 ***
poly(num_voted_users, 2)2 -1.572e+01 2.208e+00 -7.118 1.46e-12 ***
poly(num_critic_for_reviews, 2)1 2.336e+01 1.763e+00 13.251 < 2e-16 ***
poly(num_critic_for_reviews, 2)2 -1.073e+01 9.958e-01 -10.776 < 2e-16 ***
poly(num_user_for_reviews, 2)1 -2.385e+01 2.296e+00 -10.387 < 2e-16 ***
poly(num_user_for_reviews, 2)2 7.081e+00 1.595e+00 4.439 9.44e-06 ***
poly(duration, 2)1 1.285e+01 1.108e+00 11.603 < 2e-16 ***
poly(duration, 2)2 -2.766e+00 7.887e-01 -3.508 0.000461 ***
poly(gross, 2)1 -5.185e+00 2.324e+00 -2.231 0.025753 *
poly(gross, 2)2 -1.494e+00 1.236e+00 -1.209 0.226860
poly(movie_facebook_likes, 2)1 1.057e+00 1.430e+00 0.739 0.459978
poly(movie_facebook_likes, 2)2 1.905e+00 8.764e-01 2.174 0.029794 *
poly(budget, 2)1 -1.414e+01 1.955e+00 -7.233 6.38e-13 ***
poly(budget, 2)2 6.319e+00 1.119e+00 5.649 1.81e-08 ***
factor(title_year)1929 1.845e+00 1.018e+00 1.813 0.069910 .
factor(title_year)1933 3.074e+00 1.018e+00 3.019 0.002560 **
factor(title_year)1935 3.256e+00 1.018e+00 3.197 0.001409 **
factor(title_year)1936 2.914e+00 1.019e+00 2.858 0.004304 **
factor(title_year)1937 1.446e+00 1.030e+00 1.405 0.160246
factor(title_year)1940 1.490e+00 1.026e+00 1.451 0.146829
factor(title_year)1946 1.503e+00 1.018e+00 1.476 0.139955
factor(title_year)1950 2.023e+00 1.019e+00 1.985 0.047275 *
factor(title_year)1952 1.099e+00 1.018e+00 1.079 0.280608
factor(title_year)1953 1.659e+00 8.822e-01 1.880 0.060176 .
factor(title_year)1954 2.277e+00 1.016e+00 2.240 0.025169 *
factor(title_year)1959 1.967e+00 1.020e+00 1.930 0.053762 .
factor(title_year)1960 1.950e+00 1.023e+00 1.906 0.056789 .
factor(title_year)1963 2.384e+00 1.022e+00 2.333 0.019740 *
factor(title_year)1964 1.885e+00 8.837e-01 2.133 0.033040 *
factor(title_year)1965 1.304e+00 8.093e-01 1.611 0.107290
factor(title_year)1969 1.820e+00 1.021e+00 1.782 0.074833 .
factor(title_year)1970 1.213e+00 8.351e-01 1.453 0.146320
factor(title_year)1971 1.326e+00 8.806e-01 1.506 0.132318
factor(title_year)1972 1.607e+00 8.869e-01 1.812 0.070169 .
factor(title_year)1973 2.099e+00 8.067e-01 2.602 0.009320 **
factor(title_year)1974 2.489e+00 7.925e-01 3.141 0.001704 **
factor(title_year)1975 8.067e-01 8.879e-01 0.909 0.363665
factor(title_year)1976 1.725e+00 1.018e+00 1.693 0.090507 .
factor(title_year)1977 1.795e+00 8.330e-01 2.155 0.031254 *
factor(title_year)1978 2.226e+00 7.808e-01 2.851 0.004394 **
factor(title_year)1979 1.437e+00 8.398e-01 1.711 0.087276 .
factor(title_year)1980 1.545e+00 7.627e-01 2.026 0.042851 *
factor(title_year)1981 1.325e+00 7.709e-01 1.719 0.085783 .
factor(title_year)1982 1.383e+00 7.489e-01 1.846 0.064992 .
factor(title_year)1983 1.849e+00 7.650e-01 2.417 0.015738 *
factor(title_year)1984 1.569e+00 7.420e-01 2.115 0.034577 *
factor(title_year)1985 1.587e+00 7.597e-01 2.089 0.036832 *
factor(title_year)1986 1.529e+00 7.442e-01 2.055 0.040002 *
factor(title_year)1987 1.193e+00 7.394e-01 1.614 0.106751
factor(title_year)1988 1.669e+00 7.361e-01 2.267 0.023476 *
factor(title_year)1989 1.478e+00 7.407e-01 1.995 0.046165 *
factor(title_year)1990 1.433e+00 7.408e-01 1.934 0.053187 .
factor(title_year)1991 1.493e+00 7.355e-01 2.030 0.042477 *
factor(title_year)1992 1.854e+00 7.341e-01 2.526 0.011601 *
factor(title_year)1993 1.643e+00 7.345e-01 2.237 0.025363 *
factor(title_year)1994 1.647e+00 7.304e-01 2.255 0.024201 *
factor(title_year)1995 1.486e+00 7.283e-01 2.040 0.041426 *
factor(title_year)1996 1.543e+00 7.260e-01 2.125 0.033708 *
factor(title_year)1997 1.398e+00 7.259e-01 1.926 0.054203 .
factor(title_year)1998 1.619e+00 7.263e-01 2.229 0.025901 *
factor(title_year)1999 1.378e+00 7.252e-01 1.900 0.057586 .
factor(title_year)2000 1.223e+00 7.249e-01 1.687 0.091709 .
factor(title_year)2001 1.326e+00 7.246e-01 1.830 0.067311 .
factor(title_year)2002 1.261e+00 7.246e-01 1.740 0.081943 .
factor(title_year)2003 1.087e+00 7.257e-01 1.498 0.134348
factor(title_year)2004 1.148e+00 7.254e-01 1.582 0.113765
factor(title_year)2005 1.124e+00 7.254e-01 1.550 0.121255
factor(title_year)2006 1.063e+00 7.255e-01 1.466 0.142892
factor(title_year)2007 8.374e-01 7.258e-01 1.154 0.248728
factor(title_year)2008 6.891e-01 7.256e-01 0.950 0.342352
factor(title_year)2009 6.628e-01 7.259e-01 0.913 0.361358
factor(title_year)2010 5.461e-01 7.263e-01 0.752 0.452193
factor(title_year)2011 4.019e-01 7.269e-01 0.553 0.580404
factor(title_year)2012 5.815e-01 7.266e-01 0.800 0.423649
factor(title_year)2013 4.972e-01 7.275e-01 0.684 0.494356
factor(title_year)2014 5.984e-01 7.274e-01 0.823 0.410753
factor(title_year)2015 7.110e-01 7.281e-01 0.977 0.328890
factor(title_year)2016 1.271e+00 7.339e-01 1.732 0.083351 .
factor(genres)Adventure 3.861e-01 5.886e-02 6.559 6.66e-11 ***
factor(genres)Animation 8.650e-01 1.372e-01 6.304 3.47e-10 ***
factor(genres)Biography 6.384e-01 8.037e-02 7.944 3.04e-15 ***
factor(genres)Comedy 1.331e-01 4.662e-02 2.854 0.004355 **
factor(genres)Crime 3.963e-01 6.902e-02 5.742 1.06e-08 ***
factor(genres)Documentary 1.241e+00 1.633e-01 7.603 4.18e-14 ***
factor(genres)Drama 5.449e-01 5.269e-02 10.342 < 2e-16 ***
factor(genres)Family 6.358e-01 8.059e-01 0.789 0.430177
factor(genres)Fantasy -2.374e-01 1.487e-01 -1.596 0.110638
factor(genres)Horror -4.083e-01 8.614e-02 -4.740 2.27e-06 ***
factor(genres)Musical NA NA NA NA
factor(genres)Mystery 1.807e-01 1.973e-01 0.916 0.359880
factor(genres)Romance 8.626e-01 5.115e-01 1.687 0.091832 .
factor(genres)Sci-Fi 2.341e-01 2.998e-01 0.781 0.434949
factor(genres)Thriller -5.910e-02 7.254e-01 -0.081 0.935080
factor(genres)Western 1.034e+00 7.578e-01 1.365 0.172397
duration NA NA NA NA
num_voted_users NA NA NA NA
num_user_for_reviews NA NA NA NA
gross NA NA NA NA
budget NA NA NA NA
duration:num_voted_users -1.740e-08 3.749e-09 -4.642 3.65e-06 ***
num_voted_users:num_user_for_reviews 8.855e-10 3.010e-10 2.942 0.003291 **
gross:budget 1.191e-17 6.333e-18 1.880 0.060231 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7178 on 2303 degrees of freedom
Multiple R-squared: 0.5549, Adjusted R-squared: 0.5363
F-statistic: 29.9 on 96 and 2303 DF, p-value: < 2.2e-16
pr<-predict.lm(lm.fit7,newdata = data.frame(movie_test),interval = 'confidence')
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
factor factor(title_year) has new levels 1939, 1947, 1948, 1961
We can’t make prediction. since our test data does not include all the levels of years.
Conclusion: lm.fit7 would be out final model.
Get hands firty exploring other models:
#vote,genre,year,critic,user,budget,duration,mvfclike, vo*duration
lm.fit8<-lm(imdb_score~num_voted_users+num_critic_for_reviews+num_user_for_reviews+budget+duration+movie_facebook_likes+factor(genres)+factor(title_year)+num_voted_users*duration,data=movie_train)
summary(lm.fit8)
Call:
lm(formula = imdb_score ~ num_voted_users + num_critic_for_reviews +
num_user_for_reviews + budget + duration + movie_facebook_likes +
factor(genres) + factor(title_year) + num_voted_users * duration,
data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-4.7473 -0.3563 0.0669 0.4865 2.2245
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.065e+00 7.677e-01 3.992 6.76e-05 ***
num_voted_users 7.090e-06 5.084e-07 13.946 < 2e-16 ***
num_critic_for_reviews 3.950e-03 2.806e-04 14.075 < 2e-16 ***
num_user_for_reviews -6.103e-04 7.385e-05 -8.265 2.33e-16 ***
budget -4.568e-09 4.955e-10 -9.219 < 2e-16 ***
duration 1.166e-02 9.985e-04 11.676 < 2e-16 ***
movie_facebook_likes -5.532e-06 1.295e-06 -4.272 2.01e-05 ***
factor(genres)Adventure 3.723e-01 6.105e-02 6.098 1.25e-09 ***
factor(genres)Animation 8.020e-01 1.425e-01 5.629 2.04e-08 ***
factor(genres)Biography 7.171e-01 8.410e-02 8.526 < 2e-16 ***
factor(genres)Comedy 1.883e-01 4.815e-02 3.911 9.46e-05 ***
factor(genres)Crime 4.501e-01 7.222e-02 6.232 5.45e-10 ***
factor(genres)Documentary 1.235e+00 1.708e-01 7.230 6.55e-13 ***
factor(genres)Drama 5.941e-01 5.477e-02 10.848 < 2e-16 ***
factor(genres)Family 5.526e-01 7.881e-01 0.701 0.48328
factor(genres)Fantasy -1.752e-01 1.561e-01 -1.123 0.26163
factor(genres)Horror -3.722e-01 8.827e-02 -4.216 2.58e-05 ***
factor(genres)Musical 1.954e+00 1.074e+00 1.819 0.06898 .
factor(genres)Mystery 1.887e-01 2.077e-01 0.909 0.36371
factor(genres)Romance 7.712e-01 5.393e-01 1.430 0.15290
factor(genres)Sci-Fi 2.062e-01 3.161e-01 0.652 0.51418
factor(genres)Thriller -2.759e-01 7.640e-01 -0.361 0.71807
factor(genres)Western 1.064e+00 7.995e-01 1.331 0.18333
factor(title_year)1929 NA NA NA NA
factor(title_year)1933 3.180e+00 1.074e+00 2.961 0.00310 **
factor(title_year)1935 3.346e+00 1.074e+00 3.115 0.00186 **
factor(title_year)1936 3.323e+00 1.074e+00 3.093 0.00201 **
factor(title_year)1937 1.786e+00 1.083e+00 1.649 0.09921 .
factor(title_year)1940 1.873e+00 1.082e+00 1.730 0.08376 .
factor(title_year)1946 1.508e+00 1.074e+00 1.404 0.16042
factor(title_year)1950 1.950e+00 1.076e+00 1.812 0.07005 .
factor(title_year)1952 1.155e+00 1.074e+00 1.075 0.28250
factor(title_year)1953 1.772e+00 9.306e-01 1.904 0.05698 .
factor(title_year)1954 2.657e+00 1.072e+00 2.479 0.01324 *
factor(title_year)1959 2.552e+00 1.075e+00 2.375 0.01764 *
factor(title_year)1960 2.402e+00 1.078e+00 2.228 0.02596 *
factor(title_year)1963 2.202e+00 1.077e+00 2.044 0.04110 *
factor(title_year)1964 2.099e+00 9.316e-01 2.253 0.02434 *
factor(title_year)1965 1.307e+00 8.530e-01 1.532 0.12555
factor(title_year)1969 2.121e+00 1.076e+00 1.971 0.04889 *
factor(title_year)1970 1.260e+00 8.806e-01 1.430 0.15275
factor(title_year)1971 1.383e+00 9.289e-01 1.489 0.13669
factor(title_year)1972 1.586e+00 9.333e-01 1.699 0.08940 .
factor(title_year)1973 2.360e+00 8.495e-01 2.779 0.00550 **
factor(title_year)1974 2.640e+00 8.341e-01 3.165 0.00157 **
factor(title_year)1975 1.025e+00 9.337e-01 1.098 0.27223
factor(title_year)1976 1.943e+00 1.074e+00 1.808 0.07066 .
factor(title_year)1977 1.951e+00 8.783e-01 2.221 0.02645 *
factor(title_year)1978 2.229e+00 8.229e-01 2.709 0.00680 **
factor(title_year)1979 1.578e+00 8.815e-01 1.790 0.07361 .
factor(title_year)1980 1.409e+00 8.026e-01 1.755 0.07939 .
factor(title_year)1981 1.356e+00 8.131e-01 1.668 0.09541 .
factor(title_year)1982 1.518e+00 7.897e-01 1.922 0.05469 .
factor(title_year)1983 1.955e+00 8.068e-01 2.423 0.01546 *
factor(title_year)1984 1.734e+00 7.824e-01 2.216 0.02676 *
factor(title_year)1985 1.642e+00 8.012e-01 2.049 0.04058 *
factor(title_year)1986 1.605e+00 7.849e-01 2.045 0.04101 *
factor(title_year)1987 1.254e+00 7.797e-01 1.608 0.10787
factor(title_year)1988 1.801e+00 7.763e-01 2.320 0.02041 *
factor(title_year)1989 1.565e+00 7.812e-01 2.004 0.04521 *
factor(title_year)1990 1.492e+00 7.811e-01 1.910 0.05629 .
factor(title_year)1991 1.484e+00 7.758e-01 1.913 0.05592 .
factor(title_year)1992 1.861e+00 7.742e-01 2.404 0.01628 *
factor(title_year)1993 1.582e+00 7.745e-01 2.043 0.04120 *
factor(title_year)1994 1.517e+00 7.702e-01 1.969 0.04902 *
factor(title_year)1995 1.483e+00 7.681e-01 1.931 0.05366 .
factor(title_year)1996 1.507e+00 7.656e-01 1.969 0.04908 *
factor(title_year)1997 1.431e+00 7.653e-01 1.870 0.06158 .
factor(title_year)1998 1.575e+00 7.655e-01 2.057 0.03977 *
factor(title_year)1999 1.355e+00 7.644e-01 1.772 0.07646 .
factor(title_year)2000 1.199e+00 7.639e-01 1.569 0.11678
factor(title_year)2001 1.285e+00 7.636e-01 1.682 0.09264 .
factor(title_year)2002 1.212e+00 7.635e-01 1.588 0.11249
factor(title_year)2003 1.085e+00 7.647e-01 1.418 0.15627
factor(title_year)2004 1.178e+00 7.642e-01 1.541 0.12344
factor(title_year)2005 1.151e+00 7.644e-01 1.505 0.13236
factor(title_year)2006 1.150e+00 7.645e-01 1.504 0.13270
factor(title_year)2007 1.061e+00 7.648e-01 1.388 0.16530
factor(title_year)2008 8.517e-01 7.644e-01 1.114 0.26532
factor(title_year)2009 8.638e-01 7.647e-01 1.129 0.25881
factor(title_year)2010 7.557e-01 7.650e-01 0.988 0.32332
factor(title_year)2011 6.284e-01 7.655e-01 0.821 0.41178
factor(title_year)2012 7.959e-01 7.655e-01 1.040 0.29858
factor(title_year)2013 7.983e-01 7.658e-01 1.042 0.29735
factor(title_year)2014 8.690e-01 7.657e-01 1.135 0.25654
factor(title_year)2015 9.646e-01 7.663e-01 1.259 0.20823
factor(title_year)2016 1.541e+00 7.723e-01 1.995 0.04614 *
num_voted_users:duration -2.754e-08 3.441e-09 -8.002 1.91e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7575 on 2313 degrees of freedom
Multiple R-squared: 0.5021, Adjusted R-squared: 0.4836
F-statistic: 27.12 on 86 and 2313 DF, p-value: < 2.2e-16
Not good.
lm.fit9<-lm(imdb_score~poly(num_voted_users,2)+poly(num_critic_for_reviews,2)+poly(num_user_for_reviews,2)+poly(budget,2)+poly(duration,2)+poly(movie_facebook_likes,2)+factor(genres)+factor(title_year)+num_voted_users*duration,data=movie_train)
summary(lm.fit9)
Call:
lm(formula = imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews,
2) + poly(num_user_for_reviews, 2) + poly(budget, 2) + poly(duration,
2) + poly(movie_facebook_likes, 2) + factor(genres) + factor(title_year) +
num_voted_users * duration, data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-5.0248 -0.3466 0.0582 0.4337 2.2263
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.304e+00 7.251e-01 7.314 3.55e-13 ***
poly(num_voted_users, 2)1 4.725e+01 4.144e+00 11.404 < 2e-16 ***
poly(num_voted_users, 2)2 -1.007e+01 1.225e+00 -8.216 3.47e-16 ***
poly(num_critic_for_reviews, 2)1 2.409e+01 1.755e+00 13.726 < 2e-16 ***
poly(num_critic_for_reviews, 2)2 -1.056e+01 9.860e-01 -10.710 < 2e-16 ***
poly(num_user_for_reviews, 2)1 -1.877e+01 1.534e+00 -12.235 < 2e-16 ***
poly(num_user_for_reviews, 2)2 1.011e+01 1.170e+00 8.640 < 2e-16 ***
poly(budget, 2)1 -1.182e+01 1.043e+00 -11.332 < 2e-16 ***
poly(budget, 2)2 7.774e+00 8.035e-01 9.675 < 2e-16 ***
poly(duration, 2)1 1.300e+01 1.108e+00 11.730 < 2e-16 ***
poly(duration, 2)2 -2.744e+00 7.878e-01 -3.483 0.000505 ***
poly(movie_facebook_likes, 2)1 8.405e-01 1.427e+00 0.589 0.555834
poly(movie_facebook_likes, 2)2 1.616e+00 8.664e-01 1.866 0.062222 .
factor(genres)Adventure 3.760e-01 5.858e-02 6.419 1.66e-10 ***
factor(genres)Animation 8.385e-01 1.368e-01 6.129 1.04e-09 ***
factor(genres)Biography 6.289e-01 8.042e-02 7.821 7.93e-15 ***
factor(genres)Comedy 1.254e-01 4.641e-02 2.701 0.006954 **
factor(genres)Crime 4.011e-01 6.895e-02 5.817 6.82e-09 ***
factor(genres)Documentary 1.224e+00 1.634e-01 7.488 9.86e-14 ***
factor(genres)Drama 5.409e-01 5.276e-02 10.253 < 2e-16 ***
factor(genres)Family 2.272e-02 7.500e-01 0.030 0.975832
factor(genres)Fantasy -2.303e-01 1.489e-01 -1.547 0.122045
factor(genres)Horror -4.287e-01 8.581e-02 -4.996 6.29e-07 ***
factor(genres)Musical 1.852e+00 1.020e+00 1.816 0.069516 .
factor(genres)Mystery 1.876e-01 1.976e-01 0.950 0.342407
factor(genres)Romance 8.657e-01 5.126e-01 1.689 0.091396 .
factor(genres)Sci-Fi 2.518e-01 3.004e-01 0.838 0.402072
factor(genres)Thriller -6.000e-02 7.271e-01 -0.083 0.934236
factor(genres)Western 1.066e+00 7.595e-01 1.404 0.160523
factor(title_year)1929 NA NA NA NA
factor(title_year)1933 3.090e+00 1.020e+00 3.028 0.002488 **
factor(title_year)1935 3.274e+00 1.021e+00 3.208 0.001357 **
factor(title_year)1936 2.974e+00 1.021e+00 2.912 0.003626 **
factor(title_year)1937 1.314e+00 1.029e+00 1.276 0.201927
factor(title_year)1940 1.481e+00 1.029e+00 1.440 0.150015
factor(title_year)1946 1.502e+00 1.020e+00 1.472 0.141245
factor(title_year)1950 2.037e+00 1.022e+00 1.994 0.046278 *
factor(title_year)1952 1.082e+00 1.021e+00 1.060 0.289238
factor(title_year)1953 1.659e+00 8.842e-01 1.876 0.060737 .
factor(title_year)1954 2.308e+00 1.019e+00 2.266 0.023568 *
factor(title_year)1959 2.014e+00 1.022e+00 1.971 0.048804 *
factor(title_year)1960 2.087e+00 1.025e+00 2.037 0.041760 *
factor(title_year)1963 2.356e+00 1.024e+00 2.301 0.021499 *
factor(title_year)1964 1.856e+00 8.854e-01 2.097 0.036136 *
factor(title_year)1965 1.264e+00 8.108e-01 1.559 0.119035
factor(title_year)1969 1.799e+00 1.023e+00 1.759 0.078747 .
factor(title_year)1970 1.222e+00 8.370e-01 1.460 0.144392
factor(title_year)1971 1.311e+00 8.825e-01 1.486 0.137411
factor(title_year)1972 1.578e+00 8.886e-01 1.776 0.075915 .
factor(title_year)1973 2.037e+00 8.075e-01 2.523 0.011700 *
factor(title_year)1974 2.408e+00 7.939e-01 3.033 0.002451 **
factor(title_year)1975 6.285e-01 8.879e-01 0.708 0.479055
factor(title_year)1976 1.729e+00 1.021e+00 1.694 0.090368 .
factor(title_year)1977 1.801e+00 8.349e-01 2.157 0.031091 *
factor(title_year)1978 2.199e+00 7.824e-01 2.811 0.004981 **
factor(title_year)1979 1.460e+00 8.415e-01 1.735 0.082909 .
factor(title_year)1980 1.480e+00 7.643e-01 1.936 0.052931 .
factor(title_year)1981 1.296e+00 7.726e-01 1.678 0.093528 .
factor(title_year)1982 1.374e+00 7.506e-01 1.831 0.067218 .
factor(title_year)1983 1.801e+00 7.667e-01 2.349 0.018914 *
factor(title_year)1984 1.558e+00 7.436e-01 2.095 0.036310 *
factor(title_year)1985 1.550e+00 7.614e-01 2.036 0.041834 *
factor(title_year)1986 1.539e+00 7.459e-01 2.064 0.039174 *
factor(title_year)1987 1.190e+00 7.411e-01 1.606 0.108399
factor(title_year)1988 1.671e+00 7.378e-01 2.265 0.023635 *
factor(title_year)1989 1.484e+00 7.424e-01 1.998 0.045787 *
factor(title_year)1990 1.418e+00 7.425e-01 1.910 0.056281 .
factor(title_year)1991 1.485e+00 7.372e-01 2.015 0.044038 *
factor(title_year)1992 1.861e+00 7.358e-01 2.529 0.011507 *
factor(title_year)1993 1.643e+00 7.362e-01 2.231 0.025755 *
factor(title_year)1994 1.627e+00 7.321e-01 2.222 0.026382 *
factor(title_year)1995 1.492e+00 7.300e-01 2.044 0.041113 *
factor(title_year)1996 1.561e+00 7.277e-01 2.145 0.032078 *
factor(title_year)1997 1.420e+00 7.275e-01 1.952 0.051061 .
factor(title_year)1998 1.642e+00 7.279e-01 2.256 0.024188 *
factor(title_year)1999 1.390e+00 7.269e-01 1.912 0.056041 .
factor(title_year)2000 1.244e+00 7.265e-01 1.713 0.086935 .
factor(title_year)2001 1.344e+00 7.263e-01 1.851 0.064313 .
factor(title_year)2002 1.269e+00 7.262e-01 1.747 0.080807 .
factor(title_year)2003 1.106e+00 7.274e-01 1.521 0.128438
factor(title_year)2004 1.170e+00 7.270e-01 1.609 0.107820
factor(title_year)2005 1.147e+00 7.271e-01 1.578 0.114809
factor(title_year)2006 1.084e+00 7.271e-01 1.491 0.136074
factor(title_year)2007 8.652e-01 7.274e-01 1.189 0.234421
factor(title_year)2008 7.107e-01 7.272e-01 0.977 0.328505
factor(title_year)2009 6.775e-01 7.276e-01 0.931 0.351854
factor(title_year)2010 5.554e-01 7.279e-01 0.763 0.445534
factor(title_year)2011 4.116e-01 7.285e-01 0.565 0.572118
factor(title_year)2012 5.819e-01 7.283e-01 0.799 0.424363
factor(title_year)2013 5.037e-01 7.292e-01 0.691 0.489730
factor(title_year)2014 6.077e-01 7.291e-01 0.834 0.404634
factor(title_year)2015 7.088e-01 7.298e-01 0.971 0.331499
factor(title_year)2016 1.241e+00 7.355e-01 1.687 0.091749 .
num_voted_users NA NA NA NA
duration NA NA NA NA
num_voted_users:duration -1.705e-08 3.721e-09 -4.582 4.86e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7195 on 2307 degrees of freedom
Multiple R-squared: 0.552, Adjusted R-squared: 0.5341
F-statistic: 30.9 on 92 and 2307 DF, p-value: < 2.2e-16
Try to add some interaction terms:
# adding interaction :movie_facebook_likes*budget
lm.fit10<-lm(imdb_score~poly(num_voted_users,2)+poly(num_critic_for_reviews,2)+poly(num_user_for_reviews,2)+poly(budget,2)+poly(duration,2)+poly(movie_facebook_likes,2)+factor(genres)+factor(title_year)+num_voted_users*duration+budget*num_critic_for_reviews+movie_facebook_likes*budget,data=movie_train)
summary(lm.fit10)
Call:
lm(formula = imdb_score ~ poly(num_voted_users, 2) + poly(num_critic_for_reviews,
2) + poly(num_user_for_reviews, 2) + poly(budget, 2) + poly(duration,
2) + poly(movie_facebook_likes, 2) + factor(genres) + factor(title_year) +
num_voted_users * duration + budget * num_critic_for_reviews +
movie_facebook_likes * budget, data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-4.8916 -0.3497 0.0642 0.4269 2.2368
Coefficients: (6 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.329e+00 7.254e-01 7.346 2.81e-13 ***
poly(num_voted_users, 2)1 4.833e+01 4.156e+00 11.630 < 2e-16 ***
poly(num_voted_users, 2)2 -1.016e+01 1.224e+00 -8.296 < 2e-16 ***
poly(num_critic_for_reviews, 2)1 2.511e+01 2.088e+00 12.028 < 2e-16 ***
poly(num_critic_for_reviews, 2)2 -1.042e+01 1.099e+00 -9.475 < 2e-16 ***
poly(num_user_for_reviews, 2)1 -1.854e+01 1.535e+00 -12.082 < 2e-16 ***
poly(num_user_for_reviews, 2)2 1.013e+01 1.169e+00 8.664 < 2e-16 ***
poly(budget, 2)1 -1.135e+01 2.466e+00 -4.602 4.42e-06 ***
poly(budget, 2)2 7.596e+00 1.017e+00 7.468 1.15e-13 ***
poly(duration, 2)1 1.312e+01 1.108e+00 11.845 < 2e-16 ***
poly(duration, 2)2 -2.799e+00 7.881e-01 -3.552 0.00039 ***
poly(movie_facebook_likes, 2)1 -2.974e+00 2.017e+00 -1.475 0.14048
poly(movie_facebook_likes, 2)2 4.745e-01 9.601e-01 0.494 0.62119
factor(genres)Adventure 3.752e-01 5.866e-02 6.396 1.93e-10 ***
factor(genres)Animation 8.457e-01 1.367e-01 6.187 7.23e-10 ***
factor(genres)Biography 6.326e-01 8.041e-02 7.867 5.54e-15 ***
factor(genres)Comedy 1.252e-01 4.635e-02 2.700 0.00698 **
factor(genres)Crime 4.035e-01 6.902e-02 5.846 5.75e-09 ***
factor(genres)Documentary 1.231e+00 1.632e-01 7.540 6.70e-14 ***
factor(genres)Drama 5.422e-01 5.278e-02 10.273 < 2e-16 ***
factor(genres)Family 6.073e-02 7.491e-01 0.081 0.93540
factor(genres)Fantasy -2.256e-01 1.488e-01 -1.516 0.12976
factor(genres)Horror -4.333e-01 8.595e-02 -5.040 5.00e-07 ***
factor(genres)Musical 1.852e+00 1.019e+00 1.818 0.06925 .
factor(genres)Mystery 1.935e-01 1.974e-01 0.980 0.32703
factor(genres)Romance 8.678e-01 5.119e-01 1.695 0.09019 .
factor(genres)Sci-Fi 2.499e-01 3.000e-01 0.833 0.40510
factor(genres)Thriller -5.373e-02 7.261e-01 -0.074 0.94102
factor(genres)Western 1.071e+00 7.585e-01 1.412 0.15820
factor(title_year)1929 NA NA NA NA
factor(title_year)1933 3.088e+00 1.019e+00 3.030 0.00247 **
factor(title_year)1935 3.274e+00 1.019e+00 3.212 0.00134 **
factor(title_year)1936 2.951e+00 1.020e+00 2.893 0.00385 **
factor(title_year)1937 1.281e+00 1.028e+00 1.246 0.21276
factor(title_year)1940 1.458e+00 1.027e+00 1.419 0.15600
factor(title_year)1946 1.492e+00 1.019e+00 1.464 0.14322
factor(title_year)1950 2.033e+00 1.020e+00 1.992 0.04646 *
factor(title_year)1952 1.072e+00 1.019e+00 1.051 0.29319
factor(title_year)1953 1.655e+00 8.830e-01 1.874 0.06105 .
factor(title_year)1954 2.281e+00 1.017e+00 2.242 0.02505 *
factor(title_year)1959 1.996e+00 1.020e+00 1.956 0.05062 .
factor(title_year)1960 2.050e+00 1.024e+00 2.002 0.04535 *
factor(title_year)1963 2.338e+00 1.023e+00 2.286 0.02233 *
factor(title_year)1964 1.833e+00 8.843e-01 2.072 0.03833 *
factor(title_year)1965 1.254e+00 8.098e-01 1.548 0.12174
factor(title_year)1969 1.768e+00 1.022e+00 1.731 0.08365 .
factor(title_year)1970 1.208e+00 8.359e-01 1.445 0.14856
factor(title_year)1971 1.302e+00 8.814e-01 1.477 0.13974
factor(title_year)1972 1.591e+00 8.875e-01 1.793 0.07309 .
factor(title_year)1973 2.014e+00 8.065e-01 2.497 0.01260 *
factor(title_year)1974 2.390e+00 7.930e-01 3.014 0.00260 **
factor(title_year)1975 6.114e-01 8.871e-01 0.689 0.49072
factor(title_year)1976 1.713e+00 1.019e+00 1.680 0.09312 .
factor(title_year)1977 1.773e+00 8.338e-01 2.127 0.03355 *
factor(title_year)1978 2.172e+00 7.816e-01 2.779 0.00551 **
factor(title_year)1979 1.452e+00 8.404e-01 1.728 0.08415 .
factor(title_year)1980 1.465e+00 7.633e-01 1.919 0.05509 .
factor(title_year)1981 1.281e+00 7.716e-01 1.659 0.09715 .
factor(title_year)1982 1.357e+00 7.497e-01 1.810 0.07041 .
factor(title_year)1983 1.784e+00 7.657e-01 2.330 0.01990 *
factor(title_year)1984 1.540e+00 7.426e-01 2.074 0.03823 *
factor(title_year)1985 1.537e+00 7.604e-01 2.022 0.04331 *
factor(title_year)1986 1.530e+00 7.449e-01 2.054 0.04006 *
factor(title_year)1987 1.181e+00 7.401e-01 1.596 0.11071
factor(title_year)1988 1.661e+00 7.368e-01 2.254 0.02429 *
factor(title_year)1989 1.471e+00 7.414e-01 1.985 0.04732 *
factor(title_year)1990 1.404e+00 7.416e-01 1.893 0.05843 .
factor(title_year)1991 1.475e+00 7.363e-01 2.003 0.04526 *
factor(title_year)1992 1.843e+00 7.349e-01 2.508 0.01222 *
factor(title_year)1993 1.634e+00 7.352e-01 2.222 0.02635 *
factor(title_year)1994 1.617e+00 7.313e-01 2.212 0.02709 *
factor(title_year)1995 1.478e+00 7.292e-01 2.027 0.04276 *
factor(title_year)1996 1.548e+00 7.268e-01 2.129 0.03333 *
factor(title_year)1997 1.407e+00 7.267e-01 1.936 0.05294 .
factor(title_year)1998 1.629e+00 7.271e-01 2.241 0.02512 *
factor(title_year)1999 1.374e+00 7.259e-01 1.893 0.05847 .
factor(title_year)2000 1.236e+00 7.256e-01 1.704 0.08860 .
factor(title_year)2001 1.334e+00 7.253e-01 1.839 0.06610 .
factor(title_year)2002 1.259e+00 7.253e-01 1.735 0.08280 .
factor(title_year)2003 1.098e+00 7.264e-01 1.511 0.13088
factor(title_year)2004 1.160e+00 7.261e-01 1.597 0.11038
factor(title_year)2005 1.139e+00 7.261e-01 1.569 0.11688
factor(title_year)2006 1.079e+00 7.262e-01 1.486 0.13752
factor(title_year)2007 8.700e-01 7.265e-01 1.198 0.23122
factor(title_year)2008 7.112e-01 7.262e-01 0.979 0.32756
factor(title_year)2009 6.776e-01 7.266e-01 0.933 0.35111
factor(title_year)2010 5.532e-01 7.269e-01 0.761 0.44670
factor(title_year)2011 4.054e-01 7.275e-01 0.557 0.57741
factor(title_year)2012 5.807e-01 7.273e-01 0.798 0.42473
factor(title_year)2013 4.912e-01 7.282e-01 0.674 0.50007
factor(title_year)2014 6.057e-01 7.281e-01 0.832 0.40556
factor(title_year)2015 6.934e-01 7.288e-01 0.951 0.34152
factor(title_year)2016 1.191e+00 7.348e-01 1.620 0.10528
num_voted_users NA NA NA NA
duration NA NA NA NA
budget NA NA NA NA
num_critic_for_reviews NA NA NA NA
movie_facebook_likes NA NA NA NA
num_voted_users:duration -1.775e-08 3.725e-09 -4.766 1.99e-06 ***
budget:num_critic_for_reviews -4.448e-12 4.546e-12 -0.979 0.32792
budget:movie_facebook_likes 4.913e-14 1.878e-14 2.616 0.00896 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7185 on 2305 degrees of freedom
Multiple R-squared: 0.5536, Adjusted R-squared: 0.5354
F-statistic: 30.41 on 94 and 2305 DF, p-value: < 2.2e-16
AIC(lm.fit6)
[1] 5316.5
AIC(lm.fit7)
[1] 5316.5
AIC(lm.fit8)
[1] 5565.419
AIC(lm.fit9)
[1] 5323.813
AIC(lm.fit10)
[1] 5319.263
# full4 based on lm.fit7 + interaction
full4<-lm(movie_train$imdb_score~poly(movie_train$num_voted_users,2)+poly(movie_train$num_critic_for_reviews,2)+poly(movie_train$num_user_for_reviews,2)+poly(movie_train$duration,2)+poly(movie_train$gross,2)+poly(movie_train$movie_facebook_likes,2)+poly(movie_train$budget,2)+factor(movie_train$title_year)+factor(movie_train$genres)+movie_train$duration*movie_train$num_voted_users+movie_train$num_voted_users*movie_train$num_user_for_reviews+movie_train$gross*movie_train$budget+movie_train$movie_facebook_likes*movie_train$budget,data=movie_train)
summary(full4)
Call:
lm(formula = movie_train$imdb_score ~ poly(movie_train$num_voted_users,
2) + poly(movie_train$num_critic_for_reviews, 2) + poly(movie_train$num_user_for_reviews,
2) + poly(movie_train$duration, 2) + poly(movie_train$gross,
2) + poly(movie_train$movie_facebook_likes, 2) + poly(movie_train$budget,
2) + factor(movie_train$title_year) + factor(movie_train$genres) +
movie_train$duration * movie_train$num_voted_users + movie_train$num_voted_users *
movie_train$num_user_for_reviews + movie_train$gross * movie_train$budget +
movie_train$movie_facebook_likes * movie_train$budget, data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-4.9090 -0.3562 0.0528 0.4319 2.2068
Coefficients: (7 not defined because of singularities)
Estimate Std. Error t value
(Intercept) 5.164e+00 7.231e-01 7.141
poly(movie_train$num_voted_users, 2)1 4.073e+01 5.060e+00 8.048
poly(movie_train$num_voted_users, 2)2 -1.639e+01 2.217e+00 -7.393
poly(movie_train$num_critic_for_reviews, 2)1 2.328e+01 1.760e+00 13.226
poly(movie_train$num_critic_for_reviews, 2)2 -1.098e+01 9.978e-01 -11.004
poly(movie_train$num_user_for_reviews, 2)1 -2.415e+01 2.295e+00 -10.525
poly(movie_train$num_user_for_reviews, 2)2 6.883e+00 1.594e+00 4.319
poly(movie_train$duration, 2)1 1.289e+01 1.106e+00 11.657
poly(movie_train$duration, 2)2 -2.758e+00 7.874e-01 -3.503
poly(movie_train$gross, 2)1 -4.760e+00 2.325e+00 -2.047
poly(movie_train$gross, 2)2 -1.611e+00 1.234e+00 -1.306
poly(movie_train$movie_facebook_likes, 2)1 -2.238e+00 1.819e+00 -1.230
poly(movie_train$movie_facebook_likes, 2)2 9.781e-01 9.307e-01 1.051
poly(movie_train$budget, 2)1 -1.530e+01 1.991e+00 -7.682
poly(movie_train$budget, 2)2 5.919e+00 1.125e+00 5.260
factor(movie_train$title_year)1929 1.849e+00 1.016e+00 1.820
factor(movie_train$title_year)1933 3.075e+00 1.016e+00 3.026
factor(movie_train$title_year)1935 3.257e+00 1.017e+00 3.203
factor(movie_train$title_year)1936 2.897e+00 1.018e+00 2.847
factor(movie_train$title_year)1937 1.393e+00 1.028e+00 1.355
factor(movie_train$title_year)1940 1.455e+00 1.025e+00 1.420
factor(movie_train$title_year)1946 1.499e+00 1.016e+00 1.475
factor(movie_train$title_year)1950 2.021e+00 1.018e+00 1.986
factor(movie_train$title_year)1952 1.090e+00 1.017e+00 1.072
factor(movie_train$title_year)1953 1.653e+00 8.807e-01 1.877
factor(movie_train$title_year)1954 2.256e+00 1.015e+00 2.223
factor(movie_train$title_year)1959 1.959e+00 1.018e+00 1.925
factor(movie_train$title_year)1960 1.930e+00 1.022e+00 1.890
factor(movie_train$title_year)1963 2.375e+00 1.020e+00 2.328
factor(movie_train$title_year)1964 1.861e+00 8.823e-01 2.109
factor(movie_train$title_year)1965 1.292e+00 8.080e-01 1.599
factor(movie_train$title_year)1969 1.781e+00 1.019e+00 1.747
factor(movie_train$title_year)1970 1.211e+00 8.337e-01 1.452
factor(movie_train$title_year)1971 1.316e+00 8.791e-01 1.497
factor(movie_train$title_year)1972 1.619e+00 8.854e-01 1.828
factor(movie_train$title_year)1973 2.071e+00 8.054e-01 2.571
factor(movie_train$title_year)1974 2.482e+00 7.912e-01 3.137
factor(movie_train$title_year)1975 7.986e-01 8.865e-01 0.901
factor(movie_train$title_year)1976 1.717e+00 1.017e+00 1.688
factor(movie_train$title_year)1977 1.771e+00 8.317e-01 2.129
factor(movie_train$title_year)1978 2.212e+00 7.796e-01 2.837
factor(movie_train$title_year)1979 1.420e+00 8.384e-01 1.694
factor(movie_train$title_year)1980 1.537e+00 7.615e-01 2.018
factor(movie_train$title_year)1981 1.320e+00 7.696e-01 1.715
factor(movie_train$title_year)1982 1.370e+00 7.477e-01 1.833
factor(movie_train$title_year)1983 1.837e+00 7.638e-01 2.405
factor(movie_train$title_year)1984 1.554e+00 7.408e-01 2.097
factor(movie_train$title_year)1985 1.578e+00 7.585e-01 2.081
factor(movie_train$title_year)1986 1.524e+00 7.430e-01 2.051
factor(movie_train$title_year)1987 1.186e+00 7.382e-01 1.606
factor(movie_train$title_year)1988 1.661e+00 7.349e-01 2.260
factor(movie_train$title_year)1989 1.469e+00 7.395e-01 1.986
factor(movie_train$title_year)1990 1.426e+00 7.396e-01 1.928
factor(movie_train$title_year)1991 1.491e+00 7.343e-01 2.030
factor(movie_train$title_year)1992 1.845e+00 7.329e-01 2.517
factor(movie_train$title_year)1993 1.641e+00 7.333e-01 2.237
factor(movie_train$title_year)1994 1.652e+00 7.292e-01 2.266
factor(movie_train$title_year)1995 1.487e+00 7.272e-01 2.045
factor(movie_train$title_year)1996 1.543e+00 7.248e-01 2.128
factor(movie_train$title_year)1997 1.400e+00 7.247e-01 1.932
factor(movie_train$title_year)1998 1.620e+00 7.251e-01 2.234
factor(movie_train$title_year)1999 1.373e+00 7.240e-01 1.896
factor(movie_train$title_year)2000 1.223e+00 7.237e-01 1.690
factor(movie_train$title_year)2001 1.323e+00 7.234e-01 1.829
factor(movie_train$title_year)2002 1.259e+00 7.234e-01 1.740
factor(movie_train$title_year)2003 1.087e+00 7.245e-01 1.500
factor(movie_train$title_year)2004 1.147e+00 7.242e-01 1.584
factor(movie_train$title_year)2005 1.123e+00 7.242e-01 1.551
factor(movie_train$title_year)2006 1.063e+00 7.243e-01 1.468
factor(movie_train$title_year)2007 8.490e-01 7.246e-01 1.172
factor(movie_train$title_year)2008 6.917e-01 7.244e-01 0.955
factor(movie_train$title_year)2009 6.678e-01 7.247e-01 0.921
factor(movie_train$title_year)2010 5.503e-01 7.251e-01 0.759
factor(movie_train$title_year)2011 4.011e-01 7.257e-01 0.553
factor(movie_train$title_year)2012 5.863e-01 7.254e-01 0.808
factor(movie_train$title_year)2013 4.917e-01 7.263e-01 0.677
factor(movie_train$title_year)2014 6.018e-01 7.262e-01 0.829
factor(movie_train$title_year)2015 7.030e-01 7.269e-01 0.967
factor(movie_train$title_year)2016 1.235e+00 7.328e-01 1.685
factor(movie_train$genres)Adventure 3.919e-01 5.880e-02 6.666
factor(movie_train$genres)Animation 8.769e-01 1.371e-01 6.398
factor(movie_train$genres)Biography 6.461e-01 8.028e-02 8.049
factor(movie_train$genres)Comedy 1.341e-01 4.655e-02 2.882
factor(movie_train$genres)Crime 4.034e-01 6.895e-02 5.851
factor(movie_train$genres)Documentary 1.248e+00 1.630e-01 7.655
factor(movie_train$genres)Drama 5.505e-01 5.264e-02 10.458
factor(movie_train$genres)Family 6.494e-01 8.045e-01 0.807
factor(movie_train$genres)Fantasy -2.260e-01 1.485e-01 -1.521
factor(movie_train$genres)Horror -4.047e-01 8.601e-02 -4.705
factor(movie_train$genres)Musical NA NA NA
factor(movie_train$genres)Mystery 1.896e-01 1.970e-01 0.963
factor(movie_train$genres)Romance 8.640e-01 5.106e-01 1.692
factor(movie_train$genres)Sci-Fi 2.279e-01 2.993e-01 0.761
factor(movie_train$genres)Thriller -5.477e-02 7.242e-01 -0.076
factor(movie_train$genres)Western 1.037e+00 7.566e-01 1.371
movie_train$duration NA NA NA
movie_train$num_voted_users NA NA NA
movie_train$num_user_for_reviews NA NA NA
movie_train$gross NA NA NA
movie_train$budget NA NA NA
movie_train$movie_facebook_likes NA NA NA
movie_train$duration:movie_train$num_voted_users -1.780e-08 3.746e-09 -4.752
movie_train$num_voted_users:movie_train$num_user_for_reviews 9.723e-10 3.019e-10 3.220
movie_train$gross:movie_train$budget 9.456e-18 6.378e-18 1.483
movie_train$budget:movie_train$movie_facebook_likes 4.071e-14 1.393e-14 2.923
Pr(>|t|)
(Intercept) 1.23e-12 ***
poly(movie_train$num_voted_users, 2)1 1.33e-15 ***
poly(movie_train$num_voted_users, 2)2 2.00e-13 ***
poly(movie_train$num_critic_for_reviews, 2)1 < 2e-16 ***
poly(movie_train$num_critic_for_reviews, 2)2 < 2e-16 ***
poly(movie_train$num_user_for_reviews, 2)1 < 2e-16 ***
poly(movie_train$num_user_for_reviews, 2)2 1.64e-05 ***
poly(movie_train$duration, 2)1 < 2e-16 ***
poly(movie_train$duration, 2)2 0.000469 ***
poly(movie_train$gross, 2)1 0.040726 *
poly(movie_train$gross, 2)2 0.191836
poly(movie_train$movie_facebook_likes, 2)1 0.218743
poly(movie_train$movie_facebook_likes, 2)2 0.293399
poly(movie_train$budget, 2)1 2.31e-14 ***
poly(movie_train$budget, 2)2 1.57e-07 ***
factor(movie_train$title_year)1929 0.068852 .
factor(movie_train$title_year)1933 0.002510 **
factor(movie_train$title_year)1935 0.001377 **
factor(movie_train$title_year)1936 0.004457 **
factor(movie_train$title_year)1937 0.175551
factor(movie_train$title_year)1940 0.155717
factor(movie_train$title_year)1946 0.140466
factor(movie_train$title_year)1950 0.047201 *
factor(movie_train$title_year)1952 0.283661
factor(movie_train$title_year)1953 0.060692 .
factor(movie_train$title_year)1954 0.026309 *
factor(movie_train$title_year)1959 0.054345 .
factor(movie_train$title_year)1960 0.058933 .
factor(movie_train$title_year)1963 0.020018 *
factor(movie_train$title_year)1964 0.035033 *
factor(movie_train$title_year)1965 0.109887
factor(movie_train$title_year)1969 0.080769 .
factor(movie_train$title_year)1970 0.146518
factor(movie_train$title_year)1971 0.134566
factor(movie_train$title_year)1972 0.067655 .
factor(movie_train$title_year)1973 0.010192 *
factor(movie_train$title_year)1974 0.001728 **
factor(movie_train$title_year)1975 0.367732
factor(movie_train$title_year)1976 0.091508 .
factor(movie_train$title_year)1977 0.033373 *
factor(movie_train$title_year)1978 0.004593 **
factor(movie_train$title_year)1979 0.090465 .
factor(movie_train$title_year)1980 0.043658 *
factor(movie_train$title_year)1981 0.086529 .
factor(movie_train$title_year)1982 0.066971 .
factor(movie_train$title_year)1983 0.016234 *
factor(movie_train$title_year)1984 0.036058 *
factor(movie_train$title_year)1985 0.037566 *
factor(movie_train$title_year)1986 0.040377 *
factor(movie_train$title_year)1987 0.108376
factor(movie_train$title_year)1988 0.023885 *
factor(movie_train$title_year)1989 0.047156 *
factor(movie_train$title_year)1990 0.053951 .
factor(movie_train$title_year)1991 0.042469 *
factor(movie_train$title_year)1992 0.011899 *
factor(movie_train$title_year)1993 0.025357 *
factor(movie_train$title_year)1994 0.023550 *
factor(movie_train$title_year)1995 0.040989 *
factor(movie_train$title_year)1996 0.033421 *
factor(movie_train$title_year)1997 0.053429 .
factor(movie_train$title_year)1998 0.025605 *
factor(movie_train$title_year)1999 0.058117 .
factor(movie_train$title_year)2000 0.091214 .
factor(movie_train$title_year)2001 0.067463 .
factor(movie_train$title_year)2002 0.082005 .
factor(movie_train$title_year)2003 0.133769
factor(movie_train$title_year)2004 0.113317
factor(movie_train$title_year)2005 0.121051
factor(movie_train$title_year)2006 0.142278
factor(movie_train$title_year)2007 0.241441
factor(movie_train$title_year)2008 0.339788
factor(movie_train$title_year)2009 0.356924
factor(movie_train$title_year)2010 0.447953
factor(movie_train$title_year)2011 0.580541
factor(movie_train$title_year)2012 0.419083
factor(movie_train$title_year)2013 0.498466
factor(movie_train$title_year)2014 0.407378
factor(movie_train$title_year)2015 0.333599
factor(movie_train$title_year)2016 0.092091 .
factor(movie_train$genres)Adventure 3.28e-11 ***
factor(movie_train$genres)Animation 1.90e-10 ***
factor(movie_train$genres)Biography 1.33e-15 ***
factor(movie_train$genres)Comedy 0.003992 **
factor(movie_train$genres)Crime 5.57e-09 ***
factor(movie_train$genres)Documentary 2.82e-14 ***
factor(movie_train$genres)Drama < 2e-16 ***
factor(movie_train$genres)Family 0.419662
factor(movie_train$genres)Fantasy 0.128341
factor(movie_train$genres)Horror 2.69e-06 ***
factor(movie_train$genres)Musical NA
factor(movie_train$genres)Mystery 0.335873
factor(movie_train$genres)Romance 0.090777 .
factor(movie_train$genres)Sci-Fi 0.446584
factor(movie_train$genres)Thriller 0.939719
factor(movie_train$genres)Western 0.170522
movie_train$duration NA
movie_train$num_voted_users NA
movie_train$num_user_for_reviews NA
movie_train$gross NA
movie_train$budget NA
movie_train$movie_facebook_likes NA
movie_train$duration:movie_train$num_voted_users 2.14e-06 ***
movie_train$num_voted_users:movie_train$num_user_for_reviews 0.001298 **
movie_train$gross:movie_train$budget 0.138326
movie_train$budget:movie_train$movie_facebook_likes 0.003497 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7166 on 2302 degrees of freedom
Multiple R-squared: 0.5565, Adjusted R-squared: 0.5378
F-statistic: 29.78 on 97 and 2302 DF, p-value: < 2.2e-16
null1<-lm(movie_train$imdb_score~1)
step(null1,scope =list(lower=null,upper=full4),direction='forward')
Start: AIC=254.07
movie_train$imdb_score ~ 1
Df Sum of Sq RSS AIC
+ poly(movie_train$num_voted_users, 2) 2 754.44 1911.3 -540.39
+ movie_train$num_voted_users 1 674.61 1991.2 -444.18
+ poly(movie_train$duration, 2) 2 394.29 2271.5 -126.07
+ poly(movie_train$num_user_for_reviews, 2) 2 364.26 2301.5 -94.55
+ movie_train$duration 1 356.78 2309.0 -88.76
+ poly(movie_train$num_critic_for_reviews, 2) 2 337.12 2328.7 -66.42
+ movie_train$num_user_for_reviews 1 309.64 2356.2 -40.26
+ poly(movie_train$movie_facebook_likes, 2) 2 252.99 2412.8 18.76
+ factor(movie_train$genres) 16 277.71 2388.1 22.04
+ movie_train$movie_facebook_likes 1 222.23 2443.6 47.17
+ poly(movie_train$gross, 2) 2 193.93 2471.9 76.80
+ movie_train$gross 1 188.07 2477.7 80.49
+ factor(movie_train$title_year) 64 170.66 2495.1 223.29
+ poly(movie_train$budget, 2) 2 27.64 2638.2 233.06
+ movie_train$budget 1 9.96 2655.8 247.09
<none> 2665.8 254.07
Step: AIC=-540.39
movie_train$imdb_score ~ poly(movie_train$num_voted_users, 2)
Df Sum of Sq RSS AIC
+ factor(movie_train$genres) 16 277.351 1634.0 -884.66
+ poly(movie_train$budget, 2) 2 132.011 1779.3 -708.15
+ movie_train$budget 1 108.860 1802.5 -679.13
+ poly(movie_train$duration, 2) 2 100.178 1811.2 -665.60
+ movie_train$duration 1 92.678 1818.7 -657.68
+ factor(movie_train$title_year) 64 142.842 1768.5 -598.81
+ poly(movie_train$gross, 2) 2 37.844 1873.5 -584.39
+ movie_train$gross 1 34.683 1876.7 -582.34
+ poly(movie_train$num_user_for_reviews, 2) 2 26.051 1885.3 -569.33
+ movie_train$num_user_for_reviews 1 21.865 1889.5 -566.00
+ poly(movie_train$num_critic_for_reviews, 2) 2 5.313 1906.0 -543.07
<none> 1911.3 -540.39
+ poly(movie_train$movie_facebook_likes, 2) 2 1.763 1909.6 -538.60
+ movie_train$movie_facebook_likes 1 0.003 1911.3 -538.39
Step: AIC=-884.66
movie_train$imdb_score ~ poly(movie_train$num_voted_users, 2) +
factor(movie_train$genres)
Df Sum of Sq RSS AIC
+ poly(movie_train$budget, 2) 2 72.472 1561.5 -989.54
+ factor(movie_train$title_year) 63 148.420 1485.6 -987.21
+ movie_train$budget 1 49.564 1584.4 -956.59
+ poly(movie_train$duration, 2) 2 49.331 1584.7 -954.24
+ movie_train$duration 1 42.894 1591.1 -946.51
+ poly(movie_train$num_user_for_reviews, 2) 2 17.373 1616.6 -906.32
+ movie_train$num_user_for_reviews 1 12.459 1621.5 -901.03
+ movie_train$gross 1 10.176 1623.8 -897.66
+ poly(movie_train$gross, 2) 2 10.517 1623.5 -896.16
+ poly(movie_train$num_critic_for_reviews, 2) 2 10.516 1623.5 -896.16
<none> 1634.0 -884.66
+ movie_train$movie_facebook_likes 1 0.001 1634.0 -882.66
+ poly(movie_train$movie_facebook_likes, 2) 2 1.349 1632.6 -882.65
Step: AIC=-989.54
movie_train$imdb_score ~ poly(movie_train$num_voted_users, 2) +
factor(movie_train$genres) + poly(movie_train$budget, 2)
Df Sum of Sq RSS AIC
+ poly(movie_train$duration, 2) 2 87.910 1473.6 -1124.61
+ movie_train$duration 1 73.309 1488.2 -1102.95
+ factor(movie_train$title_year) 63 116.908 1444.6 -1050.31
+ poly(movie_train$num_critic_for_reviews, 2) 2 25.446 1536.1 -1024.97
+ poly(movie_train$num_user_for_reviews, 2) 2 10.208 1551.3 -1001.28
+ movie_train$num_user_for_reviews 1 6.765 1554.8 -997.96
+ poly(movie_train$gross, 2) 2 3.085 1558.4 -990.29
<none> 1561.5 -989.54
+ movie_train$movie_facebook_likes 1 0.194 1561.3 -987.84
+ movie_train$gross 1 0.004 1561.5 -987.55
+ poly(movie_train$movie_facebook_likes, 2) 2 1.301 1560.2 -987.54
Step: AIC=-1124.61
movie_train$imdb_score ~ poly(movie_train$num_voted_users, 2) +
factor(movie_train$genres) + poly(movie_train$budget, 2) +
poly(movie_train$duration, 2)
Df Sum of Sq RSS AIC
+ poly(movie_train$num_critic_for_reviews, 2) 2 30.237 1443.4 -1170.4
+ poly(movie_train$num_user_for_reviews, 2) 2 19.801 1453.8 -1153.1
+ movie_train$num_user_for_reviews 1 12.821 1460.8 -1143.6
+ factor(movie_train$title_year) 63 82.648 1391.0 -1137.1
+ poly(movie_train$gross, 2) 2 2.549 1471.1 -1124.8
<none> 1473.6 -1124.6
+ poly(movie_train$movie_facebook_likes, 2) 2 2.110 1471.5 -1124.0
+ movie_train$gross 1 0.027 1473.6 -1122.7
+ movie_train$movie_facebook_likes 1 0.026 1473.6 -1122.7
Step: AIC=-1170.37
movie_train$imdb_score ~ poly(movie_train$num_voted_users, 2) +
factor(movie_train$genres) + poly(movie_train$budget, 2) +
poly(movie_train$duration, 2) + poly(movie_train$num_critic_for_reviews,
2)
Df Sum of Sq RSS AIC
+ factor(movie_train$title_year) 63 147.048 1296.3 -1302.2
+ poly(movie_train$num_user_for_reviews, 2) 2 31.782 1411.6 -1219.8
+ movie_train$num_user_for_reviews 1 20.024 1423.3 -1201.9
<none> 1443.4 -1170.4
+ poly(movie_train$gross, 2) 2 1.855 1441.5 -1169.5
+ movie_train$gross 1 0.308 1443.1 -1168.9
+ movie_train$movie_facebook_likes 1 0.035 1443.3 -1168.4
+ poly(movie_train$movie_facebook_likes, 2) 2 0.357 1443.0 -1167.0
Step: AIC=-1302.25
movie_train$imdb_score ~ poly(movie_train$num_voted_users, 2) +
factor(movie_train$genres) + poly(movie_train$budget, 2) +
poly(movie_train$duration, 2) + poly(movie_train$num_critic_for_reviews,
2) + factor(movie_train$title_year)
Df Sum of Sq RSS AIC
+ poly(movie_train$num_user_for_reviews, 2) 2 89.846 1206.5 -1470.6
+ movie_train$num_user_for_reviews 1 48.849 1247.5 -1392.4
<none> 1296.3 -1302.2
+ movie_train$movie_facebook_likes 1 0.404 1295.9 -1301.0
+ movie_train$gross 1 0.099 1296.2 -1300.4
+ poly(movie_train$movie_facebook_likes, 2) 2 0.415 1295.9 -1299.0
+ poly(movie_train$gross, 2) 2 0.109 1296.2 -1298.5
Step: AIC=-1470.63
movie_train$imdb_score ~ poly(movie_train$num_voted_users, 2) +
factor(movie_train$genres) + poly(movie_train$budget, 2) +
poly(movie_train$duration, 2) + poly(movie_train$num_critic_for_reviews,
2) + factor(movie_train$title_year) + poly(movie_train$num_user_for_reviews,
2)
Df Sum of Sq RSS AIC
<none> 1206.5 -1470.6
+ movie_train$gross 1 0.61424 1205.9 -1469.8
+ movie_train$movie_facebook_likes 1 0.41132 1206.1 -1469.5
+ poly(movie_train$movie_facebook_likes, 2) 2 1.36811 1205.1 -1469.3
+ poly(movie_train$gross, 2) 2 0.62911 1205.8 -1467.9
Call:
lm(formula = movie_train$imdb_score ~ poly(movie_train$num_voted_users,
2) + factor(movie_train$genres) + poly(movie_train$budget,
2) + poly(movie_train$duration, 2) + poly(movie_train$num_critic_for_reviews,
2) + factor(movie_train$title_year) + poly(movie_train$num_user_for_reviews,
2))
Coefficients:
(Intercept) poly(movie_train$num_voted_users, 2)1
5.098420 29.498808
poly(movie_train$num_voted_users, 2)2 factor(movie_train$genres)Adventure
-11.855671 0.393684
factor(movie_train$genres)Animation factor(movie_train$genres)Biography
0.829702 0.622074
factor(movie_train$genres)Comedy factor(movie_train$genres)Crime
0.131195 0.404458
factor(movie_train$genres)Documentary factor(movie_train$genres)Drama
1.240420 0.556859
factor(movie_train$genres)Family factor(movie_train$genres)Fantasy
0.002683 -0.249253
factor(movie_train$genres)Horror factor(movie_train$genres)Musical
-0.427330 1.835826
factor(movie_train$genres)Mystery factor(movie_train$genres)Romance
0.200923 0.851537
factor(movie_train$genres)Sci-Fi factor(movie_train$genres)Thriller
0.254186 -0.053845
factor(movie_train$genres)Western poly(movie_train$budget, 2)1
1.051192 -11.926587
poly(movie_train$budget, 2)2 poly(movie_train$duration, 2)1
7.264125 10.343521
poly(movie_train$duration, 2)2 poly(movie_train$num_critic_for_reviews, 2)1
-2.988399 25.211496
poly(movie_train$num_critic_for_reviews, 2)2 factor(movie_train$title_year)1929
-10.121103 NA
factor(movie_train$title_year)1933 factor(movie_train$title_year)1935
3.045777 3.216109
factor(movie_train$title_year)1936 factor(movie_train$title_year)1937
2.990859 1.338009
factor(movie_train$title_year)1940 factor(movie_train$title_year)1946
1.484591 1.562392
factor(movie_train$title_year)1950 factor(movie_train$title_year)1952
2.040794 1.161149
factor(movie_train$title_year)1953 factor(movie_train$title_year)1954
1.642145 2.314379
factor(movie_train$title_year)1959 factor(movie_train$title_year)1960
2.000891 2.154797
factor(movie_train$title_year)1963 factor(movie_train$title_year)1964
2.537574 1.896053
factor(movie_train$title_year)1965 factor(movie_train$title_year)1969
1.383860 1.822409
factor(movie_train$title_year)1970 factor(movie_train$title_year)1971
1.319768 1.359876
factor(movie_train$title_year)1972 factor(movie_train$title_year)1973
1.334121 2.027163
factor(movie_train$title_year)1974 factor(movie_train$title_year)1975
2.213420 0.575191
factor(movie_train$title_year)1976 factor(movie_train$title_year)1977
1.746322 1.841745
factor(movie_train$title_year)1978 factor(movie_train$title_year)1979
2.198729 1.195766
factor(movie_train$title_year)1980 factor(movie_train$title_year)1981
1.554244 1.294995
factor(movie_train$title_year)1982 factor(movie_train$title_year)1983
1.371788 1.798019
factor(movie_train$title_year)1984 factor(movie_train$title_year)1985
1.551152 1.578051
factor(movie_train$title_year)1986 factor(movie_train$title_year)1987
1.514418 1.170771
factor(movie_train$title_year)1988 factor(movie_train$title_year)1989
1.658916 1.478776
factor(movie_train$title_year)1990 factor(movie_train$title_year)1991
1.397874 1.465037
factor(movie_train$title_year)1992 factor(movie_train$title_year)1993
1.867715 1.643761
factor(movie_train$title_year)1994 factor(movie_train$title_year)1995
1.646360 1.487183
factor(movie_train$title_year)1996 factor(movie_train$title_year)1997
1.549979 1.396281
factor(movie_train$title_year)1998 factor(movie_train$title_year)1999
1.631049 1.373484
factor(movie_train$title_year)2000 factor(movie_train$title_year)2001
1.234802 1.327942
factor(movie_train$title_year)2002 factor(movie_train$title_year)2003
1.238369 1.090093
factor(movie_train$title_year)2004 factor(movie_train$title_year)2005
1.152712 1.122846
factor(movie_train$title_year)2006 factor(movie_train$title_year)2007
1.056800 0.846197
factor(movie_train$title_year)2008 factor(movie_train$title_year)2009
0.689675 0.641725
factor(movie_train$title_year)2010 factor(movie_train$title_year)2011
0.529803 0.357597
factor(movie_train$title_year)2012 factor(movie_train$title_year)2013
0.531869 0.440646
factor(movie_train$title_year)2014 factor(movie_train$title_year)2015
0.571869 0.676143
factor(movie_train$title_year)2016 poly(movie_train$num_user_for_reviews, 2)1
1.225376 -18.936838
poly(movie_train$num_user_for_reviews, 2)2
10.327257
Last try:
lm.fit11<-lm(imdb_score~poly(movie_train$num_voted_users, 2) +
factor(movie_train$genres) + poly(movie_train$budget, 2) +
poly(movie_train$duration, 2) + poly(movie_train$num_critic_for_reviews,
2) + factor(movie_train$title_year) + poly(movie_train$num_user_for_reviews,
2),data=movie_train)
summary(lm.fit11)
Call:
lm(formula = imdb_score ~ poly(movie_train$num_voted_users, 2) +
factor(movie_train$genres) + poly(movie_train$budget, 2) +
poly(movie_train$duration, 2) + poly(movie_train$num_critic_for_reviews,
2) + factor(movie_train$title_year) + poly(movie_train$num_user_for_reviews,
2), data = movie_train)
Residuals:
Min 1Q Median 3Q Max
-4.9895 -0.3466 0.0622 0.4302 2.2011
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.098420 0.727088 7.012 3.07e-12 ***
poly(movie_train$num_voted_users, 2)1 29.498808 1.397911 21.102 < 2e-16 ***
poly(movie_train$num_voted_users, 2)2 -11.855671 1.146126 -10.344 < 2e-16 ***
factor(movie_train$genres)Adventure 0.393684 0.058662 6.711 2.42e-11 ***
factor(movie_train$genres)Animation 0.829702 0.137386 6.039 1.80e-09 ***
factor(movie_train$genres)Biography 0.622074 0.080768 7.702 1.97e-14 ***
factor(movie_train$genres)Comedy 0.131195 0.046566 2.817 0.004883 **
factor(movie_train$genres)Crime 0.404458 0.069213 5.844 5.83e-09 ***
factor(movie_train$genres)Documentary 1.240420 0.164052 7.561 5.73e-14 ***
factor(movie_train$genres)Drama 0.556859 0.052837 10.539 < 2e-16 ***
factor(movie_train$genres)Family 0.002683 0.752225 0.004 0.997155
factor(movie_train$genres)Fantasy -0.249253 0.149466 -1.668 0.095527 .
factor(movie_train$genres)Horror -0.427330 0.086115 -4.962 7.47e-07 ***
factor(movie_train$genres)Musical 1.835826 1.024536 1.792 0.073286 .
factor(movie_train$genres)Mystery 0.200923 0.198298 1.013 0.311051
factor(movie_train$genres)Romance 0.851537 0.514787 1.654 0.098232 .
factor(movie_train$genres)Sci-Fi 0.254186 0.301731 0.842 0.399637
factor(movie_train$genres)Thriller -0.053845 0.729925 -0.074 0.941201
factor(movie_train$genres)Western 1.051192 0.762881 1.378 0.168360
poly(movie_train$budget, 2)1 -11.926587 1.043912 -11.425 < 2e-16 ***
poly(movie_train$budget, 2)2 7.264125 0.799174 9.090 < 2e-16 ***
poly(movie_train$duration, 2)1 10.343521 0.937637 11.031 < 2e-16 ***
poly(movie_train$duration, 2)2 -2.988399 0.789058 -3.787 0.000156 ***
poly(movie_train$num_critic_for_reviews, 2)1 25.211496 1.621841 15.545 < 2e-16 ***
poly(movie_train$num_critic_for_reviews, 2)2 -10.121103 0.841316 -12.030 < 2e-16 ***
factor(movie_train$title_year)1929 NA NA NA NA
factor(movie_train$title_year)1933 3.045777 1.024897 2.972 0.002991 **
factor(movie_train$title_year)1935 3.216109 1.025183 3.137 0.001728 **
factor(movie_train$title_year)1936 2.990859 1.025722 2.916 0.003581 **
factor(movie_train$title_year)1937 1.338009 1.033748 1.294 0.195681
factor(movie_train$title_year)1940 1.484591 1.033118 1.437 0.150853
factor(movie_train$title_year)1946 1.562392 1.024862 1.524 0.127523
factor(movie_train$title_year)1950 2.040794 1.026307 1.988 0.046876 *
factor(movie_train$title_year)1952 1.161149 1.025144 1.133 0.257471
factor(movie_train$title_year)1953 1.642145 0.888086 1.849 0.064573 .
factor(movie_train$title_year)1954 2.314379 1.023041 2.262 0.023774 *
factor(movie_train$title_year)1959 2.000891 1.026180 1.950 0.051316 .
factor(movie_train$title_year)1960 2.154797 1.028967 2.094 0.036357 *
factor(movie_train$title_year)1963 2.537574 1.027951 2.469 0.013637 *
factor(movie_train$title_year)1964 1.896053 0.889150 2.132 0.033076 *
factor(movie_train$title_year)1965 1.383860 0.814027 1.700 0.089262 .
factor(movie_train$title_year)1969 1.822409 1.027257 1.774 0.076186 .
factor(movie_train$title_year)1970 1.319768 0.840411 1.570 0.116463
factor(movie_train$title_year)1971 1.359876 0.886404 1.534 0.125130
factor(movie_train$title_year)1972 1.334121 0.890973 1.497 0.134432
factor(movie_train$title_year)1973 2.027163 0.810981 2.500 0.012501 *
factor(movie_train$title_year)1974 2.213420 0.796141 2.780 0.005477 **
factor(movie_train$title_year)1975 0.575191 0.891734 0.645 0.518975
factor(movie_train$title_year)1976 1.746322 1.025256 1.703 0.088646 .
factor(movie_train$title_year)1977 1.841745 0.838362 2.197 0.028131 *
factor(movie_train$title_year)1978 2.198729 0.785889 2.798 0.005189 **
factor(movie_train$title_year)1979 1.195766 0.843116 1.418 0.156247
factor(movie_train$title_year)1980 1.554244 0.767491 2.025 0.042972 *
factor(movie_train$title_year)1981 1.294995 0.775979 1.669 0.095282 .
factor(movie_train$title_year)1982 1.371788 0.753860 1.820 0.068936 .
factor(movie_train$title_year)1983 1.798019 0.770053 2.335 0.019632 *
factor(movie_train$title_year)1984 1.551152 0.746804 2.077 0.037907 *
factor(movie_train$title_year)1985 1.578051 0.764752 2.063 0.039179 *
factor(movie_train$title_year)1986 1.514418 0.749166 2.021 0.043346 *
factor(movie_train$title_year)1987 1.170771 0.744344 1.573 0.115881
factor(movie_train$title_year)1988 1.658916 0.741002 2.239 0.025267 *
factor(movie_train$title_year)1989 1.478776 0.745673 1.983 0.047470 *
factor(movie_train$title_year)1990 1.397874 0.745699 1.875 0.060976 .
factor(movie_train$title_year)1991 1.465037 0.740459 1.979 0.047985 *
factor(movie_train$title_year)1992 1.867715 0.739056 2.527 0.011565 *
factor(movie_train$title_year)1993 1.643761 0.739444 2.223 0.026314 *
factor(movie_train$title_year)1994 1.646360 0.735328 2.239 0.025254 *
factor(movie_train$title_year)1995 1.487183 0.733203 2.028 0.042640 *
factor(movie_train$title_year)1996 1.549979 0.730900 2.121 0.034058 *
factor(movie_train$title_year)1997 1.396281 0.730690 1.911 0.056140 .
factor(movie_train$title_year)1998 1.631049 0.731122 2.231 0.025784 *
factor(movie_train$title_year)1999 1.373484 0.730018 1.881 0.060038 .
factor(movie_train$title_year)2000 1.234802 0.729679 1.692 0.090732 .
factor(movie_train$title_year)2001 1.327942 0.729396 1.821 0.068796 .
factor(movie_train$title_year)2002 1.238369 0.729336 1.698 0.089654 .
factor(movie_train$title_year)2003 1.090093 0.730478 1.492 0.135757
factor(movie_train$title_year)2004 1.152712 0.730106 1.579 0.114512
factor(movie_train$title_year)2005 1.122846 0.730114 1.538 0.124209
factor(movie_train$title_year)2006 1.056800 0.730186 1.447 0.147948
factor(movie_train$title_year)2007 0.846197 0.730456 1.158 0.246800
factor(movie_train$title_year)2008 0.689675 0.730245 0.944 0.345042
factor(movie_train$title_year)2009 0.641725 0.730619 0.878 0.379856
factor(movie_train$title_year)2010 0.529803 0.731045 0.725 0.468697
factor(movie_train$title_year)2011 0.357597 0.731579 0.489 0.625028
factor(movie_train$title_year)2012 0.531869 0.731273 0.727 0.467104
factor(movie_train$title_year)2013 0.440646 0.731874 0.602 0.547180
factor(movie_train$title_year)2014 0.571869 0.731858 0.781 0.434651
factor(movie_train$title_year)2015 0.676143 0.732374 0.923 0.355989
factor(movie_train$title_year)2016 1.225376 0.738008 1.660 0.096973 .
poly(movie_train$num_user_for_reviews, 2)1 -18.936838 1.529298 -12.383 < 2e-16 ***
poly(movie_train$num_user_for_reviews, 2)2 10.327257 1.165642 8.860 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7227 on 2310 degrees of freedom
Multiple R-squared: 0.5474, Adjusted R-squared: 0.53
F-statistic: 31.39 on 89 and 2310 DF, p-value: < 2.2e-16
AIC(lm.fit11)
[1] 5342.274
Conclusion: lm.fit 7 is the best. ###################################### Random forest ###################################################