This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
#Reading the data into R.
moviedata = read.csv("PROJECT/PROJECT/data/imbd_rating.csv")
head(moviedata)
#Cleaning the 'title' column head.
colnames(moviedata)[1] <- "title"
head(moviedata)
#Cleaning the alt-codes, maybe
titledata = moviedata$title
cleantitle = moviedata$title
class(cleantitle)
[1] "factor"
cleantitle = sub("Ã<U+0082>", "", cleantitle, fixed=TRUE)
cleantitle = sub("Ã<U+0083>", "", cleantitle, fixed=TRUE)
cleantitle = sub(",", "", cleantitle, fixed=TRUE)
class(cleantitle)
[1] "character"
head(cleantitle)
[1] "AvatarÃÂ " "Pirates of the Caribbean: At World's EndÃÂ "
[3] "SpectreÃÂ " "The Dark Knight RisesÃÂ "
[5] "Star Wars: Episode VII - The Force AwakensÃÂ " "John CarterÃÂ "
class(cleantitle)
[1] "character"
head(cleantitle)
[1] "AvatarÃÂ " "Pirates of the Caribbean: At World's EndÃÂ "
[3] "SpectreÃÂ " "The Dark Knight RisesÃÂ "
[5] "Star Wars: Episode VII - The Force AwakensÃÂ " "John CarterÃÂ "
genredata = moviedata$genres
cleangenre = genredata
cleangenre = sub("|", "", cleangenre)
class(cleangenre)
[1] "character"
head(genredata)
[1] Action|Adventure|Fantasy|Sci-Fi Action|Adventure|Fantasy Action|Adventure|Thriller Action|Thriller
[5] Documentary Action|Adventure|Sci-Fi
914 Levels: Action Action|Adventure ... Western
summary(moviedata)
title genres director actor1 actor2
Ben-HurÃÂ : 3 Drama : 236 : 104 Robert De Niro : 49 Morgan Freeman : 20
HalloweenÃÂ : 3 Comedy : 209 Steven Spielberg: 26 Johnny Depp : 41 Charlize Theron: 15
HomeÃÂ : 3 Comedy|Drama : 191 Woody Allen : 22 Nicolas Cage : 33 Brad Pitt : 14
King KongÃÂ : 3 Comedy|Drama|Romance: 187 Clint Eastwood : 20 J.K. Simmons : 31 : 13
PanÃÂ : 3 Comedy|Romance : 158 Martin Scorsese : 20 Bruce Willis : 30 James Franco : 11
The Fast and the FuriousÃÂ : 3 Drama|Romance : 152 Ridley Scott : 17 Denzel Washington: 30 Meryl Streep : 11
(Other) :5025 (Other) :3910 (Other) :4834 (Other) :4829 (Other) :4959
actor3 length budget director_fb_likes actor1_fb_likes actor2_fb_likes actor3_fb_likes total_cast_likes
: 23 Min. : 7.0 Min. :2.180e+02 Min. : 0.0 Min. : 0 Min. : 0 Min. : 0.0 Min. : 0
Ben Mendelsohn: 8 1st Qu.: 93.0 1st Qu.:6.000e+06 1st Qu.: 7.0 1st Qu.: 614 1st Qu.: 281 1st Qu.: 133.0 1st Qu.: 1411
John Heard : 8 Median :103.0 Median :2.000e+07 Median : 49.0 Median : 988 Median : 595 Median : 371.5 Median : 3090
Steve Coogan : 8 Mean :107.2 Mean :3.975e+07 Mean : 686.5 Mean : 6560 Mean : 1652 Mean : 645.0 Mean : 9699
Anne Hathaway : 7 3rd Qu.:118.0 3rd Qu.:4.500e+07 3rd Qu.: 194.5 3rd Qu.: 11000 3rd Qu.: 918 3rd Qu.: 636.0 3rd Qu.: 13756
Jon Gries : 7 Max. :511.0 Max. :1.222e+10 Max. :23000.0 Max. :640000 Max. :137000 Max. :23000.0 Max. :656730
(Other) :4982 NA's :15 NA's :492 NA's :104 NA's :7 NA's :13 NA's :23
fb_likes critic_reviews users_reviews users_votes score aspect_ratio gross year
Min. : 0 Min. : 1.0 Min. : 1.0 Min. : 5 Min. :1.600 Min. : 1.18 Min. : 162 Min. :1916
1st Qu.: 0 1st Qu.: 50.0 1st Qu.: 65.0 1st Qu.: 8594 1st Qu.:5.800 1st Qu.: 1.85 1st Qu.: 5340988 1st Qu.:1999
Median : 166 Median :110.0 Median : 156.0 Median : 34359 Median :6.600 Median : 2.35 Median : 25517500 Median :2005
Mean : 7526 Mean :140.2 Mean : 272.8 Mean : 83668 Mean :6.442 Mean : 2.22 Mean : 48468408 Mean :2002
3rd Qu.: 3000 3rd Qu.:195.0 3rd Qu.: 326.0 3rd Qu.: 96309 3rd Qu.:7.200 3rd Qu.: 2.35 3rd Qu.: 62309438 3rd Qu.:2011
Max. :349000 Max. :813.0 Max. :5060.0 Max. :1689764 Max. :9.500 Max. :16.00 Max. :760505847 Max. :2016
NA's :50 NA's :21 NA's :329 NA's :884 NA's :108