Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
You must follow the instructions below to get credits for this assignment.
Tidy Tuesday screencast: predicting horror movie ratings
October 22, 2019
Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?
The source of the data is IMDB by way of Kaggle. The row covers all the movie’s information like the movie name, the release date, and the genre. There are 12 variables that are title, genres, release date, release country, movie rating, review rating, movie run time, plot, cast, language, filming locations, and budget.
Hint: For example, importing data, understanding the data, data exploration, etc.
The first step was importing all the information and then googling which variables were important. For example, the release date was not needed, so he got rid of it. Then, he made graphs to compare and try to find correlations between different variables like budget and movie rating. After finding the variables that actually affected the horror movie rating, the last step was to make a graph to show how the variables affect the rating.
In the video, he used boxed graphs like how we did in class. But, other than that everything he had done in the video seemed new to me.
The major finding of the analysis is that certain words whether it be in the genre, plot, or even the release country, and sometimes the actor, are predictors of horror movie ratings.
It was really interesting to see how some variables had a big affect on the movie rating while others had very little affect. For example, I would expect budget to have had a big affect on the movie ratings with smaller budget movies having lower ratings, but it was found that budget actually had no affect.