Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
You must follow the instructions below to get credits for this assignment.
The title of the Tidy Tuesday screencast that I picked for the assignment is “predicting horror movie ratings”.
This Tidy Tuesday was published on October 22, 2019.
Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?
The source of the data comes from the site IMBD using the data set Kaggle. Each row represents the title of the movie, the genres, the release date, the country where it was released, the rating of the movie, the review rating, how long the movie is, the plot, the cast, the language of the film, and the filming location. Based on the data, there were 3,328 observations found.
Hint: For example, importing data, understanding the data, data exploration, etc.
The way that Dave approached the data was first, to open the data and take a quick look before getting too deep into it. He then wanted to know which horror movies were the highest ranked based on IMBD’s website. Dave used the function, extract, in order to get the year located next to the name of the film to make it easier for him. He then started plotting the data that he had in order to see the data at a much easier stand point. Dave plotted graphs for the rest of of Tidy Tuesday figuring out which horror movies had the highest ratings.
One thing that I noticed that we learned in class was when Dave uses the ggplot function. This function was used so that he could look at the horror movie ratings through a plot rather than just a list. We learned how to use ggplot to plot any data needed for our assignments. Another thing that I noticed that we learned in class is to repeat functions in order to be shown the data. For example, when he wanted to use the function horror_movies, he needed to repeat that at the bottom so that he could actually see his data.
Dave found that budgets and ratings are not correlated high budget movies. He also found what terms were used the most in horror movie titles, this helped him understand what titles were most popular when looking at horror films. Another major finding in the data was when he looked at how many horror movies were listed for each rating such as rated R, PG-13, Unrated, and Not Rated.
Something that I found the most interesting was when he looked at the different genres that certain horror movies were labeled to. This I thought was strange since horror is a type of genre itself. I understood that after looking at the data, comedy and thriller genres were quite popular with horror. I also wasn’t shocked to find how many comedic horror movies there are out there.