Animated Movies

Introduction

The dataset I chose is Animated Movies, which includes a range of options such as cartoons for kids, animated movies for adults/teens, and much more. This data was pulled from the IMDb website. I chose this dataset especially because it includes a lot of movies which I used to really enjoy watching as a child, such as the Lego Movie and Ratatouille. The dataset provides information such as the title of the movies, ratings out of 10, total votes, total gross collection in millions, genre, certificate, and description of the movie. I cleaned up the dataset by filtering out any NAs, dividing the metascore by 10 so that it matched the ratings, divided the votes by 50 so that it would be smaller, and seperated the genres.

Project 2

library(tidyverse) #setting libraries
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.3.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo 
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
setwd("C:/Users/asman/Documents/data110")
animatedmovies <- read_csv("TopAnimatedImDb.csv")
## Rows: 85 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): Title, Gross, Genre, Certificate, Director, Description, Runtime
## dbl (3): Rating, Metascore, Year
## num (1): Votes
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(animatedmovies)
## # A tibble: 6 × 11
##   Title           Rating  Votes Gross Genre Metascore Certificate Director  Year
##   <chr>            <dbl>  <dbl> <chr> <chr>     <dbl> <chr>       <chr>    <dbl>
## 1 Sen to Chihiro…    8.6 7.47e5 $10.… Adve…        96 U           Hayao M…  2001
## 2 The Lion King      8.5 1.04e6 $422… Adve…        88 U           Roger A…  1994
## 3 Hotaru no haka     8.5 2.72e5 <NA>  Dram…        94 U           Isao Ta…  1988
## 4 Kimi no na wa.     8.4 2.60e5 $5.0… Dram…        79 U           Makoto …  2016
## 5 Spider-Man: In…    8.4 5.10e5 $190… Acti…        87 U           Bob Per…  2018
## 6 Coco               8.4 4.92e5 $209… Adve…        81 U           Lee Unk…  2017
## # ℹ 2 more variables: Description <chr>, Runtime <chr>

Cleaning Dataset

animatedmovies1 <- animatedmovies %>%
  select(Title, Rating, Votes, Gross, Genre, Metascore, Certificate, Year, Runtime, Director) %>%
  filter(!is.na(Rating)) %>% #remove NAs
  filter(!is.na(Gross)) %>%
  filter(!is.na(Metascore)) %>%
  mutate(metascore = Metascore / 10) %>% #making metascore equal to the rating
  mutate(votes = Votes / 50) %>% #making votes smaller
  separate_rows(Genre, sep = ", ") #separating genres
animatedmovies1$Gross <- as.numeric(gsub("\\$|M", "", animatedmovies1$Gross)) #making Gross value numeric and removing $ and M

Linear Regression Analysis

linearmodel <- lm(Rating ~  metascore, data = animatedmovies1) #equation
summary(linearmodel)
## 
## Call:
## lm(formula = Rating ~ metascore, data = animatedmovies1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50167 -0.22872 -0.05576  0.19833  0.57217 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.73243    0.25962  25.932  < 2e-16 ***
## metascore    0.14413    0.03153   4.571 1.43e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2794 on 97 degrees of freedom
## Multiple R-squared:  0.1772, Adjusted R-squared:  0.1687 
## F-statistic: 20.89 on 1 and 97 DF,  p-value: 1.433e-05

The model has the equation: Rating = 0.14(metascore) + 6.73

The p-value on the right of metascore has 3 asterisks which suggests it is a meaningful variable to explain the linear increase in Rating. However, the Adjusted R-Squared value states that about 16% of the variation may be explained by the model. In other words, 84% of the variation in the data is likely not explained by this model.

Diagnostic Plot

linearplot <- ggplot(animatedmovies1, aes(x = metascore, y = Rating)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "lightblue") +  
  labs(x = "Metascore",
       y = "Rating",
       title = "Linear Regression: Metascore vs. Rating")+  # Axis labels and title
  theme_minimal()
linearplot
## `geom_smooth()` using formula = 'y ~ x'

color <- c("lightblue", "lavender", "lightpink","maroon","darkred", "darkblue","darkolivegreen", "lightyellow", "orange") #adding custom colors

Exploring with simple plots

Genres

simpleplot1 <- #simple plot for genre
  ggplot(animatedmovies1, aes(x = Genre)) +
  geom_bar(fill = color)+
  theme_minimal()
simpleplot1

Exploring Gross vs Votes

simpleplot2 <- 
  ggplot(animatedmovies1, aes(x = votes, y = Gross)) +
  geom_point(color = "darkolivegreen4")+
  labs(x = "Votes", y = "Gross ($M)", title = "Scatterplot of Gross vs Votes")
simpleplot2

Final Visualization

colors1 <- c("#2D767F", "#F88180", "#A7ACEC", "#245843", "#FDF4A9", "#324F7B", "#61305D", "#EF7B3E", "#DCF516")
highchart() |>
  hc_add_series(data = animatedmovies1,
                   type = "scatter", hcaes(
                     x = metascore,
                   y = Votes,
                   group = Genre,
                   size = Gross)) |> # size of point is the gross $ collection 
  hc_xAxis(title = list(text="Metascore Rating")) |>
  hc_yAxis(title = list(text="Number of Votes")) |>
  hc_title(text = "Animated Movies: Metascore Vs Votes by Genre") |>
  hc_caption(text = "Source: IMDb")|> #source
  hc_chart(backgroundColor = "#F3E8D2")|>
  hc_colors(colors1)

Reflection Essay

B.

Animation is the process of putting together individual illustrations in order to make inanimate objects appear to be moving. The idea is that if drawings of the stages of an action were revealed in a quick manner, the human eye would perceive it in a continuous manner. According to Britannica, one of the first devices which showed animation film was called the phenakistoscope, which is a spinning cardboard disk that created the illusion of movement when viewed in a mirror (Kehr).

Work Cited Kehr, Dave. “animation”. Encyclopedia Britannica, 24 Feb. 2024, https://www.britannica.com/art/animation. Accessed 14 April 2024.

C.

My data visualization is a scatter-plot which displays the relation between the number of votes and the metascore. It is colored by genres such as action, adventure, biography, comedy, crime, drama, family, fantasy, and sci-fi. The points are sized by the gross collection in ($) millions. An interesting pattern I noticed is that the higher number of votes had more amount of gross, however there was no specific relation between the gross and metascore. Another thing I noticed is that adventure and comedy seemed to be the more popular genres.