The dataset I chose is Animated Movies, which includes a range of options such as cartoons for kids, animated movies for adults/teens, and much more. This data was pulled from the IMDb website. I chose this dataset especially because it includes a lot of movies which I used to really enjoy watching as a child, such as the Lego Movie and Ratatouille. The dataset provides information such as the title of the movies, ratings out of 10, total votes, total gross collection in millions, genre, certificate, and description of the movie. I cleaned up the dataset by filtering out any NAs, dividing the metascore by 10 so that it matched the ratings, divided the votes by 50 so that it would be smaller, and seperated the genres.
library(tidyverse) #setting libraries
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.3.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
setwd("C:/Users/asman/Documents/data110")
animatedmovies <- read_csv("TopAnimatedImDb.csv")
## Rows: 85 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): Title, Gross, Genre, Certificate, Director, Description, Runtime
## dbl (3): Rating, Metascore, Year
## num (1): Votes
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(animatedmovies)
## # A tibble: 6 × 11
## Title Rating Votes Gross Genre Metascore Certificate Director Year
## <chr> <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 Sen to Chihiro… 8.6 7.47e5 $10.… Adve… 96 U Hayao M… 2001
## 2 The Lion King 8.5 1.04e6 $422… Adve… 88 U Roger A… 1994
## 3 Hotaru no haka 8.5 2.72e5 <NA> Dram… 94 U Isao Ta… 1988
## 4 Kimi no na wa. 8.4 2.60e5 $5.0… Dram… 79 U Makoto … 2016
## 5 Spider-Man: In… 8.4 5.10e5 $190… Acti… 87 U Bob Per… 2018
## 6 Coco 8.4 4.92e5 $209… Adve… 81 U Lee Unk… 2017
## # ℹ 2 more variables: Description <chr>, Runtime <chr>
animatedmovies1 <- animatedmovies %>%
select(Title, Rating, Votes, Gross, Genre, Metascore, Certificate, Year, Runtime, Director) %>%
filter(!is.na(Rating)) %>% #remove NAs
filter(!is.na(Gross)) %>%
filter(!is.na(Metascore)) %>%
mutate(metascore = Metascore / 10) %>% #making metascore equal to the rating
mutate(votes = Votes / 50) %>% #making votes smaller
separate_rows(Genre, sep = ", ") #separating genres
animatedmovies1$Gross <- as.numeric(gsub("\\$|M", "", animatedmovies1$Gross)) #making Gross value numeric and removing $ and M
linearmodel <- lm(Rating ~ metascore, data = animatedmovies1) #equation
summary(linearmodel)
##
## Call:
## lm(formula = Rating ~ metascore, data = animatedmovies1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50167 -0.22872 -0.05576 0.19833 0.57217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.73243 0.25962 25.932 < 2e-16 ***
## metascore 0.14413 0.03153 4.571 1.43e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2794 on 97 degrees of freedom
## Multiple R-squared: 0.1772, Adjusted R-squared: 0.1687
## F-statistic: 20.89 on 1 and 97 DF, p-value: 1.433e-05
The model has the equation: Rating = 0.14(metascore) + 6.73
The p-value on the right of metascore has 3 asterisks which suggests it is a meaningful variable to explain the linear increase in Rating. However, the Adjusted R-Squared value states that about 16% of the variation may be explained by the model. In other words, 84% of the variation in the data is likely not explained by this model.
linearplot <- ggplot(animatedmovies1, aes(x = metascore, y = Rating)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "lightblue") +
labs(x = "Metascore",
y = "Rating",
title = "Linear Regression: Metascore vs. Rating")+ # Axis labels and title
theme_minimal()
linearplot
## `geom_smooth()` using formula = 'y ~ x'
color <- c("lightblue", "lavender", "lightpink","maroon","darkred", "darkblue","darkolivegreen", "lightyellow", "orange") #adding custom colors
simpleplot1 <- #simple plot for genre
ggplot(animatedmovies1, aes(x = Genre)) +
geom_bar(fill = color)+
theme_minimal()
simpleplot1
simpleplot2 <-
ggplot(animatedmovies1, aes(x = votes, y = Gross)) +
geom_point(color = "darkolivegreen4")+
labs(x = "Votes", y = "Gross ($M)", title = "Scatterplot of Gross vs Votes")
simpleplot2
colors1 <- c("#2D767F", "#F88180", "#A7ACEC", "#245843", "#A64942", "#324F7B", "#61305D", "#EF7B3E", "#34222E")
highchart() |>
hc_add_series(data = animatedmovies1,
type = "scatter", hcaes(
x = metascore,
y = Votes,
group = Genre,
size = Gross)) |> # size of point is the gross $ collection
hc_xAxis(title = list(text="Metascore Rating")) |>
hc_yAxis(title = list(text="Number of Votes")) |>
hc_title(text = "Animated Movies: Metascore Vs Votes by Genre") |>
hc_caption(text = "Source: IMDb")|> #source
hc_chart(backgroundColor = "#F3E8D2")|>
hc_colors(colors1)
Animation is the process of putting together individual illustrations in order to make inanimate objects appear to be moving. The idea is that if drawings of the stages of an action were revealed in a quick manner, the human eye would perceive it in a continuous manner. According to Britannica, one of the first devices which showed animation film was called the phenakistoscope, which is a spinning cardboard disk that created the illusion of movement when viewed in a mirror (Kehr).
Work Cited Kehr, Dave. “animation”. Encyclopedia Britannica, 24 Feb. 2024, https://www.britannica.com/art/animation. Accessed 14 April 2024.
My data visualization is a scatter-plot which displays the relation between the number of votes and the metascore. It is colored by genres such as action, adventure, biography, comedy, crime, drama, family, fantasy, and sci-fi. The points are sized by the gross collection in ($) millions. An interesting pattern I noticed is that the higher number of votes had more amount of gross, however there was no specific relation between the gross and metascore. Another thing I noticed is that adventure and comedy seemed to be the more popular genres.