1 Project overview

This project uses the Pixar Films dataset from the TidyTuesday project for the week of 11 March 2025. The analysis combines two publicly available datasets:

The main goal is to understand how Pixar films differ across time, rating categories, runtime, and public/critical reception. The project does not try to prove causality. Instead, it uses descriptive analysis, data transformation, and visualizations to identify patterns.

2 Research questions

The analysis is guided by the following questions:

  1. How has Pixar’s film output changed over time?
  2. Are longer Pixar films rated better, worse, or similarly to shorter films?
  3. Do audience-oriented and critic-oriented rating measures tell the same story?
  4. Which films perform strongest across multiple rating systems?
  5. Are some film rating categories associated with different runtime or score patterns?
  6. Which films have the largest gap between Rotten Tomatoes and Metacritic scores?

3 Required packages

library(data.table)
library(ggplot2)
library(readr)
library(dplyr)
library(tidyr)
library(stringr)
library(forcats)
library(RColorBrewer)
library(scales)
library(knitr)

4 Loading the TidyTuesday data

The data is read directly from the official TidyTuesday GitHub repository.

pixar_films <- read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-11/pixar_films.csv"
)

public_response <- read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-11/public_response.csv"
)

head(pixar_films)
head(public_response)

5 Data structure

str(pixar_films)
## spc_tbl_ [27 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ number      : num [1:27] 1 2 3 4 5 6 7 8 9 10 ...
##  $ film        : chr [1:27] "Toy Story" "A Bug's Life" "Toy Story 2" "Monsters, Inc." ...
##  $ release_date: Date[1:27], format: "1995-11-22" "1998-11-25" ...
##  $ run_time    : num [1:27] 81 95 92 92 100 115 117 111 98 96 ...
##  $ film_rating : chr [1:27] "G" "G" "G" "G" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   number = col_double(),
##   ..   film = col_character(),
##   ..   release_date = col_date(format = ""),
##   ..   run_time = col_double(),
##   ..   film_rating = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(public_response)
## spc_tbl_ [24 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ film           : chr [1:24] "Toy Story" "A Bug's Life" "Toy Story 2" "Monsters, Inc." ...
##  $ rotten_tomatoes: num [1:24] 100 92 100 96 99 97 74 96 95 98 ...
##  $ metacritic     : num [1:24] 95 77 88 79 90 90 73 96 95 88 ...
##  $ cinema_score   : chr [1:24] "A" "A" "A+" "A+" ...
##  $ critics_choice : num [1:24] NA NA 100 92 97 88 89 91 90 95 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   film = col_character(),
##   ..   rotten_tomatoes = col_double(),
##   ..   metacritic = col_double(),
##   ..   cinema_score = col_character(),
##   ..   critics_choice = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

The first dataset contains the film-level information, while the second dataset contains public and critic response variables. The shared variable is film, which makes it possible to merge the two datasets.

6 Converting to data.table

pixar_dt <- as.data.table(pixar_films)
response_dt <- as.data.table(public_response)

class(pixar_dt)
## [1] "data.table" "data.frame"
class(response_dt)
## [1] "data.table" "data.frame"

7 Extra work 1: Data quality checks

Before analysis, I check the number of rows, columns, duplicate films, and missing values.

data_quality <- data.table(
  dataset = c("pixar_films", "public_response"),
  rows = c(nrow(pixar_dt), nrow(response_dt)),
  columns = c(ncol(pixar_dt), ncol(response_dt)),
  duplicate_films = c(
    pixar_dt[, sum(duplicated(film))],
    response_dt[, sum(duplicated(film))]
  )
)

data_quality
missing_pixar <- pixar_dt[, lapply(.SD, function(x) sum(is.na(x)))]
missing_response <- response_dt[, lapply(.SD, function(x) sum(is.na(x)))]

missing_pixar
missing_response

This step is useful because missing values and duplicate rows can affect summaries and plots. There are no duplicate film titles in the main film-level files, so the merge can be done safely at the film level.

8 Merging datasets

This is one of the extra requirements for an A grade. I merge the film metadata with the public response data using the shared film column.

pixar_merged <- merge(
  pixar_dt,
  response_dt,
  by = "film",
  all.x = TRUE
)

head(pixar_merged)
dim(pixar_merged)
## [1] 27  9

9 Extra work 2: Feature engineering

I add several new variables to make the analysis more meaningful:

pixar_merged[, release_year := as.integer(format(release_date, "%Y"))]

pixar_merged[, decade := paste0(floor(release_year / 10) * 10, "s")]

pixar_merged[, runtime_group := fifelse(
  run_time < 90, "Shorter than 90 min",
  fifelse(run_time <= 105, "90-105 min", "Longer than 105 min")
)]

pixar_merged[, rating_group := fifelse(
  film_rating %in% c("G"), "G",
  fifelse(film_rating %in% c("PG"), "PG", "Other/Unknown")
)]

cinema_lookup <- data.table(
  cinema_score = c("A+", "A", "A-", "B+", "B", "B-", "C+", "C", "C-", "D", "F"),
  cinema_score_numeric = c(100, 95, 90, 87, 83, 80, 77, 73, 70, 60, 40)
)

pixar_merged <- merge(
  pixar_merged,
  cinema_lookup,
  by = "cinema_score",
  all.x = TRUE
)

pixar_merged[, average_score := rowMeans(
  .SD,
  na.rm = TRUE
),
.SDcols = c("rotten_tomatoes", "metacritic", "critics_choice")]

pixar_merged[, rt_metacritic_gap := rotten_tomatoes - metacritic]

pixar_merged[, release_period := fifelse(
  release_year < 2010, "Before 2010",
  fifelse(release_year < 2020, "2010-2019", "2020 onwards")
)]

head(pixar_merged)

10 Extra work 3: A clean analysis dataset

I keep only complete rows for the variables used most often in the analysis. I also arrange the films by release order.

analysis_dt <- pixar_merged[
  !is.na(rotten_tomatoes) &
    !is.na(metacritic) &
    !is.na(critics_choice) &
    !is.na(run_time)
][order(number)]

analysis_dt[, .(
  film,
  release_year,
  run_time,
  film_rating,
  rotten_tomatoes,
  metacritic,
  critics_choice,
  cinema_score,
  average_score
)]

11 Required item: Filtering rows with data.table

The following table filters for Pixar films released from 2010 onwards with Rotten Tomatoes scores of 90 or higher.

high_rt_recent <- analysis_dt[
  release_year >= 2010 & rotten_tomatoes >= 90,
  .(film, release_year, film_rating, run_time, rotten_tomatoes, metacritic, critics_choice, average_score)
][order(-rotten_tomatoes)]

high_rt_recent

This filter shows which newer Pixar films had especially strong Rotten Tomatoes results. Filtering is useful because it lets us focus on a specific part of the dataset instead of looking at all films at once.

12 Required item: Aggregating data with data.table

12.1 Aggregation by film rating

rating_summary <- analysis_dt[
  ,
  .(
    films = .N,
    average_runtime = round(mean(run_time, na.rm = TRUE), 1),
    average_rotten_tomatoes = round(mean(rotten_tomatoes, na.rm = TRUE), 1),
    average_metacritic = round(mean(metacritic, na.rm = TRUE), 1),
    average_critics_choice = round(mean(critics_choice, na.rm = TRUE), 1),
    average_combined_score = round(mean(average_score, na.rm = TRUE), 1)
  ),
  by = film_rating
][order(-average_combined_score)]

rating_summary

12.2 Aggregation by release period

period_summary <- analysis_dt[
  ,
  .(
    films = .N,
    average_runtime = round(mean(run_time, na.rm = TRUE), 1),
    median_runtime = median(run_time, na.rm = TRUE),
    average_score = round(mean(average_score, na.rm = TRUE), 1),
    highest_score = round(max(average_score, na.rm = TRUE), 1),
    lowest_score = round(min(average_score, na.rm = TRUE), 1)
  ),
  by = release_period
][order(release_period)]

period_summary

These aggregated tables provide a compact summary of the dataset. They make it easier to compare groups instead of interpreting every film separately.

13 Extra work 4: Ranking films across multiple rating systems

A single rating system may not tell the full story. For this reason, I rank films using a combined score based on Rotten Tomatoes, Metacritic, and Critics Choice.

top_films <- analysis_dt[
  order(-average_score),
  .(
    rank = seq_len(.N),
    film,
    release_year,
    film_rating,
    run_time,
    rotten_tomatoes,
    metacritic,
    critics_choice,
    average_score = round(average_score, 1)
  )
][1:10]

top_films

14 Extra work 5: Correlation table

This table checks whether the numerical rating variables tend to move together.

score_vars <- analysis_dt[, .(
  rotten_tomatoes,
  metacritic,
  critics_choice,
  run_time,
  cinema_score_numeric
)]

correlation_table <- round(cor(score_vars, use = "pairwise.complete.obs"), 2)

correlation_table
##                      rotten_tomatoes metacritic critics_choice run_time
## rotten_tomatoes                 1.00       0.80           0.85    -0.15
## metacritic                      0.80       1.00           0.86     0.00
## critics_choice                  0.85       0.86           1.00    -0.13
## run_time                       -0.15       0.00          -0.13     1.00
## cinema_score_numeric            0.62       0.53           0.58     0.00
##                      cinema_score_numeric
## rotten_tomatoes                      0.62
## metacritic                           0.53
## critics_choice                       0.58
## run_time                             0.00
## cinema_score_numeric                 1.00

The correlation table helps compare the rating measures. Stronger positive correlations suggest that two measures often increase together, while weaker correlations suggest that they capture somewhat different aspects of film reception.

15 Visualization theme and palette

For the visualizations, I use a consistent minimal theme and ColorBrewer palettes.

main_palette <- brewer.pal(8, "Set2")
sequential_palette <- brewer.pal(9, "YlGnBu")

theme_project <- theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(size = 11),
    axis.title = element_text(face = "bold"),
    legend.position = "bottom"
  )

16 Plot 1: Pixar film releases over time

ggplot(analysis_dt, aes(x = release_year)) +
  geom_histogram(binwidth = 5, fill = main_palette[1], color = "white") +
  scale_x_continuous(breaks = pretty_breaks()) +
  labs(
    title = "Pixar film releases by year",
    subtitle = "Films are grouped into five-year intervals",
    x = "Release year",
    y = "Number of films"
  ) +
  theme_project

The plot shows how Pixar releases are distributed across time. This provides background for the rest of the analysis, because later patterns may partly reflect changes in the number of films released during different periods.

17 Plot 2: Runtime trend across release order

ggplot(analysis_dt, aes(x = number, y = run_time)) +
  geom_line(color = "grey60", linewidth = 0.8) +
  geom_point(aes(color = film_rating), size = 3) +
  geom_smooth(method = "loess", se = FALSE, color = "black", linewidth = 0.9) +
  scale_color_brewer(palette = "Set2") +
  labs(
    title = "Runtime trend across Pixar release order",
    subtitle = "The black smooth line shows the general runtime trend",
    x = "Release order",
    y = "Runtime in minutes",
    color = "Film rating"
  ) +
  theme_project

This plot uses multiple geom layers: line, points, and a smoothed trend line. It helps show whether Pixar films became longer or shorter across time.

18 Plot 3: Runtime distribution by film rating

ggplot(analysis_dt, aes(x = film_rating, y = run_time, fill = film_rating)) +
  geom_boxplot(alpha = 0.75, outlier.shape = NA) +
  geom_jitter(width = 0.15, alpha = 0.65, size = 2) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Runtime distribution by film rating",
    subtitle = "Each dot represents one film",
    x = "Film rating",
    y = "Runtime in minutes",
    fill = "Film rating"
  ) +
  theme_project

The boxplot compares runtimes across rating groups. The added jittered points show individual films, making the distribution more transparent.

19 Plot 4: Rotten Tomatoes vs Metacritic

ggplot(analysis_dt, aes(x = metacritic, y = rotten_tomatoes)) +
  geom_point(aes(size = run_time, color = film_rating), alpha = 0.8) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linewidth = 0.8) +
  scale_color_brewer(palette = "Dark2") +
  labs(
    title = "Relationship between Metacritic and Rotten Tomatoes scores",
    subtitle = "Point size represents runtime",
    x = "Metacritic score",
    y = "Rotten Tomatoes score",
    color = "Film rating",
    size = "Runtime"
  ) +
  theme_project

This plot compares two rating systems. If the points follow the fitted line closely, it suggests the two measures tell a similar story. If some points are far from the line, those films may be rated differently by the two systems.

20 Plot 5: Combined score over release order

ggplot(analysis_dt, aes(x = number, y = average_score)) +
  geom_line(color = "grey50", linewidth = 0.8) +
  geom_point(aes(color = release_period), size = 3) +
  geom_hline(yintercept = mean(analysis_dt$average_score, na.rm = TRUE), linetype = "dashed") +
  scale_color_brewer(palette = "Set1") +
  labs(
    title = "Combined rating score across Pixar films",
    subtitle = "Dashed line shows the overall average combined score",
    x = "Release order",
    y = "Average score across three rating systems",
    color = "Release period"
  ) +
  theme_project

The combined score reduces dependence on one rating system. This makes it easier to identify films that perform strongly across different forms of evaluation.

21 Plot 6: Top 10 films by combined score

top_10_plot <- analysis_dt[
  order(-average_score)
][1:10]

ggplot(top_10_plot, aes(x = fct_reorder(film, average_score), y = average_score, fill = film_rating)) +
  geom_col() +
  coord_flip() +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Top 10 Pixar films by combined score",
    subtitle = "Combined score averages Rotten Tomatoes, Metacritic, and Critics Choice",
    x = "Film",
    y = "Combined score",
    fill = "Film rating"
  ) +
  theme_project

This plot provides a clear ranking of the strongest films based on multiple rating measures. It is more balanced than using only one score.

22 Plot 7: Rating system heatmap for top films

top_heatmap <- analysis_dt[
  order(-average_score)
][1:12, .(
  film,
  Rotten_Tomatoes = rotten_tomatoes,
  Metacritic = metacritic,
  Critics_Choice = critics_choice
)]

top_heatmap_long <- melt(
  top_heatmap,
  id.vars = "film",
  variable.name = "rating_system",
  value.name = "score"
)

ggplot(top_heatmap_long, aes(x = rating_system, y = fct_reorder(film, score), fill = score)) +
  geom_tile(color = "white") +
  scale_fill_distiller(palette = "YlGnBu", direction = 1) +
  labs(
    title = "Heatmap of rating scores for top Pixar films",
    subtitle = "Darker cells indicate higher scores",
    x = "Rating system",
    y = "Film",
    fill = "Score"
  ) +
  theme_project

The heatmap makes it easy to compare the same films across several rating systems. This is useful because a film may perform very well in one system but less strongly in another.

23 Plot 8: Score gap between Rotten Tomatoes and Metacritic

gap_plot <- analysis_dt[
  order(abs(rt_metacritic_gap), decreasing = TRUE)
][1:12]

ggplot(gap_plot, aes(x = fct_reorder(film, rt_metacritic_gap), y = rt_metacritic_gap, fill = rt_metacritic_gap > 0)) +
  geom_col() +
  coord_flip() +
  scale_fill_brewer(palette = "Paired", labels = c("Metacritic higher", "Rotten Tomatoes higher")) +
  labs(
    title = "Largest gaps between Rotten Tomatoes and Metacritic",
    subtitle = "Positive values mean Rotten Tomatoes is higher than Metacritic",
    x = "Film",
    y = "Rotten Tomatoes minus Metacritic",
    fill = "Direction of gap"
  ) +
  theme_project

This plot identifies films where the two rating systems disagree most. A large positive gap means Rotten Tomatoes is higher than Metacritic, while a negative gap means Metacritic is higher.

24 Plot 9: Average score by runtime group and release period

runtime_period_summary <- analysis_dt[
  ,
  .(
    films = .N,
    average_score = mean(average_score, na.rm = TRUE)
  ),
  by = .(runtime_group, release_period)
]

ggplot(runtime_period_summary, aes(x = runtime_group, y = average_score, fill = release_period)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Set3") +
  labs(
    title = "Average combined score by runtime group and release period",
    subtitle = "Comparing runtime categories across different periods",
    x = "Runtime group",
    y = "Average combined score",
    fill = "Release period"
  ) +
  theme_project

This plot adds a more detailed comparison by combining two grouping variables. It checks whether the relationship between runtime and scores looks different across release periods.

25 Plot 10: Missing data visualization

missing_long <- pixar_merged[, lapply(.SD, function(x) sum(is.na(x)))]
missing_long <- melt(
  missing_long,
  measure.vars = names(missing_long),
  variable.name = "variable",
  value.name = "missing_values"
)

ggplot(missing_long, aes(x = fct_reorder(variable, missing_values), y = missing_values)) +
  geom_col(fill = main_palette[3]) +
  coord_flip() +
  labs(
    title = "Missing values by variable",
    subtitle = "Checking data completeness after merging",
    x = "Variable",
    y = "Number of missing values"
  ) +
  theme_project

This plot is an extra diagnostic step. It shows whether some variables are less complete than others, which is important before interpreting results.

26 Extra work 6: Simple regression model

As an additional step, I estimate a simple linear model predicting the combined score from runtime, release year, and film rating. This is not meant to prove causality. It is only used as an exploratory model.

model_dt <- analysis_dt[
  !is.na(average_score) &
    !is.na(run_time) &
    !is.na(release_year) &
    !is.na(film_rating)
]

score_model <- lm(
  average_score ~ run_time + release_year + film_rating,
  data = model_dt
)

summary(score_model)
## 
## Call:
## lm(formula = average_score ~ run_time + release_year + film_rating, 
##     data = model_dt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.432  -6.031   3.040   8.163  15.351 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1677.93483  918.38691   1.827   0.0853 .
## run_time        -0.08804    0.32345  -0.272   0.7888  
## release_year    -0.78891    0.45858  -1.720   0.1035  
## film_ratingPG    6.06937    5.72659   1.060   0.3040  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.41 on 17 degrees of freedom
## Multiple R-squared:  0.1598, Adjusted R-squared:  0.01153 
## F-statistic: 1.078 on 3 and 17 DF,  p-value: 0.3849

26.0.1 Interpretation of the regression model

The regression model examines whether a film’s run_time, release_year, and film_rating help explain differences in the average_score.

Overall, the model has limited explanatory power. The Multiple R-squared value is 0.1598, meaning that the variables included in the model explain only about 16% of the variation in the average score. The adjusted R-squared is much lower, at 0.0115, which suggests that after adjusting for the number of predictors, the model explains almost none of the variation in the outcome. This means that other factors not included in the model are likely much more important in explaining why some Pixar films receive higher or lower average scores.

The overall model is also not statistically significant. The F-statistic has a p-value of 0.3849, which is above the usual 0.05 threshold. This means that, taken together, run_time, release_year, and film_rating do not provide strong statistical evidence for predicting the average score in this dataset.

Looking at the individual coefficients, none of the predictors are statistically significant at the 5% level. run_time has a very small negative coefficient, suggesting that longer films are associated with slightly lower average scores, but this relationship is weak and not significant. release_year also has a negative coefficient, suggesting that more recent films may have slightly lower average scores in this sample, but again the result is not statistically significant. The film_ratingPG coefficient is positive, suggesting that PG-rated films may have higher average scores than the reference rating category, but this effect is also not statistically significant.

Therefore, the model should be interpreted as exploratory rather than conclusive.However, including the model is still useful because it adds an analytical layer beyond visualizations and shows that film reception is likely influenced by more complex factors such as story quality, cultural context, franchise popularity, nostalgia, marketing, and audience expectations.

27 Extra work 7: Film-level interpretation table

This table combines the main transformed variables and creates a clear final output.

final_table <- analysis_dt[
  order(-average_score),
  .(
    film,
    release_year,
    film_rating,
    run_time,
    runtime_group,
    rotten_tomatoes,
    metacritic,
    critics_choice,
    cinema_score,
    average_score = round(average_score, 1),
    rt_metacritic_gap
  )
]

kable(
  final_table,
  caption = "Final film-level analysis table ordered by combined score"
)
Final film-level analysis table ordered by combined score
film release_year film_rating run_time runtime_group rotten_tomatoes metacritic critics_choice cinema_score average_score rt_metacritic_gap
Toy Story 2 1999 G 92 90-105 min 100 88 100 A+ 96.0 12
Toy Story 3 2010 G 103 90-105 min 98 92 97 A 95.7 6
Finding Nemo 2003 G 100 90-105 min 99 90 97 A+ 95.3 9
Inside Out 2015 PG 95 90-105 min 98 94 93 A 95.0 4
Ratatouille 2007 G 111 Longer than 105 min 96 96 91 A 94.3 0
Up 2009 PG 96 90-105 min 98 88 95 A+ 93.7 10
WALL-E 2008 G 98 90-105 min 95 95 90 A 93.3 0
The Incredibles 2004 PG 115 Longer than 105 min 97 90 88 A+ 91.7 7
Toy Story 4 2019 G 100 90-105 min 97 84 94 A 91.7 13
Soul 2020 PG 100 90-105 min 96 83 93 NA 90.7 13
Monsters, Inc. 2001 G 92 90-105 min 96 79 92 A+ 89.0 17
Coco 2017 PG 105 90-105 min 97 81 89 A+ 89.0 16
Finding Dory 2016 PG 97 90-105 min 94 77 89 A 86.7 17
Incredibles 2 2018 PG 118 Longer than 105 min 93 80 86 A+ 86.3 13
Cars 2006 G 117 Longer than 105 min 74 73 89 A 78.7 1
Brave 2012 PG 93 90-105 min 78 69 81 A 76.0 9
Onward 2020 PG 102 90-105 min 88 61 79 A- 76.0 27
Monsters University 2013 G 104 90-105 min 80 65 79 A 74.7 15
The Good Dinosaur 2015 PG 93 90-105 min 76 66 75 A 72.3 10
Cars 3 2017 G 102 90-105 min 69 59 66 A 64.7 10
Cars 2 2011 G 106 Longer than 105 min 40 57 67 A- 54.7 -17

28 Main findings

The analysis suggests several main findings:

  1. Pixar films vary meaningfully in runtime, release period, and rating performance.
  2. Rotten Tomatoes, Metacritic, and Critics Choice scores are related, but they do not always rank films in the same way.
  3. The merged dataset allows a stronger analysis because film characteristics can be studied together with public and critic response.
  4. Runtime alone does not fully explain film reception. Some shorter and longer films both perform well.
  5. Looking at score gaps is useful because it identifies films where rating systems disagree.
  6. The top-performing films are stronger when they perform well across multiple rating systems, not only one.

29 Conclusion

This project used a real TidyTuesday dataset to analyze Pixar films using data.table transformations, dataset merging, grouped aggregation, filtering, and ggplot2 visualizations. The analysis goes beyond the minimum requirements by adding data quality checks, feature engineering, a combined rating index, correlation analysis, score-gap analysis, a missing-data visualization, and a simple exploratory regression model. The regression model did not identify strong statistical predictors of average score, which suggests that Pixar film reception cannot be explained well by simple structural variables such as runtime, release year, or rating alone.

The strongest part of the analysis is the combination of film metadata and public response data. This makes it possible to compare Pixar films not only by when they were released or how long they are, but also by how they were received across different rating systems.

30 Requirement checklist

Requirement Completed? Where
Publicly accessible TidyTuesday dataset Yes Loading the TidyTuesday data
Filtering rows using data.table Yes Filtering section
Aggregating data using data.table Yes Aggregation section
At least 7 plots Yes 10 plots included
At least 3 ggplot2 geoms Yes histogram, point, line, smooth, boxplot, jitter, col, tile, hline
Merge datasets Yes pixar_films + public_response
Apply a theme Yes theme_project
Axis and plot titles Yes All plots use labs()
ColorBrewer palette Yes scale_*_brewer() and scale_fill_distiller()
Multiple geom layers on same plot Yes Plots 2, 3, 4, and 5
Extra analysis beyond requirements Yes quality checks, engineered variables, ranking, correlation, model, missingness
Publish on RPubs/Medium To do after knitting Using RStudio Publish button