2025-10-30

The Dataset

The dataset used in this analysis contains 10,000 TV show records with rich metadata including titles, original names and languages, genre IDs, countries of origin, popularity scores, voting metrics, overviews, poster and backdrop paths, and first air dates.

This project explores patterns among the highest-rated shows—examining what common characteristics they share—and analyzes them in various different to help understand the data better

The dataset was obtained from Kaggle: 10000 Popular TV Shows Data set (https://www.kaggle.com/datasets/riteshswami08/10000-popular-tv-shows-dataset-tmdb).

Research Questions

  • Which shows get the highest ratings (vote_average)?
  • Do highly rated shows also have high popularity?
  • Does language / country relate to rating?
  • Can we predict a show’s rating using popularity and vote_count?

Summary Statistics

desc_table <- tv_clean %>%
  summarize(
    avg_rating_mean = mean(vote_average, na.rm = TRUE),
    avg_rating_sd   = sd(vote_average, na.rm = TRUE),
    popularity_mean = mean(popularity, na.rm = TRUE),
    popularity_sd   = sd(popularity, na.rm = TRUE),
    votes_mean      = mean(vote_count, na.rm = TRUE),
    votes_sd        = sd(vote_count, na.rm = TRUE)
  )

knitr::kable(desc_table, digits = 3)
avg_rating_mean avg_rating_sd popularity_mean popularity_sd votes_mean votes_sd
6.55 2.315 7.826 10.551 230.099 872.623

Ratings vs Popularity (ggplot scatter)

Ratings by Language

Popularity, Votes, and Rating (plotly 3D)

Average Rating by Language (plotly bar)

Statistical Model: Can we predict rating?

We used popularity and vote_count to predict vote_average (rating).
Both variables have positive effects, meaning higher popularity and more votes are generally linked to higher ratings.
However, the R² value shows that these factors explain only a small portion of the variation in ratings.

Linear Regression Results: vote_average ~ popularity + vote_count
Term Estimate Std. Error t value p value
(Intercept) 6.38136 0.02859 223.22087 0
popularity 0.01165 0.00250 4.66970 0
vote_count 0.00034 0.00003 11.20575 0

Takeaways

  • Popular shows with a high number of votes are not always the highest rated, but many high-rating shows also attract large audiences.
  • Some languages show slightly higher typical ratings, suggesting regional or cultural taste patterns.
  • Popularity + vote_count can partially explain rating, but not perfectly — quality and hype aren’t identical.

Appendix: Data Notes / Source

Thank You!

Thank you for viewing my presentation!
I appreciate your time and attention.


Questions or Feedback?
vbhatia8@asu.edu