TV Shows & Movies in the Streaming Era

Part 1: Introduction & Data Overview

Introduction

Hello everyone! My name is Madison Clore. I am a graduating senior at Xavier University studying Finance and Business Analytics. After graduation, I will be working at Pension Corporation of America (PCA) as an Investment Advisor Representative.

I have always enjoyed watching different TV shows and movies. It has been a part of how I spend some of my free time with family and friends, whether it is watching a new series together or rewatching familiar favorites. With that said, I have noticed how much streaming platforms have changed the way people discover and consume content because of the wide array of options across genres and formats. This shift has made me curious about what kinds of movies and TV shows are actually included in these platforms and what patterns exist within their libraries. I wanted to explore whether certain characteristics, such as release year, duration, or content type, tend to be more common than others and what that might reveal about modern viewing habits. At the same time, I am also interested in how these broader content trends relate to highly rated or widely recognized shows, since certain titles consistently appear in rankings while others do not receive the same level of recognition.

For my Programming in Analytics course, I explored this area of interest by analyzing a Netflix Movies and TV Shows dataset. This dataset contains detailed information on titles of movies and TV shows available on the Netflix streaming platform. Each observation represents a single title, and there are 12 variables that describe key characteristics of each title. This dataset is well-suited for analyzing patterns in streaming content because it captures key descriptive characteristics across both movies and TV shows. This dataset can be accessed here: https://myxaviermy.sharepoint.com/:x:/g/personal/clorem_xavier_edu/IQCe_iBNpJ7HQLetGoc6Gt9ZAaCwr_h1hni_dClieuJM1N8?e=KM5CvN.

Data Dictionary

The Netflix Movies and TV Shows dataset has 8,809 observations and 12 variables. The variables include:

  • show_id: unique identifier assigned to each movie or TV show

  • type: indicates whether the entry is a movie or a TV show

  • title: title of the movie or TV show

  • director: director(s) of the movie or TV show

  • cast: main actors featured in the movie or TV show

  • country: country where the movie or TV show was produced

  • date_added: date the title was added to Netflix

  • release_year: year the movie or TV show was originally released

  • rating: content rating (e.g. TV-MA, TV-PG)

  • duration: length of movie in minutes or number of seasons for TV shows

  • listed_in: genre(s) associated with the title

  • description: brief summary of the movie or TV show

Summary Statistics

# A tibble: 2 × 5
  type    total_titles average_duration earliest_release latest_release
  <chr>          <int>            <dbl>            <dbl>          <dbl>
1 Movie           6132            99.6              1942           2021
2 TV Show         2677             1.76             1925           2024

The summary statistics reveal several differences between movies and TV shows that are available on Netflix. Movies make up the majority of the dataset with over 6,100 titles, while TV shows account for approximately 2,700. On average, movies have a duration of about 100 minutes, whereas TV shows average around 1.8 seasons. This suggests that shorter series formats are common within Netflix’s catalog. In addition to this, the release years show that Netflix includes both older and newer content, with movies ranging from 1942 to 2021 and TV shows spanning from 1925 to 2024. Overall, these statistics suggest that Netflix’s library is heavily dominated by movies while still maintaining a substantial collection of television content.

Part 2: Descriptive Analysis

To explore these ideas further, I used a variety of visuals and tables to identify trends and patterns within Netflix’s content library.

1. Distribution of Titles on Netflix by Release Year

The distribution of titles on Netflix by release year is negatively-skewed, with the majority of movies and TV shows being released after 2015. There is very limited older content available on the platform, especially for titles released before 2000. The sharp increase in more recent releases reflects the rapid expansion of Netflix’s library over time, while also suggesting that audiences may be more interested in newer content and recent releases.

2. Delay Between Content Release & Netflix Availability

The distribution of the delay between content release and Netflix availability is positively-skewed, with nearly 5,000 titles landing on the platform within the same year they were released. The number of titles decreases sharply after that, with very few taking more than 10 years to appear on Netflix. As reflected in the previous visual, Netflix’s catalog is heavily concentrated with newer content, so it makes sense that most titles are added relatively quickly after release. This pattern highlights how the streaming industry has accelerated content distribution, which allows movies and TV shows to move from initial release to streaming platforms within a short period of time.

3. Content Ratings by Frequency

# A tibble: 10 × 2
   rating `number of titles`
   <chr>               <int>
 1 TV-MA                3208
 2 TV-14                2160
 3 TV-PG                 863
 4 R                     799
 5 PG-13                 490
 6 TV-Y7                 334
 7 TV-Y                  307
 8 PG                    287
 9 TV-G                  220
10 NR                     80

Netflix’s content ratings are dominated by TV-MA and TV-14 titles, which together account for well over half of all titles on the platform’s catalog. This strong concentration of mature and teen-focused content is not particularly surprising, as adult audiences make up a large portion of Netflix’s subscriber base. In comparison, family-oriented ratings such as TV-G, TV-Y, and PG represent a smaller share of the catalog, which suggests that children and family content serve more as a supplemental offering rather than Netflix’s primary focus.

4. Relationship Between TV Show Release Year & Number of Seasons

The scatterplot shows that the vast majority of TV shows on Netflix have fewer than five seasons, and this pattern is consistent across all release years. The plot becomes noticeably denser after 2010, which aligns with Netflix’s broader expansion of its content library during that period. TV shows with 10 or more seasons are relatively rare, which suggests that the platform tends to feature shorter, more recent series rather than long-running traditional TV shows.

5. Average Time for Netflix to Add Title by Content Rating Group

Kids content appears on Netflix the fastest, averaging just under three years after release. Teen/Young Adult and Family content take the longest to arrive, at around six years on average, while Mature content falls in the middle at roughly 3.5 years. The faster turnaround for kids’ titles is expected as children’s programming tends to have a shorter relevance window and is frequently refreshed to keep younger audiences engaged with new content.

Part 3: Secondary Data Source

This dataset contains 250 distinct TV shows from IMDb’s Top 250 TV Shows list, which represents some of the highest-rated series according to user reviewers and ratings. Each row corresponds to a single TV show, and the original scraped variables include:

  • tvshow_title: name of the TV show

  • rating_of_show: content rating (e.g., TV-MA, TV-14)

  • series_type: type of show (TV series or mini-series)

  • stars_of_tvshow: IMDb user rating (out of 10)

  • start_year: year the show first aired

  • end_year: year the show ended (if applicable)

Additionally variables were created during the data wrangling process to support the analysis. These include:

  • show_age: number of years since release

  • show_length: number of years the show ran

  • rating_group: grouped content rating category (Kids, Family, Teen, Mature)

This dataset is used as a secondary source to compare highly rated television content against Netflix’s overall catalog.

1. Availability of IMDb Top-Ranked TV Shows on Netflix

When examining the IMDb Top 250 TV Shows, Netflix includes approximately 70 titles, while around 180 are not available on the platform. This means Netflix carries roughly one-quarter of the highest-rated shows, which is a notable gap given the overall size of its catalog. For viewers who prioritize highly rated content, this suggests that Netflix alone may not fully capture the breadth of top-performing TV series.

2. Average TV Show Length Comparison

The difference is notable: IMDb’s top-rated shows average nearly six seasons, while Netflix shows average fewer than two. This suggests that IMDb ratings may favor longevity, as longer-running series have more opportunities to build large, engaged audiences that actively rate and support them over time. In contrast, Netflix’s lower average aligns with its tendency toward shorter series runs and frequent cancellations.

Conclusion

Overall, this analysis explored whether characteristics are more common within Netflix’s catalog and what this reveals about modern viewing habits. The results show a clear emphasis on recent content, with most titles released in recent years and added to the platform relatively quickly. Netflix’s library is also dominated by shorter TV series and mature or teen-focused content, which reflects both audience preferences and platform strategy. When compared to IMDb’s top-rated shows, a gap emerges, as Netflix only includes a portion of highly ranked series. Ultimately, these findings suggest that Netflix prioritizes newer, shorter-form content, while highly rated TV shows are more often associated with longevity and sustained audience engagement.