Dating Reality TV - A Guilty Pleasure!


How Reality-TV compares to other genres

Row

Dating Reality TV versus other IMDb genres

Inferences

  • The graph shows the average ratings of all TV series grouped by genre rated on IMDb
  • Action ranks #1 with an average rating of 6.95
  • Reality TV is among the top 10 genres and ranks 7th with an average rating of 3.83

The Rise of Dating Reality TV

Row

Dating Reality TV over the years

Inferences

  • Dating Reality TV took off in the 2000s with more and more new shows of this genre taking over the TV series market
  • The market saw a peak in 2022 and 2023 with shows like Too Hot To Handle gaining traction
  • The COVID-19 pandemic and increased time spent on social media might have influenced people’s content choices, reality TV potentially providing a much needed escape from real life

Top Shows (Rating and Popularity)

Row

Top 10 shows by rating and popularity

Inferences

  • The graph shows the top 10 shows by rating, i.e. the dating reality shows/ TV movies that are rated highly on IMDb
  • From the drop down, we can also toggle to the the shows that are most popular i.e. received the most number of votes or ratings on IMDb
  • Recipe for Love and Love on the Run have >9 ratings
  • Too Hot To Handle and Love is Blind are the most popular dating reality shows as of date with the most number of votes on IMDb

Shows with the highest global reach

Row

Shows with the highest number of regions aired

Inferences

  • This graph shows the number of countries in which a particular show/ title was aired
  • Love is Blind tops the list again, probably due to popularity and was aired in a total of 39 countries
  • It is followed closely by Too Hot To Handle: Brazil which was aired in 34 countries
  • This shows the far and wide global reach of trending shows like Love is Blind

What decade did global reach rise?

Row

Global reach every decade

Inferences

  • Plotting the global reach by decade, we can see that the globalization of dating reality series started recently i.e. in the 2020s
  • The pandemic, strict stay-at-home mandates and economic uncertainty probably encouraged people to indulge in some dramatic reality television.

Countries with the highest number of dating reality shows

Row

The United States consumes 500+ dating reality shows

Inferences

  • The choropleth shows the countries with the highest number of dating reality series on the market
  • US tops the charts with a massive 560 reality shows
  • English speaking countries like Canada, UK and Australia also have a plethora of dating reality TV options
  • Some emerging markets include India, Brazil, Italy and Switzerland

High Snackability

Column

Runtimes of dating reality shows

Inferences

  • I was also curious about the run times of reality television, especially since these shows are addictive and binge-worthy!
  • The density plots show that the run times are mostly between 40-60 mins
  • These run times make each episode highly snackable and keep you coming back for more!

Legacy franchises have an incredibly high number of seasons

Row

Shows with the highest number of seasons

Inferences

  • When I investigated the number of seasons in some legacy franchises like The Bachelor and The Bachelorette, the results were staggering
  • The Bachelor has a total of 30 seasons and 300+ episodes
  • This highlights the popularity of legacy franchises and their cultural relevance

Final Thoughts and References

References:
- IMDb.com, Inc. (2024, March 18). IMDb non-commercial datasets [Dataset]. Retrieved October 29, 2025, from https://developer.imdb.com/non-commercial-datasets

---
title: "Dating Reality TV - A Guilty Pleasure"
author: "Storytelling with Open Data by Nikita Walunjkar (4146512)"
output:
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
    theme: cosmo
    source_code: embed
    css: styles.css
---


```{r echo=FALSE, message=FALSE, warning=FALSE}
# Load libraries
library(readr)
library(tidyverse)
library(vroom) 
library(glue)
library(ggplot2)
library(data.table)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(plotly)
library(leaflet)
library(scales)
library(ggrepel)
library(tidytext)
library(forcats)
```

```{r echo=FALSE, message=FALSE, warning=FALSE}
setwd("~/RMIT/Semesters/2/Data Visualization and Communication/A3")

# titles      <- fread("data/title.basics.tsv")
# ratings     <- fread("data/title.ratings.tsv")
# episodes    <- fread("data/title.episode.tsv")
# akas        <- fread("data/title.akas.tsv")
```


```{r}
# # Detect dating reality TV series from the IMDb data
# dating_keywords <- "(?i)\\b(love|date|dating|bachelor|bachelorette|married|match|island|temptation|first\\s+dates|too\\s+hot)\\b"
# 
# dating_reality_series <- titles %>%
#    filter(
#      titleType == "tvSeries",
#      str_detect(genres %||% "", "(^|,)Reality-TV(,|$)"),
#      str_detect(primaryTitle %||% "", dating_keywords) |
#        str_detect(originalTitle %||% "", dating_keywords)
#    ) %>%
#    mutate(
#      startYear      = as.integer(startYear),
#      endYear        = as.integer(endYear),
#      runtimeMinutes = as.integer(runtimeMinutes)
#    )
# 
# # Join the ratings to the series data
# dating_reality_rated <- dating_reality_series %>%
#    inner_join(ratings, by = "tconst") %>%
#    mutate(
#      numVotes = as.integer(numVotes),
#      decade   = (startYear %/% 10) * 10
#    )
# 
# # Join episode data to series
# dating_reality_eps <- episodes %>%
#    semi_join(dating_reality_series, by = c("parentTconst" = "tconst")) %>%
#    mutate(
#      seasonNumber  = as.integer(seasonNumber),
#      episodeNumber = as.integer(episodeNumber)
#    )
# 
# # Filter regional data to only dating reality
# merged <- akas %>%
#   filter(titleId %in% dating_reality_series$tconst)
# 
# # Count the number of unique regions in the regional data
# aka_counts <- merged %>%
#   group_by(titleId) %>%
#   summarise(aka_count = n(),
#             unique_regions = n_distinct(region))
#
# # Prep average rating and popularity data for all TV series genres
# genres <- titles %>%
#   filter(titleType == "tvSeries") %>%
#   filter(genres != "\\N") %>%
#   left_join(ratings, by = "tconst") %>%
#   separate_rows(genres, sep = ",") %>%
#   group_by(tconst) %>%
#   mutate(
#     genre_rank = row_number(),
#     averageRating = if_else(genre_rank %in% c(2, 3), 0, averageRating)
#   ) %>%
#   ungroup()
```


```{r}
# Saving the files and reading them since the code takes a long time to execute

# write.csv(dating_reality_series, "data/dating_reality_series.csv", row.names = FALSE)
# write.csv(dating_reality_rated, "data/dating_reality_rated.csv", row.names = FALSE)
# write.csv(dating_reality_eps, "data/dating_reality_eps.csv", row.names = FALSE)
# write.csv(merged, "data/akas.csv", row.names = FALSE)
# write.csv(aka_counts, "data/aka_counts.csv", row.names = FALSE)
# write.csv(genres, "data/genres.csv", row.names = FALSE)
```

```{r}
# Reading the saved and filtered data files
dating_reality_series <- read.csv("data/dating_reality_series.csv")
dating_reality_rated <- read.csv("data/dating_reality_rated.csv")
dating_reality_eps <- read.csv("data/dating_reality_eps.csv")
akas <- read.csv("data/akas.csv")
aka_counts <- read.csv("data/aka_counts.csv")
genres <- read.csv("data/genres.csv")
```

```{r}
# Filtering some shows manually from all files
filter_list <- c("tt2373856", "tt3455408", "tt27664503", "tt14849626", "tt15336554", "tt7838238")

dating_reality_series <- dating_reality_series %>% 
  filter(!tconst %in% filter_list)

dating_reality_rated <- dating_reality_rated %>% 
  filter(!tconst %in% filter_list)

dating_reality_eps <- dating_reality_eps %>% 
  filter(!parentTconst %in% filter_list)

akas <- akas %>% 
  filter(!titleId %in% filter_list)

aka_counts <- aka_counts %>% 
  filter(!titleId %in% filter_list)
```

# Dating Reality TV - A Guilty Pleasure!

- Binge watching dating reality TV is one of my favorite guilty pleasures
- I've devoured one too many seasons and at a very high consumption rate!
- I'm pretty sure everyone loves occasionally engaging in dramatic content like Love Island, The Bachelor and Too Hot To Handle!
- In this flexdashboard, I analyse the IMDb non-commercial datasets, zooming into the Dating Reality TV genre and try to uncover global trends and reasons why this genre is SO DAMN ADDICTIVE!
- The dataset used here is: IMDb.com, Inc. (2024, March 18). IMDb non-commercial datasets [Dataset]. Retrieved October 29, 2025, from https://developer.imdb.com/non-commercial-datasets


*********************************

How Reality-TV compares to other genres
=====================================

Row
-------------------------------------

### Dating Reality TV versus other IMDb genres

```{r echo=FALSE, message=FALSE, warning=FALSE}
# Calculating the average rating and sum of votes for all genres
genres_viz <- genres %>% 
  group_by(genres) %>%
  summarise(
    avg_rating = mean(averageRating, na.rm = TRUE),
    count_titles = n(),
    sum_votes = sum(numVotes, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_rating))

# Top 10 genres by average rating
top10_genres <- genres_viz %>%
  arrange(desc(avg_rating)) %>%
  slice_head(n = 10)

# Color the Reality-TV column differently
top10_genres <- top10_genres %>%
  mutate(color_fill = ifelse(genres == "Reality-TV", "#ff69b4", "#3b82f6"))

ggplot(top10_genres, aes(x = reorder(genres, avg_rating), y = avg_rating, fill = color_fill)) +
  geom_col(alpha = 0.9) +
  coord_flip() +
  geom_text(aes(label = round(avg_rating, 2)), hjust = -0.2, size = 3.8) +
  scale_fill_identity() +
  labs(
    title = "Top 10 IMDb Genres by Average Rating",
    x = "Genre",
    y = "Average Rating"
  ) +
  theme_minimal(base_size = 12) +
  theme(panel.grid.minor = element_blank()) +
  ylim(0, max(top10_genres$avg_rating) + 1)
```

### Inferences

- The graph shows the average ratings of all TV series grouped by genre rated on IMDb
- Action ranks #1 with an average rating of 6.95
- Reality TV is among the top 10 genres and ranks 7th with an average rating of 3.83


*********************************

The Rise of Dating Reality TV
=====================================

Row
-------------------------------------

### Dating Reality TV over the years

```{r echo=FALSE, message=FALSE, warning=FALSE}
# Data Prep
reality_by_year <- dating_reality_rated %>%
  filter(!is.na(startYear), startYear >= 1990) %>%
  count(startYear) %>%
  arrange(startYear)

# Create cumulative frames
cum_df <- reality_by_year %>%
  tidyr::expand(frame = startYear, startYear) %>%
  filter(startYear <= frame) %>%
  left_join(reality_by_year, by = "startYear")

x_order <- sort(unique(reality_by_year$startYear))

# Build chart
p <- plot_ly(
  data = cum_df,
  x = ~startYear,
  y = ~n,
  frame = ~frame,        
  ids = ~startYear,     
  type = "bar",
  name = "Number of new shows",
  color = I("#ff69b4"),  
  hovertemplate = "Year %{x}<br>New series %{y}<extra></extra>",
  marker = list(line = list(color = "#ffffff", width = 1))
) %>%
  layout(
    title = "The Rise of Dating Reality TV",
    xaxis = list(title = "Start Year", categoryorder = "array", categoryarray = x_order),
    yaxis = list(title = "New Dating Reality Series")
  ) %>%
  animation_opts(
    frame = 200,      
    transition = 150, 
    easing = "linear"
  )

p
```

### Inferences

- Dating Reality TV took off in the 2000s with more and more new shows of this genre taking over the TV series market
- The market saw a peak in 2022 and 2023 with shows like Too Hot To Handle gaining traction
- The COVID-19 pandemic and increased time spent on social media might have influenced people's content choices, reality TV potentially providing a much needed escape from real life


*********************************

Top Shows (Rating and Popularity)
=====================================

Row
-------------------------------------

### Top 10 shows by rating and popularity

```{r echo=FALSE, message=FALSE, warning=FALSE}
# Prep data for ratings
top10_rated <- dating_reality_rated %>%
  filter(!is.na(averageRating)) %>%
  arrange(desc(averageRating), desc(numVotes), primaryTitle) %>%
  slice_head(n = 10) %>%
  transmute(title = primaryTitle, x = averageRating, aux = numVotes) %>%
  arrange(x, title)

# Prep data for popularity
top10_popular <- dating_reality_rated %>%
  filter(!is.na(numVotes)) %>%
  arrange(desc(numVotes), desc(averageRating), primaryTitle) %>%
  slice_head(n = 10) %>%
  transmute(title = primaryTitle, x = numVotes, aux = round(averageRating, 1)) %>%
  arrange(x, title)

# Plot
fig <- plot_ly(
  x = top10_rated$x,
  y = top10_rated$title,
  type = "bar",
  orientation = "h",
  hovertemplate = "<b>%{y}</b><br>Rating: %{x:.1f}<br>Votes: %{customdata}<extra></extra>",
  customdata = top10_rated$aux,
  marker = list(color = "#ff69b4")
)

fig <- fig %>% layout(
  title = list(text = "Top 10 Shows", x = 0.02),
  xaxis = list(title = "IMDb Rating", tickformat = ".1f"),
  yaxis = list(
    title = "",
    categoryorder = "array",
    categoryarray = top10_rated$title,
    automargin = TRUE,                  
    tickfont = list(size = 12)          
  ),
  margin = list(l = 240, r = 120, t = 60, b = 40),  
  bargap = 0.2,
  showlegend = FALSE,
  updatemenus = list(list(
    type = "dropdown",
    direction = "down",
    x = 1.08, y = 0.5,                  
    xanchor = "left", yanchor = "middle",
    bgcolor = "white",
    bordercolor = "lightgray",
    buttons = list(
      list(
        label = "Top 10 by Rating",
        method = "update",
        args = list(
          list(
            x = list(top10_rated$x),
            y = list(top10_rated$title),
            customdata = list(top10_rated$aux),
            hovertemplate = list("<b>%{y}</b><br>Rating: %{x:.1f}<br>Votes: %{customdata}<extra></extra>"),
            marker = list(list(color = "#ff69b4"))
          ),
          list(
            xaxis = list(title = "IMDb Rating", tickformat = ".1f"),
            yaxis = list(
              categoryorder = "array",
              categoryarray = top10_rated$title,
              automargin = TRUE,
              tickfont = list(size = 12)
            )
          )
        )
      ),
      list(
        label = "Top 10 by Popularity",
        method = "update",
        args = list(
          list(
            x = list(top10_popular$x),
            y = list(top10_popular$title),
            customdata = list(top10_popular$aux),
            hovertemplate = list("<b>%{y}</b><br>Votes: %{x:,}<br>Rating: %{customdata:.1f}<extra></extra>"),
            marker = list(list(color = "#ff69b4"))
          ),
          list(
            xaxis = list(title = "Number of Votes", tickformat = ",d"),
            yaxis = list(
              categoryorder = "array",
              categoryarray = top10_popular$title,
              automargin = TRUE,
              tickfont = list(size = 12)
            )
          )
        )
      )
    )
  ))
)

fig
```

### Inferences

- The graph shows the top 10 shows by rating, i.e. the dating reality shows/ TV movies that are rated highly on IMDb
- From the drop down, we can also toggle to the the shows that are most popular i.e. received the most number of votes or ratings on IMDb
- Recipe for Love and Love on the Run have >9 ratings
- Too Hot To Handle and Love is Blind are the most popular dating reality shows as of date with the most number of votes on IMDb

*********************************

Shows with the highest global reach
=====================================

Row
-------------------------------------

### Shows with the highest number of regions aired

```{r echo=FALSE, message=FALSE, warning=FALSE}
# Prep data
top_shows <- aka_counts %>%
  top_n(10, unique_regions) %>%
  left_join(dating_reality_series, by = c("titleId" = "tconst"))

# Plot
plot_ly(
  data = top_shows %>%
    arrange(desc(unique_regions)),
  x = ~unique_regions,
  y = ~reorder(primaryTitle, unique_regions),
  type = "bar",
  orientation = "h",
  hoverinfo = "text",
  hovertext = ~paste0(
    "<b>", primaryTitle, "</b>",
    "<br>Countries Aired: ", unique_regions
  ),
  marker = list(color = '#ff69b4')
) %>%
  layout(
    title = "Top Dating Reality Shows by Countires Aired",
    xaxis = list(title = "Number of Countries Aired"),
    yaxis = list(title = "", categoryorder = "total ascending"),
    hoverlabel = list(bgcolor = "white")
  )
```

### Inferences

- This graph shows the number of countries in which a particular show/ title was aired
- Love is Blind tops the list again, probably due to popularity and was aired in a total of 39 countries
- It is followed closely by Too Hot To Handle: Brazil which was aired in 34 countries
- This shows the far and wide global reach of trending shows like Love is Blind

*********************************

What decade did global reach rise?
=====================================

Row
-------------------------------------

### Global reach every decade

```{r echo=FALSE, message=FALSE, warning=FALSE}
# Merge regional data with start year data
merged_trend <- left_join(aka_counts, dating_reality_series, by = c("titleId" = "tconst"))

df <- merged_trend %>%
  filter(!is.na(startYear), startYear >= 1990, !is.na(unique_regions)) %>%
  mutate(unique_regions = as.numeric(unique_regions))

# Define decades
df_dec <- df %>%
  mutate(era = cut(startYear,
                   breaks = c(1990, 2000, 2010, 2020, 2030),
                   right = FALSE,
                   labels = c("1990s","2000s","2010s","2020s")))

# Plot boxplot
p3 <- ggplot(df_dec, aes(era, unique_regions)) +
  geom_boxplot(outlier.alpha = 0.3, fill = "lightpink", color = "deeppink4") +
  stat_summary(fun = median, geom = "point", size = 2, color = "deeppink4") +
  labs(title = "Countries aired by decade",
       x = NULL, y = "Countries aired") +
  theme_minimal(base_size = 12) +
  theme(panel.grid.minor = element_blank())
p3
```

### Inferences

- Plotting the global reach by decade, we can see that the globalization of dating reality series started recently i.e. in the 2020s
- The pandemic, strict stay-at-home mandates and economic uncertainty probably encouraged people to indulge in some dramatic reality television.

*********************************

Countries with the highest number of dating reality shows
=====================================

Row
-------------------------------------

### The United States consumes 500+ dating reality shows

```{r echo=FALSE, message=FALSE, warning=FALSE}
# Counts the regions in the regional aka file
region_counts <- akas %>%
  filter(region != "\\N") %>%
  count(region, name = "num_shows")

# Get world data including ISO 2 and ISO 3 country codes
world <- ne_countries(scale = "medium", returnclass = "sf") %>%
  select(name_long, iso_a2, iso_a3, geometry)

# Join this data to region counts
world_counts <- world %>%
  left_join(region_counts, by = c("iso_a2" = "region")) %>%
  mutate(num_shows = tidyr::replace_na(num_shows, 0L))

# Drop the geometry column
world_counts_df <- world_counts %>% st_drop_geometry()

# Define pink gradient for colorscale
pink_scale <- list(
  list(0.00, "#fff0f6"),
  list(0.25, "#ffd1e6"),
  list(0.50, "#ff9ec4"),
  list(0.75, "#ff4fa3"),
  list(1.00, "#7a003c")
)

plot_ly(
  data = world_counts_df,
  type = "choropleth",
  locations = ~iso_a3,
  z = ~num_shows,
  text = ~paste0(name_long, "<br><b>Shows:</b> ", scales::comma(num_shows)),
  hoverinfo = "text",
  colorscale = pink_scale,
  marker = list(line = list(color = "rgba(255,255,255,0.7)", width = 0.3)),
  zmin = 0,
  zmax = max(world_counts_df$num_shows, na.rm = TRUE)
) %>%
  colorbar(title = "Number of Shows") %>%
  layout(
    title = list(text = "Global Distribution of Dating Reality Shows"),
    geo = list(
      showframe = FALSE,
      showcoastlines = FALSE,
      projection = list(type = "natural earth"),
      bgcolor = "rgba(0,0,0,0)"
    )
  )
```

### Inferences

- The choropleth shows the countries with the highest number of dating reality series on the market
- US tops the charts with a massive 560 reality shows
- English speaking countries like Canada, UK and Australia also have a plethora of dating reality TV options
- Some emerging markets include India, Brazil, Italy and Switzerland

*********************************

High Snackability
=====================================

Column
-------------------------------------

### Runtimes of dating reality shows

```{r echo=FALSE, message=FALSE, warning=FALSE}
p3 <- dating_reality_rated %>%
  filter(!is.na(runtimeMinutes), runtimeMinutes > 0, runtimeMinutes < 200) %>%
  ggplot(aes(x = runtimeMinutes)) +
  geom_density(fill = "lightpink", alpha = 0.4, color = "deeppink", linewidth = 1) +
  labs(
    title = "Snackability of dating reality shows",
    x = "Runtime (minutes)",
    y = "Density"
  ) + 
  theme_minimal(base_size = 12) +
  theme(panel.grid.minor = element_blank())
p3
```

### Inferences

- I was also curious about the run times of reality television, especially since these shows are addictive and binge-worthy!
- The density plots show that the run times are mostly between 40-60 mins
- These run times make each episode highly snackable and keep you coming back for more!

*********************************

Legacy franchises have an incredibly high number of seasons
=====================================

Row
-------------------------------------

### Shows with the highest number of seasons

```{r echo=FALSE, message=FALSE, warning=FALSE}
# Season and episode count from the episodes dataset
seasons_count <- dating_reality_eps %>%
  filter(!is.na(seasonNumber), seasonNumber > 0) %>%
  distinct(parentTconst, seasonNumber) %>%         
  count(parentTconst, name = "seasons") %>%        
  left_join(
    dating_reality_eps %>%
      filter(!is.na(episodeNumber)) %>%
      count(parentTconst, name = "episodes"),
    by = "parentTconst"
  ) %>%
  left_join(
    dating_reality_series %>% select(tconst, primaryTitle, startYear),
    by = c("parentTconst" = "tconst")
  ) %>%
  mutate(
    primaryTitle = ifelse(is.na(primaryTitle), parentTconst, primaryTitle),
    episodes = ifelse(is.na(episodes), 0L, episodes)
  )

# Top 10 by number of seasons
top10_seasons <- seasons_count %>%
  arrange(desc(seasons), desc(episodes), primaryTitle) %>%
  slice_head(n = 10)

# Plot
p <- ggplot(
  top10_seasons,
  aes(
    x = reorder(primaryTitle, seasons),
    y = seasons,
    text = paste0(
      "Show: ", primaryTitle,
      "<br>Seasons: ", seasons,
      "<br>Episodes: ", episodes,
      ifelse(is.na(startYear), "", paste0("<br>Start year: ", startYear))
    )
  )
) +
  geom_col(fill = "#ff69b4") +
  coord_flip() +
  labs(
    title = "Top 10 by Number of Seasons",
    x = "Show",
    y = "Seasons"
  ) +
  theme_minimal()

ggplotly(p, tooltip = "text")
```

### Inferences

- When I investigated the number of seasons in some legacy franchises like The Bachelor and The Bachelorette, the results were staggering
- The Bachelor has a total of 30 seasons and 300+ episodes
- This highlights the popularity of legacy franchises and their cultural relevance

*********************************

# Final Thoughts and References

- Reality TV is in the top 10 highest rated genres on IMDb
- Dating Reality TV took off post 2021 and is still dominating screens worldwide
- Western audiences can't get enough: Dating reality has a high global reach in western markets and is expanding rapidly across other markets 
- Legacy franchises with repeat seasons and short run times of 40-60 minutes keep people coming back for more!
- Looks like Reality TV isn’t just my guilty pleasure — it’s a global one!

References:<br>
- IMDb.com, Inc. (2024, March 18). IMDb non-commercial datasets [Dataset]. Retrieved October 29, 2025, from https://developer.imdb.com/non-commercial-datasets