Introduction

This project examines YouTube trending videos across India, the United States, Canada, and the United Kingdom. The goal is to understand how video categories, engagement metrics, and publishing patterns influence trending content.

Data Sources

The data comes from the Kaggle YouTube Trending Videos Dataset and includes: - INvideos.csv - USvideos.csv - CAvideos.csv - GBvideos.csv

Category names are obtained from the YouTube category JSON files.

Description of the Data

The dataset contains trending videos collected during 2017–2018 across multiple countries. Variables include video title, category, views, likes, dislikes, comments, publish date, and country.

Data Preparation

india <- read_csv("INvideos.csv") %>% mutate(country = "India")
usa <- read_csv("USvideos.csv") %>% mutate(country = "USA")
canada <- read_csv("CAvideos.csv") %>% mutate(country = "Canada")
uk <- read_csv("GBvideos.csv") %>% mutate(country = "United Kingdom")

youtube <- bind_rows(india, usa, canada, uk)

cat_json <- fromJSON("IN_category_id.json")

category_lookup <- tibble(
  category_id = as.numeric(cat_json$items$id),
  category_name = cat_json$items$snippet$title
)

youtube <- youtube %>%
  left_join(category_lookup, by = "category_id") %>%
  filter(!is.na(views),
         !is.na(likes),
         !is.na(dislikes),
         !is.na(comment_count))

Figure 2: Average Views by Category

fig2_data <- youtube %>%
  group_by(category_name) %>%
  summarise(avg_views = mean(views, na.rm = TRUE))

ggplot(fig2_data,
       aes(x = reorder(category_name, avg_views), y = avg_views)) +
  geom_col(fill="darkgreen") +
  coord_flip() +
  labs(title="Average Views by Category",
       x="Category", y="Average Views") +
  theme_minimal()

Interpretation

This bar chart compares the average number of views received by trending videos across different YouTube categories. Categories with higher average views attract larger audiences, indicating greater viewer interest and reach. The chart highlights which content types generate the most visibility and engagement on the platform.

Figure 3: Views vs Likes Relationship

ggplot(youtube, aes(views, likes)) +
  geom_point(alpha=.2) +
  labs(title="Views vs Likes",
       x="Views", y="Likes") +
  theme_minimal()

Interpretation

The scatter plot reveals a strong positive relationship between views and likes. Videos with larger audiences tend to receive substantially more likes, indicating that viewer engagement generally increases with popularity. However, the spread of points suggests that some highly viewed videos receive disproportionately higher or lower engagement than others.

Figure 5: Distribution of View Counts

ggplot(youtube, aes(views)) +
  geom_histogram(bins = 40,
               fill = "steelblue",
               color = "white") +
  labs(title="Distribution of View Counts",
       x="Views", y="Frequency") +
theme_minimal() +
theme(
  plot.title = element_text(hjust = 0.5, face = "bold"),
  axis.text.x = element_text(angle = 45, hjust = 1)
)

Interpretation

This histogram shows the distribution of view counts among trending YouTube videos. The distribution is heavily right-skewed, indicating that most trending videos receive relatively moderate view counts, while a small number achieve exceptionally high viewership. This pattern suggests that views are unevenly distributed across trending content, with only a few viral videos attracting very large audiences. Although rare, these highly viewed videos contribute substantially to overall platform engagement and popularity.

Figure 6: Engagement Metrics Correlation

corr_data <- youtube %>%
  select(views, likes, dislikes, comment_count)

corr_matrix <- cor(corr_data)

corrplot(
  corr_matrix,
  method = "color",
  type = "upper",
  addCoef.col = "black",
  tl.col = "black",
  tl.srt = 45
)

title(
  main = "Correlation of YouTube Engagement Metrics",
  line = 2
)

Interpretation

The heatmap shows that engagement metrics are closely related. The strongest association is between views and likes, suggesting that audience appreciation generally increases as video exposure grows. Comment counts are also positively associated with views and likes, indicating that popular videos encourage more audience participation. The positive correlations across all metrics imply that trending videos typically generate multiple forms of engagement simultaneously, making engagement metrics useful indicators of overall video popularity.

Figure 7: Publishing Day Analysis

youtube <- youtube %>%
  mutate(
    publish_day = weekdays(as.Date(publish_time))
  )

fig7_data <- youtube %>%
  count(publish_day)

ggplot(fig7_data,
       aes(x = reorder(publish_day, n),
           y = n)) +
  geom_col(fill = "orange") +
  coord_flip() +
  labs(
    title = "Trending Videos by Publishing Day",
    x = "Day of the Week",
    y = "Number of Trending Videos"
  ) +
  theme_minimal()

Interpretation

This bar chart shows the number of trending videos published on each day of the week. The visualization helps identify whether videos released on certain days are more likely to appear on YouTube’s Trending page. Days with higher counts may indicate periods when creators are more active or when audiences are more engaged with new content.

The figure can reveal patterns in publishing behavior and audience activity across different countries. If certain days consistently have a greater number of trending videos, this may suggest that timing plays a role in increasing visibility and engagement. Overall, the chart provides insight into how publication timing may influence a video’s likelihood of becoming trending content.

Figure 8: Interactive Dashboard Summary

p <- ggplot(
  youtube,
  aes(
    x = views,
    y = likes,
    color = country,
    text = paste(
      "Title:", title,
      "<br>Country:", country,
      "<br>Category:", category_name,
      "<br>Views:", scales::comma(views),
      "<br>Likes:", scales::comma(likes),
      "<br>Comments:", scales::comma(comment_count)
    )
  )
) +
  geom_point(alpha = 0.5) +
  labs(
    title = "Interactive Views vs Likes by Country",
    x = "Views",
    y = "Likes",
    color = "Country"
  ) +
  theme_minimal()

ggplotly(p, tooltip = "text")

Interpretation

This interactive visualization summarizes the relationship between views and likes for trending videos across India, the United States, Canada, and the United Kingdom. Each point represents a trending video, while colors distinguish videos from different countries.

Unlike the static scatter plots presented earlier, this interactive figure allows users to explore individual videos by hovering over data points. The tooltip displays detailed information including the video title, country, category, views, likes, and comment count. This enables a deeper examination of engagement patterns and helps identify highly successful videos or potential outliers.

Overall, the visualization serves as a dashboard-style summary of audience engagement and provides an intuitive way to compare trending content across countries. The positive relationship between views and likes observed throughout the dataset is also evident in this figure, reinforcing the connection between audience reach and user engagement.

Conclusion

This project explored the characteristics of YouTube trending videos across India, the United States, Canada, and the United Kingdom. Through a series of visualizations, the analysis examined how video categories, audience engagement metrics, publishing patterns, and geographic location relate to the popularity of trending content.

The results showed that certain categories consistently dominate YouTube’s Trending page, while others attract larger average audiences despite appearing less frequently. Strong relationships were observed among views, likes, and comments, indicating that videos with greater reach generally receive higher levels of audience engagement. The analysis also revealed substantial variation in view counts across categories, suggesting that content type plays an important role in determining popularity.

Cross-country comparisons demonstrated that trending patterns differ among countries, reflecting variations in audience preferences and viewing behavior. The findings further suggest that publication timing may influence a video’s likelihood of appearing on the Trending page. The interactive visualization provided additional insight into engagement patterns and video performance across different regions.

Overall, the findings indicate that YouTube video popularity is influenced by a combination of content category, audience engagement, publication timing, and geographic factors. By analyzing trending videos across multiple countries, this project provides valuable insights into the characteristics associated with successful content on one of the world’s largest video-sharing platforms. Future research could expand this analysis by incorporating additional countries, examining video titles and tags, or investigating how engagement patterns evolve over time.