This project examines YouTube trending videos across India, the United States, Canada, and the United Kingdom. The goal is to understand how video categories, engagement metrics, and publishing patterns influence trending content.
The data comes from the Kaggle YouTube Trending Videos Dataset and includes: - INvideos.csv - USvideos.csv - CAvideos.csv - GBvideos.csv
Category names are obtained from the YouTube category JSON files.
The dataset contains trending videos collected during 2017–2018 across multiple countries. Variables include video title, category, views, likes, dislikes, comments, publish date, and country.
india <- read_csv("INvideos.csv") %>% mutate(country = "India")
usa <- read_csv("USvideos.csv") %>% mutate(country = "USA")
canada <- read_csv("CAvideos.csv") %>% mutate(country = "Canada")
uk <- read_csv("GBvideos.csv") %>% mutate(country = "United Kingdom")
youtube <- bind_rows(india, usa, canada, uk)
cat_json <- fromJSON("IN_category_id.json")
category_lookup <- tibble(
category_id = as.numeric(cat_json$items$id),
category_name = cat_json$items$snippet$title
)
youtube <- youtube %>%
left_join(category_lookup, by = "category_id") %>%
filter(!is.na(views),
!is.na(likes),
!is.na(dislikes),
!is.na(comment_count))
fig1_data <- youtube %>% count(category_name)
ggplot(fig1_data,
aes(x = reorder(category_name, n), y = n)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(title="Number of Trending Videos by Category",
x="Category", y="Number of Videos") +
theme_minimal()
This bar chart shows the number of trending videos in each YouTube category. Categories with longer bars appear more frequently on the Trending page, indicating greater popularity and audience engagement. The chart highlights which types of content dominate YouTube’s trending ecosystem.
fig2_data <- youtube %>%
group_by(category_name) %>%
summarise(avg_views = mean(views, na.rm = TRUE))
ggplot(fig2_data,
aes(x = reorder(category_name, avg_views), y = avg_views)) +
geom_col(fill="darkgreen") +
coord_flip() +
labs(title="Average Views by Category",
x="Category", y="Average Views") +
theme_minimal()
This bar chart compares the average number of views received by trending videos across different YouTube categories. Categories with higher average views attract larger audiences, indicating greater viewer interest and reach. The chart highlights which content types generate the most visibility and engagement on the platform.
ggplot(youtube, aes(views, likes)) +
geom_point(alpha=.2) +
labs(title="Views vs Likes",
x="Views", y="Likes") +
theme_minimal()
The scatter plot reveals a strong positive relationship between views and likes. Videos with larger audiences tend to receive substantially more likes, indicating that viewer engagement generally increases with popularity. However, the spread of points suggests that some highly viewed videos receive disproportionately higher or lower engagement than others.
country_category <- youtube %>%
group_by(country, category_name) %>%
summarise(videos = n(), .groups = "drop")
ggplot(country_category,
aes(x = category_name,
y = videos,
fill = country)) +
geom_col(position = "dodge") +
labs(
title = "Trending Videos Across Countries and Categories",
x = "Video Category",
y = "Number of Trending Videos",
fill = "Country"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1)
)
This grouped bar chart compares the number of trending videos across different countries and content categories. Differences in bar heights show how trending content varies by country, highlighting regional preferences and viewing patterns. The chart helps identify which countries and categories contribute most to YouTube’s trending content.
ggplot(youtube, aes(views)) +
geom_histogram(bins = 40,
fill = "steelblue",
color = "white") +
labs(title="Distribution of View Counts",
x="Views", y="Frequency") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)
This histogram shows the distribution of view counts among trending YouTube videos. The distribution is heavily right-skewed, indicating that most trending videos receive relatively moderate view counts, while a small number achieve exceptionally high viewership. This pattern suggests that views are unevenly distributed across trending content, with only a few viral videos attracting very large audiences. Although rare, these highly viewed videos contribute substantially to overall platform engagement and popularity.
corr_data <- youtube %>%
select(views, likes, dislikes, comment_count)
corr_matrix <- cor(corr_data)
corrplot(
corr_matrix,
method = "color",
type = "upper",
addCoef.col = "black",
tl.col = "black",
tl.srt = 45
)
title(
main = "Correlation of YouTube Engagement Metrics",
line = 2
)
The heatmap shows that engagement metrics are closely related. The strongest association is between views and likes, suggesting that audience appreciation generally increases as video exposure grows. Comment counts are also positively associated with views and likes, indicating that popular videos encourage more audience participation. The positive correlations across all metrics imply that trending videos typically generate multiple forms of engagement simultaneously, making engagement metrics useful indicators of overall video popularity.
youtube <- youtube %>%
mutate(
publish_day = weekdays(as.Date(publish_time))
)
fig7_data <- youtube %>%
count(publish_day)
ggplot(fig7_data,
aes(x = reorder(publish_day, n),
y = n)) +
geom_col(fill = "orange") +
coord_flip() +
labs(
title = "Trending Videos by Publishing Day",
x = "Day of the Week",
y = "Number of Trending Videos"
) +
theme_minimal()
This bar chart shows the number of trending videos published on each day of the week. The visualization helps identify whether videos released on certain days are more likely to appear on YouTube’s Trending page. Days with higher counts may indicate periods when creators are more active or when audiences are more engaged with new content.
The figure can reveal patterns in publishing behavior and audience activity across different countries. If certain days consistently have a greater number of trending videos, this may suggest that timing plays a role in increasing visibility and engagement. Overall, the chart provides insight into how publication timing may influence a video’s likelihood of becoming trending content.
p <- ggplot(
youtube,
aes(
x = views,
y = likes,
color = country,
text = paste(
"Title:", title,
"<br>Country:", country,
"<br>Category:", category_name,
"<br>Views:", scales::comma(views),
"<br>Likes:", scales::comma(likes),
"<br>Comments:", scales::comma(comment_count)
)
)
) +
geom_point(alpha = 0.5) +
labs(
title = "Interactive Views vs Likes by Country",
x = "Views",
y = "Likes",
color = "Country"
) +
theme_minimal()
ggplotly(p, tooltip = "text")