This report analyzes the “Time-Wasters on Social Media” dataset, sourced from Kaggle to identify the true drivers of productivity loss in the digital age. By examining a synthetic population of 1,000 users, this study tests common assumptions regarding geography, gender, and platform choice. Our findings reveal that while social media usage is a consistent global phenomenon, the intent behind usage, particularly boredom, may be an important factor associated with higher productivity loss.
Introduction:
Social media has become a major part of our daily life, but it may negatively impact productivity and increase addictive behaviors. This analysis explores how user characteristics, platform choice, and usage patterns relate to productivity loss and addiction.
Methodology:
The analysis was conducted using R and the tidyverse suite of packages.
Data Source: The dataset was obtained from Kaggle, containing user metrics such as location, gender, platform type, and self reported productivity loss.
Cleaning: Data was cleaned for consistency, including recoding categorical typos (e.g., “Barzil” to “Brazil”).
Visual Analysis: A series of six targeted visualizations including: histograms, ranked bar charts, scatterplots, and boxplots, were used to test variables.
1. How bad is the productivity loss for the average person in this study?
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.3 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 1000 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (12): Gender, Location, Profession, Demographics, Platform, Video_Categ...
dbl (16): UserID, Age, Income, Total_Time_Spent, Number_of_Sessions, Video_...
lgl (2): Debt, Owns_Property
time (1): Watch_Time
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(data, aes(x = Productivity_Loss)) +geom_histogram(bins =10, fill ="skyblue", color ="white") +labs(title ="Distribution of Productivity Loss",subtitle ="How social media impact is spread across all users",x ="Productivity Loss Score (0-10)",y ="Number of Users")
The distribution of productivity loss across the dataset reveals a bimodal pattern, indicating that social media does not affect all users in a uniform way. The data clusters into two distinct groups: a “low-impact” group peaking around a score of 2.5, and a larger “high-impact” group peaking between 6.0 and 7.0. It is also characterized by a significant dip at the 5.0 mark. This ‘gap’ splits the dataset into this two distinct populations. This tells us that an ‘average’ score is not representative of most users; they are either winning the battle with their focus or losing it.
2. Where is the problem happening? Does geography influence social media impact?
country_ranking <- data %>%mutate(Location =recode(Location, "Barzil"="Brazil"))%>%group_by(Location) %>%summarize(Avg_Loss =mean(Productivity_Loss, na.rm =TRUE)) %>%arrange(desc(Avg_Loss)) #highest to lowestggplot(country_ranking, aes(x =reorder(Location, Avg_Loss), y = Avg_Loss, fill = Avg_Loss)) +geom_col(width =0.6) +coord_flip() +#sideways so country names are easier to readscale_fill_viridis_c(option ="plasma") +#color variationlabs(title ="Ranked Productivity Loss by Country",x ="Country",y ="Average Productivity Loss (0-10)")
This ranked bar chart identifies the specific leaders in productivity loss, with Pakistan, Japan, and Vietnam reporting the highest average impact (approximately 5.4). Brazil shows the lowest impact in this dataset (near 4.5). While these rankings provide a clear hierarchy, the most significant insight is the narrow variance between the top and bottom of the list, with less than a 1.0-point difference separating all ten countries on a 10-point scale. Although a ranking exists, the differences between countries are minimal, suggesting that productivity loss is a globally consistent phenomenon.
3. The Demographics: Is it a specific gender? Does total time spent on social media differ significantly between genders?
gender_stats <- data %>%group_by(Gender) %>%summarize(Avg_Time =mean(Total_Time_Spent),Avg_Addiction =mean(Addiction_Level) )#Plot for timeggplot(gender_stats, aes(x = Gender, y = Avg_Time, fill = Gender)) +geom_col(width =0.5) +labs(title ="Average Time Spent by Gender", y ="Minutes")
Social media engagement appears neutral in this dataset, with all groups spending nearly the same amount of time on platforms daily. This suggests that gender is not a meaningful predictor of social media usage intensity within this dataset.
4. The App: Is one platform more “evil” than others? Do different social media platforms lead to different levels of productivity loss?
ggplot(data, aes(x = Platform, y = Productivity_Loss)) +geom_boxplot() +labs(title ="Productivity Loss by Platform")
The boxplot shows that productivity loss is relatively similar across all four platforms, with medians around 5 to 6. This suggests that no single platform dominates in causing productivity loss, but small differences exist.
TikTok and YouTube show slightly higher median productivity loss compared to Facebook and Instagram, indicating that users on video-heavy platforms may experience slightly greater productivity loss impact.
Instagram stands out not because it has the highest productivity loss, but because it has the widest spread of values. This means user experiences on Instagram vary more widely; some users report low productivity loss, while others report much higher values.
Checking these values using the summary statistics table:
TikTok and YouTube have higher median and mean productivity loss compared to Facebook and Instagram. Although the differences are not large, the consistent pattern suggests that more video based platforms may contribute to higher productivity loss.
5. The time: Does spending more time make it worse? How does the total time spent on social media relate to productivity loss?
ggplot(data, aes(x = Total_Time_Spent, y = Productivity_Loss)) +geom_jitter(alpha =0.3, color ="blue") +geom_smooth(method ="loess", color ="red") +labs(title ="Relationship Between Time Spent and Productivity Loss",x ="Total Time Spent (Minutes)",y ="Productivity Loss")
`geom_smooth()` using formula = 'y ~ x'
The red trend line is nearly flat, indicating little to no relationship between time spent on social media and productivity loss. This suggests that a user spending only a short amount of time on their phone can experience similar levels of productivity loss as someone who spends several hours.
Rather than showing a steady increase, the data appears widely scattered, indicating high variability and a weak association between the variables. Overall, time spent is a poor predictor of productivity loss, suggesting that the presence of distraction itself may be more important than the total duration of use.
6. The Truth: Does the ‘Watch Reason’ serve as a better predictor of productivity loss than ‘Total Time Spent’?
ggplot(data, aes(x = Watch_Reason, y = Productivity_Loss, fill = Watch_Reason)) +geom_boxplot() +labs(title ="Productivity Loss by Watch Reason",x ="Reason for Watching",y ="Productivity Loss Score")
It’s not the time that kills productivity, it’s the reason.
While total time spent on social media did not show a clear relationship with productivity loss, the reason for usage does appear to matter. Specifically, users who cite Boredom as their primary reason for watching experience a higher median productivity loss (around 6.0) compared to those using platforms for habit or procrastination (5.0). This suggests that the mental state of the user, rather than just the time spent, is a key factor in how social media usage impacts their daily output.
This suggest if someone is using social media because they are bored, they are statistically more likely to see a drop in their productivity compared to someone using it for a specific reason such as a habit or intentional procrastination.
Conclusion:
Debunking the myth that total “Time Spent” is the primary driver of distraction, the analysis reveals that the psychological intent behind usage is what truly matters. Specifically, users who engage with content due to Boredom report the highest levels of productivity loss, whereas those seeking information or specific engagement remain more focused. This leads to the final conclusion that reclaiming productivity is not about setting strict time limits, but about managing the internal triggers and “Watch Reasons” that lead us to scroll in the first place.