An advocacy group hired us to investigate how different social media platforms influence users’ emotions while on the app. Incorporating other factors, such as time spent on the app, interaction with other users, and post engagements, will be a focus in determining overall usage. Other possible factors to consider include gender and age. The overall goal is to understand how different social media platforms and usage patterns influence users’ dominant emotional state throughout the day and to predict how users’ emotional well-being is influenced by their social media use.
library(dplyr)
library(rpart)
library(rpart.plot)
library(caret)
library(fastDummies)
library(ggplot2)
data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/train.csv")
val_data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/val.csv")
test_data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/test.csv")
Emirhan Bulut created the data set through meticulous research and preparation. He has over 10+ years in the industry and is the founder of Quantum PIYA. This data was gathered through a “hypothetical survey”. Our approach to analyzing the data will be to determine the factors that are found to be significant in impacting emotional state, how emotional states vary across different media, and how they are used.
link to download data: https://www.kaggle.com/datasets/emirhanai/social-media-usage-and-emotional-well-being
Age (text, chr): Age of the user [Needs to be changed to numerical]
Gender (categorical, chr): Gender of the user
Platform (categorical, chr): Social media platform used
Daily_Usage_Time..minutes. (numerical, int): Daily time spent on the platform in minutes
Posts_Per_Day (numerical, int): Number of posts made per day by user
Likes_Received_Per_Day (numerical, int): Number of likes received per day by user
Comments_Received_Per_Day (numerical, int): Number of comments received per day by user
Messages_Sent_Per_Day (numerical, int): Number of messages sent per day by user
Dominant_Emotion (categorical, chr): User’s dominant emotional state during the day
We dropped 77 rows that had some of the columns flipped or missing data. We also dropped the UserID column, which gave each data point a unique number.
data_2<-data %>%
filter(Age<=90) %>%
na.omit(data_2) %>%
select(-User_ID)
data_2 %>% head()
Based on your social media usage, what is the dominant emotional state during the day?
_Method: Decision tree model, predictor based on usage_Based on the decision tree model, what factor influences that emotional state the most?
_Method: Bar Graph, take the points from the decision tree model and quantify the data set visually_ What is the dominant emotional state during the day for each social media app?
_Method: Bar Graph, quantify the emotional state for each social media app (doge)_Is there a difference in which dominant emotional state a user feels based on their interaction with other users and the engagement they receive?
_Method: Bar Graph, manipulate the data to see the dominant emotions for high vs low engagement (rescuing and interacting)_To figure out which part of social media use had the biggest impact on a user’s dominant emotion, we looked at the variable importance scores from a larger decision tree and graphed them. These scores tell us which predictors the model relied on the most when deciding how to classify people’s emotions.
model_data <-data_2 %>%
select(Daily_Usage_Time..minutes., Platform, Dominant_Emotion, Posts_Per_Day, Messages_Sent_Per_Day, Comments_Received_Per_Day, Likes_Received_Per_Day )
set.seed(314159)
data_split <- createDataPartition(model_data$Dominant_Emotion, p = 0.75, list = FALSE)
train_data <- model_data[data_split, ]
test_data <- model_data[-data_split, ]
tree_model <- rpart(Dominant_Emotion ~ .,
data = train_data,
method = "class")
rpart.plot(
tree_model,
type = 2,
extra = 106,
box.palette = "RdBu",
shadow.col = "gray",
nn = TRUE
)
#Factor Importance
imp <- tree_model$variable.importance
imp_df <- data.frame(
predictor = names(imp),
importance = as.numeric(imp)
) %>%
arrange(desc(importance))
Importance_Palette<-c("#29374d", "#314a70", "#305ea6", "#4fb3ff", "#7cc5fc", "#6ff7f7")
imp_df
ggplot(imp_df, aes(x = reorder(predictor, importance), y = importance)) +
geom_bar(stat = "identity", fill=Importance_Palette) +
scale_x_discrete(labels=c("Platform"="Platform","Posts_Per_Day"="Posts per Day","Comments_Received_Per_Day"="Comments Received per Day","Messages_Sent_Per_Day"="Messages Sent per Day","Daily_Usage_Time..minutes."="Daily Usage","Likes_Received_Per_Day"="Likes Received"))+
labs(title = "Top Predictors of Dominant Emotion",x = "Predictor",y = "Importance Score") +
scale_fill_manual(values=Importance_Palette) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),legend.position="none")
From the bar graph, “Likes Received per Day” stands out as the strongest predictor. This means that how many likes a user gets plays a bigger role in shaping their emotional state than anything else measured. In other words, the amount of positive feedback people receive seems to matter more than simply how long they use the app or which platform they’re on. Likes can act as a form of social validation, so higher or lower engagement could influence how someone ends up feeling.
Right behind likes were “Daily Usage Time”, “Messages Sent per Day”, and “Comments Received per Day.” These are all interaction related measures, suggesting that emotions are more influenced by how long users spend on a platform, how much users engage with others, and how much people engage back with them. On the other hand, “Platform” and “Posts per Day” were the least important predictors. That means the specific app someone uses or how often they post didn’t matter nearly as much as the amount of feedback and interaction they received.
Overall, the model shows a clear pattern; Emotions are shaped more by engagement than by simple activity. It isn’t the app you use, rather it’s the feedback and attention you get while using it. This aligns with the idea that social interaction and perceived social approval play a major role in how we feel after being online.
graph4 <- data_2 %>%
select(Posts_Per_Day, Likes_Received_Per_Day, Comments_Received_Per_Day, Messages_Sent_Per_Day, Dominant_Emotion)
graph4 <- graph4 %>%
mutate(interaction = Posts_Per_Day + Messages_Sent_Per_Day)
graph4 <- graph4 %>%
mutate(engagement = Likes_Received_Per_Day + Comments_Received_Per_Day)
graph4 <- graph4 %>%
select(interaction, engagement, Dominant_Emotion)
ggplot(graph4,aes(x=engagement,y=interaction, fill=Dominant_Emotion))+
geom_point(shape=21,size=3,color="black",stroke=.5)+
xlim(0,160)+
ylim(0,60)+
labs(x="Engagement (Likes + Comments Received / Day) ",y="Interaction (Posts + Messsages Sent / Day)", title="Dominant Emotion Felt Based on Usage") +
scale_fill_manual(name="Dominant Emotion",values=c("red","purple","grey","yellow","lightblue","darkblue"))
The graph shows a positive correlation between emotion, engagement, and interaction. The more you receive engagement from other users via likes and comments, and the more you interact by posting and messaging other users, you generally feel happier. There are two high data points that some may feel anxious about at the maximum end of engagement and interaction. On the other hand, if a user has low interaction and engagement, they are more likely to feel bored. If you are bored, spending time on social media might be a good way to make you feel better, as it provides a way to interact with others and alleviate boredom.
Anger, anxiety, sadness, and neutral had a larger spread; however, most do not overlap with the happiness data points. This may indicate that the users and media people they interact with may contribute to the emotions they feel, and that this is not a one-to-one reflection of the quantity of their interaction and engagement.
Based on our analysis, we found that users are more likely to feel happiness when they spend more than 135 minutes on social media and receive 71 likes. This trend of higher engagement and interaction leading to happiness is visualized in the scatter plot. Also, it demonstrates that a user was likely to experience an emotion other than happiness if they had less engagement and interaction on the platform. By increasing engagement and interaction, it was also found that a user is less likely to become bored. Likes received per day, daily usage time, messages sent per day, and comments received per day were found to be the top predictors of which emotion was dominantly felt. However, other emotions beyond happiness and boredom were found to be less influenced by social media usage and more by the platform used. For example, Twitter users were predominantly angry or sad, while Facebook users were neutral or anxious.
Overall, our data analysis provided a comprehensive understanding of how usage and the type of platform impact the user’s dominant emotional state throughout the day. However, 77 data points were removed because of formatting issues. Their inclusion may have increased variability in the emotional states felt across platforms and enabled more accurate predictive models. Additionally, for future analysis, it may be worthwhile to investigate the media and interactions on each platform rather than inferring them based on the emotion. This would help determine the direction of causality between media and emotion.