Introduction

An advocacy group hired us to investigate how different social media platforms influence users’ emotions while on the app. Incorporating other factors, such as time spent on the app, interaction with other users, and post engagements, will be a focus in determining overall usage. Other possible factors to consider include gender and age. The overall goal is to understand how different social media platforms and usage patterns influence users’ dominant emotional state throughout the day and to predict how users’ emotional well-being is influenced by their social media use.

library(dplyr)
library(rpart)
library(rpart.plot)
library(caret)
library(fastDummies)
library(ggplot2)
data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/train.csv")
val_data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/val.csv")
test_data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/test.csv")

Origin of the dataset

Emirhan Bulut created the data set through meticulous research and preparation. He has over 10+ years in the industry and is the founder of Quantum PIYA. This data was gathered through a “hypothetical survey”. Our approach to analyzing the data will be to determine the factors that are found to be significant in impacting emotional state, how emotional states vary across different media, and how they are used.

link to download data: https://www.kaggle.com/datasets/emirhanai/social-media-usage-and-emotional-well-being

Dataset

Age (text, chr): Age of the user [Needs to be changed to numerical]

Gender (categorical, chr): Gender of the user

Platform (categorical, chr): Social media platform used

Daily_Usage_Time..minutes. (numerical, int): Daily time spent on the platform in minutes

Posts_Per_Day (numerical, int): Number of posts made per day by user

Likes_Received_Per_Day (numerical, int): Number of likes received per day by user

Comments_Received_Per_Day (numerical, int): Number of comments received per day by user

Messages_Sent_Per_Day (numerical, int): Number of messages sent per day by user

Dominant_Emotion (categorical, chr): User’s dominant emotional state during the day

Cleaning the Data

We dropped 77 rows that had some of the columns flipped or missing data. We also dropped the UserID column, which gave each data point a unique number.

data_2<-data %>% 
  filter(Age<=90) %>% 
  na.omit(data_2) %>% 
  select(-User_ID)

data_2  %>% head()

Questions To Investigate

  1. Based on your social media usage, what is the dominant emotional state during the day?

     _Method: Decision tree model, predictor based on usage_
  2. Based on the decision tree model, what factor influences that emotional state the most?

     _Method: Bar Graph, take the points from the decision tree model and quantify the data set visually_ 
  3. What is the dominant emotional state during the day for each social media app?

     _Method: Bar Graph, quantify the emotional state for each social media app (doge)_
  4. Is there a difference in which dominant emotional state a user feels based on their interaction with other users and the engagement they receive?

     _Method: Bar Graph, manipulate the data to see the dominant emotions for high vs low engagement (rescuing and interacting)_

Analysis

Question 1: Based on your social media usage, what is the dominant emotional state during the day?

We can examine the results of the decision tree that we made to better understand how a user’s emotions are dictated by the specifics of their social media usage. All measures mentioned are based on usage per day.

data_3<-data_2 %>% 
  select(-Age,-Gender)
data_3_numeric<-data_3 %>% 
  mutate(Dominant_Emotion_Anger=as.numeric(Dominant_Emotion=="Anger"),Dominant_Emotion_Happiness=as.numeric(Dominant_Emotion=="Happiness"),Dominant_Emotion_Neutral=as.numeric(Dominant_Emotion=="Neutral"),Dominant_Emotion_Sadness=as.numeric(Dominant_Emotion=="Sadness"),Dominant_Emotion_Boredom=as.numeric(Dominant_Emotion=="Boredom"),Dominant_Emotion_Anxiety=as.numeric(Dominant_Emotion=="Anxiety")) %>% 
  select(-Dominant_Emotion) %>% 
  mutate(Platform_Facebook=as.numeric(Platform=="Facebook"),Platform_Instagram=as.numeric(Platform=="Instagram"),Platform_LinkedIn=as.numeric(Platform=="LinkedIn"),Platform_Snapchat=as.numeric(Platform=="Snapchat"),Platform_Telegram=as.numeric(Platform=="Telegram"),Platform_Twitter=as.numeric(Platform=="Twitter"),Platform_Whatsapp=as.numeric(Platform=="Whatsapp")) %>% 
  select(-Platform)
happiness_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Anxiety,-Dominant_Emotion_Boredom,-Dominant_Emotion_Neutral,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anger)

set.seed(314159)

h_data_split <- createDataPartition(happiness_model_data$Dominant_Emotion_Happiness, p = 0.75, list = FALSE)

h_train_data <- happiness_model_data[h_data_split, ]

h_test_data <- happiness_model_data[-h_data_split, ]


h_tree_model <- rpart(Dominant_Emotion_Happiness ~ .,
                    data = h_train_data,
                    method = "class")

rpart.plot(
  h_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)

Happiness: The best chance of a user feeling happy (14%) as a result of their time spent online is to spend more than 135 minutes on social media and receive at least 71 likes. A user is likely (67%) to experience an emotion other than happiness if they use social media for less than 135 minutes and receive fewer than 20 comments.

neutral_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Anxiety,-Dominant_Emotion_Boredom,-Dominant_Emotion_Happiness,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anger)

set.seed(314159)

n_data_split <- createDataPartition(neutral_model_data$Dominant_Emotion_Neutral, p = 0.75, list = FALSE)

n_train_data <- neutral_model_data[n_data_split, ]

n_test_data <- neutral_model_data[-n_data_split, ]


n_tree_model <- rpart(Dominant_Emotion_Neutral ~ .,
                    data = n_train_data,
                    method = "class")

rpart.plot(
  n_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)

Neutral: Receiving 18 comments is a dividing factor in whether a user feels neutral as a result of their usage. Getting less than 18 comments while spending less than 43 minutes on social media provides the best chance at a user feeling neutral (7%). Having at least 18 comments received and sending fewer than 31 messages gives the best chance (30%) of a user feeling an emotion other than neutral.

anxiety_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Happiness,-Dominant_Emotion_Boredom,-Dominant_Emotion_Neutral,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anger)

set.seed(314159)

x_data_split <- createDataPartition(anxiety_model_data$Dominant_Emotion_Anxiety, p = 0.75, list = FALSE)

x_train_data <- anxiety_model_data[x_data_split, ]

x_test_data <- anxiety_model_data[-x_data_split, ]


x_tree_model <- rpart(Dominant_Emotion_Anxiety ~ .,
                    data = x_train_data,
                    method = "class")

rpart.plot(
  x_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)

Anxiety: As for anxiety, Facebook may play a significant role. Users not on Facebook who receive less than 20 likes are reportedly not anxious as a result of their usage (28%). The best chance of a user feeling anxious (5%) is produced when they are on Facebook, receive at least 11 comments, and create fewer than 3 posts.

boredom_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Anxiety,-Dominant_Emotion_Neutral,-Dominant_Emotion_Happiness,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anger)

set.seed(314159)

b_data_split <- createDataPartition(boredom_model_data$Dominant_Emotion_Boredom, p = 0.75, list = FALSE)

b_train_data <- boredom_model_data[b_data_split, ]

b_test_data <- boredom_model_data[-b_data_split, ]


b_tree_model <- rpart(Dominant_Emotion_Boredom ~ .,
                    data = b_train_data,
                    method = "class")

rpart.plot(
  b_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)

Boredom: Users who receive at least 23 likes and send at least 19 messages are most likely (57%) to not experience boredom. On the other hand, those who get less than 18 likes and spend more than 58 minutes on a platform other than Facebook have the highest chance (7%) of being bored.

sadness_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Anxiety,-Dominant_Emotion_Boredom,-Dominant_Emotion_Happiness,-Dominant_Emotion_Neutral,-Dominant_Emotion_Anger)

set.seed(314159)

s_data_split <- createDataPartition(sadness_model_data$Dominant_Emotion_Sadness, p = 0.75, list = FALSE)

s_train_data <- sadness_model_data[s_data_split, ]

s_test_data <- sadness_model_data[-s_data_split, ]


s_tree_model <- rpart(Dominant_Emotion_Sadness ~ .,
                    data = s_train_data,
                    method = "class")

rpart.plot(
  s_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)

Sadness: As for sadness, there was not any significant chain that lends itself to a user being sad (<5%), however, those who receive at least 19 likes, are not on Snapchat, and receive fewer than 10 comments have the best chance (24%) of experiencing an emotion other than sadness.

anger_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Happiness,-Dominant_Emotion_Boredom,-Dominant_Emotion_Neutral,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anxiety)

set.seed(314159)

g_data_split <- createDataPartition(anger_model_data$Dominant_Emotion_Anger, p = 0.75, list = FALSE)

g_train_data <- anger_model_data[g_data_split, ]

g_test_data <- anger_model_data[-g_data_split, ]


g_tree_model <- rpart(Dominant_Emotion_Anger ~ .,
                    data = g_train_data,
                    method = "class")

rpart.plot(
  g_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)

Anger: Unlike the other emotions, the factors that lead to a user not being angry due to their media usage are most dependent on the platforms being used. From this model, users who are not on Twitter, WhatsApp, or Telegram are likely (65%) not to be angry. Those who do report anger have the best chance (7%) of following a specific path. This path is as follows: being on Twitter, receiving at least 17 comments, and sending between 19 and 24 messages.

Question 2: Based on the decision tree model, what factor influences that emotional state the most?

To figure out which part of social media use had the biggest impact on a user’s dominant emotion, we looked at the variable importance scores from a larger decision tree and graphed them. These scores tell us which predictors the model relied on the most when deciding how to classify people’s emotions.


model_data <-data_2 %>% 
  select(Daily_Usage_Time..minutes., Platform, Dominant_Emotion, Posts_Per_Day, Messages_Sent_Per_Day, Comments_Received_Per_Day, Likes_Received_Per_Day )

set.seed(314159)

data_split <- createDataPartition(model_data$Dominant_Emotion, p = 0.75, list = FALSE)

train_data <- model_data[data_split, ]

test_data <- model_data[-data_split, ]


tree_model <- rpart(Dominant_Emotion ~ ., 
                    data = train_data,
                    method = "class")

rpart.plot(
  tree_model,
  type = 2,
  extra = 106,
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)

#Factor Importance

imp <- tree_model$variable.importance
imp_df <- data.frame(
predictor = names(imp),
importance = as.numeric(imp)
) %>%
arrange(desc(importance))
Importance_Palette<-c("#29374d", "#314a70", "#305ea6", "#4fb3ff", "#7cc5fc", "#6ff7f7")

imp_df
ggplot(imp_df, aes(x = reorder(predictor, importance), y = importance)) +
geom_bar(stat = "identity", fill=Importance_Palette) +
  scale_x_discrete(labels=c("Platform"="Platform","Posts_Per_Day"="Posts per Day","Comments_Received_Per_Day"="Comments Received per Day","Messages_Sent_Per_Day"="Messages Sent per Day","Daily_Usage_Time..minutes."="Daily Usage","Likes_Received_Per_Day"="Likes Received"))+
labs(title = "Top Predictors of Dominant Emotion",x = "Predictor",y = "Importance Score") +
scale_fill_manual(values=Importance_Palette) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),legend.position="none")

From the bar graph, “Likes Received per Day” stands out as the strongest predictor. This means that how many likes a user gets plays a bigger role in shaping their emotional state than anything else measured. In other words, the amount of positive feedback people receive seems to matter more than simply how long they use the app or which platform they’re on. Likes can act as a form of social validation, so higher or lower engagement could influence how someone ends up feeling.

Right behind likes were “Daily Usage Time”, “Messages Sent per Day”, and “Comments Received per Day.” These are all interaction related measures, suggesting that emotions are more influenced by how long users spend on a platform, how much users engage with others, and how much people engage back with them. On the other hand, “Platform” and “Posts per Day” were the least important predictors. That means the specific app someone uses or how often they post didn’t matter nearly as much as the amount of feedback and interaction they received.

Overall, the model shows a clear pattern; Emotions are shaped more by engagement than by simple activity. It isn’t the app you use, rather it’s the feedback and attention you get while using it. This aligns with the idea that social interaction and perceived social approval play a major role in how we feel after being online.

Question 3: What is the dominant emotional state during the day for each social media app?

graph3 <- data_2 %>% 
  select(Platform, Dominant_Emotion)

ggplot(graph3,aes(x=Platform, fill=Dominant_Emotion)) +
  geom_bar(position = "dodge") +
  labs(x="Platform",y="Count of Felt Dominant Emotion", title="Dominant Emotion Felt on each Social Media Platform") +
  scale_fill_manual(name="Dominant Emotion",values=c("red","purple","grey","yellow","lightblue","darkblue"))

The dominant emotional state across the platforms greatly differed. From this data set, users on Instagram predominantly reported feeling happy, far more than any other emotion, and boredom was not reported. Users on LinkedIn were predominantly bored, and no one was happy. Users on Twitter felt predominantly angry; however, sadness was also up there. Facebook, Snapchat, Telegram, and WhatsApp had more variability in which emotions were predominantly felt. However, no happiness or anger was present in this Facebook data set, and neutrality + anxiety was the most common. No anger was found in Snapchat, but sadness was the dominant emotion. Sadness was also the top emotion on Telegram, and no happiness counts were present. As for WhatsApp, anger was the dominant emotion, with no counts of sadness or boredom.

This demonstrates how the media is consumed and how the structure of each unique platform influences which dominant emotion is felt. This data can be used to spread awareness of which platforms to limit use on because they may be generating negative emotions, and which can potentially be used to promote positive emotions.

Question 4: Is there a difference in which dominant emotional state a user feels based on their interaction with other users and the engagement they receive?

graph4 <- data_2 %>% 
  select(Posts_Per_Day, Likes_Received_Per_Day, Comments_Received_Per_Day, Messages_Sent_Per_Day,  Dominant_Emotion) 
graph4 <- graph4 %>% 
 mutate(interaction = Posts_Per_Day + Messages_Sent_Per_Day)
  
graph4 <- graph4 %>% 
  mutate(engagement = Likes_Received_Per_Day + Comments_Received_Per_Day)
graph4 <- graph4 %>% 
  select(interaction, engagement, Dominant_Emotion)

ggplot(graph4,aes(x=engagement,y=interaction, fill=Dominant_Emotion))+
  geom_point(shape=21,size=3,color="black",stroke=.5)+
  xlim(0,160)+
  ylim(0,60)+
  labs(x="Engagement (Likes + Comments Received / Day) ",y="Interaction (Posts + Messsages Sent / Day)", title="Dominant Emotion Felt Based on Usage")  +
  scale_fill_manual(name="Dominant Emotion",values=c("red","purple","grey","yellow","lightblue","darkblue"))

The graph shows a positive correlation between emotion, engagement, and interaction. The more you receive engagement from other users via likes and comments, and the more you interact by posting and messaging other users, you generally feel happier. There are two high data points that some may feel anxious about at the maximum end of engagement and interaction. On the other hand, if a user has low interaction and engagement, they are more likely to feel bored. If you are bored, spending time on social media might be a good way to make you feel better, as it provides a way to interact with others and alleviate boredom.

Anger, anxiety, sadness, and neutral had a larger spread; however, most do not overlap with the happiness data points. This may indicate that the users and media people they interact with may contribute to the emotions they feel, and that this is not a one-to-one reflection of the quantity of their interaction and engagement.

Conclusions

Based on our analysis, we found that users are more likely to feel happiness when they spend more than 135 minutes on social media and receive 71 likes. This trend of higher engagement and interaction leading to happiness is visualized in the scatter plot. Also, it demonstrates that a user was likely to experience an emotion other than happiness if they had less engagement and interaction on the platform. By increasing engagement and interaction, it was also found that a user is less likely to become bored. Likes received per day, daily usage time, messages sent per day, and comments received per day were found to be the top predictors of which emotion was dominantly felt. However, other emotions beyond happiness and boredom were found to be less influenced by social media usage and more by the platform used. For example, Twitter users were predominantly angry or sad, while Facebook users were neutral or anxious.

Overall, our data analysis provided a comprehensive understanding of how usage and the type of platform impact the user’s dominant emotional state throughout the day. However, 77 data points were removed because of formatting issues. Their inclusion may have increased variability in the emotional states felt across platforms and enabled more accurate predictive models. Additionally, for future analysis, it may be worthwhile to investigate the media and interactions on each platform rather than inferring them based on the emotion. This would help determine the direction of causality between media and emotion.

---
title: "R Notebook"
output: html_notebook
---

# Introduction

An advocacy group hired us to investigate how different social media platforms influence users' emotions while on the app. Incorporating other factors, such as time spent on the app, interaction with other users, and post engagements, will be a focus in determining overall usage. Other possible factors to consider include gender and age. The overall goal is to understand how different social media platforms and usage patterns influence users' dominant emotional state throughout the day and to predict how users' emotional well-being is influenced by their social media use.

```{r}
library(dplyr)
library(rpart)
library(rpart.plot)
library(caret)
library(fastDummies)
library(ggplot2)
data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/train.csv")
val_data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/val.csv")
test_data<-read.csv("/Users/vanessaveretelnikov/Desktop/DiDa_325/test.csv")
```

### Origin of the dataset

Emirhan Bulut created the data set through meticulous research and preparation. He has over 10+ years in the industry and is the founder of Quantum PIYA. This data was gathered through a "hypothetical survey". Our approach to analyzing the data will be to determine the factors that are found to be significant in impacting emotional state, how emotional states vary across different media, and how they are used.

link to download data: https://www.kaggle.com/datasets/emirhanai/social-media-usage-and-emotional-well-being


### Dataset

__Age__ _(text, chr)_: Age of the user [Needs to be changed to numerical]

__Gender__ _(categorical, chr)_: Gender of the user

__Platform__ _(categorical, chr)_: Social media platform used

__Daily_Usage_Time..minutes.__ _(numerical, int)_: Daily time spent on the platform in minutes

__Posts_Per_Day__ _(numerical, int)_: Number of posts made per day by user

__Likes_Received_Per_Day__ _(numerical, int)_: Number of likes received per day by user

__Comments_Received_Per_Day__ _(numerical, int)_: Number of comments received per day by user

__Messages_Sent_Per_Day__ _(numerical, int)_: Number of messages sent per day by user

__Dominant_Emotion__ _(categorical, chr)_: User's dominant emotional state during the day

  
### Cleaning the Data

We dropped 77 rows that had some of the columns flipped or missing data. We also dropped the UserID column, which gave each data point a unique number.

```{r}
data_2<-data %>% 
  filter(Age<=90) %>% 
  na.omit(data_2) %>% 
  select(-User_ID)

data_2  %>% head()
```

# Questions To Investigate

1. Based on your social media usage, what is the dominant emotional state during the day? 

        _Method: Decision tree model, predictor based on usage_

2. Based on the decision tree model, what factor influences that emotional state the most?

        _Method: Bar Graph, take the points from the decision tree model and quantify the data set visually_ 

3. What is the dominant emotional state during the day for each social media app?

        _Method: Bar Graph, quantify the emotional state for each social media app (doge)_

4. Is there a difference in which dominant emotional state a user feels based on their interaction with other users and the engagement they receive?

        _Method: Bar Graph, manipulate the data to see the dominant emotions for high vs low engagement (rescuing and interacting)_

 
 
 
 
## Analysis

### Question 1: Based on your social media usage, what is the dominant emotional state during the day? 

We can examine the results of the decision tree that we made to better understand how a user's emotions are dictated by the specifics of their social media usage. All measures mentioned are based on usage per day.


```{r}
data_3<-data_2 %>% 
  select(-Age,-Gender)
```

```{r}
data_3_numeric<-data_3 %>% 
  mutate(Dominant_Emotion_Anger=as.numeric(Dominant_Emotion=="Anger"),Dominant_Emotion_Happiness=as.numeric(Dominant_Emotion=="Happiness"),Dominant_Emotion_Neutral=as.numeric(Dominant_Emotion=="Neutral"),Dominant_Emotion_Sadness=as.numeric(Dominant_Emotion=="Sadness"),Dominant_Emotion_Boredom=as.numeric(Dominant_Emotion=="Boredom"),Dominant_Emotion_Anxiety=as.numeric(Dominant_Emotion=="Anxiety")) %>% 
  select(-Dominant_Emotion) %>% 
  mutate(Platform_Facebook=as.numeric(Platform=="Facebook"),Platform_Instagram=as.numeric(Platform=="Instagram"),Platform_LinkedIn=as.numeric(Platform=="LinkedIn"),Platform_Snapchat=as.numeric(Platform=="Snapchat"),Platform_Telegram=as.numeric(Platform=="Telegram"),Platform_Twitter=as.numeric(Platform=="Twitter"),Platform_Whatsapp=as.numeric(Platform=="Whatsapp")) %>% 
  select(-Platform)
```

```{r}
happiness_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Anxiety,-Dominant_Emotion_Boredom,-Dominant_Emotion_Neutral,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anger)

set.seed(314159)

h_data_split <- createDataPartition(happiness_model_data$Dominant_Emotion_Happiness, p = 0.75, list = FALSE)

h_train_data <- happiness_model_data[h_data_split, ]

h_test_data <- happiness_model_data[-h_data_split, ]


h_tree_model <- rpart(Dominant_Emotion_Happiness ~ .,
                    data = h_train_data,
                    method = "class")

rpart.plot(
  h_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)
```

_Happiness:_ The best chance of a user feeling happy (14%) as a result of their time spent online is to spend more than 135 minutes on social media and receive at least 71 likes. A user is likely (67%) to experience an emotion other than happiness if they use social media for less than 135 minutes and receive fewer than 20 comments.





```{r}
neutral_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Anxiety,-Dominant_Emotion_Boredom,-Dominant_Emotion_Happiness,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anger)

set.seed(314159)

n_data_split <- createDataPartition(neutral_model_data$Dominant_Emotion_Neutral, p = 0.75, list = FALSE)

n_train_data <- neutral_model_data[n_data_split, ]

n_test_data <- neutral_model_data[-n_data_split, ]


n_tree_model <- rpart(Dominant_Emotion_Neutral ~ .,
                    data = n_train_data,
                    method = "class")

rpart.plot(
  n_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)
```

_Neutral:_ Receiving 18 comments is a dividing factor in whether a user feels neutral as a result of their usage. Getting less than 18 comments while spending less than 43 minutes on social media provides the best chance at a user feeling neutral (7%). Having at least 18 comments received and sending fewer than 31 messages gives the best chance (30%) of a user feeling an emotion other than neutral.



```{r}
anxiety_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Happiness,-Dominant_Emotion_Boredom,-Dominant_Emotion_Neutral,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anger)

set.seed(314159)

x_data_split <- createDataPartition(anxiety_model_data$Dominant_Emotion_Anxiety, p = 0.75, list = FALSE)

x_train_data <- anxiety_model_data[x_data_split, ]

x_test_data <- anxiety_model_data[-x_data_split, ]


x_tree_model <- rpart(Dominant_Emotion_Anxiety ~ .,
                    data = x_train_data,
                    method = "class")

rpart.plot(
  x_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)
```

_Anxiety:_ As for anxiety, Facebook may play a significant role. Users not on Facebook who receive less than 20 likes are reportedly not anxious as a result of their usage (28%). The best chance of a user feeling anxious (5%) is produced when they are on Facebook, receive at least 11 comments, and create fewer than 3 posts.



```{r}
boredom_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Anxiety,-Dominant_Emotion_Neutral,-Dominant_Emotion_Happiness,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anger)

set.seed(314159)

b_data_split <- createDataPartition(boredom_model_data$Dominant_Emotion_Boredom, p = 0.75, list = FALSE)

b_train_data <- boredom_model_data[b_data_split, ]

b_test_data <- boredom_model_data[-b_data_split, ]


b_tree_model <- rpart(Dominant_Emotion_Boredom ~ .,
                    data = b_train_data,
                    method = "class")

rpart.plot(
  b_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)
```

_Boredom:_ Users who receive at least 23 likes and send at least 19 messages are most likely (57%) to not experience boredom. On the other hand, those who get less than 18 likes and spend more than 58 minutes on a platform other than Facebook have the highest chance (7%) of being bored.


```{r}
sadness_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Anxiety,-Dominant_Emotion_Boredom,-Dominant_Emotion_Happiness,-Dominant_Emotion_Neutral,-Dominant_Emotion_Anger)

set.seed(314159)

s_data_split <- createDataPartition(sadness_model_data$Dominant_Emotion_Sadness, p = 0.75, list = FALSE)

s_train_data <- sadness_model_data[s_data_split, ]

s_test_data <- sadness_model_data[-s_data_split, ]


s_tree_model <- rpart(Dominant_Emotion_Sadness ~ .,
                    data = s_train_data,
                    method = "class")

rpart.plot(
  s_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)
```

_Sadness:_ As for sadness, there was not any significant chain that lends itself to a user being sad (<5%), however, those who receive at least 19 likes, are not on Snapchat, and receive fewer than 10 comments have the best chance (24%) of experiencing an emotion other than sadness.




```{r}
anger_model_data <-data_3_numeric %>% 
  select(-Dominant_Emotion_Happiness,-Dominant_Emotion_Boredom,-Dominant_Emotion_Neutral,-Dominant_Emotion_Sadness,-Dominant_Emotion_Anxiety)

set.seed(314159)

g_data_split <- createDataPartition(anger_model_data$Dominant_Emotion_Anger, p = 0.75, list = FALSE)

g_train_data <- anger_model_data[g_data_split, ]

g_test_data <- anger_model_data[-g_data_split, ]


g_tree_model <- rpart(Dominant_Emotion_Anger ~ .,
                    data = g_train_data,
                    method = "class")

rpart.plot(
  g_tree_model,
  type = 2, 
  extra = 106, 
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)
```

_Anger:_ Unlike the other emotions, the factors that lead to a user not being angry due to their media usage are most dependent on the platforms being used. From this model, users who are not on Twitter, WhatsApp, or Telegram are likely (65%) not to be angry. Those who do report anger have the best chance (7%) of following a specific path. This path is as follows: being on Twitter, receiving at least 17 comments, and sending between 19 and 24 messages. 





### Question 2: Based on the decision tree model, what factor influences that emotional state the most?

To figure out which part of social media use had the biggest impact on a user’s dominant emotion, we looked at the variable importance scores from a larger decision tree and graphed them. These scores tell us which predictors the model relied on the most when deciding how to classify people’s emotions.
  
```{r}

model_data <-data_2 %>% 
  select(Daily_Usage_Time..minutes., Platform, Dominant_Emotion, Posts_Per_Day, Messages_Sent_Per_Day, Comments_Received_Per_Day, Likes_Received_Per_Day )

set.seed(314159)

data_split <- createDataPartition(model_data$Dominant_Emotion, p = 0.75, list = FALSE)

train_data <- model_data[data_split, ]

test_data <- model_data[-data_split, ]


tree_model <- rpart(Dominant_Emotion ~ ., 
                    data = train_data,
                    method = "class")

rpart.plot(
  tree_model,
  type = 2,
  extra = 106,
  box.palette = "RdBu",
  shadow.col = "gray",
  nn = TRUE
)
```
  
```{r}
#Factor Importance

imp <- tree_model$variable.importance
imp_df <- data.frame(
predictor = names(imp),
importance = as.numeric(imp)
) %>%
arrange(desc(importance))
Importance_Palette<-c("#29374d", "#314a70", "#305ea6", "#4fb3ff", "#7cc5fc", "#6ff7f7")

imp_df
ggplot(imp_df, aes(x = reorder(predictor, importance), y = importance)) +
geom_bar(stat = "identity", fill=Importance_Palette) +
  scale_x_discrete(labels=c("Platform"="Platform","Posts_Per_Day"="Posts per Day","Comments_Received_Per_Day"="Comments Received per Day","Messages_Sent_Per_Day"="Messages Sent per Day","Daily_Usage_Time..minutes."="Daily Usage","Likes_Received_Per_Day"="Likes Received"))+
labs(title = "Top Predictors of Dominant Emotion",x = "Predictor",y = "Importance Score") +
scale_fill_manual(values=Importance_Palette) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),legend.position="none")
```

From the bar graph, “Likes Received per Day” stands out as the strongest predictor. This means that how many likes a user gets plays a bigger role in shaping their emotional state than anything else measured. In other words, the amount of positive feedback people receive seems to matter more than simply how long they use the app or which platform they’re on. Likes can act as a form of social validation, so higher or lower engagement could influence how someone ends up feeling.

Right behind likes were “Daily Usage Time”, “Messages Sent per Day”, and “Comments Received per Day.” These are all interaction related measures, suggesting that emotions are more influenced by how long users spend on a platform, how much users engage with others, and how much people engage back with them. On the other hand, “Platform” and “Posts per Day” were the least important predictors. That means the specific app someone uses or how often they post didn’t matter nearly as much as the amount of feedback and interaction they received.

Overall, the model shows a clear pattern; Emotions are shaped more by engagement than by simple activity. It isn’t the app you use, rather it’s the feedback and attention you get while using it. This aligns with the idea that social interaction and perceived social approval play a major role in how we feel after being online.


### Question 3: What is the dominant emotional state during the day for each social media app?
  
```{r}
graph3 <- data_2 %>% 
  select(Platform, Dominant_Emotion)

ggplot(graph3,aes(x=Platform, fill=Dominant_Emotion)) +
  geom_bar(position = "dodge") +
  labs(x="Platform",y="Count of Felt Dominant Emotion", title="Dominant Emotion Felt on each Social Media Platform") +
  scale_fill_manual(name="Dominant Emotion",values=c("red","purple","grey","yellow","lightblue","darkblue"))
```

The dominant emotional state across the platforms greatly differed. From this data set, users on Instagram predominantly reported feeling happy, far more than any other emotion, and boredom was not reported. Users on LinkedIn were predominantly bored, and no one was happy. Users on Twitter felt predominantly angry; however, sadness was also up there. Facebook, Snapchat, Telegram, and WhatsApp had more variability in which emotions were predominantly felt. However, no happiness or anger was present in this Facebook data set, and neutrality + anxiety was the most common. No anger was found in Snapchat, but sadness was the dominant emotion. Sadness was also the top emotion on Telegram, and no happiness counts were present. As for WhatsApp, anger was the dominant emotion, with no counts of sadness or boredom.

This demonstrates how the media is consumed and how the structure of each unique platform influences which dominant emotion is felt. This data can be used to spread awareness of which platforms to limit use on because they may be generating negative emotions, and which can potentially be used to promote positive emotions.


### Question 4: Is there a difference in which dominant emotional state a user feels based on their interaction with other users and the engagement they receive?

```{r}
graph4 <- data_2 %>% 
  select(Posts_Per_Day, Likes_Received_Per_Day, Comments_Received_Per_Day, Messages_Sent_Per_Day,  Dominant_Emotion) 
```

```{r}
graph4 <- graph4 %>% 
 mutate(interaction = Posts_Per_Day + Messages_Sent_Per_Day)
  
graph4 <- graph4 %>% 
  mutate(engagement = Likes_Received_Per_Day + Comments_Received_Per_Day)
```

```{r}
graph4 <- graph4 %>% 
  select(interaction, engagement, Dominant_Emotion)

ggplot(graph4,aes(x=engagement,y=interaction, fill=Dominant_Emotion))+
  geom_point(shape=21,size=3,color="black",stroke=.5)+
  xlim(0,160)+
  ylim(0,60)+
  labs(x="Engagement (Likes + Comments Received / Day) ",y="Interaction (Posts + Messsages Sent / Day)", title="Dominant Emotion Felt Based on Usage")  +
  scale_fill_manual(name="Dominant Emotion",values=c("red","purple","grey","yellow","lightblue","darkblue"))
```

The graph shows a positive correlation between emotion, engagement, and interaction. The more you receive engagement from other users via likes and comments, and the more you interact by posting and messaging other users, you generally feel happier. There are two high data points that some may feel anxious about at the maximum end of engagement and interaction. On the other hand, if a user has low interaction and engagement, they are more likely to feel bored. If you are bored, spending time on social media might be a good way to make you feel better, as it provides a way to interact with others and alleviate boredom.

Anger, anxiety, sadness, and neutral had a larger spread; however, most do not overlap with the happiness data points. This may indicate that the users and media people they interact with may contribute to the emotions they feel, and that this is not a one-to-one reflection of the quantity of their interaction and engagement. 



# Conclusions

Based on our analysis, we found that users are more likely to feel happiness when they spend more than 135 minutes on social media and receive 71 likes. This trend of higher engagement and interaction leading to happiness is visualized in the scatter plot. Also, it demonstrates that a user was likely to experience an emotion other than happiness if they had less engagement and interaction on the platform. By increasing engagement and interaction, it was also found that a user is less likely to become bored. Likes received per day, daily usage time, messages sent per day, and comments received per day were found to be the top predictors of which emotion was dominantly felt. However, other emotions beyond happiness and boredom were found to be less influenced by social media usage and more by the platform used. For example, Twitter users were predominantly angry or sad, while Facebook users were neutral or anxious.

Overall, our data analysis provided a comprehensive understanding of how usage and the type of platform impact the user's dominant emotional state throughout the day. However, 77 data points were removed because of formatting issues. Their inclusion may have increased variability in the emotional states felt across platforms and enabled more accurate predictive models. Additionally, for future analysis, it may be worthwhile to investigate the media and interactions on each platform rather than inferring them based on the emotion. This would help determine the direction of causality between media and emotion. 