How do users’ Instagram behaviors—such as screen time, engagement actions, and content interaction—relate to their overall well‑being, lifestyle characteristics, and user engagement score?

Author

Geoffrey Kirnon & Aamiri Duckworth

Introduction

library(readxl)
library(DT)
library(tidyverse)
library(readr)
# Read CSV file (MUCH faster than read_excel)
#insta_data <- read_excel("instagram_usage_lifestyle.xlsx")

insta_data <- read_csv("instagram_usage_lifestyle.csv")

us_only_insta_data <- insta_data %>%
  drop_na() %>%
  filter( country == "United States")


us_only_insta_data
# A tibble: 262,213 × 58
   user_id app_name    age gender country       urban_rural income_level
     <dbl> <chr>     <dbl> <chr>  <chr>         <chr>       <chr>       
 1      10 Instagram    35 Male   United States Urban       Low         
 2      25 Instagram    50 Female United States Urban       Middle      
 3      60 Instagram    40 Male   United States Suburban    Lower-middle
 4      71 Instagram    52 Female United States Suburban    Lower-middle
 5     106 Instagram    13 Female United States Rural       High        
 6     124 Instagram    51 Male   United States Urban       Upper-middle
 7     135 Instagram    36 Female United States Urban       Lower-middle
 8     139 Instagram    52 Female United States Urban       Middle      
 9     196 Instagram    37 Female United States Urban       High        
10     248 Instagram    18 Male   United States Suburban    Low         
# ℹ 262,203 more rows
# ℹ 51 more variables: employment_status <chr>, education_level <chr>,
#   relationship_status <chr>, has_children <chr>,
#   exercise_hours_per_week <dbl>, sleep_hours_per_night <dbl>,
#   diet_quality <chr>, smoking <chr>, alcohol_frequency <chr>,
#   perceived_stress_score <dbl>, self_reported_happiness <dbl>,
#   body_mass_index <dbl>, blood_pressure_systolic <dbl>, …

Data

The data for this project was obtained from a Kaggle Social Media User Activity dataset. The original dataset contained 1,048,546 observations from users across multiple countries. For the purpose of this analysis, the data was filtered to include only users from the United States, resulting in a reduced dataset of 262,213 observations. This step allowed for a more focused analysis within a consistent geographic and cultural context.

In addition to filtering by country, the dataset was cleaned by removing observations with missing values. This was done to ensure that all analyses were conducted using complete data, reducing the risk of bias or errors caused by incomplete records. By working with complete cases, the results of the analysis are more reliable and easier to interpret.

The dataset was further refined by selecting only variables relevant to the goals of this study. These variables were grouped into categories such as demographics, well-being, engagement, and lifestyle factors.

Finally, variables were reviewed to ensure they were in the correct format for analysis, such as distinguishing between numerical and categorical data. Attention was also given to identifying any extreme or unusual values that could influence the results. Overall, these data cleaning steps improved the quality, consistency, and usability of the dataset, providing a strong foundation for meaningful analysis and visualization.

Table of Variables

Category Variable Name
Demographics age
Demographics gender
Demographics user_id
Well-being perceived_stress_score
Well-being self_reported_happiness
Well-being sleep_hours_per_night
Well-being time_on_feed_per_day
Well-being time_on_reels_per_day
Well-being daily_active_minutes_instagram
Engagement following_count
Engagement user_engagement_score
Engagement time_on_messages_per_day
Lifestyle income_level
Lifestyle education_level
Lifestyle employment_status
Lifestyle hobbies_count
Lifestyle social_events_per_month
Lifestyle travel_frequency_per_year
Lifestyle time_on_explore_per_day
clean_insta_data <- insta_data %>%
  filter(country == "United States") %>%
  
  # Select ONLY the columns needed for ANY of your analyses
  select(
    # Demographics
    user_id, age, gender,  
    # Well-being variables
    perceived_stress_score, self_reported_happiness, sleep_hours_per_night,
    time_on_feed_per_day, time_on_reels_per_day, daily_active_minutes_instagram,
    
    # Engagement variables
    following_count, user_engagement_score, time_on_messages_per_day,
    
    # Lifestyle variables
    income_level, education_level, employment_status, hobbies_count,
    social_events_per_month, travel_frequency_per_year,
    time_on_explore_per_day
  )


datatable(clean_insta_data, rownames = FALSE, options = list(scrollX = TRUE))

You can add options to executable code like this

library(knitr)

# WELL-BEING DATA FRAME
well_being_data_frame <- clean_insta_data %>%
  select(
    perceived_stress_score, self_reported_happiness, sleep_hours_per_night,
    time_on_feed_per_day, time_on_reels_per_day,
    daily_active_minutes_instagram, age, gender
  )


# ENGAGEMENT DRIVERS DATA FRAME
engagement_drivers_data_frame <- clean_insta_data %>%
  select(
    perceived_stress_score, self_reported_happiness, sleep_hours_per_night,
    time_on_feed_per_day, time_on_reels_per_day, daily_active_minutes_instagram,
    following_count, user_engagement_score, time_on_messages_per_day,
    age, gender
  )

# LIFESTYLE DATA FRAME
lifestyle_data_frame <- clean_insta_data %>%
  select(
    income_level, education_level, employment_status, hobbies_count,
    social_events_per_month, travel_frequency_per_year,
    daily_active_minutes_instagram, time_on_explore_per_day,
    time_on_reels_per_day, time_on_feed_per_day,
    age, gender
  )
clean_insta_kable <- clean_insta_data %>%
  summarize(
    'Stress Score'= mean(perceived_stress_score, na.rm = TRUE),
    'Self Reported Happiness' = mean(self_reported_happiness, na.rm = TRUE),
    'Sleep Hours' = mean(sleep_hours_per_night, na.rm = TRUE),
    'Daily Active Minutes' = mean(daily_active_minutes_instagram, na.rm = TRUE),
    'Following Count' = mean(following_count, na.rm = TRUE),
    'User Engagement' = mean(user_engagement_score, na.rm = TRUE)
  )
kable(clean_insta_kable, caption = "Average values of selected variables for U.S. users")
Average values of selected variables for U.S. users
Stress Score Self Reported Happiness Sleep Hours Daily Active Minutes Following Count User Engagement
19.97684 5.500959 6.995965 188.0548 2601.999 1.643556

Results

Conclusions

library(leaflet)
ksu_lat <- 33.93795
ksu_lng <- -84.5203

leaflet() |>
  addTiles() |>
  addMarkers(
    lng = ksu_lng,
    lat = ksu_lat,
    popup = paste0(
      "<b>Kennesaw State University</b><br>",
      "Group DS-1<br>",
      "Emails:<br>",
      "aamiri@students.kennesaw.edu<br>",
      "teammate1@students.kennesaw.edu<br>",
      "teammate2@students.kennesaw.edu"
    )
  )

Future Studies