How do users’ Instagram behaviors—such as screen time, engagement actions, and content interaction—relate to their overall well‑being, lifestyle characteristics, and user engagement score?
Author
Geoffrey Kirnon & Aamiri Duckworth
Introduction
library(readxl)library(DT)library(tidyverse)library(readr)# Read CSV file (MUCH faster than read_excel)#insta_data <- read_excel("instagram_usage_lifestyle.xlsx")insta_data <-read_csv("instagram_usage_lifestyle.csv")us_only_insta_data <- insta_data %>%drop_na() %>%filter( country =="United States")us_only_insta_data
# A tibble: 262,213 × 58
user_id app_name age gender country urban_rural income_level
<dbl> <chr> <dbl> <chr> <chr> <chr> <chr>
1 10 Instagram 35 Male United States Urban Low
2 25 Instagram 50 Female United States Urban Middle
3 60 Instagram 40 Male United States Suburban Lower-middle
4 71 Instagram 52 Female United States Suburban Lower-middle
5 106 Instagram 13 Female United States Rural High
6 124 Instagram 51 Male United States Urban Upper-middle
7 135 Instagram 36 Female United States Urban Lower-middle
8 139 Instagram 52 Female United States Urban Middle
9 196 Instagram 37 Female United States Urban High
10 248 Instagram 18 Male United States Suburban Low
# ℹ 262,203 more rows
# ℹ 51 more variables: employment_status <chr>, education_level <chr>,
# relationship_status <chr>, has_children <chr>,
# exercise_hours_per_week <dbl>, sleep_hours_per_night <dbl>,
# diet_quality <chr>, smoking <chr>, alcohol_frequency <chr>,
# perceived_stress_score <dbl>, self_reported_happiness <dbl>,
# body_mass_index <dbl>, blood_pressure_systolic <dbl>, …
Data
The data for this project was obtained from a Kaggle Social Media User Activity dataset. The original dataset contained 1,048,546 observations from users across multiple countries. For the purpose of this analysis, the data was filtered to include only users from the United States, resulting in a reduced dataset of 262,213 observations. This step allowed for a more focused analysis within a consistent geographic and cultural context.
In addition to filtering by country, the dataset was cleaned by removing observations with missing values. This was done to ensure that all analyses were conducted using complete data, reducing the risk of bias or errors caused by incomplete records. By working with complete cases, the results of the analysis are more reliable and easier to interpret.
The dataset was further refined by selecting only variables relevant to the goals of this study. These variables were grouped into categories such as demographics, well-being, engagement, and lifestyle factors.
Finally, variables were reviewed to ensure they were in the correct format for analysis, such as distinguishing between numerical and categorical data. Attention was also given to identifying any extreme or unusual values that could influence the results. Overall, these data cleaning steps improved the quality, consistency, and usability of the dataset, providing a strong foundation for meaningful analysis and visualization.
Table of Variables
Category
Variable Name
Demographics
age
Demographics
gender
Demographics
user_id
Well-being
perceived_stress_score
Well-being
self_reported_happiness
Well-being
sleep_hours_per_night
Well-being
time_on_feed_per_day
Well-being
time_on_reels_per_day
Well-being
daily_active_minutes_instagram
Engagement
following_count
Engagement
user_engagement_score
Engagement
time_on_messages_per_day
Lifestyle
income_level
Lifestyle
education_level
Lifestyle
employment_status
Lifestyle
hobbies_count
Lifestyle
social_events_per_month
Lifestyle
travel_frequency_per_year
Lifestyle
time_on_explore_per_day
clean_insta_data <- insta_data %>%filter(country =="United States") %>%# Select ONLY the columns needed for ANY of your analysesselect(# Demographics user_id, age, gender, # Well-being variables perceived_stress_score, self_reported_happiness, sleep_hours_per_night, time_on_feed_per_day, time_on_reels_per_day, daily_active_minutes_instagram,# Engagement variables following_count, user_engagement_score, time_on_messages_per_day,# Lifestyle variables income_level, education_level, employment_status, hobbies_count, social_events_per_month, travel_frequency_per_year, time_on_explore_per_day )datatable(clean_insta_data, rownames =FALSE, options =list(scrollX =TRUE))