2025-11-04

Course: DAT 301 Instructor: Dr. Neha Joshi

}

Description of the Dataset

The dataset was sourced from (https://www.kaggle.com/datasets/prince7489/mental-health-and-social-media-balance-dataset/code) and provides information on user screen time on social media across various ages and genders, along with various factors such as sleep quality, stress levels, exercise frequency, and happiness index.

Subset of Data with 5 users
Age Gender Daily_Screen_Time.hrs. Stress_Level.1.10.
44 Male 3.1 6
30 Other 5.1 8
23 Other 7.4 7
36 Female 5.7 8
34 Female 7.0 7

Introduction to Simple Linear Regression

What is Simple Linear Regression?

  • A statistical method to model the relationship between:

    • One independent variable (X) - predictor
    • One dependent variable (Y) - response
  • Goal: Find the best-fitting straight line through data points

  • Applications:

    • Predicting Outcomes
    • Understanding Relationships
    • Making data-driven decisions

The Linear Regression Model

Mathematical Foundations

The simple linear regression model is expressed as:

\[Y = \beta_0 + \beta_1 X + \epsilon\] where:

  • \(Y\) = Dependent variable (Stress Level)
  • \(X\) = Independent variable (Daily Screen Time in Hours)
  • \(\beta_0\) = Y-intercept (baseline stress)
  • \(\beta_1\) = Slope (change in stress per hour of Screen Time)
  • \(\epsilon\) = Random error term

The Research Question

Investigating the relationship between:

  • Independent Variable (X): Daily Screen Hours
    • Time Spent on social media per day
  • Dependent Variable (Y): Stress Level
    • Self-reported stress score (1-10 scale)

Hypothesis: Does increased daily social media screen time lead to higher stress levels?

Dataset: Social Media and Mental Health Balance

Sample Size: 500 observations

Distribution of Daily Screen Hours

Summary Statistics: Mean: 5.53 hours | Median: 5.6 hours | SD: 1.73 hours

Distribution of Stress Levels

Summary Statistics: Min: 2 hours | Q1: 6 hours | Q2 (Median): 7 hours | Q3: 8 hours | Max: 10 hours

Linear Regression Analysis

Fitted Regression Equation:

\[\hat{Y} = 2.979 + 0.658 \times X\] Model Statistics:

  • Slope (\(\beta_1\)): 0.658
    • Interpretation: Each additional hour of screen time is associated with a 0.658 point increase in stress level
  • Intercept (\(\beta_0\)): 2.979
    • Interpretation: Expected stress level with zero screen time
  • R Squared (Coefficient of Determination): 0.547
    • Interpretation: 54.7% of variance in stress is explained by daily screen time
  • Correlation: 0.74

Interactive 2D Regression Plot (Plotly)

R Code: Creating the Regression Plot

# Load required libraries 
library(ggplot2)
library(plotly)

#Load your data 
data <- read.csv('Mental_Health_and_Social_Media_Balance_Dataset.csv')

#Fit Linear Regression Model
model <- lm(Stress_Level.1.10. ~ Daily_Screen_Time.hrs., data = data)

#Create plotly interaction plot
plot_ly(data, x=~Daily_Screen_Time.hrs., y= ~Stress_Level.1.10.,
        type = 'scatter', mode ='markers',
        marker = list(size = 8, color = 'maroon', opacity = 0.5),
        name = 'Data points') %>%
  layout(title = "Screen Hours vs Stress Level",
         xaxis = list(title = "Daily Screen Hours"),
         yaxis = list(title = "Stress Level"))

Key Findings & Conclusions

Summary Insights

  • Excessive screen time may contribute to increased stress
  • Consider setting daily screen time limits

Limitations: Correlation does not imply causation; other factors may influence stress

– Reference –

Concepts for simple linear regression used in this presentation was based on course lecture materials