Introduction

This week’s goal is to practice identifying analytical, ethical, and epistemological issues with statistical models.

I selected my Week 8 Data Dive (“Regression Modeling”) for critique. Here is the link

In that week, I built a simple linear regression model to predict Screen Time based on Sleep Time and other variables.


Model Summary

The original model:

# Simplified example:
model_week8 <- lm(ScreenTime ~ SleepTime + Age, data = screentime_data)
summary(model_week8)
  • Response: Screen Time (hours)
  • Predictors: Sleep Time (hours), Age (years)

At first glance, the model had some reasonable coefficients, but now, applying Week 14 concepts, several issues are visible.


Analytical Issues

  • Omitted Variables: Important factors like device type, stress levels, school/work demands were missing. This introduces omitted variable bias.

  • Assumption Violations: No clear check for linearity, normality of residuals, or constant variance (homoscedasticity). These assumptions might have been violated.

  • Small Sample Bias: If the data was small or unbalanced (e.g., skewed age groups), the model would easily overfit or mislead.


Ethical Issues

  • Misleading Interpretations: The model risked implying causation (“sleep causes screen time”) when only correlation was modeled. That’s misleading if shared without context.

  • Representation Problems: If the dataset mostly contained young people or certain groups, results wouldn’t generalize fairly to broader populations.

  • Data Source Transparency: Without explaining where the Screentime data came from and its limitations, users could wrongly trust the model.


Epistemological Issues

  • What can we know from this model? The model only shows associations in this dataset — not real-world cause and effect.

  • Overconfidence Risk: The relatively good fit statistics (e.g., R-squared) could falsely boost confidence in predicting screen time behaviors.

  • Bias in Variable Choice: Sleep Time was treated as a main predictor, but in reality, many unmeasured variables (e.g., stress, social media addiction) might be more powerful drivers.


Conclusion and Fixes

  • Model diagnosis: The Week 8 regression was simple but hid many important risks.

  • Fixes going forward:

    • Add more predictors (stress, work demands, mental health).
    • Perform proper diagnostic checks (residual plots, tests for multicollinearity).
    • Clarify that models only show correlations, not causal relationships.
    • Be careful about sample representation and explain data sources clearly.