11.220 QR Recitation 5 Regression

Author

Chenhan Shao

Published

November 24, 2025

1. Data Preparation

1.1 Read the dataset into R Studio

setwd("C:/Users/Chenh/OneDrive - Massachusetts Institute of Technology/000_2025_Fall/11.220/recitation 5")

CHR <-    
  "CHR_QR_2025.csv" |>    
  read.csv()

1.2 Check what variables we have in this dataset

colnames(CHR)

 [1] "Name"                    "median_hhi"             
 [3] "food_env"                "exercise_oppo"          
 [5] "adult_ob"                "insufficient_sleep"     
 [7] "motor_vehicle_death"     "social_association"     
 [9] "poor_mental_health_days" "rac_seg"                
[11] "exercise_oppo_nominal"

1.3 Check how many percent of the data is missing for each variable

# check missing data      

CHR |> 
  summarise(across(everything(), ~ mean(is.na(.)) * 100)) |>
  t() |>       
  as.data.frame() |>       
  setNames("percent_missing")

                        percent_missing
Name                                  0
median_hhi                            0
food_env                              0
exercise_oppo                         0
adult_ob                              0
insufficient_sleep                    0
motor_vehicle_death                   0
social_association                    0
poor_mental_health_days               0
rac_seg                               0
exercise_oppo_nominal                 0

2. Regression

2.1 Regression with one continuous variable

2.1.1 Regress adult_ob on food_env

Note: the order of the two variables in the regression does matter here!

# run the regression  

model1 <- lm(adult_ob ~ food_env, data = CHR)  

# view the outcome  

stargazer(model1, type = "html", 
          out = "model1_results.html", 
          notes = c("Significance: * p<0.1; ** p<0.05; *** p<0.01"),            
          notes.append = FALSE)


	Dependent variable:

	adult_ob

food_env	-0.029^***
	(0.007)

Constant	0.497^***
	(0.050)


Observations	51
R²	0.268
Adjusted R²	0.253
Residual Std. Error	0.031 (df = 49)
F Statistic	17.944^*** (df = 1; 49)

Note:	Significance: * p<0.1; p<0.05; * p<0.01

2.1.2 Visualize the regression

if (!require(ggplot2)) {install.packages("ggplot2")
  library(ggplot2) } else {library(ggplot2) }

Loading required package: ggplot2

ggplot(CHR, aes(x = food_env, y = adult_ob)) + 
  geom_point() +   
  geom_smooth(method = "lm", se = FALSE, color = "orange") +
  labs(title = "Food Environment and Obesity", 
       x = "food_env", y = "adult_ob")+   
  theme_minimal()

`geom_smooth()` using formula = 'y ~ x'

2.1.2 Exercise: How to regress rac_reg on food_env and visualize the regression?

# type your codes here

2.2 Regression with one continuous variable + one categorical/ grouping variable

2.2.1 Regress adult_ob on food_env and exercise_oppo_nominal

model2 <- lm(adult_ob ~ food_env + exercise_oppo_nominal, data = CHR)  

# view the outcome  

stargazer(model2, type = "html", 
          out = "model1_results.html", 
          notes = c("Significance: * p<0.1; ** p<0.05; *** p<0.01"), 
          notes.append = FALSE)


	Dependent variable:

	adult_ob

food_env	-0.017^**
	(0.007)

exercise_oppo_nominallow	0.028^***
	(0.009)

Constant	0.396^***
	(0.058)


Observations	51
R²	0.382
Adjusted R²	0.356
Residual Std. Error	0.029 (df = 48)
F Statistic	14.826^*** (df = 2; 48)

Note:	Significance: * p<0.1; p<0.05; * p<0.01

2.2.2 Visualize the regression in a 2-D plot

2.2.2 Visualize the regression in a “3-D” plot

2.3 Regression with two continuous variables

2.3.1 Regress adult_ob on food_env and exercise_oppo

model3 <- lm(adult_ob ~ food_env + exercise_oppo, data = CHR)


stargazer(model3,
          type = "html",
          out = "model1_results.html",
          notes = c("Significance: * p<0.1; ** p<0.05; *** p<0.01"), 
          notes.append = FALSE)


	Dependent variable:

	adult_ob

food_env	-0.010
	(0.007)

exercise_oppo	-0.192^***
	(0.040)

Constant	0.513^***
	(0.042)


Observations	51
R²	0.503
Adjusted R²	0.482
Residual Std. Error	0.026 (df = 48)
F Statistic	24.279^*** (df = 2; 48)

Note:	Significance: * p<0.1; p<0.05; * p<0.01

2.3.2. Visualize the regression in a 3-D plane

2.4 Compare and contrast regression models

stargazer(
  model1, model2, model3,
  type = "html",
  out = "models_1_3_results.html",   # saved to Posit Cloud working directory
  notes = c("Significance: * p < 0.1; ** p < 0.05; *** p < 0.01"),
  notes.append = FALSE
)


	Dependent variable:

	adult_ob
	(1)	(2)	(3)

food_env	-0.029^***	-0.017^**	-0.010
	(0.007)	(0.007)	(0.007)

exercise_oppo_nominallow		0.028^***
		(0.009)

exercise_oppo			-0.192^***
			(0.040)

Constant	0.497^***	0.396^***	0.513^***
	(0.050)	(0.058)	(0.042)


Observations	51	51	51
R²	0.268	0.382	0.503
Adjusted R²	0.253	0.356	0.482
Residual Std. Error	0.031 (df = 49)	0.029 (df = 48)	0.026 (df = 48)
F Statistic	17.944^*** (df = 1; 49)	14.826^*** (df = 2; 48)	24.279^*** (df = 2; 48)

Note:	Significance: * p < 0.1; p < 0.05; * p < 0.01