For this simple linear regression I used the statistical package “car” and data set from “Davis”. This data set gives us the reported heights and weights and recorded heights and weights, reported in kilograms, of 183 respondents. I am focusing on the reported weight and recorded weight of the respondents.
head(Davis)
## sex weight height repwt repht
## 1 M 77 182 77 180
## 2 F 58 161 51 159
## 3 F 53 161 54 158
## 4 M 68 177 70 175
## 5 F 59 157 59 155
## 6 M 76 170 76 165
The following scatterplot shows the relationship between respondent RECORDED weight and respondent RECORDED weight
scatterplot(weight ~ repwt, data=Davis, smooth=FALSE, id.n=1)
## 12
## 12
As we can see, the respodents reported weight and recorded weights are very similar, there is not a wide variation between the information reported by the respondents and the actual weight that was recorded for this data set.
reg1<-zelig(repwt ~ weight, model="normal", data=Davis)
reg2 <-zelig(repwt ~ weight + sex, model="normal", data=Davis)
summary(reg1)
##
## Call:
## glm(formula = formula, weights = weights, family = gaussian,
## model = F, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -84.750 -2.178 -0.172 2.293 18.639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.75873 2.49803 6.308 2.1e-09 ***
## weight 0.75296 0.03676 20.484 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 57.51487)
##
## Null deviance: 34543 on 182 degrees of freedom
## Residual deviance: 10410 on 181 degrees of freedom
## (17 observations deleted due to missingness)
## AIC: 1264.8
##
## Number of Fisher Scoring iterations: 2
summary(reg2)
##
## Call:
## glm(formula = formula, weights = weights, family = gaussian,
## model = F, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -62.104 -2.707 0.002 2.586 22.925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.52041 2.32791 10.104 < 2e-16 ***
## weight 0.56978 0.03837 14.850 < 2e-16 ***
## sexM 9.75113 1.17668 8.287 2.6e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 41.86271)
##
## Null deviance: 34543.0 on 182 degrees of freedom
## Residual deviance: 7535.3 on 180 degrees of freedom
## (17 observations deleted due to missingness)
## AIC: 1207.7
##
## Number of Fisher Scoring iterations: 2
stargazer(reg1, reg2, type="text")
##
## ==============================================
## Dependent variable:
## ----------------------------
## repwt
## (1) (2)
## ----------------------------------------------
## weight 0.753*** 0.570***
## (0.037) (0.038)
##
## sexM 9.751***
## (1.177)
##
## Constant 15.759*** 23.520***
## (2.498) (2.328)
##
## ----------------------------------------------
## Observations 183 183
## Log Likelihood -630.422 -600.850
## Akaike Inf. Crit. 1,264.844 1,207.701
## ==============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
This table shows a more concrete look at the relationship between reported weights and recorded weights that we saw in the scatterplot above. As we can see there is a close relationship between reported weight and recorded weight of the respondents. The relationship is statistically significant with a 95% confidence level.We see that for every kilogram reported by respondents weight went up by 0.753. We also controlled for sex of the respondents and we found that there is still a high statistical significance with 95% confidence level. When controlled for sex for every kilogram reported the weight increased by 0.570