Introduction

For this simple linear regression I used the statistical package “car” and data set from “Davis”. This data set gives us the reported heights and weights and recorded heights and weights, reported in kilograms, of 183 respondents. I am focusing on the reported weight and recorded weight of the respondents.

head(Davis)
##   sex weight height repwt repht
## 1   M     77    182    77   180
## 2   F     58    161    51   159
## 3   F     53    161    54   158
## 4   M     68    177    70   175
## 5   F     59    157    59   155
## 6   M     76    170    76   165

Analysis

The following scatterplot shows the relationship between respondent RECORDED weight and respondent RECORDED weight

scatterplot(weight ~ repwt, data=Davis, smooth=FALSE, id.n=1)

## 12 
## 12

As we can see, the respodents reported weight and recorded weights are very similar, there is not a wide variation between the information reported by the respondents and the actual weight that was recorded for this data set.

reg1<-zelig(repwt ~ weight, model="normal", data=Davis)

reg2 <-zelig(repwt ~ weight + sex, model="normal", data=Davis)
summary(reg1)
## 
## Call:
## glm(formula = formula, weights = weights, family = gaussian, 
##     model = F, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -84.750   -2.178   -0.172    2.293   18.639  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.75873    2.49803   6.308  2.1e-09 ***
## weight       0.75296    0.03676  20.484  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 57.51487)
## 
##     Null deviance: 34543  on 182  degrees of freedom
## Residual deviance: 10410  on 181  degrees of freedom
##   (17 observations deleted due to missingness)
## AIC: 1264.8
## 
## Number of Fisher Scoring iterations: 2
summary(reg2)
## 
## Call:
## glm(formula = formula, weights = weights, family = gaussian, 
##     model = F, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -62.104   -2.707    0.002    2.586   22.925  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 23.52041    2.32791  10.104  < 2e-16 ***
## weight       0.56978    0.03837  14.850  < 2e-16 ***
## sexM         9.75113    1.17668   8.287  2.6e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 41.86271)
## 
##     Null deviance: 34543.0  on 182  degrees of freedom
## Residual deviance:  7535.3  on 180  degrees of freedom
##   (17 observations deleted due to missingness)
## AIC: 1207.7
## 
## Number of Fisher Scoring iterations: 2
stargazer(reg1, reg2, type="text")
## 
## ==============================================
##                       Dependent variable:     
##                   ----------------------------
##                              repwt            
##                        (1)            (2)     
## ----------------------------------------------
## weight               0.753***      0.570***   
##                      (0.037)        (0.038)   
##                                               
## sexM                               9.751***   
##                                     (1.177)   
##                                               
## Constant            15.759***      23.520***  
##                      (2.498)        (2.328)   
##                                               
## ----------------------------------------------
## Observations           183            183     
## Log Likelihood       -630.422      -600.850   
## Akaike Inf. Crit.   1,264.844      1,207.701  
## ==============================================
## Note:              *p<0.1; **p<0.05; ***p<0.01

This table shows a more concrete look at the relationship between reported weights and recorded weights that we saw in the scatterplot above. As we can see there is a close relationship between reported weight and recorded weight of the respondents. The relationship is statistically significant with a 95% confidence level.We see that for every kilogram reported by respondents weight went up by 0.753. We also controlled for sex of the respondents and we found that there is still a high statistical significance with 95% confidence level. When controlled for sex for every kilogram reported the weight increased by 0.570