Raven Shan

Loading and viewing the data

library(car)
data(Davis)
str(Davis)
'data.frame':   200 obs. of  5 variables:
 $ sex   : Factor w/ 2 levels "F","M": 2 1 1 2 1 2 2 2 2 2 ...
 $ weight: int  77 58 53 68 59 76 76 69 71 65 ...
 $ height: int  182 161 161 177 157 170 167 186 178 171 ...
 $ repwt : int  77 51 54 70 59 76 77 73 71 64 ...
 $ repht : int  180 159 158 175 155 165 165 180 175 170 ...

What Factors Influence Weight?

Creating a histogram to examine the data

library(ggplot2)
ggplot (Davis, aes (x = height)) + geom_histogram()

Simple Regression: Examining the relationship between height and weight

  • Height = independent variable (measured in cm)
  • Weight = dependent variable (measured in kg)
m1 <- lm(weight ~ height, data = Davis) 
summary(m1)

Call:
lm(formula = weight ~ height, data = Davis)

Residuals:
    Min      1Q  Median      3Q     Max 
-23.696  -9.506  -2.818   6.372 127.145 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 25.26623   14.95042   1.690  0.09260 . 
height       0.23841    0.08772   2.718  0.00715 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.86 on 198 degrees of freedom
Multiple R-squared:  0.03597,   Adjusted R-squared:  0.0311 
F-statistic: 7.387 on 1 and 198 DF,  p-value: 0.007152

For those 0 cm in height, their weight would be 25.266 kg. One unit increase in height means a .2384kg increase in weight.

Simple Regression: Examining the relationship between sex and weight

  • Sex = independent variable
  • Weight = dependent variable
m2<- lm(weight ~ sex, data = Davis)
summary(m2)

Call:
lm(formula = weight ~ sex, data = Davis)

Residuals:
    Min      1Q  Median      3Q     Max 
-21.898  -6.874  -1.866   5.102 108.134 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   57.866      1.150   50.32   <2e-16 ***
sexM          18.032      1.733   10.40   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.17 on 198 degrees of freedom
Multiple R-squared:  0.3534,    Adjusted R-squared:  0.3501 
F-statistic: 108.2 on 1 and 198 DF,  p-value: < 2.2e-16

Notice that model 2 is different than model 1 in that the independent variable, sex, is a factor variable. Here, the intercept represents the value of the dependent variable for the reference group (females). An intercept of 57.866 would mean that females weigh 57.866 kg on average. The slope represents how many more or less kgs the “dummy” group (males) weighs in comparison to the reference group (females). A slope of 18.032 means that males weigh 18.032 kg more than females.

Multiple Regression: Examining the relationship between sex, height, and weight

m3 <- lm (weight ~ sex + height, data = Davis)
summary(m3)

Call:
lm(formula = weight ~ sex + height, data = Davis)

Residuals:
    Min      1Q  Median      3Q     Max 
-24.718  -7.159  -1.472   5.487  74.726 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 109.11420   14.20513   7.681 7.22e-13 ***
sexM         22.49801    2.08690  10.781  < 2e-16 ***
height       -0.31298    0.08649  -3.619 0.000376 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.81 on 197 degrees of freedom
Multiple R-squared:  0.3937,    Adjusted R-squared:  0.3875 
F-statistic: 63.95 on 2 and 197 DF,  p-value: < 2.2e-16

Controlling for height, males weigh 22.49801 kg more than females on average. Controlling for sex, for every unit increase in height, weight decreases by 0.31298 kg.

Creating a table that shows all 3 models side by side

library(texreg)
screenreg (list(m1, m2, m3))

==============================================
             Model 1    Model 2     Model 3   
----------------------------------------------
(Intercept)   25.27      57.87 ***  109.11 ***
             (14.95)     (1.15)     (14.21)   
height         0.24 **               -0.31 ***
              (0.09)                 (0.09)   
sexM                     18.03 ***   22.50 ***
                         (1.73)      (2.09)   
----------------------------------------------
R^2            0.04       0.35        0.39    
Adj. R^2       0.03       0.35        0.39    
Num. obs.    200        200         200       
RMSE          14.86      12.17       11.81    
==============================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Describing R^2:

A higher R-squared means that the independent variable is more powerful in explaining the total variations in the dependent variable. Model 3 is preferred because the R-squared value is the highest.Therefore, it can be said that part of the ratio difference in weight can be explained by the ratio difference in height and sex. Part of the reason males weigh more than females is because they are typically taller.

Testing the interaction between two indepedent variables: sex and height

m4 <- lm(weight ~ sex*height, data = Davis)
summary(m4)

Call:
lm(formula = weight ~ sex * height, data = Davis)

Residuals:
    Min      1Q  Median      3Q     Max 
-23.091  -6.331  -0.995   6.207  41.230 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  160.49748   13.45954  11.924  < 2e-16 ***
sexM        -261.82753   32.72161  -8.002 1.05e-13 ***
height        -0.62679    0.08199  -7.644 9.17e-13 ***
sexM:height    1.62239    0.18644   8.702 1.33e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.06 on 196 degrees of freedom
Multiple R-squared:  0.5626,    Adjusted R-squared:  0.556 
F-statistic: 84.05 on 3 and 196 DF,  p-value: < 2.2e-16

The t value (1.33e-15) is higher than .05 which means that the effect of height on weight (or weight on height) does not depend on sex; they are not significant.

LS0tDQp0aXRsZTogIlNvYyA3MTIgSG9tZXdvcmsgIzQiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KKlJhdmVuIFNoYW4qDQoNCiMjTG9hZGluZyBhbmQgdmlld2luZyB0aGUgZGF0YSANCmBgYHtyfQ0KbGlicmFyeShjYXIpDQpkYXRhKERhdmlzKQ0Kc3RyKERhdmlzKQ0KYGBgDQoNCioqKg0KDQojIyBXaGF0IEZhY3RvcnMgSW5mbHVlbmNlIFdlaWdodD8NCg0KIyMjQ3JlYXRpbmcgYSBoaXN0b2dyYW0gdG8gZXhhbWluZSB0aGUgZGF0YQ0KYGBge3J9DQpsaWJyYXJ5KGdncGxvdDIpDQoNCmdncGxvdCAoRGF2aXMsIGFlcyAoeCA9IGhlaWdodCkpICsgZ2VvbV9oaXN0b2dyYW0oKQ0KYGBgDQoNCiMjIyBTaW1wbGUgUmVncmVzc2lvbjogRXhhbWluaW5nIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBoZWlnaHQgYW5kIHdlaWdodA0KKiAqKkhlaWdodCoqID0gaW5kZXBlbmRlbnQgdmFyaWFibGUgKG1lYXN1cmVkIGluIGNtKQ0KKiAqKldlaWdodCoqID0gZGVwZW5kZW50IHZhcmlhYmxlIChtZWFzdXJlZCBpbiBrZykgDQoNCmBgYHtyfQ0KbTEgPC0gbG0od2VpZ2h0IH4gaGVpZ2h0LCBkYXRhID0gRGF2aXMpIA0KDQpzdW1tYXJ5KG0xKQ0KYGBgDQoNCkZvciB0aG9zZSAwIGNtIGluIGhlaWdodCwgdGhlaXIgd2VpZ2h0IHdvdWxkIGJlIDI1LjI2NiBrZy4NCk9uZSB1bml0IGluY3JlYXNlIGluIGhlaWdodCBtZWFucyBhIC4yMzg0a2cgaW5jcmVhc2UgaW4gd2VpZ2h0Lg0KDQoNCiMjIyBTaW1wbGUgUmVncmVzc2lvbjogRXhhbWluaW5nIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBzZXggYW5kIHdlaWdodA0KKiAqKlNleCoqID0gaW5kZXBlbmRlbnQgdmFyaWFibGUgDQoqICoqV2VpZ2h0KiogPSBkZXBlbmRlbnQgdmFyaWFibGUgDQpgYGB7cn0NCm0yPC0gbG0od2VpZ2h0IH4gc2V4LCBkYXRhID0gRGF2aXMpDQoNCnN1bW1hcnkobTIpDQpgYGANCg0KTm90aWNlIHRoYXQgbW9kZWwgMiBpcyBkaWZmZXJlbnQgdGhhbiBtb2RlbCAxIGluIHRoYXQgdGhlIGluZGVwZW5kZW50IHZhcmlhYmxlLCBzZXgsIGlzIGEgZmFjdG9yIHZhcmlhYmxlLiBIZXJlLCB0aGUgaW50ZXJjZXB0IHJlcHJlc2VudHMgdGhlIHZhbHVlIG9mIHRoZSBkZXBlbmRlbnQgdmFyaWFibGUgZm9yIHRoZSByZWZlcmVuY2UgZ3JvdXAgKGZlbWFsZXMpLiBBbiBpbnRlcmNlcHQgb2YgNTcuODY2IHdvdWxkIG1lYW4gdGhhdCBmZW1hbGVzIHdlaWdoIDU3Ljg2NiBrZyBvbiBhdmVyYWdlLiBUaGUgc2xvcGUgcmVwcmVzZW50cyBob3cgbWFueSBtb3JlIG9yIGxlc3Mga2dzIHRoZSAiZHVtbXkiIGdyb3VwIChtYWxlcykgd2VpZ2hzIGluIGNvbXBhcmlzb24gdG8gdGhlIHJlZmVyZW5jZSBncm91cCAoZmVtYWxlcykuIEEgc2xvcGUgb2YgMTguMDMyIG1lYW5zIHRoYXQgbWFsZXMgd2VpZ2ggMTguMDMyIGtnIG1vcmUgdGhhbiBmZW1hbGVzLg0KDQojIyMgTXVsdGlwbGUgUmVncmVzc2lvbjogRXhhbWluaW5nIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBzZXgsIGhlaWdodCwgYW5kIHdlaWdodA0KYGBge3J9DQptMyA8LSBsbSAod2VpZ2h0IH4gc2V4ICsgaGVpZ2h0LCBkYXRhID0gRGF2aXMpDQpzdW1tYXJ5KG0zKQ0KYGBgDQoNCkNvbnRyb2xsaW5nIGZvciBoZWlnaHQsIG1hbGVzIHdlaWdoIDIyLjQ5ODAxIGtnIG1vcmUgdGhhbiBmZW1hbGVzIG9uIGF2ZXJhZ2UuDQpDb250cm9sbGluZyBmb3Igc2V4LCBmb3IgZXZlcnkgdW5pdCBpbmNyZWFzZSBpbiBoZWlnaHQsIHdlaWdodCAqKmRlY3JlYXNlcyoqIGJ5IDAuMzEyOTgga2cuICANCg0KIyMjIENyZWF0aW5nIGEgdGFibGUgdGhhdCBzaG93cyBhbGwgMyBtb2RlbHMgc2lkZSBieSBzaWRlIA0KDQpgYGB7cn0NCmxpYnJhcnkodGV4cmVnKQ0KDQpzY3JlZW5yZWcgKGxpc3QobTEsIG0yLCBtMykpDQoNCmBgYA0KDQpEZXNjcmliaW5nIFJeMjoNCg0KQSBoaWdoZXIgUi1zcXVhcmVkIG1lYW5zIHRoYXQgdGhlIGluZGVwZW5kZW50IHZhcmlhYmxlIGlzIG1vcmUgcG93ZXJmdWwgaW4gZXhwbGFpbmluZyB0aGUgdG90YWwgdmFyaWF0aW9ucyBpbiB0aGUgZGVwZW5kZW50IHZhcmlhYmxlLiBNb2RlbCAzIGlzIHByZWZlcnJlZCBiZWNhdXNlIHRoZSBSLXNxdWFyZWQgdmFsdWUgaXMgdGhlIGhpZ2hlc3QuVGhlcmVmb3JlLCBpdCBjYW4gYmUgc2FpZCB0aGF0IHBhcnQgb2YgdGhlIHJhdGlvIGRpZmZlcmVuY2UgaW4gd2VpZ2h0IGNhbiBiZSBleHBsYWluZWQgYnkgdGhlIHJhdGlvIGRpZmZlcmVuY2UgaW4gaGVpZ2h0IGFuZCBzZXguIFBhcnQgb2YgdGhlIHJlYXNvbiBtYWxlcyB3ZWlnaCBtb3JlIHRoYW4gZmVtYWxlcyBpcyBiZWNhdXNlIHRoZXkgYXJlIHR5cGljYWxseSB0YWxsZXIuIA0KDQojIyMgVGVzdGluZyB0aGUgaW50ZXJhY3Rpb24gYmV0d2VlbiB0d28gaW5kZXBlZGVudCB2YXJpYWJsZXM6IHNleCBhbmQgaGVpZ2h0IA0KDQpgYGB7cn0NCm00IDwtIGxtKHdlaWdodCB+IHNleCpoZWlnaHQsIGRhdGEgPSBEYXZpcykNCnN1bW1hcnkobTQpDQpgYGANCg0KVGhlIHQgdmFsdWUgKDEuMzNlLTE1KSBpcyBoaWdoZXIgdGhhbiAuMDUgd2hpY2ggbWVhbnMgdGhhdCB0aGUgZWZmZWN0IG9mIGhlaWdodCBvbiB3ZWlnaHQgKG9yIHdlaWdodCBvbiBoZWlnaHQpIGRvZXMgKipub3QqKiBkZXBlbmQgb24gc2V4OyB0aGV5IGFyZSBub3Qgc2lnbmlmaWNhbnQuICANCg0KDQoNCg0KDQoNCg0KDQoNCg==