Are you curious as to whether people who are taller than you make more money? Well, I am curious too. Let us find out together using a data sample from a textbook by Andrew Gelman and Hill known as Data Analysis using Regression and Multilevel Hierarchical Models, Cambridge Press, 2007.

It is based on a survey that was done in the early 1990s, and I’ve updated the prices to reflect inflation so they’re in today’s dollars.The dataset is a survey of earnings or wages that includes demographic characteristics of about 1,300 individuals. Among the characteristics measured are the height of people.

Method I will go ahead and load the csv data into the R console.

tbl_df(wages)
## # A tibble: 1,379 x 6
##        earn height    sex     race    ed   age
##       <dbl>  <dbl> <fctr>   <fctr> <int> <int>
##  1 79571.30  73.89   male    white    16    49
##  2 96396.99  66.23 female    white    16    62
##  3 48710.67  63.77 female    white    16    33
##  4 80478.10  63.22 female    other    16    95
##  5 82089.35  63.08 female    white    17    43
##  6 15313.35  64.53 female    white    15    30
##  7 47104.17  61.54 female    white    12    53
##  8 50960.05  73.29   male    white    17    50
##  9  3212.65  72.24   male hispanic    15    25
## 10 42996.64  72.40   male    white    12    30
## # ... with 1,369 more rows

From the data above, we have earn(how much they earn), height, race, ed (which is the number years spent in school),sex, and age.

For now, I am interested in finding out whether height has any effect on earnings. In other to do this, I have to create a simple regression algorithm model.

Linear Regression Model:

modEarn<-lm(earn~height, data = wages)
summary(modEarn)
## 
## Call:
## lm(formula = earn ~ height, data = wages)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -47903 -19744  -5184  11642 276796 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -126523      14076  -8.989   <2e-16 ***
## height          2387        211  11.312   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29910 on 1377 degrees of freedom
## Multiple R-squared:  0.08503,    Adjusted R-squared:  0.08437 
## F-statistic:   128 on 1 and 1377 DF,  p-value: < 2.2e-16

From the summary, I can see that my model is statistically significant, since the P-Value is less than 0.05. Hence, I can proceed with the analysis of the model.

coef(modEarn)
## (Intercept)      height 
## -126523.359    2387.196
library(ggplot2)
plot(modEarn)

Initial Result

From above, It shows that for every 1 inch increase in height is associated with $2387.20 increase in earnings. How true is this? Does this hold true for both males and females? From the Normal distribution graph(Q-Q plot) and residual graph, it looks like I am missing some other factors. Let me go ahead and create a multivariate model interracting height with sex.

Multivariate Model

modEarn1<-lm(earn~height+sex+height:sex, data = wages)

Coefficients of Multivariate Model

coef(modEarn1)
##    (Intercept)         height        sexmale height:sexmale 
##    -12166.9667       564.5102    -30510.4336       701.4065

Further Analysis From the coefficients above, using female as the baseline;1 inch increase in height for female is associated with roughly 565 increase in earnings as compared to roughly 1200 dollars in an inch increase in height for men.

In Conclusion: Should you be worried about your height as a female? Well, there are so many factors that might depend on how much more one person earn more than the other regardless of the sex. However, does height and sex contribute to the differences in earnings among indidividuals, well from this analysis, I will say yes!