Are you curious as to whether people who are taller than you make more money? Well, I am curious too. Let us find out together using a data sample from a textbook by Andrew Gelman and Hill known as Data Analysis using Regression and Multilevel Hierarchical Models, Cambridge Press, 2007.
It is based on a survey that was done in the early 1990s, and I’ve updated the prices to reflect inflation so they’re in today’s dollars.The dataset is a survey of earnings or wages that includes demographic characteristics of about 1,300 individuals. Among the characteristics measured are the height of people.
Method I will go ahead and load the csv data into the R console.
tbl_df(wages)
## # A tibble: 1,379 x 6
## earn height sex race ed age
## <dbl> <dbl> <fctr> <fctr> <int> <int>
## 1 79571.30 73.89 male white 16 49
## 2 96396.99 66.23 female white 16 62
## 3 48710.67 63.77 female white 16 33
## 4 80478.10 63.22 female other 16 95
## 5 82089.35 63.08 female white 17 43
## 6 15313.35 64.53 female white 15 30
## 7 47104.17 61.54 female white 12 53
## 8 50960.05 73.29 male white 17 50
## 9 3212.65 72.24 male hispanic 15 25
## 10 42996.64 72.40 male white 12 30
## # ... with 1,369 more rows
From the data above, we have earn(how much they earn), height, race, ed (which is the number years spent in school),sex, and age.
For now, I am interested in finding out whether height has any effect on earnings. In other to do this, I have to create a simple regression algorithm model.
Linear Regression Model:
modEarn<-lm(earn~height, data = wages)
summary(modEarn)
##
## Call:
## lm(formula = earn ~ height, data = wages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47903 -19744 -5184 11642 276796
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -126523 14076 -8.989 <2e-16 ***
## height 2387 211 11.312 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29910 on 1377 degrees of freedom
## Multiple R-squared: 0.08503, Adjusted R-squared: 0.08437
## F-statistic: 128 on 1 and 1377 DF, p-value: < 2.2e-16
From the summary, I can see that my model is statistically significant, since the P-Value is less than 0.05. Hence, I can proceed with the analysis of the model.
coef(modEarn)
## (Intercept) height
## -126523.359 2387.196
library(ggplot2)
plot(modEarn)
Initial Result
From above, It shows that for every 1 inch increase in height is associated with $2387.20 increase in earnings. How true is this? Does this hold true for both males and females? From the Normal distribution graph(Q-Q plot) and residual graph, it looks like I am missing some other factors. Let me go ahead and create a multivariate model interracting height with sex.
Multivariate Model
modEarn1<-lm(earn~height+sex+height:sex, data = wages)
Coefficients of Multivariate Model
coef(modEarn1)
## (Intercept) height sexmale height:sexmale
## -12166.9667 564.5102 -30510.4336 701.4065
Further Analysis From the coefficients above, using female as the baseline;1 inch increase in height for female is associated with roughly 565 increase in earnings as compared to roughly 1200 dollars in an inch increase in height for men.
In Conclusion: Should you be worried about your height as a female? Well, there are so many factors that might depend on how much more one person earn more than the other regardless of the sex. However, does height and sex contribute to the differences in earnings among indidividuals, well from this analysis, I will say yes!