1. Load data from the file Griliches.csv
. See the full description here and save it as wages
.
wages <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/Ecdat/Griliches.csv")
2. Choose columns lw
(natural logaritm of wage), expr
(experience in years), age
(age in years) and iq
and save them as small
.
small <- wages[c(20, 16, 12, 9)]
3. Plot a matrix of scatterplots for the pairs of variables chosen before. Which variables are correlated positively? Negatively?
library(GGally)
ggpairs(small)
All positively, except the pair expr
and iq
.
4. Run a multiple linear regression model that will show how the wage is affected by people’s age, experience and IQ level.
4.1. Run the model in R and provide your code used.
model <- lm(data = small, lw ~ age + expr + iq)
summary(model)
##
## Call:
## lm(formula = lw ~ age + expr + iq, data = small)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.47799 -0.23336 0.00811 0.22212 1.29912
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.2854082 0.1229667 26.718 < 2e-16 ***
## age 0.0750321 0.0047310 15.860 < 2e-16 ***
## expr -0.0157532 0.0066863 -2.356 0.0187 *
## iq 0.0076099 0.0009684 7.858 1.34e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3449 on 754 degrees of freedom
## Multiple R-squared: 0.3561, Adjusted R-squared: 0.3535
## F-statistic: 139 on 3 and 754 DF, p-value: < 2.2e-16
4.2. Write the equation of the model using coefficients you obtained in R.
\[ \text{wages} = 3.285 + 0.075 \times \text{age} -0.015 \times \text{expr} + 0.007 \times \text{iq}. \]
4.3. Which of the variables in the model affect wages significantly if we consider 5% level of significance?
All independent variables (one at 5% and others at 5% and less, so at 0.1% level of significance).
4.4. Interpret the coefficient of expr
.
When the experience of person increases by one year, the wage of a person (its logarithm) decreases by 0.015.
4.5. How does wage change when age increases by one year (on average, all else equal)?
All else equal, when the age increases by one year, the wage (logarithm of wage as we have such a variable in our data set) increases by 0.075 on average.
5. Report the \(R^2\) of this model. How do you feel, is this model acceptable (by quality)?
\(R^2 = 0.3561\), so model is not very bad, but not acceptable since \(R^2\) is less that 0.6.
6. Plot any graphs for residuals of this model. Can you conclude that there are some non-linear patterns in residuals (and, hence, in relationships between variables)?
small$residuals <- model$residuals
ggplot(data = small, aes(x = age, y = residuals)) + geom_point()
ggplot(data = small, aes(x = expr, y = residuals)) + geom_point()
ggplot(data = small, aes(x = iq, y = residuals)) + geom_point()
No certain patterns found, points are scattered randomly.