library(ISLR2)
attach(Wage)
library(ggplot2)
library(splines)
library(ISLR2)
attach(Wage)
The following objects are masked from Wage (pos = 3):
age, education, health, health_ins, jobclass, logwage, maritl, race,
region, wage, year
# Boxplot of wage by marital status
boxplot(wage ~ maritl, data = Wage, col = "lightpink",
main = "Wage by Marital Status",
ylab = "Wage")
Comment: Married individuals tend to have higher median wages compared to other groups.
# Boxplot of wage by job class
boxplot(wage ~ jobclass, data = Wage, col = "lightpink",
main = "Wage by Job Class",
ylab = "Wage")
Comment: This plot show that people in the “Information” job class typically earn more than those in “Industrial” jobs.
library(splines)
fit_maritl <- lm(wage ~ maritl + bs(age, df=4), data = Wage)
summary(fit_maritl)
Call:
lm(formula = wage ~ maritl + bs(age, df = 4), data = Wage)
Residuals:
Min 1Q Median 3Q Max
-101.909 -23.295 -4.165 14.610 208.639
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 55.706 5.364 10.385 < 2e-16 ***
maritl2. Married 13.917 2.099 6.630 3.98e-11 ***
maritl3. Widowed -4.424 9.281 -0.477 0.634
maritl4. Divorced -2.812 3.409 -0.825 0.409
maritl5. Separated -3.483 5.640 -0.618 0.537
bs(age, df = 4)1 41.198 8.820 4.671 3.13e-06 ***
bs(age, df = 4)2 61.392 6.918 8.874 < 2e-16 ***
bs(age, df = 4)3 49.476 10.714 4.618 4.04e-06 ***
bs(age, df = 4)4 21.754 12.624 1.723 0.085 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 39.43 on 2991 degrees of freedom
Multiple R-squared: 0.1097, Adjusted R-squared: 0.1073
F-statistic: 46.05 on 8 and 2991 DF, p-value: < 2.2e-16
# Predict wage for each marital status across ages
age.grid <- seq(min(age), max(age), length=100)
pred_df <- expand.grid(age = age.grid, maritl = levels(Wage$maritl))
pred_df$wage_pred <- predict(fit_maritl, newdata = pred_df)
library(ggplot2)
ggplot(pred_df, aes(x = age, y = wage_pred, color = maritl)) +
geom_line(size=1.2) +
labs(title = "Predicted Wage by Age and Marital Status", y = "Predicted Wage")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
Please use `linewidth` instead.
Comment: This plot illustrates how predicted wage changes with age for each group. Differences in the vertical position of the curves reflect the effect of marital status, while the shape reflects the non-linear effect of age.
fit_jobclass <- lm(wage ~ jobclass + bs(age, df=4), data = Wage)
summary(fit_jobclass)
Call:
lm(formula = wage ~ jobclass + bs(age, df = 4), data = Wage)
Residuals:
Min 1Q Median 3Q Max
-106.161 -23.718 -4.942 15.909 197.755
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.906 5.320 9.380 < 2e-16 ***
jobclass2. Information 15.040 1.442 10.432 < 2e-16 ***
bs(age, df = 4)1 47.282 8.611 5.491 4.34e-08 ***
bs(age, df = 4)2 73.075 6.324 11.556 < 2e-16 ***
bs(age, df = 4)3 56.370 10.423 5.408 6.87e-08 ***
bs(age, df = 4)4 29.534 12.297 2.402 0.0164 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 39.22 on 2994 degrees of freedom
Multiple R-squared: 0.118, Adjusted R-squared: 0.1166
F-statistic: 80.14 on 5 and 2994 DF, p-value: < 2.2e-16
# Predict wage for each job class across ages
pred_df2 <- expand.grid(age = age.grid, jobclass = levels(Wage$jobclass))
pred_df2$wage_pred <- predict(fit_jobclass, newdata = pred_df2)
ggplot(pred_df2, aes(x = age, y = wage_pred, color = jobclass)) +
geom_line(size=1.2) +
labs(title = "Predicted Wage by Age and Job Class", y = "Predicted Wage")
Comment: This plot will show how wage trajectories differ between job classes across age, with “Information” showing a higher predicted wage at all ages.
Marital Status:
Married individuals tend to have higher wages than other groups, even after controlling for non-linear age effects. The effect of age on wage is similar in shape across marital statuses, but the baseline wage level differs.
Job Class:
“Information” job class consistently earns more than “Industrial.” The non-linear relationship between age and wage is present in both job classes, but the gap remains across ages.