library(ISLR2)
attach(Wage)
library(ggplot2)
library(splines)

7. The Wage data set contains a number of other features not explored in this chapter, such as marital status ( maritl), job class ( jobclass ), and others. Explore the relationships between some of these other predictors and wage, and use non-linear fitting techniques in order to fit flexible models to the data. Create plots of the results obtained, and write a summary of your findings.

library(ISLR2)
attach(Wage)
The following objects are masked from Wage (pos = 3):

    age, education, health, health_ins, jobclass, logwage, maritl, race,
    region, wage, year
# Boxplot of wage by marital status
boxplot(wage ~ maritl, data = Wage, col = "lightpink", 
        main = "Wage by Marital Status", 
        ylab = "Wage")

Comment: Married individuals tend to have higher median wages compared to other groups.

# Boxplot of wage by job class
boxplot(wage ~ jobclass, data = Wage, col = "lightpink", 
        main = "Wage by Job Class", 
        ylab = "Wage")

Comment: This plot show that people in the “Information” job class typically earn more than those in “Industrial” jobs.

library(splines)
fit_maritl <- lm(wage ~ maritl + bs(age, df=4), data = Wage)
summary(fit_maritl)

Call:
lm(formula = wage ~ maritl + bs(age, df = 4), data = Wage)

Residuals:
     Min       1Q   Median       3Q      Max 
-101.909  -23.295   -4.165   14.610  208.639 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)          55.706      5.364  10.385  < 2e-16 ***
maritl2. Married     13.917      2.099   6.630 3.98e-11 ***
maritl3. Widowed     -4.424      9.281  -0.477    0.634    
maritl4. Divorced    -2.812      3.409  -0.825    0.409    
maritl5. Separated   -3.483      5.640  -0.618    0.537    
bs(age, df = 4)1     41.198      8.820   4.671 3.13e-06 ***
bs(age, df = 4)2     61.392      6.918   8.874  < 2e-16 ***
bs(age, df = 4)3     49.476     10.714   4.618 4.04e-06 ***
bs(age, df = 4)4     21.754     12.624   1.723    0.085 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 39.43 on 2991 degrees of freedom
Multiple R-squared:  0.1097,    Adjusted R-squared:  0.1073 
F-statistic: 46.05 on 8 and 2991 DF,  p-value: < 2.2e-16
# Predict wage for each marital status across ages
age.grid <- seq(min(age), max(age), length=100)
pred_df <- expand.grid(age = age.grid, maritl = levels(Wage$maritl))
pred_df$wage_pred <- predict(fit_maritl, newdata = pred_df)

library(ggplot2)
ggplot(pred_df, aes(x = age, y = wage_pred, color = maritl)) +
  geom_line(size=1.2) +
  labs(title = "Predicted Wage by Age and Marital Status", y = "Predicted Wage")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
Please use `linewidth` instead.

Comment: This plot illustrates how predicted wage changes with age for each group. Differences in the vertical position of the curves reflect the effect of marital status, while the shape reflects the non-linear effect of age.

fit_jobclass <- lm(wage ~ jobclass + bs(age, df=4), data = Wage)
summary(fit_jobclass)

Call:
lm(formula = wage ~ jobclass + bs(age, df = 4), data = Wage)

Residuals:
     Min       1Q   Median       3Q      Max 
-106.161  -23.718   -4.942   15.909  197.755 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              49.906      5.320   9.380  < 2e-16 ***
jobclass2. Information   15.040      1.442  10.432  < 2e-16 ***
bs(age, df = 4)1         47.282      8.611   5.491 4.34e-08 ***
bs(age, df = 4)2         73.075      6.324  11.556  < 2e-16 ***
bs(age, df = 4)3         56.370     10.423   5.408 6.87e-08 ***
bs(age, df = 4)4         29.534     12.297   2.402   0.0164 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 39.22 on 2994 degrees of freedom
Multiple R-squared:  0.118, Adjusted R-squared:  0.1166 
F-statistic: 80.14 on 5 and 2994 DF,  p-value: < 2.2e-16
# Predict wage for each job class across ages
pred_df2 <- expand.grid(age = age.grid, jobclass = levels(Wage$jobclass))
pred_df2$wage_pred <- predict(fit_jobclass, newdata = pred_df2)

ggplot(pred_df2, aes(x = age, y = wage_pred, color = jobclass)) +
  geom_line(size=1.2) +
  labs(title = "Predicted Wage by Age and Job Class", y = "Predicted Wage")

Comment: This plot will show how wage trajectories differ between job classes across age, with “Information” showing a higher predicted wage at all ages.

Summary of Findings

Marital Status:

Married individuals tend to have higher wages than other groups, even after controlling for non-linear age effects. The effect of age on wage is similar in shape across marital statuses, but the baseline wage level differs.

Job Class:

“Information” job class consistently earns more than “Industrial.” The non-linear relationship between age and wage is present in both job classes, but the gap remains across ages.

LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3J9CmxpYnJhcnkoSVNMUjIpCmF0dGFjaChXYWdlKQpsaWJyYXJ5KGdncGxvdDIpCmxpYnJhcnkoc3BsaW5lcykKYGBgCgojIyMjIDcuIFRoZSBXYWdlIGRhdGEgc2V0IGNvbnRhaW5zIGEgbnVtYmVyIG9mIG90aGVyIGZlYXR1cmVzIG5vdCBleHBsb3JlZCBpbiB0aGlzIGNoYXB0ZXIsIHN1Y2ggYXMgbWFyaXRhbCBzdGF0dXMgKCBtYXJpdGwpLCBqb2IgY2xhc3MgKCBqb2JjbGFzcyApLCBhbmQgb3RoZXJzLiBFeHBsb3JlIHRoZSByZWxhdGlvbnNoaXBzIGJldHdlZW4gc29tZSBvZiB0aGVzZSBvdGhlciBwcmVkaWN0b3JzIGFuZCB3YWdlLCBhbmQgdXNlIG5vbi1saW5lYXIgZml0dGluZyB0ZWNobmlxdWVzIGluIG9yZGVyIHRvIGZpdCBmbGV4aWJsZSBtb2RlbHMgdG8gdGhlIGRhdGEuIENyZWF0ZSBwbG90cyBvZiB0aGUgcmVzdWx0cyBvYnRhaW5lZCwgYW5kIHdyaXRlIGEgc3VtbWFyeSBvZiB5b3VyIGZpbmRpbmdzLgoKYGBge3J9CiMgQm94cGxvdCBvZiB3YWdlIGJ5IG1hcml0YWwgc3RhdHVzCmJveHBsb3Qod2FnZSB+IG1hcml0bCwgZGF0YSA9IFdhZ2UsIGNvbCA9ICJsaWdodHBpbmsiLCAKICAgICAgICBtYWluID0gIldhZ2UgYnkgTWFyaXRhbCBTdGF0dXMiLCAKICAgICAgICB5bGFiID0gIldhZ2UiKQpgYGAKCioqQ29tbWVudDoqKiBNYXJyaWVkIGluZGl2aWR1YWxzIHRlbmQgdG8gaGF2ZSBoaWdoZXIgbWVkaWFuIHdhZ2VzIGNvbXBhcmVkIHRvIG90aGVyIGdyb3Vwcy4KCmBgYHtyfQojIEJveHBsb3Qgb2Ygd2FnZSBieSBqb2IgY2xhc3MKYm94cGxvdCh3YWdlIH4gam9iY2xhc3MsIGRhdGEgPSBXYWdlLCBjb2wgPSAibGlnaHRwaW5rIiwgCiAgICAgICAgbWFpbiA9ICJXYWdlIGJ5IEpvYiBDbGFzcyIsIAogICAgICAgIHlsYWIgPSAiV2FnZSIpCmBgYAoKKipDb21tZW50OioqIFRoaXMgcGxvdCBzaG93IHRoYXQgcGVvcGxlIGluIHRoZSAiSW5mb3JtYXRpb24iIGpvYiBjbGFzcyB0eXBpY2FsbHkgZWFybiBtb3JlIHRoYW4gdGhvc2UgaW4gIkluZHVzdHJpYWwiIGpvYnMuCgpgYGB7cn0KZml0X21hcml0bCA8LSBsbSh3YWdlIH4gbWFyaXRsICsgYnMoYWdlLCBkZj00KSwgZGF0YSA9IFdhZ2UpCnN1bW1hcnkoZml0X21hcml0bCkKYGBgCgpgYGB7cn0KIyBQcmVkaWN0IHdhZ2UgZm9yIGVhY2ggbWFyaXRhbCBzdGF0dXMgYWNyb3NzIGFnZXMKYWdlLmdyaWQgPC0gc2VxKG1pbihhZ2UpLCBtYXgoYWdlKSwgbGVuZ3RoPTEwMCkKcHJlZF9kZiA8LSBleHBhbmQuZ3JpZChhZ2UgPSBhZ2UuZ3JpZCwgbWFyaXRsID0gbGV2ZWxzKFdhZ2UkbWFyaXRsKSkKcHJlZF9kZiR3YWdlX3ByZWQgPC0gcHJlZGljdChmaXRfbWFyaXRsLCBuZXdkYXRhID0gcHJlZF9kZikKCmdncGxvdChwcmVkX2RmLCBhZXMoeCA9IGFnZSwgeSA9IHdhZ2VfcHJlZCwgY29sb3IgPSBtYXJpdGwpKSArCiAgZ2VvbV9saW5lKHNpemU9MS4yKSArCiAgbGFicyh0aXRsZSA9ICJQcmVkaWN0ZWQgV2FnZSBieSBBZ2UgYW5kIE1hcml0YWwgU3RhdHVzIiwgeSA9ICJQcmVkaWN0ZWQgV2FnZSIpCmBgYAoKKipDb21tZW50OioqIFRoaXMgcGxvdCBpbGx1c3RyYXRlcyBob3cgcHJlZGljdGVkIHdhZ2UgY2hhbmdlcyB3aXRoIGFnZSBmb3IgZWFjaCBncm91cC4gRGlmZmVyZW5jZXMgaW4gdGhlIHZlcnRpY2FsIHBvc2l0aW9uIG9mIHRoZSBjdXJ2ZXMgcmVmbGVjdCB0aGUgZWZmZWN0IG9mIG1hcml0YWwgc3RhdHVzLCB3aGlsZSB0aGUgc2hhcGUgcmVmbGVjdHMgdGhlIG5vbi1saW5lYXIgZWZmZWN0IG9mIGFnZS4KCmBgYHtyfQpmaXRfam9iY2xhc3MgPC0gbG0od2FnZSB+IGpvYmNsYXNzICsgYnMoYWdlLCBkZj00KSwgZGF0YSA9IFdhZ2UpCnN1bW1hcnkoZml0X2pvYmNsYXNzKQoKIyBQcmVkaWN0IHdhZ2UgZm9yIGVhY2ggam9iIGNsYXNzIGFjcm9zcyBhZ2VzCnByZWRfZGYyIDwtIGV4cGFuZC5ncmlkKGFnZSA9IGFnZS5ncmlkLCBqb2JjbGFzcyA9IGxldmVscyhXYWdlJGpvYmNsYXNzKSkKcHJlZF9kZjIkd2FnZV9wcmVkIDwtIHByZWRpY3QoZml0X2pvYmNsYXNzLCBuZXdkYXRhID0gcHJlZF9kZjIpCgpnZ3Bsb3QocHJlZF9kZjIsIGFlcyh4ID0gYWdlLCB5ID0gd2FnZV9wcmVkLCBjb2xvciA9IGpvYmNsYXNzKSkgKwogIGdlb21fbGluZShzaXplPTEuMikgKwogIGxhYnModGl0bGUgPSAiUHJlZGljdGVkIFdhZ2UgYnkgQWdlIGFuZCBKb2IgQ2xhc3MiLCB5ID0gIlByZWRpY3RlZCBXYWdlIikKYGBgCgoqKkNvbW1lbnQ6KiogVGhpcyBwbG90IHdpbGwgc2hvdyBob3cgd2FnZSB0cmFqZWN0b3JpZXMgZGlmZmVyIGJldHdlZW4gam9iIGNsYXNzZXMgYWNyb3NzIGFnZSwgd2l0aCAiSW5mb3JtYXRpb24iIHNob3dpbmcgYSBoaWdoZXIgcHJlZGljdGVkIHdhZ2UgYXQgYWxsIGFnZXMuCgojIyMjIFN1bW1hcnkgb2YgRmluZGluZ3MKCioqTWFyaXRhbCBTdGF0dXM6KioKCk1hcnJpZWQgaW5kaXZpZHVhbHMgdGVuZCB0byBoYXZlIGhpZ2hlciB3YWdlcyB0aGFuIG90aGVyIGdyb3VwcywgZXZlbiBhZnRlciBjb250cm9sbGluZyBmb3Igbm9uLWxpbmVhciBhZ2UgZWZmZWN0cy4gVGhlIGVmZmVjdCBvZiBhZ2Ugb24gd2FnZSBpcyBzaW1pbGFyIGluIHNoYXBlIGFjcm9zcyBtYXJpdGFsIHN0YXR1c2VzLCBidXQgdGhlIGJhc2VsaW5lIHdhZ2UgbGV2ZWwgZGlmZmVycy4KCioqSm9iIENsYXNzOioqCgoiSW5mb3JtYXRpb24iIGpvYiBjbGFzcyBjb25zaXN0ZW50bHkgZWFybnMgbW9yZSB0aGFuICJJbmR1c3RyaWFsLiIgVGhlIG5vbi1saW5lYXIgcmVsYXRpb25zaGlwIGJldHdlZW4gYWdlIGFuZCB3YWdlIGlzIHByZXNlbnQgaW4gYm90aCBqb2IgY2xhc3NlcywgYnV0IHRoZSBnYXAgcmVtYWlucyBhY3Jvc3MgYWdlcy4KCg==