library(ISLR2)
library(gam)
## Warning: package 'gam' was built under R version 4.4.3
## Loading required package: splines
## Loading required package: foreach
## Loaded gam 1.22-7
par(mfrow = c(1, 2))
boxplot(wage ~ maritl, data = Wage,
col = "lightblue",
main = "Wage vs Marital Status",
xlab = "Marital Status", ylab = "Wage",
cex.axis = 0.8)
boxplot(wage ~ jobclass, data = Wage,
col = "coral",
main = "Wage vs Job Class",
xlab = "Job Class", ylab = "Wage")
gam.fit <- gam(wage ~ s(age, df = 4) + maritl + jobclass, data = Wage)
summary(gam.fit)
##
## Call: gam(formula = wage ~ s(age, df = 4) + maritl + jobclass, data = Wage)
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -110.378 -23.260 -4.772 15.361 201.141
##
## (Dispersion Parameter for gaussian family taken to be 1497.157)
##
## Null Deviance: 5222086 on 2999 degrees of freedom
## Residual Deviance: 4476501 on 2990 degrees of freedom
## AIC: 30459.59
##
## Number of Local Scoring Iterations: NA
##
## Anova for Parametric Effects
## Df Sum Sq Mean Sq F value Pr(>F)
## s(age, df = 4) 1 199870 199870 133.499 < 2.2e-16 ***
## maritl 4 141136 35284 23.567 < 2.2e-16 ***
## jobclass 1 173181 173181 115.673 < 2.2e-16 ***
## Residuals 2990 4476501 1497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Anova for Nonparametric Effects
## Npar Df Npar F Pr(F)
## (Intercept)
## s(age, df = 4) 3 30.202 < 2.2e-16 ***
## maritl
## jobclass
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow = c(1, 3))
plot(gam.fit, se = TRUE, col = "skyblue")
EDA:
1. Marital Status: The boxplot shows relationship
between marital status and wage. On average, workers who are Married
earn significantly higher wages than those who are “Never Married”,
“Divorced”, “Separated”, or “Widowed”.
2. Job Class: The second
boxplot shows that jobs in the Information sector generally pay a higher
average wage than jobs in the Industrial sector.
The GAM Plots:
1. Age (Non-linear): Holding marital status
and job class constant, age has a distinct non-linear, inverted U-shape
relationship with wage. Wages tend to increase rapidly from age 20 to
40, peak between ages 40 and 60, and then gradually decline as workers
approach retirement age.
2. Marital Status: The GAM isolates the
effect of marital status. Even when accounting for a worker’s age and
job class, being married is associated with a distinct upward “step” in
wage compared to being single or divorced.
3. Job Class: The model
confirms that moving from an industrial job to an information job
results in a positive step up in predicted wage, holding age and marital
status constant.