library(ISLR2)
library(gam)
## Warning: package 'gam' was built under R version 4.4.3
## Loading required package: splines
## Loading required package: foreach
## Loaded gam 1.22-7
par(mfrow = c(1, 2))
boxplot(wage ~ maritl, data = Wage, 
        col = "lightblue", 
        main = "Wage vs Marital Status", 
        xlab = "Marital Status", ylab = "Wage",
        cex.axis = 0.8)

boxplot(wage ~ jobclass, data = Wage, 
        col = "coral", 
        main = "Wage vs Job Class", 
        xlab = "Job Class", ylab = "Wage")

gam.fit <- gam(wage ~ s(age, df = 4) + maritl + jobclass, data = Wage)
summary(gam.fit)
## 
## Call: gam(formula = wage ~ s(age, df = 4) + maritl + jobclass, data = Wage)
## Deviance Residuals:
##      Min       1Q   Median       3Q      Max 
## -110.378  -23.260   -4.772   15.361  201.141 
## 
## (Dispersion Parameter for gaussian family taken to be 1497.157)
## 
##     Null Deviance: 5222086 on 2999 degrees of freedom
## Residual Deviance: 4476501 on 2990 degrees of freedom
## AIC: 30459.59 
## 
## Number of Local Scoring Iterations: NA 
## 
## Anova for Parametric Effects
##                  Df  Sum Sq Mean Sq F value    Pr(>F)    
## s(age, df = 4)    1  199870  199870 133.499 < 2.2e-16 ***
## maritl            4  141136   35284  23.567 < 2.2e-16 ***
## jobclass          1  173181  173181 115.673 < 2.2e-16 ***
## Residuals      2990 4476501    1497                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Anova for Nonparametric Effects
##                Npar Df Npar F     Pr(F)    
## (Intercept)                                
## s(age, df = 4)       3 30.202 < 2.2e-16 ***
## maritl                                     
## jobclass                                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow = c(1, 3))
plot(gam.fit, se = TRUE, col = "skyblue")


EDA:
1. Marital Status: The boxplot shows relationship between marital status and wage. On average, workers who are Married earn significantly higher wages than those who are “Never Married”, “Divorced”, “Separated”, or “Widowed”.
2. Job Class: The second boxplot shows that jobs in the Information sector generally pay a higher average wage than jobs in the Industrial sector.


The GAM Plots:
1. Age (Non-linear): Holding marital status and job class constant, age has a distinct non-linear, inverted U-shape relationship with wage. Wages tend to increase rapidly from age 20 to 40, peak between ages 40 and 60, and then gradually decline as workers approach retirement age.
2. Marital Status: The GAM isolates the effect of marital status. Even when accounting for a worker’s age and job class, being married is associated with a distinct upward “step” in wage compared to being single or divorced.
3. Job Class: The model confirms that moving from an industrial job to an information job results in a positive step up in predicted wage, holding age and marital status constant.