BLOG5

Non-Parametric regression

Dataset: Swiss Here, I have ran through code similar to this blog to see how we can model fertility dependent on catholic value. There is a blog related here using ksmooth function. “http://users.stat.umn.edu/~helwig/notes/smooth-notes.html”

“Nonparametric regression, like linear regression, estimates mean outcomes for a given set of covariates. Unlike linear regression, nonparametric regression is agnostic about the functional form between the outcome and the covariates and is therefore not subject to misspecification error.”

library(ggplot2)

summary(swiss)
   Fertility      Agriculture     Examination      Education    
 Min.   :35.00   Min.   : 1.20   Min.   : 3.00   Min.   : 1.00  
 1st Qu.:64.70   1st Qu.:35.90   1st Qu.:12.00   1st Qu.: 6.00  
 Median :70.40   Median :54.10   Median :16.00   Median : 8.00  
 Mean   :70.14   Mean   :50.66   Mean   :16.49   Mean   :10.98  
 3rd Qu.:78.45   3rd Qu.:67.65   3rd Qu.:22.00   3rd Qu.:12.00  
 Max.   :92.50   Max.   :89.70   Max.   :37.00   Max.   :53.00  
    Catholic       Infant.Mortality
 Min.   :  2.150   Min.   :10.80   
 1st Qu.:  5.195   1st Qu.:18.15   
 Median : 15.140   Median :20.00   
 Mean   : 41.144   Mean   :19.94   
 3rd Qu.: 93.125   3rd Qu.:21.70   
 Max.   :100.000   Max.   :26.60   
plot(swiss)

You can add options to executable code like this

ggplot(swiss, aes(x=Catholic, y=Fertility)) + geom_point()

The echo: false option disables the printing of code (only output is displayed).

#install.packages("glue")
library("stats")
library("glue")
Warning: package 'glue' was built under R version 4.4.2
bandwidths = c(1, 5, 10, 15, 20, 25, 30, 40, 50)

# Loop over different smoothing params to visually garner best fit to data
for (b in bandwidths){
  smoothswiss <- ksmooth(swiss$Catholic, swiss$Fertility, "normal", bandwidth=b)
  plot(Fertility ~ Catholic, swiss, main=glue("bandwidth = {b}"))
  lines(smoothswiss)
}