Kernel regression

Kernel regressin can be used to model non-linear relationship between the outcome variable and predictors. The most common Kernel regression is Nadaraya–Watson kernel regression. First we can write the regression model as

\[Y=f(x)+\epsilon \text{, and }\epsilon\sim N(0,\sigma^2)\] Then \(f(x)\) can be estimated by:

\[\hat{f}(x)=\frac{\sum_{i=1}^nK_h(x-x_i)y_i}{\sum_{i=1}^nK_h(x-x_i)} \tag{1}\] where \(K_h\) is a kernel function with a bandwidth \(h\)

\(\hat{f}(x)\) can also be estimated by the following steps: (1)

\[ \hat{f}(x)=\frac{1}{n}\sum_{i=1}^n w_i y_i\] and (2)

\[ w_i=\frac{K(\frac{x_i-x}{h})}{n^{-1}\sum_{j=1}^n K(\frac{x-x_j}{h})}\] Note, in step 2, for the denominator we run form \(j=1\) to \(n\), it will make the summation of step 1 only applied to numerator of step 2.

Here, is the definition of Kernel function from wiki “In nonparametric statistics, a kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables’ density functions, or in kernel regression to estimate the conditional expectation of a random variable.A kernel is a non-negative real-valued integrable function K. For most applications, it is desirable to define the function to satisfy two additional requirements:”Normalization:\(\int _{-\infty }^{+\infty }K(u)du=1\) and Symmetry: \(K(-u)=K(u)\). Pdf of standard normal distribution is a commonly used kernel function.

We use a dataset from the book “The Elements of Statistical Learning” to show the nonlinear relationship between BMD and age among young people using the above kernel regression model. For more details on the dataset please see https://hastie.su.domains/ElemStatLearn/. There was an online course for this book taught by professors Hastie and Tibshirani, unfortunately, it seems they do not offer this free class anymore.

We import the data from the book website and sort the data set by age.

bone<-read.delim(file="https://hastie.su.domains/ElemStatLearn/datasets/bone.data", header = TRUE, sep = "\t",  dec = "." )
bone2<-bone[order(bone$age),]
head(bone2,5)

##     idnum  age gender      spnbmd
## 471   364 9.40 female 0.007072136
## 479   377 9.60   male 0.016977930
## 203   104 9.65 female 0.100025300
## 21      9 9.70 female 0.031367630
## 473   366 9.75   male 0.021748590

Next, we use two for loops to calculate the \(\hat{f}(x)\), note, for the first loop, data should be stored in a matrix and two loops were used, every time we should treat \(x\) in \(\hat{f}(x)\) as a fixed value. And we show the first 5 values of estimated \(\hat{f}(x)\) using the formula \((1)\). We used the normal kernel function and bandwidth 1 to conduct the estimations.

n<-nrow(bone2)
my_matrix1 <- matrix(NA, nrow=n, ncol=n)
my_matrix2 <- matrix(NA, nrow=n, ncol=n)  
for(i in 1:n)
{
  for(j in 1:n){
    my_matrix1[i,j]<-dnorm(bone2$age[i]-bone2$age[j])*bone2$spnbmd[j]
    my_matrix2[i,j]<-dnorm(bone2$age[i]-bone2$age[j])
}
} 
fx<-c()
for(i in 1:n){
fx[i]<-sum(my_matrix1[i,])/sum(my_matrix2[i,])
}
head(fx,5)

## [1] 0.05357814 0.05449697 0.05473984 0.05498847 0.05524311

Final results

Finally, we plot our final results and compare the results with the built in function ksmooth() in R with normal density and bandwidth 1.

z=ksmooth(bone2$age, bone2$spnbmd,"normal",1)
plot(bone2$age,bone2$spnbmd,xlab='age', ylab='Relative spinal bone mineral density')
lines(bone2$age,fx,type='l',col='blue',lwd=3)
lines(z$x,z$y,type='l',col='red',lwd=3)

We can see the blue line estimated by our home made function is a little bit smoother than the red line that estimated by the built in R function ksmooth(), and it seems the spinal bone mineral density for North American youth peaked at 13 or 14 year old. The following animation shows the changing of relative spinal bone mineral density with age.

If we want to completely understand the built in function ksmooth(), we should following the following steps to view the \(C\) language code used by the function

View(ksmooth)

We will get

function (x, y, kernel = c("box", "normal"), bandwidth = 0.5, 
  range.x = range(x), n.points = max(100L, length(x)), x.points) 
{
  if (missing(y) || is.null(y)) 
    stop("numeric y must be supplied.\nFor density estimation use density()")
  kernel <- match.arg(kernel)
  krn <- switch(kernel, box = 1L, normal = 2L)
  x.points <- if (missing(x.points)) 
    seq.int(range.x[1L], range.x[2L], length.out = n.points)
  else {
    n.points <- length(x.points)
    sort(x.points)
  }
  ord <- order(x)
  .Call(C_ksmooth, x[ord], y[ord], x.points, krn, bandwidth)
}

Here, we can see Ksmooth() is mainly written by \(C\) language since, it called C_Ksmooth, function. The follow C_ksmooth function were from the Winston Chang’s github mirror (https://github.com/wch/r-source/find/trunk) for compiled code built into the R interpreter. And the following web page gives every detailed methods to see all kind of R source codes.(https://stackoverflow.com/questions/19226816/how-can-i-view-the-source-code-for-a-function)

/*
 *  R : A Computer Language for Statistical Data Analysis
 *  Copyright (C) 1998-2016 The R Foundation
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, a copy is available at
 *  https://www.R-project.org/Licenses/
 */

#include <math.h>
#include <R.h>          /* for NA_REAL, includes math.h */
#include <Rinternals.h>

#ifdef ENABLE_NLS
#include <libintl.h>
#define _(String) dgettext ("stats", String)
#else
#define _(String) (String)
#endif

static double dokern(double x, int kern)
{
    if(kern == 1) return(1.0);
    if(kern == 2) return(exp(-0.5*x*x));
    return(0.0); /* -Wall */
}

static void BDRksmooth(double *x, double *y, R_xlen_t n,
               double *xp, double *yp, R_xlen_t np,
               int kern, double bw)
{
    R_xlen_t imin = 0;
    double cutoff = 0.0, num, den, x0, w;

    /* bandwidth is in units of half inter-quartile range. */
    if(kern == 1) {bw *= 0.5; cutoff = bw;}
    if(kern == 2) {bw *= 0.3706506; cutoff = 4*bw;}
    while(x[imin] < xp[0] - cutoff && imin < n) imin++;
    for(R_xlen_t j = 0; j < np; j++) {
    num = den = 0.0;
    x0 = xp[j];
    for(R_xlen_t i = imin; i < n; i++) {
        if(x[i] < x0 - cutoff) imin = i;
        else {
        if(x[i] > x0 + cutoff) break;
        w = dokern(fabs(x[i] - x0)/bw, kern);
        num += w*y[i];
        den += w;
        }
    }
    if(den > 0) yp[j] = num/den; else yp[j] = NA_REAL;
    }
}


// called only from  spline()  in ./ppr.f
NORET void F77_SUB(bdrsplerr)(void)
{
    error(_("only 2500 rows are allowed for sm.method=\"spline\""));
}

void F77_SUB(splineprt)(double* df, double* gcvpen, int* ismethod,
                  double* lambda, double *edf)
{
    Rprintf("spline(df=%5.3g, g.pen=%11.6g, ismeth.=%+2d) -> (lambda, edf) = (%.7g, %5.2f)\n",
        *df, *gcvpen, *ismethod, *lambda, *edf);
    return;
}

// called only from smooth(..., trace=TRUE)  in ./ppr.f :
void F77_SUB(smoothprt)(double* span, int* iper, double* var, double* cvar)
{
    Rprintf("smooth(span=%4g, iper=%+2d) -> (var, cvar) = (%g, %g)\n",
        *span, *iper, *var, *cvar);
    return;
}


SEXP ksmooth(SEXP x, SEXP y, SEXP xp, SEXP skrn, SEXP sbw)
{
    int krn = asInteger(skrn);
    double bw = asReal(sbw);
    x = PROTECT(coerceVector(x, REALSXP));
    y = PROTECT(coerceVector(y, REALSXP));
    xp = PROTECT(coerceVector(xp, REALSXP));
    R_xlen_t nx = XLENGTH(x), np = XLENGTH(xp);
    SEXP yp = PROTECT(allocVector(REALSXP, np));

    BDRksmooth(REAL(x), REAL(y), nx, REAL(xp), REAL(yp), np, krn, bw);
    SEXP ans = PROTECT(allocVector(VECSXP, 2));
    SET_VECTOR_ELT(ans, 0, xp);
    SET_VECTOR_ELT(ans, 1, yp);
    SEXP nm = allocVector(STRSXP, 2);
    setAttrib(ans, R_NamesSymbol, nm);
    SET_STRING_ELT(nm, 0, mkChar("x"));
    SET_STRING_ELT(nm, 1, mkChar("y"));
    UNPROTECT(5);
    return ans;
}

Kernel regression

Enwu Liu

2022-12-28

Kernel regression

Final results