3.1 Bandwidth selection(ISE)

Simulate 100 observations from a standard normal distribution. Using this sample conduct your investigation to determine the best bandwidth using the density function and computing the ISE (Integrated Square Error).

To do this, create grid of potential bandwidth values and compute the density estimator and validate it against the known standard normal density measuring its distance by the ISE. I would recommend using the arguments n = 512, from = -3, to = 3 in density for appropriate validation. Create two plots. One of your bandwidth grid against the ISE and plot your final optimal estimate against the true standard normal density for comparison.

Solution

set.seed(100)
data <- rnorm(100,mean=0,sd=1)

bandwidth <- seq(0.01,1,length.out = 100)

ISE <- sapply(bandwidth, function(b) {
  density_est <- density(data, bw = b, kernel='gaussian', n = 512, from = -3, to = 3)
  estimated_density <- approxfun(density_est$x, density_est$y)
  f <- (function(x) {
    real_density <- dnorm(x)
    estimated_value <- estimated_density(x)
    return((estimated_value - real_density)^2)})
  
  xi <- seq(-3, 3, length.out = 100)
  sum_f <- sum(f(xi))
  ISE_value <- (6/100) * (sum_f / 100)
  
  return(ISE_value)
})

best_bandwidth <- bandwidth[which.min(ISE)]
cat("The best bandwidth is:", best_bandwidth, "\n")

## The best bandwidth is: 0.43

plot(bandwidth, ISE,type='l',ylab='ISE',xlab='bandwidth(h)',main='ISE(h)')

density_est_final <- density(data, bw = best_bandwidth, kernel='gaussian', n = 512, from = -3, to = 3)
plot(density_est_final, type="l", lty=1, ylim=c(0, 0.4),
     xlab="", ylab="Density", main="Kernel Density Estimation")

lines(density_est_final$x, dnorm(density_est_final$x), col='red', lty=2)
rug(data)

legend("topright", legend=c("Estimated via KDE", "Standard Normal Density"), 
       col=c("black", "red"), lty=c(1,2))

workshop3 assignment

Changlin liu

2024-03-06

3.1 Bandwidth selection(ISE)

Solution