Non Parametric Modeling

Conditional variance estimation

For the purpose of this assignment, we will consider the aircarft dataset from the smpackage. The objective is to estimate the conditional variance of the variable Log of Weight (Y) given Year (X).

Variables: Log of Weight: is the logarithm of the the maximum take-off weight in kilograms, lgWeight. Year: is the year of first manufacture, Yr

Fit a nonparametric regression to data \((x_i,y_i)\) and save the estimated values \(\hat{m} (x_i)\).

To fit a nonparametric regression, we decided to utilize the logpolreg.R source, built by Pedro Delicado with the function locpolreg in order to fit a local polinomial regression.

The decision of the chosen polynomial and banwidth was only taken after having it used in other examples. In order to provide different examples it was chosen to utilize three different polynomials and plot the results. No method of minimizing error and/or cross validation was made in order to pursue this choice since it is not the objective in this situation.

#Load data and attach
data(aircraft)
help(aircraft)
attach(aircraft)

#Logarithm of Weight
lgWeight <- log(Weight)

# Regression of degree 2 polynomial of lgWeight against Yr
op <- par(mfrow=c(3,1))

# Local Polynomial Regression
fit_0 <- locpolreg(x=Yr,y=lgWeight,h=10,p=0,r=0,main="Local Polynomial Regression p=0,h=10",type.kernel="epan")
fit_1 <- locpolreg(x=Yr,y=lgWeight,h=10,p=1,r=0,main="Local Polynomial Regression p=1,h=10",type.kernel="epan")
fit_2 <- locpolreg(x=Yr,y=lgWeight,h=10,p=2,r=0,main="Local Polynomial Regression p=2,h=10",type.kernel="epan")

Transform the estimated residuals

To get the residuals, we followed the following formula \[z_i = log((y_i − \hat{m}(x_i))^2)\]

Which is basically taking the logarithm to what we normally know as the Error.

#Estimated Values m(xi)
mhat0 = unname(unlist(fit_0[1])) ;mhat1 = unname(unlist(fit_1[1])) ;mhat2 =unname(unlist(fit_2[1]))

#Compute the errors
error_sq0 = (lgWeight-mhat0)^2  ; error_sq1 = (lgWeight-mhat1)^2; error_sq2 = (lgWeight-mhat2)^2
errors = list(error_sq0,error_sq1,error_sq2)
  #Compute the estimated residuals
res0 = log(error_sq0)           ; res1 = log(error_sq1)         ; res2 = log(error_sq2)

Next, a non-parametric regression was fit to the data utilizing the residuals obtained as the y-axis. This estimated function will be called \(\hat{q}(x)\) and we will take it as an estimate of \(log \sigma?^2(x)\). Then

# Fit a nonparametric regression to data (xi,zi) 
op <- par(mfrow=c(3,1))
q_0 <- locpolreg(x=Yr,y=res0,h=10,p=0,r=0,main="Q(x) with p=0,h=10",type.kernel="epan")
q_1 <- locpolreg(x=Yr,y=res1,h=10,p=1,r=0,main="Q(x) with p=1,h=10",type.kernel="epan")
q_2 <- locpolreg(x=Yr,y=res2,h=10,p=2,r=0,main="Q(x) with p=2,h=10",type.kernel="epan")

To find the estimate, we can aproximate:
\(\sigma?^2(x) = e^{\hat{q}(x)}\)

# Estimate σ^2(x) by
hat_sigma_sqr0 = exp(unname(unlist(q_0[1])))
hat_sigma_sqr1 = exp(unname(unlist(q_1[1])))
hat_sigma_sqr2 = exp(unname(unlist(q_2[1])))
sigmas = c("hat_sigma_sqr0","hat_sigma_sqr1","hat_sigma_sqr3")

Now we plot the Epsilon squared against Year in order to be able to plot our variance and analyse its behaviour.

#Plot for Errors
op <- par(mfrow=c(3,1))
plot(Yr, errors[[1]]^2, ylab ="Error Squared", xlab = "Year", main="Plot for Polynomial 0")
lines(Yr,hat_sigma_sqr0, col="red", lwd=3)
plot(Yr, errors[[2]]^2, ylab ="Error Squared", xlab = "Year", main="Plot for Polynomial 1")
lines(Yr,hat_sigma_sqr1, col="blue", lwd=3)
plot(Yr, errors[[3]]^2, ylab ="Error Squared", xlab = "Year", main="Plot for Polynomial 2")
lines(Yr,hat_sigma_sqr2, col="darkred", lwd=3)

Finally we calculate the the confidence intervals and plot the original function of LogWeight against Year. The estimated function will be included, and this time we will calculate the confidence interval to portray the variability of our estimates.

#Plot of estimates with confidence interval
op <- par(mfrow=c(3,1))
plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year",  main="Local Polynomial Reg. with CI p=0")
lines(Yr,mhat0,lwd=2)
CIU <- mhat0 + 1.96*sqrt(hat_sigma_sqr0)
CIL <- mhat0 - 1.96*sqrt(hat_sigma_sqr0)
lines(Yr, CIU, col ="red",lty=2,lwd=2)
lines(Yr,CIL, col ="red",lty=2,lwd=2)

plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year",main="Local Polynomial Reg. with CI p=1")
lines(Yr,mhat1,lwd=2)
CIU <- mhat1 + 1.96*sqrt(hat_sigma_sqr1)
CIL <- mhat1 - 1.96*sqrt(hat_sigma_sqr1)
lines(Yr, CIU, col ="blue",lty=2,lwd=2)
lines(Yr,CIL, col ="blue",lty=2,lwd=2)

plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year",main="Local Polynomial Reg. with CI p=2")
lines(Yr,mhat2,lwd=2)
CIU <- mhat2 + 1.96*sqrt(hat_sigma_sqr2)
CIL <- mhat2 - 1.96*sqrt(hat_sigma_sqr2)
lines(Yr, CIU, col ="darkred",lty=2,lwd=2)
lines(Yr,CIL, col ="darkred",lty=2,lwd=2)

Non Parametric Modeling

Saul Garcia, Krishna Kalyan

10/13/2016

Practical Session

Conditional variance estimation

Fit a nonparametric regression to data \((x_i,y_i)\) and save the estimated values \(\hat{m} (x_i)\).

Transform the estimated residuals

Results