For the purpose of this assignment, we will consider the aircarft dataset from the smpackage. The objective is to estimate the conditional variance of the variable Log of Weight (Y) given Year (X).
Variables: Log of Weight: is the logarithm of the the maximum take-off weight in kilograms, lgWeight. Year: is the year of first manufacture, Yr
To fit a nonparametric regression, we decided to utilize the logpolreg.R source, built by Pedro Delicado with the function locpolreg in order to fit a local polinomial regression.
The decision of the chosen polynomial and banwidth was only taken after having it used in other examples. In order to provide different examples it was chosen to utilize three different polynomials and plot the results. No method of minimizing error and/or cross validation was made in order to pursue this choice since it is not the objective in this situation.
#Load data and attach
data(aircraft)
help(aircraft)
attach(aircraft)
#Logarithm of Weight
lgWeight <- log(Weight)
# Regression of degree 2 polynomial of lgWeight against Yr
op <- par(mfrow=c(3,1))
# Local Polynomial Regression
fit_0 <- locpolreg(x=Yr,y=lgWeight,h=10,p=0,r=0,main="Local Polynomial Regression p=0,h=10",type.kernel="epan")
fit_1 <- locpolreg(x=Yr,y=lgWeight,h=10,p=1,r=0,main="Local Polynomial Regression p=1,h=10",type.kernel="epan")
fit_2 <- locpolreg(x=Yr,y=lgWeight,h=10,p=2,r=0,main="Local Polynomial Regression p=2,h=10",type.kernel="epan")To get the residuals, we followed the following formula \[z_i = log((y_i − \hat{m}(x_i))^2)\]
Which is basically taking the logarithm to what we normally know as the Error.
#Estimated Values m(xi)
mhat0 = unname(unlist(fit_0[1])) ;mhat1 = unname(unlist(fit_1[1])) ;mhat2 =unname(unlist(fit_2[1]))
#Compute the errors
error_sq0 = (lgWeight-mhat0)^2 ; error_sq1 = (lgWeight-mhat1)^2; error_sq2 = (lgWeight-mhat2)^2
errors = list(error_sq0,error_sq1,error_sq2)
#Compute the estimated residuals
res0 = log(error_sq0) ; res1 = log(error_sq1) ; res2 = log(error_sq2) Next, a non-parametric regression was fit to the data utilizing the residuals obtained as the y-axis. This estimated function will be called \(\hat{q}(x)\) and we will take it as an estimate of \(log \sigma?^2(x)\). Then
# Fit a nonparametric regression to data (xi,zi)
op <- par(mfrow=c(3,1))
q_0 <- locpolreg(x=Yr,y=res0,h=10,p=0,r=0,main="Q(x) with p=0,h=10",type.kernel="epan")
q_1 <- locpolreg(x=Yr,y=res1,h=10,p=1,r=0,main="Q(x) with p=1,h=10",type.kernel="epan")
q_2 <- locpolreg(x=Yr,y=res2,h=10,p=2,r=0,main="Q(x) with p=2,h=10",type.kernel="epan")To find the estimate, we can aproximate:
\(\sigma?^2(x) = e^{\hat{q}(x)}\)
# Estimate σ^2(x) by
hat_sigma_sqr0 = exp(unname(unlist(q_0[1])))
hat_sigma_sqr1 = exp(unname(unlist(q_1[1])))
hat_sigma_sqr2 = exp(unname(unlist(q_2[1])))
sigmas = c("hat_sigma_sqr0","hat_sigma_sqr1","hat_sigma_sqr3")Now we plot the Epsilon squared against Year in order to be able to plot our variance and analyse its behaviour.
#Plot for Errors
op <- par(mfrow=c(3,1))
plot(Yr, errors[[1]]^2, ylab ="Error Squared", xlab = "Year", main="Plot for Polynomial 0")
lines(Yr,hat_sigma_sqr0, col="red", lwd=3)
plot(Yr, errors[[2]]^2, ylab ="Error Squared", xlab = "Year", main="Plot for Polynomial 1")
lines(Yr,hat_sigma_sqr1, col="blue", lwd=3)
plot(Yr, errors[[3]]^2, ylab ="Error Squared", xlab = "Year", main="Plot for Polynomial 2")
lines(Yr,hat_sigma_sqr2, col="darkred", lwd=3)Finally we calculate the the confidence intervals and plot the original function of LogWeight against Year. The estimated function will be included, and this time we will calculate the confidence interval to portray the variability of our estimates.
#Plot of estimates with confidence interval
op <- par(mfrow=c(3,1))
plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year", main="Local Polynomial Reg. with CI p=0")
lines(Yr,mhat0,lwd=2)
CIU <- mhat0 + 1.96*sqrt(hat_sigma_sqr0)
CIL <- mhat0 - 1.96*sqrt(hat_sigma_sqr0)
lines(Yr, CIU, col ="red",lty=2,lwd=2)
lines(Yr,CIL, col ="red",lty=2,lwd=2)
plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year",main="Local Polynomial Reg. with CI p=1")
lines(Yr,mhat1,lwd=2)
CIU <- mhat1 + 1.96*sqrt(hat_sigma_sqr1)
CIL <- mhat1 - 1.96*sqrt(hat_sigma_sqr1)
lines(Yr, CIU, col ="blue",lty=2,lwd=2)
lines(Yr,CIL, col ="blue",lty=2,lwd=2)
plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year",main="Local Polynomial Reg. with CI p=2")
lines(Yr,mhat2,lwd=2)
CIU <- mhat2 + 1.96*sqrt(hat_sigma_sqr2)
CIL <- mhat2 - 1.96*sqrt(hat_sigma_sqr2)
lines(Yr, CIU, col ="darkred",lty=2,lwd=2)
lines(Yr,CIL, col ="darkred",lty=2,lwd=2)After working on this set of exercises, we could understand how the Local Polynomial Regression works, and how to calculate its Confidence Interval. It is important to notice that for the purpose of this assignment, there was not a validation of the parameters chosen. In any case, by plotting the different models, we can compare the behaviour of different degrees of polynomials and visually see which one fits best the model. It is also important to consider our confidence intervals, since we want to avoid having a big variance, and our objective is to minimize the error.