The purpose of this assignment is to compare alternative variance estimators which do not require a previous estimation of the regression function. For this purpose, we will use the Boston housing Data provided in the Advanced Statistical Modeling course.
Normally when dealing with data sets, it is of importance to know the variance of our data, or model. The variance is represented by \(\sigma^2= V(\epsilon_{i})\). Normally we estimate it by having a regression function, since the residuals are obtained by the difference between the estimated value and the real value: \(\epsilon = \hat{y}_{i}-y_{i}\). In this assignment two alternative proposals will be approached.
For the Proposal of Rice (1984) and Proposal of Gasser, Sroka, and Jennen-Steinmetz (1986) are considered and a function is computed in order to obtain the value of each variance for a given x and y vectors.
The Proposal of Rice (1984) follows: \[\hat{\sigma}^2 = \frac{1}{2(n-1)}\sum_{i=2}^{n}(y_{i}-y_{i-1})^2\].
The Proposal of Gasser, Sroka, and Jennen-Steinmetz (1986) follows: \[\hat{\sigma}^2 = \frac{1}{n-2}\sum_{i=2}^{n}\frac{1}{a_{i}^2b_{i}^2+1}\tilde{\epsilon}_{i}^2\] where \[\tilde{\epsilon}_{i} = \hat{y}_{i} - y_{i} = a_{i}y_{i-1} + b_{i}y_{i+1} - y_{i}\] and \[\hat{y}_{i} = a_{i}y_{i-1} + b_{i}y_{i+1} = \frac{x_{i+1}-x_{i}}{x_{i+1}-x_{i-1}}y_{i-1} + \frac{x_{i}-x_{i-1}}{x_{i+1}-x_{i-1}}y_{i+1}\].
A function is built in order to receive an x and y vector, in this case being LSTAT and ROOM respectively. The function will compute the variance using Rice's method, and Gasser,Sroka and Jennen's method, which we will call Interpolation Estimator.
estimators <- function(x,y){
data = cbind(x,y)
if(is.unsorted(x, na.rm = FALSE, strictly = FALSE)){
data <- data[order(x),]
}
##################### Proposal of Rice ##################################
sum_sq = 0
for(i in 2:nrow(data)){
sum_sq = sum_sq + (data[i,2]-data[i-1,2])^2
}
Rice_var_estimator = unname(1/(2*(nrow(data)-1)) * sum_sq)
##########################################################################
################# Proposal of Gasser Sroka and Jennen Steinmetz #########
sum_sq = 0
# Build the estimator
for(i in 2:(nrow(data)-1)){
if(data[i-1,1] == data[i,1] && data[i+1,1] == data[i,1] ){
xminus = max(data[data[,1]<data[i,1],1])
xplus = min(data[data[,1]>data[i,1],1])
a = (xplus-data[i,1])/(xplus-xminus)
b = (data[i,1]-xminus)/(xplus-xminus)
} else{
a = (data[i+1,1]-data[i,1])/(data[i+1,1]-data[i-1,1])
b = (data[i,1]-data[i-1,1])/(data[i+1,1]-data[i-1,1])
}
y_est = a*data[i-1,2] + b*data[i+1,2]
error_sq = (y_est - data[i,2])^2
sum_sq = sum_sq + 1/(a^2+b^2+1)*error_sq
}
Interpolation_var_estimator = unname(sum_sq/(nrow(data)-2))
##########################################################################
#Return output
return(c("Rice Variance Estimator" = Rice_var_estimator,"Interpolation Variance Estimator" = Interpolation_var_estimator))
}
estimators <- sqrt(estimators(df$LSTAT,df$ROOM))
| Proposal | Variance |
|---|---|
| Rice Variance Estimator | 0.5315709 |
| Interpolation Variance Estimator | 0.5174054 |
In order to prove that these variance estimators from the mentioned proposals approach to the variance computed by a regression function, we will use the package sm, to fit a local polynomial regression with loess and non-parametric regression with sm.regression.
x= df$LSTAT ; y= df$ROOM
# Fit the local polynomail regression
fit1 <- loess(ROOM~LSTAT, df)
# Fit the Non-parametric regression
fit2 <- sm.regression(df$LSTAT, df$ROOM)
compared_estimators = c("Loess Model" = summary(fit1)$s,"SM Regression" = fit2$sigma,estimators)
| Model | Variance |
|---|---|
| Loess Model | 0.5033824 |
| SM Regression | 0.5097599 |
| Rice Variance Estimator | 0.5315709 |
| Interpolation Variance Estimator | 0.5174054 |
As seen in the results table, the \(\sigma^2\) can be estimated without previously fitting a regression function. the variances obtained by Rice and Gasser, Sroka, and Jennen-Steinmetz variance estimation methods, are very similar than the ones obtained by the regression functions.
Hereby the complete code in case it is to be attempted on your own.
### LOADING DATA
load("boston.Rdata")
names(boston.c)[13]<-'ROOM'
# [1] "TOWN" "TOWNNO" "TRACT" "LON" "LAT" "MEDV" "CMEDV"
# [8] "CRIM" "ZN" "INDUS" "CHAS" "NOX" "ROOM" "AGE"
#[15] "DIS" "RAD" "TAX" "PTRATIO" "B" "LSTAT"
#X = LSTAT Y = ROOM
df <- boston.c[c("ROOM", "LSTAT")]
x= df$LSTAT
y= df$ROOM
## FUNCTION
estimators <- function(x,y){
data = cbind(x,y)
if(is.unsorted(x, na.rm = FALSE, strictly = FALSE)){
data <- data[order(x),]
}
##################### Proposal of Rice ##################################
sum_sq = 0
for(i in 2:nrow(data)){
sum_sq = sum_sq + (data[i,2]-data[i-1,2])^2
}
Rice_var_estimator = unname(1/(2*(nrow(data)-1)) * sum_sq)
##########################################################################
################# Proposal of Gasser Sroka and Jennen Steinmetz #########
sum_sq = 0
# Build the estimator
for(i in 2:(nrow(data)-1)){
if(data[i-1,1] == data[i,1] && data[i+1,1] == data[i,1] ){
xminus = max(data[data[,1]<data[i,1],1])
xplus = min(data[data[,1]>data[i,1],1])
a = (xplus-data[i,1])/(xplus-xminus)
b = (data[i,1]-xminus)/(xplus-xminus)
} else{
a = (data[i+1,1]-data[i,1])/(data[i+1,1]-data[i-1,1])
b = (data[i,1]-data[i-1,1])/(data[i+1,1]-data[i-1,1])
}
y_est = a*data[i-1,2] + b*data[i+1,2]
error_sq = (y_est - data[i,2])^2
sum_sq = sum_sq + 1/(a^2+b^2+1)*error_sq
}
Interpolation_var_estimator = unname(sum_sq/(nrow(data)-2))
##########################################################################
#Return output
return(c("Rice Variance Estimator" = Rice_var_estimator,"Interpolation Variance Estimator" = Interpolation_var_estimator))
}
estimators <- sqrt(estimators(df$LSTAT,df$ROOM))
#### Load Library SM
library(sm)
x= df$LSTAT ; y= df$ROOM
# Fit the local polynomail regression
fit1 <- loess(ROOM~LSTAT, df)
# Fit the Non-parametric regression
fit2 <- sm.regression(df$LSTAT, df$ROOM)
compared_estimators = c("Loess Model" = summary(fit1)$s,"SM Regression" = fit2$sigma,estimators)