Summary

The purpose of this assignment is to compare alternative variance estimators which do not require a previous estimation of the regression function. For this purpose, we will use the Boston housing Data provided in the Advanced Statistical Modeling course.

Introduction

Normally when dealing with data sets, it is of importance to know the variance of our data, or model. The variance is represented by \(\sigma^2= V(\epsilon_{i})\). Normally we estimate it by having a regression function, since the residuals are obtained by the difference between the estimated value and the real value: \(\epsilon = \hat{y}_{i}-y_{i}\). In this assignment two alternative proposals will be approached.

Proposals

For the Proposal of Rice (1984) and Proposal of Gasser, Sroka, and Jennen-Steinmetz (1986) are considered and a function is computed in order to obtain the value of each variance for a given x and y vectors.

The Proposal of Rice (1984) follows: \[\hat{\sigma}^2 = \frac{1}{2(n-1)}\sum_{i=2}^{n}(y_{i}-y_{i-1})^2\].

The Proposal of Gasser, Sroka, and Jennen-Steinmetz (1986) follows: \[\hat{\sigma}^2 = \frac{1}{n-2}\sum_{i=2}^{n}\frac{1}{a_{i}^2b_{i}^2+1}\tilde{\epsilon}_{i}^2\] where \[\tilde{\epsilon}_{i} = \hat{y}_{i} - y_{i} = a_{i}y_{i-1} + b_{i}y_{i+1} - y_{i}\] and \[\hat{y}_{i} = a_{i}y_{i-1} + b_{i}y_{i+1} = \frac{x_{i+1}-x_{i}}{x_{i+1}-x_{i-1}}y_{i-1} + \frac{x_{i}-x_{i-1}}{x_{i+1}-x_{i-1}}y_{i+1}\].

Function

A function is built in order to receive an x and y vector, in this case being LSTAT and ROOM respectively. The function will compute the variance using Rice's method, and Gasser,Sroka and Jennen's method, which we will call Interpolation Estimator.

estimators <- function(x,y){
  data = cbind(x,y)
  if(is.unsorted(x, na.rm = FALSE, strictly = FALSE)){
    data <- data[order(x),]
  }
  
  
  ##################### Proposal of Rice ##################################
  sum_sq = 0
  for(i in 2:nrow(data)){
    sum_sq = sum_sq + (data[i,2]-data[i-1,2])^2
  }
  Rice_var_estimator = unname(1/(2*(nrow(data)-1)) * sum_sq)
 ########################################################################## 
  
  ################# Proposal of Gasser Sroka and Jennen Steinmetz #########
  sum_sq = 0
  # Build the estimator 
  for(i in 2:(nrow(data)-1)){
    if(data[i-1,1] == data[i,1] && data[i+1,1] == data[i,1] ){
      xminus = max(data[data[,1]<data[i,1],1])
      xplus =  min(data[data[,1]>data[i,1],1])
      a = (xplus-data[i,1])/(xplus-xminus)
      b = (data[i,1]-xminus)/(xplus-xminus)
    } else{
        a = (data[i+1,1]-data[i,1])/(data[i+1,1]-data[i-1,1])
        b = (data[i,1]-data[i-1,1])/(data[i+1,1]-data[i-1,1])
    }
    y_est = a*data[i-1,2] + b*data[i+1,2]
    error_sq = (y_est - data[i,2])^2
    sum_sq = sum_sq + 1/(a^2+b^2+1)*error_sq
  }
  Interpolation_var_estimator = unname(sum_sq/(nrow(data)-2))
  ##########################################################################
  
  #Return output
  return(c("Rice Variance Estimator" = Rice_var_estimator,"Interpolation Variance Estimator" = Interpolation_var_estimator))
}
estimators <- sqrt(estimators(df$LSTAT,df$ROOM))
Proposal Variance
Rice Variance Estimator 0.5315709
Interpolation Variance Estimator 0.5174054

Fitting Regression

In order to prove that these variance estimators from the mentioned proposals approach to the variance computed by a regression function, we will use the package sm, to fit a local polynomial regression with loess and non-parametric regression with sm.regression.

x= df$LSTAT ; y= df$ROOM

# Fit the local polynomail regression
fit1 <- loess(ROOM~LSTAT, df)
# Fit the Non-parametric regression
fit2 <- sm.regression(df$LSTAT, df$ROOM)
compared_estimators = c("Loess Model" = summary(fit1)$s,"SM Regression" = fit2$sigma,estimators)
Model Variance
Loess Model 0.5033824
SM Regression 0.5097599
Rice Variance Estimator 0.5315709
Interpolation Variance Estimator 0.5174054

Conclusions

As seen in the results table, the \(\sigma^2\) can be estimated without previously fitting a regression function. the variances obtained by Rice and Gasser, Sroka, and Jennen-Steinmetz variance estimation methods, are very similar than the ones obtained by the regression functions.

Code

Hereby the complete code in case it is to be attempted on your own.

### LOADING DATA
load("boston.Rdata")
names(boston.c)[13]<-'ROOM'
# [1] "TOWN"    "TOWNNO"  "TRACT"   "LON"     "LAT"     "MEDV"    "CMEDV"  
# [8] "CRIM"    "ZN"      "INDUS"   "CHAS"    "NOX"     "ROOM"      "AGE"    
#[15] "DIS"     "RAD"     "TAX"     "PTRATIO" "B"       "LSTAT"  

#X = LSTAT  Y = ROOM
df <- boston.c[c("ROOM", "LSTAT")]
x= df$LSTAT
y= df$ROOM


## FUNCTION

estimators <- function(x,y){
  data = cbind(x,y)
  if(is.unsorted(x, na.rm = FALSE, strictly = FALSE)){
    data <- data[order(x),]
  }
  
  
  ##################### Proposal of Rice ##################################
  sum_sq = 0
  for(i in 2:nrow(data)){
    sum_sq = sum_sq + (data[i,2]-data[i-1,2])^2
  }
  Rice_var_estimator = unname(1/(2*(nrow(data)-1)) * sum_sq)
 ########################################################################## 
  
  ################# Proposal of Gasser Sroka and Jennen Steinmetz #########
  sum_sq = 0
  # Build the estimator 
  for(i in 2:(nrow(data)-1)){
    if(data[i-1,1] == data[i,1] && data[i+1,1] == data[i,1] ){
      xminus = max(data[data[,1]<data[i,1],1])
      xplus =  min(data[data[,1]>data[i,1],1])
      a = (xplus-data[i,1])/(xplus-xminus)
      b = (data[i,1]-xminus)/(xplus-xminus)
    } else{
        a = (data[i+1,1]-data[i,1])/(data[i+1,1]-data[i-1,1])
        b = (data[i,1]-data[i-1,1])/(data[i+1,1]-data[i-1,1])
    }
    y_est = a*data[i-1,2] + b*data[i+1,2]
    error_sq = (y_est - data[i,2])^2
    sum_sq = sum_sq + 1/(a^2+b^2+1)*error_sq
  }
  Interpolation_var_estimator = unname(sum_sq/(nrow(data)-2))
  ##########################################################################
  
  #Return output
  return(c("Rice Variance Estimator" = Rice_var_estimator,"Interpolation Variance Estimator" = Interpolation_var_estimator))
}
estimators <- sqrt(estimators(df$LSTAT,df$ROOM))



#### Load Library SM
library(sm)

x= df$LSTAT ; y= df$ROOM

# Fit the local polynomail regression
fit1 <- loess(ROOM~LSTAT, df)
# Fit the Non-parametric regression
fit2 <- sm.regression(df$LSTAT, df$ROOM)
compared_estimators = c("Loess Model" = summary(fit1)$s,"SM Regression" = fit2$sigma,estimators)