Verifying number of samples required to bound classification error under Hoeffding’s inequality

Two different linear classifiers will be applied on a 2-dimensional, linearly separable, binary space. This binary space will be defined by a random hyperplane. This hyperplane will be our target funtion to be approximated.

We’ll specify our tolerance (\(\epsilon\)) to allow at most a 5% error, with a confidence of \(3\sigma\), or 99.73% chance of expected behavior, that is \(P[|E_{in}-E_{out}|>\epsilon]=1-3\sigma=.0027\). This yields a Hoeffding sample size of 1322 points

It’s on this sample size where our classifiers will be executed; once with a primordial set of points that will determine our in-sample error(\(E_{in}\)), then another 1000 sets of 1322 points each which will account for an average out-sample error(\(E_{out}\)). Values from which we’ll extract empirical tolerance and confidence levels.

Our primordial set looks like this:

Algorithm definition:

perceptron <- function(dataset,weights=vector('numeric',ncol(dataset))){
        
        biased  <- function(x){
                return(cbind(replicate(nrow(x),1),x[,-ncol(x)]))
        }
        
        output  <- function(x){
                return(as.numeric(apply(x,1,function(x)ifelse(x%*%weights>0,1,-1))))
        }
        
        weightUpdate <- function(weights,point){
                desired <- point[length(point)]
                current <- output(biased(point))
                weights <- mapply(function(x,y)x+(desired-current)*y,weights,biased(point))
                return(as.numeric(weights))
        }
        
        samplePoint <- function(dataset){
                mis <- dataset[which(dataset[,ncol(dataset)]!=output(biased(dataset))),]
                return(mis[sample(nrow(mis),1),])
        }
        
        while(any(dataset[,ncol(dataset)] != output(biased(dataset)))){
                point   <- samplePoint(dataset)
                update  <- weightUpdate(weights,point)
                weights <- update
        }
        
        return(weights)
}
LinRegress <- function(dataset){
        
        biased <- function(x){
                return(cbind(replicate(nrow(x),1),x[,-ncol(x)]))
        }
        
        weights <- ginv(as.matrix(biased(dataset)))%*%dataset[,ncol(dataset)]
        
        return(weights)
}
binLinearG <- function(dim=2,n=10,hp){
        
        points <- replicate(dim,runif(n,-1,1))
        
        output <- function(points){
                a  <- rowSums(solve(hp))
                b  <- a%*%hp[sample(nrow(hp),1),]
                y  <- as.numeric(apply(points,1,function(x)ifelse(x%*%a>b,1,-1)))
                return(y)
        }
        
        return(as.data.frame(cbind(points,output(points))))
}
biased  <- function(dataset){
                return(cbind(replicate(nrow(dataset),1),dataset[,-ncol(dataset)]))
        }
        
output  <- function(dataset,weights){
                return(as.numeric(apply(dataset,1,function(dataset)ifelse(dataset%*%weights>0,1,-1))))
        }

Imperative Script:

#Generating our 2 linear hypothesis with a single set of N points and determining in-sample error:
primordial_points    <- binLinearG(n=1322,hp=hyper_plane)
PercepWei <- perceptron(points)
RegresWei <- LinRegress(points)
EinPercep <- mean(primordial_points$V3!=output(biased(primordial_points),PercepWei))
EinRegres <- mean(primordial_points$V3!=output(biased(primordial_points),RegresWei))

#Computing error & confidence measures:
runs    <- 1000
results <- data.frame(matrix(ncol=4))

for(i in 1:runs){

        random_set <- binLinearG(n=N,hp=hyper_plane)
        EoutPercep <- mean(random_set$V3!=output(biased(random_set),PercepWei))
        Percep_overTolerance <- as.numeric(abs(EinPercep-EoutPercep)>.05)
        EoutRegres <- mean(random_set$V3!=output(biased(random_set),RegresWei))
        Regres_overTolerance <- as.numeric(abs(EinRegres-EoutRegres)>.05)
        
        results[i,] <- c(abs(EinPercep-EoutPercep),Percep_overTolerance,abs(EinRegres-EoutRegres),Regres_overTolerance)
}

Conclusions:

Average results agree with our initial specifications for \(\epsilon\) and \(P[|E_{in}-E_{out}|>\epsilon]\):

Histogram/Density plots also hint something peculiar about the distribution of both \(\epsilon\) values:

It is surprising to see such a different shape in both distributions. Regression’s values are distributed along a wider domain (<.02) than Perceptron’s (<.002). The Perceptron hypothesis guaranteed \(E_{in}=0\) while Regression’s allowed a certain \(E_{in}\). Under Hoeffding’s bound, Regression’s hypothesis was able to leverage its implicit in-sample error and successfully satisfy Hoeffding’s inequality. It is the Perceptron hypothesis the one that ambitiously guaranteed a nil \(E_{in}\), proving to be reliable with the rest of the random sets and bringing error precision to the limits of double-precision floating point format.

Further investigation about precision to be performed.

## [1] "Javier Prado"