Two different linear classifiers will be applied on a 2-dimensional, linearly separable, binary space. This binary space will be defined by a random hyperplane. This hyperplane will be our target funtion to be approximated.
We’ll specify our tolerance (\(\epsilon\)) to allow at most a 5% error, with a confidence of \(3\sigma\), or 99.73% chance of expected behavior, that is \(P[|E_{in}-E_{out}|>\epsilon]=1-3\sigma=.0027\). This yields a Hoeffding sample size of 1322 points
It’s on this sample size where our classifiers will be executed; once with a primordial set of points that will determine our in-sample error(\(E_{in}\)), then another 1000 sets of 1322 points each which will account for an average out-sample error(\(E_{out}\)). Values from which we’ll extract empirical tolerance and confidence levels.
Our primordial set looks like this:
perceptron <- function(dataset,weights=vector('numeric',ncol(dataset))){
biased <- function(x){
return(cbind(replicate(nrow(x),1),x[,-ncol(x)]))
}
output <- function(x){
return(as.numeric(apply(x,1,function(x)ifelse(x%*%weights>0,1,-1))))
}
weightUpdate <- function(weights,point){
desired <- point[length(point)]
current <- output(biased(point))
weights <- mapply(function(x,y)x+(desired-current)*y,weights,biased(point))
return(as.numeric(weights))
}
samplePoint <- function(dataset){
mis <- dataset[which(dataset[,ncol(dataset)]!=output(biased(dataset))),]
return(mis[sample(nrow(mis),1),])
}
while(any(dataset[,ncol(dataset)] != output(biased(dataset)))){
point <- samplePoint(dataset)
update <- weightUpdate(weights,point)
weights <- update
}
return(weights)
}
LinRegress <- function(dataset){
biased <- function(x){
return(cbind(replicate(nrow(x),1),x[,-ncol(x)]))
}
weights <- ginv(as.matrix(biased(dataset)))%*%dataset[,ncol(dataset)]
return(weights)
}
binLinearG <- function(dim=2,n=10,hp){
points <- replicate(dim,runif(n,-1,1))
output <- function(points){
a <- rowSums(solve(hp))
b <- a%*%hp[sample(nrow(hp),1),]
y <- as.numeric(apply(points,1,function(x)ifelse(x%*%a>b,1,-1)))
return(y)
}
return(as.data.frame(cbind(points,output(points))))
}
biased <- function(dataset){
return(cbind(replicate(nrow(dataset),1),dataset[,-ncol(dataset)]))
}
output <- function(dataset,weights){
return(as.numeric(apply(dataset,1,function(dataset)ifelse(dataset%*%weights>0,1,-1))))
}
#Generating our 2 linear hypothesis with a single set of N points and determining in-sample error:
primordial_points <- binLinearG(n=1322,hp=hyper_plane)
PercepWei <- perceptron(points)
RegresWei <- LinRegress(points)
EinPercep <- mean(primordial_points$V3!=output(biased(primordial_points),PercepWei))
EinRegres <- mean(primordial_points$V3!=output(biased(primordial_points),RegresWei))
#Computing error & confidence measures:
runs <- 1000
results <- data.frame(matrix(ncol=4))
for(i in 1:runs){
random_set <- binLinearG(n=N,hp=hyper_plane)
EoutPercep <- mean(random_set$V3!=output(biased(random_set),PercepWei))
Percep_overTolerance <- as.numeric(abs(EinPercep-EoutPercep)>.05)
EoutRegres <- mean(random_set$V3!=output(biased(random_set),RegresWei))
Regres_overTolerance <- as.numeric(abs(EinRegres-EoutRegres)>.05)
results[i,] <- c(abs(EinPercep-EoutPercep),Percep_overTolerance,abs(EinRegres-EoutRegres),Regres_overTolerance)
}
Average results agree with our initial specifications for \(\epsilon\) and \(P[|E_{in}-E_{out}|>\epsilon]\):
Perceptron:
* \(\epsilon_{average}\)=0.0010174 < 0.05
* \(P[|E_{in}-E_{out}|>\epsilon]\)=0 < 0.0027
Linear Regression:
* \(\epsilon_{average}\)=0.0063116 < 0.05
* \(P[|E_{in}-E_{out}|>\epsilon]\)=0 < 0.0027
Histogram/Density plots also hint something peculiar about the distribution of both \(\epsilon\) values:
It is surprising to see such a different shape in both distributions. Regression’s values are distributed along a wider domain (<.02) than Perceptron’s (<.002). The Perceptron hypothesis guaranteed \(E_{in}=0\) while Regression’s allowed a certain \(E_{in}\). Under Hoeffding’s bound, Regression’s hypothesis was able to leverage its implicit in-sample error and successfully satisfy Hoeffding’s inequality. It is the Perceptron hypothesis the one that ambitiously guaranteed a nil \(E_{in}\), proving to be reliable with the rest of the random sets and bringing error precision to the limits of double-precision floating point format.
Further investigation about precision to be performed.
## [1] "Javier Prado"