DATA 624 - Non-Linear Regression

Omer Ozeren

Linear Regression Models

Linear Regression model equations can be written either directly or indirectly in the form:

\[ \begin{align} y_i = b_0 +b_1x_{i1} + b_2x_{i2} + ... + b_px_{iP} + \epsilon_i \tag{1} \end{align} \]

where

Linear Regression models

Goal: to minimize the sum of squared errors (SSE) or a function of the sum of squared errors

Minimize SSE

OLS - Ordinary Least Squares

PLS - Partial Least Squares

Minimize a function of the SSE

Penalized Models

Ridge Regression

\[ \begin{aligned} SSE_{L_2} = \sum_{i=1}^n (y_i - \hat{y_i})^2 + \lambda \sum_{j=1}^P \beta_j^2 \end{aligned} \]

LAsso

\[ \begin{aligned} SSE_{L_1} = \sum_{i=1}^n (y_i - \hat{y_i})^2 + \lambda \sum_{j=1}^P |{\beta_j}| \end{aligned} \]

Elastic Net

\[ \begin{aligned} SSE_{E_{net}} = \sum_{i=1}^n (y_i - \hat{y_i})^2 + \lambda_1 \sum_{j=1}^P \beta_j^2 + \lambda_2 \sum_{j=1}^P |{\beta_j}| \end{aligned} \]

Linear Regression Pros and Cons

Advantages:

Disadvantages:



Non-Linear Regression

Non-linear regression equations take the form:

\[y = f(x,\beta) + \varepsilon\]

Where:

Any model equation that cannot be written in the linear form \(y_i = b_0 + b_1x_{i1} + b_2x_{i2} + ... + b_Px_{iP} + e_i\) is non-linear!

Non-Linear Regression Pros and Cons

Advantages:

Disadvantages:



Non-Linear Regression Model Types

Neural Networks (NN)

Inspired by theories about how the brain works and the interconnectedness of the neurons in the brain

Multivariate Regression Splines (MARS)

Splits each predictor into two groups using a “hinge” or “hockey-stick” function and models linear relationships to the outcome for each group separately

K-Nearest Neighbors (KNN)

Predicts new samples by finding the closest samples in the training set predictor space and taking the mean of their response values. K represents the number of closest samples to consider.

Support Vector Machines (SVM)

Robust regression that aims to minimize the effect of outliers with an approach that uses a subset of the training data points called “support vectors” to predict new values, and counter-intuitively excludes data points closest to the regression line from the prediction equation.

Neural Networks

Introduction

Computing examples

# Create Model
library(neuralnet)
nn<-neuralnet(q03_symptoms~., data=training, hidden =1 , linear.output=FALSE, rep=3) 

Computing examples

## NULL
##        [,1]       [,2]
## 1 0.9781103 0.01241033
## 2 0.9787063 0.01201374
## 3 0.9787063 0.01201374
## 4 0.9787063 0.01201374
## 6 0.9815269 0.01016525
## 7 0.9787063 0.01201374

Pros and Cons

Pros

Cons

Multivariate Adaptive Regression Splines

Introduction

Pros and Cons

Pros

Cons

K-Nearest Neighbors

Introduction

Number of Neighbors

If k = 1, then the new instance is assigned to the class where its nearest neighbor.

If we give a small (large) k input, it may lead to over-fitting (under-fitting). To choose a proper k-value, one can count on cross-validation or bootstrapping.

Similarity/ Distance Metrics

K-Nearest Neighbor Classification of Distance Version method can be “euclidean”, “maximum”, “manhattan”,“canberra”, “binary” or “minkowski”

df <- read.table("C:/Users/OMERO/Documents/GitHub/DATA622/data.txt",header = T,sep=',')
df$label <- ifelse(df$label =="BLACK",1,0)
df$y <- as.numeric(df$y)
df$X <- as.factor(df$X)
## [1] 0.5833333
##       ALGO       AUC       ACC       TPR FPR TNR       FNR
## 1 KNN(K=3) 0.5833333 0.6428571 0.8888889 0.8 0.2 0.1111111

Advantages & Disadvantages

Pros

- cost of the learning process is zero
- nonparametric, K-NN is pretty intuitive and simple
- K-NN has no assumptions
- No Training Step

Cons

- K-NN slow algorithm
- Optimal number of neighbors
- Imbalanced data causes problems

Support Vector Machines

Introduction

SVM Applications

Soft vs. Hard Margins

Allow some misclassification by introducing a slack penalty variable (\(\xi\)). T

Cost Penalty

The slack variable is regulated by hyperparameter cost parameter C. - when C=0, there is a less complex boundary
- when C=inf, more complex boundary, as algorithms cannot afford to misclassify a single datapoint (overfitting)

SVM with High Cost Parameter

svm.model = svm(Species ~ ., data=iris.subset, type='C-classification', kernel='linear', cost=10000, scale=FALSE)
plot(x=iris.subset$Sepal.Length,y=iris.subset$Sepal.Width, col=iris.subset$Species, pch=19)
points(iris.subset[svm.model$index,c(1,2)],col="blue",cex=2)
w = t(svm.model$coefs) %*% svm.model$SV
b = -svm.model$rho #The negative intercept.
abline(a=-b/w[1,2], b=-w[1,1]/w[1,2], col="red", lty=5)

Support Vector Machines

Kernel Trick

All kernel functions take two feature vectors as parameters and return the scalar dot (inner) product of the vectors. Have property of symmetry and is positive semi-definite.

By performing convex quadratic optimization, we may rewrite the algorithm so that it is independent of transforming function \(\phi\)

Support Vector Machines

Model Tuning (Cost & Gamma)

tuned = tune.svm(Species~., data = iris.subset, gamma = 10^(-6:-1),cost = 10^(0:2))
model.tuned = svm(Species~., data = iris.subset, gamma = tuned$best.parameters$gamma, cost = tuned$best.parameters$cost)

Support Vector Machines

Advantages & Disadvantages