2024-11-01

Introduction to Non-Linear Models

  • Purpose: Capture complex patterns in data.
  • Methods Covered: Support Vector Machines (SVM) for regression and classification.
  • Why SVM? Known for flexibility and robustness against outliers.

Overview of Support Vector Machines (SVM)

  • Developed by: Vladimir Vapnik in the 1960s.
  • Applications: Originally for classification; extended to regression.
  • Core Idea: Maximizing margin to improve model stability and generalization.

Support Vector Machines for Regression

  • Goal: Minimize impact of outliers on model.
  • Key Method: ε-Insensitive Loss Function.
  • Impact: Only points outside ε zone contribute to the model.
  • Outcome: Models that are less sensitive to data noise.

Mathematics of SVM Regression

  • Objective Function: \[ \text{Minimize: } \sum L_{\epsilon}(y_i - \hat{y}_i) + \sum \beta^2 \]
  • Key Components:
    • \(L_{\epsilon}\): ε-insensitive function limits small errors.
    • Regularization: Helps avoid overfitting.
    • Support Vectors: Points that define the regression line.

Non-Linear Classification with SVM

  • Challenges: Non-linear boundaries often needed for classification.
  • Kernel Trick: Allows non-linear boundaries without explicit transformation.
  • Kernel Types: Polynomial, RBF, Hyperbolic Tangent.

Margin and Support Vectors in Classification

  • Concept of Margin: Distance between decision boundary and nearest data points.
  • Support Vectors: Only these points (on the margin) influence the boundary.
  • Result: High margin classifiers are less likely to overfit.

Mathematics of SVM Classification

  • Decision Function: \[ D(u) = \beta_0 + \sum y_i \alpha_i K(x_i, u) \]
  • Terms:
    • \(y\): Class labels
    • \(\alpha\): Parameters for support vectors
    • \(K(x, u)\): Kernel function provides flexibility in model boundaries.

Kernel Functions in Detail

  • Linear Kernel: Best for linear relationships.
  • Polynomial Kernel: Captures polynomial relationships.
  • RBF Kernel: Effective for highly non-linear data.
  • Choosing a Kernel: Based on data complexity.

Hyperparameter Tuning in SVM

  • Key Parameters:
    • Cost (C): Adjusts trade-off between bias and variance.
    • Kernel Parameters: E.g., σ in RBF controls scale.
  • Tuning Strategy: Use cross-validation to balance fitting.

Practical Example: Classification Boundaries with RBF Kernel

  • Scenario: Example of boundary complexity with cost variation.
  • Cost:
    • Low cost: Underfitting.
    • High cost: Overfitting with complex boundaries.
  • Outcome: Optimal cost achieves balanced boundary.

Case Study: SVM for Grant Data Classification

  • Data Setup: Predict grant success with RBF kernel.
  • Steps:
    • Tune kernel and cost parameters.
    • Evaluate with reduced and full predictor sets.
  • Results: Reduced set achieved ROC AUC = 0.898.

Extensions of SVM

  • Least Squares SVM: Simplifies optimization.
  • Relevance Vector Machines (RVM): Bayesian analog with fewer vectors.
  • Graph Kernels: For chemistry and text mining applications.

Conclusion and Best Practices

  • Strengths of SVM:
    • Robust to noise, highly flexible for non-linear patterns.
  • Limitations:
    • Computationally intensive with large datasets.
  • Final Tip: Center and scale predictors for best performance.