RANDOM FOREST

CRF

  • Dependent Var is categorical
  • Bootstrap Sampling with replacement
  • Generate Tree on every bootstrap
  • To calculate importance we randomly select variables in every bootstrap
  • By default no of trees = 500 that can be optimised by plotting rf
  • By default mtry = sqrt (ncols(data))
  • mtry >> no of variables randomly selected at every split for decision making
  • Predict classification value using predict(modelname , data)
  • For Acuracy - methods
    1.Generate confusion matrix
    1. Load ROCR
      Prediction(prob, actual value of dep var)
      prob = predict(modelname , data, type = “prob”)
      performance(prediction variable , “auc”)

RRF

predict(modelname , data)
- predict generates avg value
- Cross Validation can be achieved using RMSE
- RMSE :
- pred <- predict(model, data)
- res - actual - predicted (ALSO CALLED Out of Bag Error)
- rmse > sqrt(mean(res^2))


SVM

  1. Classification
  2. Regression

Concepts :

  • Hyperplane

  • Suppport Vector

  • Rules for generating Hyperplane & Support Vector
    1. Hyper plane must clasify the data accurately
    2. Hyperplane must be equidistant from SV of two classes
    3. To avoid over fitting hyperplane can be selected with some cost
  • Data that is not linearly seperatble uses kernel to increase
    dimensions of one level in the sum this is called kernel trick


k-MEANS CLUSTERING

It is Unsuperivised Learning
1. All Independent variables must be continuous
2. Scale Data
3. Works using Euclidian Distance formula
4. To get optimum clusters calculate
- Within SS
- Between SS
- Limiting value should be sqrt of total no of variable
5. Calculate Optimum clusters using Elbow Method by PLOTTING WSS/BSS

cl_varkmeans(data, k)

cl_var$cluster will assign cluster for every observation.


MARKET BASKET ANALYSIS

  1. Support
  2. Confidence
  3. Lift
  4. Convert Data in Transaction Type
  5. Implement arules
  6. Arrange rules using parameters for confidence, support, lift.

rules <- inspect(rules[1:n])
arulesviz(rules)


Cross Validation

  1. Hold Out
  2. k-Fold
  3. Repeated k-Fold
  4. LOOCV

HOLD OUT

  • Create Data Partition on data with 75:25
  • Generate Model on Train data
  • Calculate RMSE / Classification Table
  • Apply model on test data
  • Calculate RMSE / Classification tABLE
  • Compare RMSE train with RMSE Test

k Fold

  • Generate K
  • Train to define foldes
  • TrainControl to assign algorithm on folds
  • avg RMSE (regression) / avg Accuracy (classification) of all folds
  • compare Average RMSE / Average Accuracy with whole data RMSE/ Accuracy

Repeated k Fold

  • Repeats k- Fold

LOOCV

  • Leaves one out