RANDOM FOREST

CRF

Dependent Var is categorical
Bootstrap Sampling with replacement
Generate Tree on every bootstrap
To calculate importance we randomly select variables in every bootstrap
By default no of trees = 500 that can be optimised by plotting rf
By default mtry = sqrt (ncols(data))
mtry >> no of variables randomly selected at every split for decision making
Predict classification value using predict(modelname , data)
For Acuracy - methods
1.Generate confusion matrix
1. Load ROCR
  Prediction(prob, actual value of dep var)
  prob = predict(modelname , data, type = “prob”)
  performance(prediction variable , “auc”)

RRF

predict(modelname , data)
- predict generates avg value
- Cross Validation can be achieved using RMSE
- RMSE :
- pred <- predict(model, data)
- res - actual - predicted (ALSO CALLED Out of Bag Error)
- rmse > sqrt(mean(res^2))

SVM

Classification
Regression

Concepts :

Hyperplane
Suppport Vector
Rules for generating Hyperplane & Support Vector
1. Hyper plane must clasify the data accurately
2. Hyperplane must be equidistant from SV of two classes
3. To avoid over fitting hyperplane can be selected with some cost
Data that is not linearly seperatble uses kernel to increase
dimensions of one level in the sum this is called kernel trick

k-MEANS CLUSTERING

It is Unsuperivised Learning
1. All Independent variables must be continuous
2. Scale Data
3. Works using Euclidian Distance formula
4. To get optimum clusters calculate
- Within SS
- Between SS
- Limiting value should be sqrt of total no of variable
5. Calculate Optimum clusters using Elbow Method by PLOTTING WSS/BSS

cl_varkmeans(data, k)

cl_var$cluster will assign cluster for every observation.

MARKET BASKET ANALYSIS

Support
Confidence
Lift
Convert Data in Transaction Type
Implement arules
Arrange rules using parameters for confidence, support, lift.

rules <- inspect(rules[1:n])
arulesviz(rules)

Cross Validation

Hold Out
k-Fold
Repeated k-Fold
LOOCV

HOLD OUT

Create Data Partition on data with 75:25
Generate Model on Train data
Calculate RMSE / Classification Table
Apply model on test data
Calculate RMSE / Classification tABLE
Compare RMSE train with RMSE Test

k Fold

Generate K
Train to define foldes
TrainControl to assign algorithm on folds
avg RMSE (regression) / avg Accuracy (classification) of all folds
compare Average RMSE / Average Accuracy with whole data RMSE/ Accuracy

Repeated k Fold

Repeats k- Fold

LOOCV

Leaves one out

Revision Notes

2019-07-12