Split Data into Training and Testing Sets

# Set seed for reproducibility
set.seed(123)

# Load caret package
library(caret)

## Loading required package: ggplot2

## Loading required package: lattice

# Create a training and testing split
trainIndex <- createDataPartition(iris$Species, p = 0.8, list = FALSE)
trainData <- iris[trainIndex, ]
testData <- iris[-trainIndex, ]

Train a Random Forest Model

# Train a Random Forest model
model <- train(Species ~ ., data = trainData, method = "rf")

# View the trained model
print(model)

## Random Forest 
## 
## 120 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 120, 120, 120, 120, 120, 120, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   2     0.9474229  0.9201435
##   3     0.9455403  0.9172896
##   4     0.9462277  0.9182848
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.

Predict Test Data

# Make predictions on the test data
predictions <- predict(model, newdata = testData)

# Compare predictions with actual values
results <- data.frame(Actual = testData$Species, Predicted = predictions)
print(results)

##        Actual  Predicted
## 1      setosa     setosa
## 2      setosa     setosa
## 3      setosa     setosa
## 4      setosa     setosa
## 5      setosa     setosa
## 6      setosa     setosa
## 7      setosa     setosa
## 8      setosa     setosa
## 9      setosa     setosa
## 10     setosa     setosa
## 11 versicolor versicolor
## 12 versicolor versicolor
## 13 versicolor versicolor
## 14 versicolor versicolor
## 15 versicolor versicolor
## 16 versicolor versicolor
## 17 versicolor versicolor
## 18 versicolor versicolor
## 19 versicolor versicolor
## 20 versicolor versicolor
## 21  virginica  virginica
## 22  virginica  virginica
## 23  virginica  virginica
## 24  virginica  virginica
## 25  virginica  virginica
## 26  virginica versicolor
## 27  virginica  virginica
## 28  virginica versicolor
## 29  virginica  virginica
## 30  virginica  virginica

Calculate Accuracy

# Calculate the accuracy of the model
accuracy <- sum(results$Actual == results$Predicted) / nrow(results)
cat("Test accuracy of the model:", accuracy, "\n")

## Test accuracy of the model: 0.9333333

Summary of Results

This document demonstrates splitting data into training and testing sets, training a Random Forest model using the caret package, and evaluating predictions on the test set.

Train and Test Data Split with Random Forest

BIOC599

Sheng Li, PhD @ Keck USC

2025-01-21

Split Data into Training and Testing Sets

Train a Random Forest Model

Predict Test Data

Calculate Accuracy

Summary of Results