Random Forest: A Study Guide with Mathematical and Coding Representation

1. Introduction to Random Forest

Definition

Random Forest is an ensemble learning technique that extends Bagging (Bootstrap Aggregation) by adding feature randomness in addition to data randomness. It trains multiple decision trees on different bootstrapped samples of the dataset and combines their predictions for improved performance and stability.

Random Forest is widely used for both classification and regression tasks due to its robustness, ability to handle missing data, and resistance to overfitting.


2. Mathematical Representation of Random Forest

Random Forest works by training multiple decision trees and aggregating their outputs using:

where: - \(B\) is the total number of trees in the forest - \(f_b(x)\) is the prediction from the \(b\)-th tree

By introducing randomness at both data and feature levels, Random Forest reduces variance while maintaining accuracy.


3. Random Forest in Decision Trees

This feature randomness helps improve generalization by reducing correlation among trees, making Random Forest more robust than a single decision tree.


4. Python Implementation of Random Forest

Training a Random Forest Classifier

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Train a Random Forest Classifier
rf = RandomForestClassifier(n_estimators=100, max_features='sqrt', random_state=42)
rf.fit(X_train, y_train)

# Make predictions
y_pred_rf = rf.predict(X_test)
accuracy_rf = accuracy_score(y_test, y_pred_rf)

# Print results
print(f"Accuracy of Random Forest (100 trees): {accuracy_rf:.4f}")

Feature Importance in Random Forest

import matplotlib.pyplot as plt
import numpy as np

# Get feature importances
feature_importances = rf.feature_importances_
features = iris.feature_names

# Plot feature importances
plt.figure(figsize=(10, 5))
plt.barh(features, feature_importances, color='skyblue')
plt.xlabel("Feature Importance")
plt.ylabel("Feature")
plt.title("Feature Importance in Random Forest")
plt.show()

5. Key Takeaways

  1. Random Forest extends Bagging by adding feature randomness, reducing overfitting and improving generalization.
  2. Each tree is trained on a different bootstrap sample, ensuring diversity in predictions.
  3. Only a random subset of features is considered at each split, preventing dominant features from overshadowing others.
  4. Random Forest is resistant to overfitting, especially when the number of trees is sufficiently large.
  5. Feature importance analysis helps identify the most relevant predictors, making it a valuable tool for feature selection.

By leveraging Random Forest, we can significantly improve predictive accuracy while maintaining robustness in machine learning models.

LS0tDQp0aXRsZTogIlJhbmRvbSBGb3Jlc3QiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQoqKlJhbmRvbSBGb3Jlc3Q6IEEgU3R1ZHkgR3VpZGUgd2l0aCBNYXRoZW1hdGljYWwgYW5kIENvZGluZyBSZXByZXNlbnRhdGlvbioqDQoNCiMjICoqMS4gSW50cm9kdWN0aW9uIHRvIFJhbmRvbSBGb3Jlc3QqKg0KDQojIyMgKipEZWZpbml0aW9uKioNClJhbmRvbSBGb3Jlc3QgaXMgYW4gZW5zZW1ibGUgbGVhcm5pbmcgdGVjaG5pcXVlIHRoYXQgZXh0ZW5kcyAqKkJhZ2dpbmcgKEJvb3RzdHJhcCBBZ2dyZWdhdGlvbikqKiBieSBhZGRpbmcgZmVhdHVyZSByYW5kb21uZXNzIGluIGFkZGl0aW9uIHRvIGRhdGEgcmFuZG9tbmVzcy4gSXQgdHJhaW5zIG11bHRpcGxlIGRlY2lzaW9uIHRyZWVzIG9uIGRpZmZlcmVudCBib290c3RyYXBwZWQgc2FtcGxlcyBvZiB0aGUgZGF0YXNldCBhbmQgY29tYmluZXMgdGhlaXIgcHJlZGljdGlvbnMgZm9yIGltcHJvdmVkIHBlcmZvcm1hbmNlIGFuZCBzdGFiaWxpdHkuDQoNClJhbmRvbSBGb3Jlc3QgaXMgd2lkZWx5IHVzZWQgZm9yIGJvdGggKipjbGFzc2lmaWNhdGlvbioqIGFuZCAqKnJlZ3Jlc3Npb24qKiB0YXNrcyBkdWUgdG8gaXRzIHJvYnVzdG5lc3MsIGFiaWxpdHkgdG8gaGFuZGxlIG1pc3NpbmcgZGF0YSwgYW5kIHJlc2lzdGFuY2UgdG8gb3ZlcmZpdHRpbmcuDQoNCi0tLQ0KDQojIyAqKjIuIE1hdGhlbWF0aWNhbCBSZXByZXNlbnRhdGlvbiBvZiBSYW5kb20gRm9yZXN0KioNCg0KUmFuZG9tIEZvcmVzdCB3b3JrcyBieSB0cmFpbmluZyBtdWx0aXBsZSBkZWNpc2lvbiB0cmVlcyBhbmQgYWdncmVnYXRpbmcgdGhlaXIgb3V0cHV0cyB1c2luZzoNCg0KLSAqKkJvb3RzdHJhcCBTYW1wbGluZzoqKg0KICAtIEdpdmVuIGEgZGF0YXNldCBcKCBEIFwpIHdpdGggXCggTiBcKSBzYW1wbGVzLCBtdWx0aXBsZSBzdWJzZXRzIFwoIERfYiBcKSBhcmUgY3JlYXRlZCBieSByYW5kb21seSBzYW1wbGluZyAqKndpdGggcmVwbGFjZW1lbnQqKi4NCiAgLSBFYWNoIHN1YnNldCBpcyB1c2VkIHRvIHRyYWluIGFuIGluZGl2aWR1YWwgZGVjaXNpb24gdHJlZS4NCg0KLSAqKkZlYXR1cmUgUmFuZG9tbmVzczoqKg0KICAtIEluc3RlYWQgb2YgY29uc2lkZXJpbmcgYWxsIGZlYXR1cmVzIGF0IGVhY2ggc3BsaXQsIG9ubHkgYSAqKnJhbmRvbSBzdWJzZXQgb2YgZmVhdHVyZXMqKiBpcyBjb25zaWRlcmVkLg0KDQotICoqRmluYWwgUHJlZGljdGlvbjoqKg0KICAtICoqRm9yIFJlZ3Jlc3Npb246KiogVGhlIGZpbmFsIHByZWRpY3Rpb24gaXMgdGhlICoqYXZlcmFnZSoqIG9mIGFsbCB0cmVlIG91dHB1dHM6DQogICAgXFsNCiAgICBGKHgpID0gXGZyYWN7MX17Qn0gXHN1bV97Yj0xfV57Qn0gZl9iKHgpDQogICAgXF0NCiAgLSAqKkZvciBDbGFzc2lmaWNhdGlvbjoqKiBUaGUgZmluYWwgcHJlZGljdGlvbiBpcyB0aGUgKiptYWpvcml0eSB2b3RlKiogYW1vbmcgYWxsIHRyZWVzOg0KICAgIFxbDQogICAgRih4KSA9IFx0ZXh0e21vZGV9IFx7IGZfYih4KSBcfQ0KICAgIFxdDQoNCndoZXJlOg0KLSBcKCBCIFwpIGlzIHRoZSB0b3RhbCBudW1iZXIgb2YgdHJlZXMgaW4gdGhlIGZvcmVzdA0KLSBcKCBmX2IoeCkgXCkgaXMgdGhlIHByZWRpY3Rpb24gZnJvbSB0aGUgXCggYiBcKS10aCB0cmVlDQoNCkJ5IGludHJvZHVjaW5nIHJhbmRvbW5lc3MgYXQgYm90aCAqKmRhdGEqKiBhbmQgKipmZWF0dXJlKiogbGV2ZWxzLCBSYW5kb20gRm9yZXN0IHJlZHVjZXMgdmFyaWFuY2Ugd2hpbGUgbWFpbnRhaW5pbmcgYWNjdXJhY3kuDQoNCi0tLQ0KDQojIyAqKjMuIFJhbmRvbSBGb3Jlc3QgaW4gRGVjaXNpb24gVHJlZXMqKg0KDQotICoqQm9vdHN0cmFwIFNhbXBsZSBDcmVhdGlvbjoqKiBFYWNoIHRyZWUgaXMgdHJhaW5lZCBvbiBhIGRpZmZlcmVudCByYW5kb21seSBzYW1wbGVkIHN1YnNldC4NCi0gKipGZWF0dXJlIFNlbGVjdGlvbjoqKiBBdCBlYWNoIHNwbGl0LCBhICoqcmFuZG9tIHN1YnNldCBvZiBmZWF0dXJlcyoqIGlzIGNvbnNpZGVyZWQgaW5zdGVhZCBvZiBhbGwgZmVhdHVyZXMuDQotICoqSW5kZXBlbmRlbnQgVHJlZSBUcmFpbmluZzoqKiBFYWNoIHRyZWUgaXMgdHJhaW5lZCBpbmRlcGVuZGVudGx5IGFuZCBtYWtlcyBwcmVkaWN0aW9ucyB3aXRob3V0IGludGVyYWN0aW9uLg0KLSAqKkFnZ3JlZ2F0aW9uOioqIFRoZSBmaW5hbCBvdXRwdXQgaXMgb2J0YWluZWQgdmlhIG1ham9yaXR5IHZvdGluZyAoY2xhc3NpZmljYXRpb24pIG9yIGF2ZXJhZ2luZyAocmVncmVzc2lvbikuDQoNClRoaXMgZmVhdHVyZSByYW5kb21uZXNzIGhlbHBzIGltcHJvdmUgZ2VuZXJhbGl6YXRpb24gYnkgKipyZWR1Y2luZyBjb3JyZWxhdGlvbiBhbW9uZyB0cmVlcyoqLCBtYWtpbmcgUmFuZG9tIEZvcmVzdCBtb3JlIHJvYnVzdCB0aGFuIGEgc2luZ2xlIGRlY2lzaW9uIHRyZWUuDQoNCi0tLQ0KDQojIyAqKjQuIFB5dGhvbiBJbXBsZW1lbnRhdGlvbiBvZiBSYW5kb20gRm9yZXN0KioNCg0KIyMjICoqVHJhaW5pbmcgYSBSYW5kb20gRm9yZXN0IENsYXNzaWZpZXIqKg0KYGBgcHl0aG9uDQpmcm9tIHNrbGVhcm4uZW5zZW1ibGUgaW1wb3J0IFJhbmRvbUZvcmVzdENsYXNzaWZpZXINCmZyb20gc2tsZWFybi5kYXRhc2V0cyBpbXBvcnQgbG9hZF9pcmlzDQpmcm9tIHNrbGVhcm4ubW9kZWxfc2VsZWN0aW9uIGltcG9ydCB0cmFpbl90ZXN0X3NwbGl0DQpmcm9tIHNrbGVhcm4ubWV0cmljcyBpbXBvcnQgYWNjdXJhY3lfc2NvcmUNCg0KIyBMb2FkIGRhdGFzZXQNCmlyaXMgPSBsb2FkX2lyaXMoKQ0KWF90cmFpbiwgWF90ZXN0LCB5X3RyYWluLCB5X3Rlc3QgPSB0cmFpbl90ZXN0X3NwbGl0KGlyaXMuZGF0YSwgaXJpcy50YXJnZXQsIHRlc3Rfc2l6ZT0wLjIsIHJhbmRvbV9zdGF0ZT00MikNCg0KIyBUcmFpbiBhIFJhbmRvbSBGb3Jlc3QgQ2xhc3NpZmllcg0KcmYgPSBSYW5kb21Gb3Jlc3RDbGFzc2lmaWVyKG5fZXN0aW1hdG9ycz0xMDAsIG1heF9mZWF0dXJlcz0nc3FydCcsIHJhbmRvbV9zdGF0ZT00MikNCnJmLmZpdChYX3RyYWluLCB5X3RyYWluKQ0KDQojIE1ha2UgcHJlZGljdGlvbnMNCnlfcHJlZF9yZiA9IHJmLnByZWRpY3QoWF90ZXN0KQ0KYWNjdXJhY3lfcmYgPSBhY2N1cmFjeV9zY29yZSh5X3Rlc3QsIHlfcHJlZF9yZikNCg0KIyBQcmludCByZXN1bHRzDQpwcmludChmIkFjY3VyYWN5IG9mIFJhbmRvbSBGb3Jlc3QgKDEwMCB0cmVlcyk6IHthY2N1cmFjeV9yZjouNGZ9IikNCmBgYA0KDQojIyMgKipGZWF0dXJlIEltcG9ydGFuY2UgaW4gUmFuZG9tIEZvcmVzdCoqDQpgYGBweXRob24NCmltcG9ydCBtYXRwbG90bGliLnB5cGxvdCBhcyBwbHQNCmltcG9ydCBudW1weSBhcyBucA0KDQojIEdldCBmZWF0dXJlIGltcG9ydGFuY2VzDQpmZWF0dXJlX2ltcG9ydGFuY2VzID0gcmYuZmVhdHVyZV9pbXBvcnRhbmNlc18NCmZlYXR1cmVzID0gaXJpcy5mZWF0dXJlX25hbWVzDQoNCiMgUGxvdCBmZWF0dXJlIGltcG9ydGFuY2VzDQpwbHQuZmlndXJlKGZpZ3NpemU9KDEwLCA1KSkNCnBsdC5iYXJoKGZlYXR1cmVzLCBmZWF0dXJlX2ltcG9ydGFuY2VzLCBjb2xvcj0nc2t5Ymx1ZScpDQpwbHQueGxhYmVsKCJGZWF0dXJlIEltcG9ydGFuY2UiKQ0KcGx0LnlsYWJlbCgiRmVhdHVyZSIpDQpwbHQudGl0bGUoIkZlYXR1cmUgSW1wb3J0YW5jZSBpbiBSYW5kb20gRm9yZXN0IikNCnBsdC5zaG93KCkNCmBgYA0KDQotLS0NCg0KIyMgKio1LiBLZXkgVGFrZWF3YXlzKioNCjEuICoqUmFuZG9tIEZvcmVzdCBleHRlbmRzIEJhZ2dpbmcqKiBieSBhZGRpbmcgZmVhdHVyZSByYW5kb21uZXNzLCByZWR1Y2luZyBvdmVyZml0dGluZyBhbmQgaW1wcm92aW5nIGdlbmVyYWxpemF0aW9uLg0KMi4gKipFYWNoIHRyZWUgaXMgdHJhaW5lZCBvbiBhIGRpZmZlcmVudCBib290c3RyYXAgc2FtcGxlKiosIGVuc3VyaW5nIGRpdmVyc2l0eSBpbiBwcmVkaWN0aW9ucy4NCjMuICoqT25seSBhIHJhbmRvbSBzdWJzZXQgb2YgZmVhdHVyZXMgaXMgY29uc2lkZXJlZCoqIGF0IGVhY2ggc3BsaXQsIHByZXZlbnRpbmcgZG9taW5hbnQgZmVhdHVyZXMgZnJvbSBvdmVyc2hhZG93aW5nIG90aGVycy4NCjQuICoqUmFuZG9tIEZvcmVzdCBpcyByZXNpc3RhbnQgdG8gb3ZlcmZpdHRpbmcqKiwgZXNwZWNpYWxseSB3aGVuIHRoZSBudW1iZXIgb2YgdHJlZXMgaXMgc3VmZmljaWVudGx5IGxhcmdlLg0KNS4gKipGZWF0dXJlIGltcG9ydGFuY2UgYW5hbHlzaXMgaGVscHMgaWRlbnRpZnkgdGhlIG1vc3QgcmVsZXZhbnQgcHJlZGljdG9ycyoqLCBtYWtpbmcgaXQgYSB2YWx1YWJsZSB0b29sIGZvciBmZWF0dXJlIHNlbGVjdGlvbi4NCg0KQnkgbGV2ZXJhZ2luZyBSYW5kb20gRm9yZXN0LCB3ZSBjYW4gc2lnbmlmaWNhbnRseSBpbXByb3ZlIHByZWRpY3RpdmUgYWNjdXJhY3kgd2hpbGUgbWFpbnRhaW5pbmcgcm9idXN0bmVzcyBpbiBtYWNoaW5lIGxlYXJuaW5nIG1vZGVscy4NCg0KDQo=