Partition Trees: A Study Guide with Mathematical and Coding
Representation
1. Introduction to Partition Trees
Definition
Partition trees, also known as decision trees, are hierarchical
models used for classification and regression. They recursively split
data into subsets based on feature values to create structured decision
paths.
Partition trees form the basis for advanced algorithms such as
Random Forests and XGBoost, making
them essential in modern data science.
2. Mathematical Representation of Partition
Trees
A partition tree works by selecting the best feature to split data at
each node. The selection is based on minimizing impurity, often measured
using Entropy or Gini Impurity.
Entropy-Based Split
To determine the best split: IG(S,A)=H(S)−∑v∈A|Sv||S|H(Sv) where: - IG(S,A) is the
Information Gain for splitting attribute A - H(S) is the entropy of the dataset S - Sv represents subsets after splitting by
A
Gini-Based Split
The Gini Impurity of a node is given by: G(X)=1−k∑i=1p2i where: - G(X) is the
impurity score - pi represents the
probability of each class in the node
The best split minimizes the weighted Gini impurity across resulting
partitions.
3. Partition Trees in Decision Trees
A decision tree model recursively partitions the feature space into
subregions.
- Stopping Criteria: The tree stops growing when:
- A predefined maximum depth is reached
- A minimum number of samples per node is met
- The split does not provide significant information gain
Example: Iris Dataset Split
A decision tree on the Iris dataset could split
based on petal length and width: -
If petal length < 2.45 → classify as Setosa
- If petal length > 2.45 and petal width < 1.75 →
classify as Versicolor - Otherwise, classify
as Virginica
4. Python Implementation of Partition Trees
Training a Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
clf = DecisionTreeClassifier(criterion='gini', max_depth=3)
clf.fit(X_train, y_train)
print("Decision Tree Accuracy:", clf.score(X_test, y_test))
Visualizing the Tree
from sklearn import tree
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
tree.plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()
5. Key Takeaways
- Partition Trees use recursive binary splits to
classify data.
- Splitting criteria include Entropy and Gini
Impurity, chosen based on the dataset and computation
efficiency.
- Decision trees are prone to overfitting, requiring
hyperparameter tuning (e.g., max depth, min samples per leaf).
- Partition Trees are the foundation of ensemble
models such as Random Forests and Gradient Boosted Trees.
LS0tDQp0aXRsZTogIlBhcnRpdGlvbiBUcmVlcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCioqUGFydGl0aW9uIFRyZWVzOiBBIFN0dWR5IEd1aWRlIHdpdGggTWF0aGVtYXRpY2FsIGFuZCBDb2RpbmcgUmVwcmVzZW50YXRpb24qKg0KDQojIyAqKjEuIEludHJvZHVjdGlvbiB0byBQYXJ0aXRpb24gVHJlZXMqKg0KDQojIyMgKipEZWZpbml0aW9uKioNClBhcnRpdGlvbiB0cmVlcywgYWxzbyBrbm93biBhcyBkZWNpc2lvbiB0cmVlcywgYXJlIGhpZXJhcmNoaWNhbCBtb2RlbHMgdXNlZCBmb3IgY2xhc3NpZmljYXRpb24gYW5kIHJlZ3Jlc3Npb24uIFRoZXkgcmVjdXJzaXZlbHkgc3BsaXQgZGF0YSBpbnRvIHN1YnNldHMgYmFzZWQgb24gZmVhdHVyZSB2YWx1ZXMgdG8gY3JlYXRlIHN0cnVjdHVyZWQgZGVjaXNpb24gcGF0aHMuDQoNClBhcnRpdGlvbiB0cmVlcyBmb3JtIHRoZSBiYXNpcyBmb3IgYWR2YW5jZWQgYWxnb3JpdGhtcyBzdWNoIGFzICoqUmFuZG9tIEZvcmVzdHMqKiBhbmQgKipYR0Jvb3N0KiosIG1ha2luZyB0aGVtIGVzc2VudGlhbCBpbiBtb2Rlcm4gZGF0YSBzY2llbmNlLg0KDQotLS0NCg0KIyMgKioyLiBNYXRoZW1hdGljYWwgUmVwcmVzZW50YXRpb24gb2YgUGFydGl0aW9uIFRyZWVzKioNCg0KQSBwYXJ0aXRpb24gdHJlZSB3b3JrcyBieSBzZWxlY3RpbmcgdGhlIGJlc3QgZmVhdHVyZSB0byBzcGxpdCBkYXRhIGF0IGVhY2ggbm9kZS4gVGhlIHNlbGVjdGlvbiBpcyBiYXNlZCBvbiBtaW5pbWl6aW5nIGltcHVyaXR5LCBvZnRlbiBtZWFzdXJlZCB1c2luZyAqKkVudHJvcHkqKiBvciAqKkdpbmkgSW1wdXJpdHkqKi4NCg0KIyMjICoqRW50cm9weS1CYXNlZCBTcGxpdCoqDQpUbyBkZXRlcm1pbmUgdGhlIGJlc3Qgc3BsaXQ6DQpcWw0KSUcoUywgQSkgPSBIKFMpIC0gXHN1bV97diBcaW4gQX0gXGZyYWN7fFNfdnx9e3xTfH0gSChTX3YpDQpcXQ0Kd2hlcmU6DQotIFwoIElHKFMsIEEpIFwpIGlzIHRoZSAqKkluZm9ybWF0aW9uIEdhaW4qKiBmb3Igc3BsaXR0aW5nIGF0dHJpYnV0ZSBcKCBBIFwpDQotIFwoIEgoUykgXCkgaXMgdGhlIGVudHJvcHkgb2YgdGhlIGRhdGFzZXQgXCggUyBcKQ0KLSBcKCBTX3YgXCkgcmVwcmVzZW50cyBzdWJzZXRzIGFmdGVyIHNwbGl0dGluZyBieSBcKCBBIFwpDQoNCiMjIyAqKkdpbmktQmFzZWQgU3BsaXQqKg0KVGhlIEdpbmkgSW1wdXJpdHkgb2YgYSBub2RlIGlzIGdpdmVuIGJ5Og0KXFsNCkcoWCkgPSAxIC0gXHN1bV97aT0xfV57a30gcF9pXjINClxdDQp3aGVyZToNCi0gXCggRyhYKSBcKSBpcyB0aGUgaW1wdXJpdHkgc2NvcmUNCi0gXCggcF9pIFwpIHJlcHJlc2VudHMgdGhlIHByb2JhYmlsaXR5IG9mIGVhY2ggY2xhc3MgaW4gdGhlIG5vZGUNCg0KVGhlIGJlc3Qgc3BsaXQgbWluaW1pemVzIHRoZSB3ZWlnaHRlZCBHaW5pIGltcHVyaXR5IGFjcm9zcyByZXN1bHRpbmcgcGFydGl0aW9ucy4NCg0KLS0tDQoNCiMjICoqMy4gUGFydGl0aW9uIFRyZWVzIGluIERlY2lzaW9uIFRyZWVzKioNCg0KQSBkZWNpc2lvbiB0cmVlIG1vZGVsIHJlY3Vyc2l2ZWx5IHBhcnRpdGlvbnMgdGhlIGZlYXR1cmUgc3BhY2UgaW50byBzdWJyZWdpb25zLg0KDQotICoqU3RvcHBpbmcgQ3JpdGVyaWEqKjogVGhlIHRyZWUgc3RvcHMgZ3Jvd2luZyB3aGVuOg0KICAtIEEgcHJlZGVmaW5lZCBtYXhpbXVtIGRlcHRoIGlzIHJlYWNoZWQNCiAgLSBBIG1pbmltdW0gbnVtYmVyIG9mIHNhbXBsZXMgcGVyIG5vZGUgaXMgbWV0DQogIC0gVGhlIHNwbGl0IGRvZXMgbm90IHByb3ZpZGUgc2lnbmlmaWNhbnQgaW5mb3JtYXRpb24gZ2Fpbg0KDQojIyMgKipFeGFtcGxlOiBJcmlzIERhdGFzZXQgU3BsaXQqKg0KQSBkZWNpc2lvbiB0cmVlIG9uIHRoZSAqKklyaXMgZGF0YXNldCoqIGNvdWxkIHNwbGl0IGJhc2VkIG9uICoqcGV0YWwgbGVuZ3RoKiogYW5kICoqd2lkdGgqKjoNCi0gKipJZiBwZXRhbCBsZW5ndGggPCAyLjQ1Kiog4oaSIGNsYXNzaWZ5IGFzICpTZXRvc2EqDQotICoqSWYgcGV0YWwgbGVuZ3RoID4gMi40NSBhbmQgcGV0YWwgd2lkdGggPCAxLjc1Kiog4oaSIGNsYXNzaWZ5IGFzICpWZXJzaWNvbG9yKg0KLSAqKk90aGVyd2lzZSoqLCBjbGFzc2lmeSBhcyAqVmlyZ2luaWNhKg0KDQotLS0NCg0KIyMgKio0LiBQeXRob24gSW1wbGVtZW50YXRpb24gb2YgUGFydGl0aW9uIFRyZWVzKioNCg0KIyMjICoqVHJhaW5pbmcgYSBEZWNpc2lvbiBUcmVlIENsYXNzaWZpZXIqKg0KYGBgcHl0aG9uDQpmcm9tIHNrbGVhcm4udHJlZSBpbXBvcnQgRGVjaXNpb25UcmVlQ2xhc3NpZmllcg0KZnJvbSBza2xlYXJuLmRhdGFzZXRzIGltcG9ydCBsb2FkX2lyaXMNCmZyb20gc2tsZWFybi5tb2RlbF9zZWxlY3Rpb24gaW1wb3J0IHRyYWluX3Rlc3Rfc3BsaXQNCg0KIyBMb2FkIGRhdGFzZXQNCmlyaXMgPSBsb2FkX2lyaXMoKQ0KWF90cmFpbiwgWF90ZXN0LCB5X3RyYWluLCB5X3Rlc3QgPSB0cmFpbl90ZXN0X3NwbGl0KGlyaXMuZGF0YSwgaXJpcy50YXJnZXQsIHRlc3Rfc2l6ZT0wLjIsIHJhbmRvbV9zdGF0ZT00MikNCg0KIyBUcmFpbiBEZWNpc2lvbiBUcmVlIENsYXNzaWZpZXINCmNsZiA9IERlY2lzaW9uVHJlZUNsYXNzaWZpZXIoY3JpdGVyaW9uPSdnaW5pJywgbWF4X2RlcHRoPTMpDQpjbGYuZml0KFhfdHJhaW4sIHlfdHJhaW4pDQoNCnByaW50KCJEZWNpc2lvbiBUcmVlIEFjY3VyYWN5OiIsIGNsZi5zY29yZShYX3Rlc3QsIHlfdGVzdCkpDQpgYGANCg0KIyMjICoqVmlzdWFsaXppbmcgdGhlIFRyZWUqKg0KYGBgcHl0aG9uDQpmcm9tIHNrbGVhcm4gaW1wb3J0IHRyZWUNCmltcG9ydCBtYXRwbG90bGliLnB5cGxvdCBhcyBwbHQNCg0KcGx0LmZpZ3VyZShmaWdzaXplPSgxMiwgNikpDQp0cmVlLnBsb3RfdHJlZShjbGYsIGZlYXR1cmVfbmFtZXM9aXJpcy5mZWF0dXJlX25hbWVzLCBjbGFzc19uYW1lcz1pcmlzLnRhcmdldF9uYW1lcywgZmlsbGVkPVRydWUpDQpwbHQuc2hvdygpDQpgYGANCg0KLS0tDQoNCiMjICoqNS4gS2V5IFRha2Vhd2F5cyoqDQoxLiAqKlBhcnRpdGlvbiBUcmVlcyB1c2UgcmVjdXJzaXZlIGJpbmFyeSBzcGxpdHMqKiB0byBjbGFzc2lmeSBkYXRhLg0KMi4gKipTcGxpdHRpbmcgY3JpdGVyaWEgaW5jbHVkZSBFbnRyb3B5IGFuZCBHaW5pIEltcHVyaXR5KiosIGNob3NlbiBiYXNlZCBvbiB0aGUgZGF0YXNldCBhbmQgY29tcHV0YXRpb24gZWZmaWNpZW5jeS4NCjMuICoqRGVjaXNpb24gdHJlZXMgYXJlIHByb25lIHRvIG92ZXJmaXR0aW5nKiosIHJlcXVpcmluZyBoeXBlcnBhcmFtZXRlciB0dW5pbmcgKGUuZy4sIG1heCBkZXB0aCwgbWluIHNhbXBsZXMgcGVyIGxlYWYpLg0KNC4gKipQYXJ0aXRpb24gVHJlZXMgYXJlIHRoZSBmb3VuZGF0aW9uIG9mIGVuc2VtYmxlIG1vZGVscyoqIHN1Y2ggYXMgUmFuZG9tIEZvcmVzdHMgYW5kIEdyYWRpZW50IEJvb3N0ZWQgVHJlZXMuDQoNCg==