Here’s your detailed Module 14 Study Guide: Ensembling —
Quantifying the World with Dr. Slater
Includes full coverage: visuals, math, coding, examples, and
end-of-module questions.
Title: Module 14 – Ensembling: Voting, Stacking, and
Real-World Performance
Section 1: What Is Ensembling?
Definition:
Ensembling is a technique where multiple models (weak learners) are
combined to create a stronger, more accurate model.
Two main types:
- Voting (Bagging) – Combines outputs of independent
models. Majority or weighted vote decides the output.
- Stacking (Stacked Generalization) – A meta-model
learns to combine the predictions of base models.
Section 2: Voting Ensemble — Concept and Math
From Slide 1:
- Models: A, B, C
- Each model gives binary predictions
- Final decision = majority vote
Example Voting Table Recap: Model A: [1, 0, 1, 1, 0,
1, 1, 0, 0, 0] (80%)
Model B: [1, 1, 0, 1, 1, 0, 1, 0, 1, 1] (60%)
Model C: [1, 0, 0, 0, 1, 1, 1, 1, 0, 1] (70%)
Vote Winner: [1, 0, 0, 1, 1, 1, 1, 0, 0, 1]
True Values: [1, 0, 0, 1, 0, 1, 1, 1, 0, 0]
Vote Accuracy = 90%
Mathematical Notation:
Let \(M_i(x)\) be prediction of model i
on input x.
Voting ensemble prediction:
\[ \hat{y} = \text{mode}(M_1(x), M_2(x), ...,
M_n(x)) \]
Section 3: Stacking — Ensemble Diagram
From Slide 2:
- Models A, B, and C are trained on original data.
- Their predictions are collected as features for training a new model
φ (meta-model).
- Out-of-Fold (OOF): Prevents data leakage by
training base models on k-1 folds and generating predictions on the
k-th.
Mathematical Flow: Let: - \(h_i(x)\): Output of base model i
- Meta-model φ gets vector \([h_1(x), h_2(x),
..., h_n(x)]\)
- φ is trained to predict true label y from this vector.
\[ \phi([h_1(x), h_2(x), ..., h_n(x)])
\approx y \]
Section 4: Real-World Impact of Ensembling
From Slide 3:
- SQuAD2 Leaderboard (Nov 2020):
- Top 6 models were all ensembles.
- Best single model: Rank 7.
- Ensembles increase EM score, e.g. 89.55 (single) → 90.72
(ensemble)
Section 5: Code Examples
VotingClassifier (Sklearn)
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
clf1 = LogisticRegression(max_iter=1000)
clf2 = DecisionTreeClassifier()
clf3 = SVC(probability=True)
ensemble = VotingClassifier(estimators=[
('lr', clf1), ('dt', clf2), ('svc', clf3)], voting='soft')
scores = cross_val_score(ensemble, X, y, cv=5)
print(f'Voting Ensemble Accuracy: {scores.mean():.4f}')
StackingClassifier (Sklearn)
from sklearn.ensemble import StackingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
base_learners = [
('gnb', GaussianNB()),
('knn', KNeighborsClassifier())
]
meta = LogisticRegression()
stack_model = StackingClassifier(estimators=base_learners, final_estimator=meta)
scores = cross_val_score(stack_model, X, y, cv=5)
print(f'Stacking Ensemble Accuracy: {scores.mean():.4f}')
Section 6: Why It Works
Bias-Variance Tradeoff:
- Voting lowers variance. - Stacking lowers both bias and variance if
base models are diverse.
Theorem (Simplified): If each classifier has error
< 0.5 and errors are independent, then ensembling reduces total
error.
Final Takeaways:
- Voting works well when models are independent and diverse.
- Stacking is more powerful but requires careful
cross-validation.
- Ensembles always outperform a single model if
implemented properly.
- Real-world systems (e.g., top NLP systems) use ensembles for SOTA
performance.
Questions for Dr. Slater:
- When stacking, how do we avoid overfitting with the meta-model?
- Are there known limits to ensemble size before it becomes
computationally inefficient?
- How does ensemble diversity affect final accuracy in non-binary
classification?
- Can ensemble models be interpreted easily for explainability?
- Are there practical techniques for ensembling deep learning models
like ConvNets or RNNs?
LS0tDQp0aXRsZTogIjczMzMgTW9kdWxlIDE0IC0gRW5zZW1ibGluZyAtIFFUVyAtIERyIFNsYXRlciBTcHJpbmcgMjAyNSINCmF1dGhvcjogIkplc3NpY2EgTWNQaGF1bCINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCkhlcmXigJlzIHlvdXIgZGV0YWlsZWQgKipNb2R1bGUgMTQgU3R1ZHkgR3VpZGU6IEVuc2VtYmxpbmcg4oCUIFF1YW50aWZ5aW5nIHRoZSBXb3JsZCB3aXRoIERyLiBTbGF0ZXIqKiAgDQpJbmNsdWRlcyBmdWxsIGNvdmVyYWdlOiB2aXN1YWxzLCBtYXRoLCBjb2RpbmcsIGV4YW1wbGVzLCBhbmQgZW5kLW9mLW1vZHVsZSBxdWVzdGlvbnMuDQoNCi0tLQ0KDQpUaXRsZTogKipNb2R1bGUgMTQg4oCTIEVuc2VtYmxpbmc6IFZvdGluZywgU3RhY2tpbmcsIGFuZCBSZWFsLVdvcmxkIFBlcmZvcm1hbmNlKioNCg0KLS0tDQoNClNlY3Rpb24gMTogKipXaGF0IElzIEVuc2VtYmxpbmc/KioNCg0KKipEZWZpbml0aW9uKio6ICANCkVuc2VtYmxpbmcgaXMgYSB0ZWNobmlxdWUgd2hlcmUgbXVsdGlwbGUgbW9kZWxzICh3ZWFrIGxlYXJuZXJzKSBhcmUgY29tYmluZWQgdG8gY3JlYXRlIGEgc3Ryb25nZXIsIG1vcmUgYWNjdXJhdGUgbW9kZWwuDQoNClR3byBtYWluIHR5cGVzOg0KDQoxLiAqKlZvdGluZyAoQmFnZ2luZykqKiDigJMgQ29tYmluZXMgb3V0cHV0cyBvZiBpbmRlcGVuZGVudCBtb2RlbHMuIE1ham9yaXR5IG9yIHdlaWdodGVkIHZvdGUgZGVjaWRlcyB0aGUgb3V0cHV0Lg0KMi4gKipTdGFja2luZyAoU3RhY2tlZCBHZW5lcmFsaXphdGlvbikqKiDigJMgQSBtZXRhLW1vZGVsIGxlYXJucyB0byBjb21iaW5lIHRoZSBwcmVkaWN0aW9ucyBvZiBiYXNlIG1vZGVscy4NCg0KLS0tDQoNClNlY3Rpb24gMjogKipWb3RpbmcgRW5zZW1ibGUg4oCUIENvbmNlcHQgYW5kIE1hdGgqKg0KDQpGcm9tIFNsaWRlIDE6DQoNCi0gKipNb2RlbHMqKjogQSwgQiwgQw0KLSBFYWNoIG1vZGVsIGdpdmVzIGJpbmFyeSBwcmVkaWN0aW9ucw0KLSBGaW5hbCBkZWNpc2lvbiA9IG1ham9yaXR5IHZvdGUNCg0KKipFeGFtcGxlIFZvdGluZyBUYWJsZSBSZWNhcCoqOg0KTW9kZWwgQTogWzEsIDAsIDEsIDEsIDAsIDEsIDEsIDAsIDAsIDBdICg4MCUpICANCk1vZGVsIEI6IFsxLCAxLCAwLCAxLCAxLCAwLCAxLCAwLCAxLCAxXSAoNjAlKSAgDQpNb2RlbCBDOiBbMSwgMCwgMCwgMCwgMSwgMSwgMSwgMSwgMCwgMV0gKDcwJSkgIA0KVm90ZSBXaW5uZXI6IFsxLCAwLCAwLCAxLCAxLCAxLCAxLCAwLCAwLCAxXSAgDQpUcnVlIFZhbHVlczogWzEsIDAsIDAsIDEsIDAsIDEsIDEsIDEsIDAsIDBdICANClZvdGUgQWNjdXJhY3kgPSAqKjkwJSoqDQoNCioqTWF0aGVtYXRpY2FsIE5vdGF0aW9uKio6ICANCkxldCBcKCBNX2koeCkgXCkgYmUgcHJlZGljdGlvbiBvZiBtb2RlbCBpIG9uIGlucHV0IHguICANClZvdGluZyBlbnNlbWJsZSBwcmVkaWN0aW9uOiAgDQpcWyBcaGF0e3l9ID0gXHRleHR7bW9kZX0oTV8xKHgpLCBNXzIoeCksIC4uLiwgTV9uKHgpKSBcXQ0KDQotLS0NCg0KU2VjdGlvbiAzOiAqKlN0YWNraW5nIOKAlCBFbnNlbWJsZSBEaWFncmFtKioNCg0KRnJvbSBTbGlkZSAyOg0KDQotIE1vZGVscyBBLCBCLCBhbmQgQyBhcmUgdHJhaW5lZCBvbiBvcmlnaW5hbCBkYXRhLg0KLSBUaGVpciBwcmVkaWN0aW9ucyBhcmUgY29sbGVjdGVkIGFzIGZlYXR1cmVzIGZvciB0cmFpbmluZyBhIG5ldyBtb2RlbCDPhiAobWV0YS1tb2RlbCkuDQotICoqT3V0LW9mLUZvbGQgKE9PRikqKjogUHJldmVudHMgZGF0YSBsZWFrYWdlIGJ5IHRyYWluaW5nIGJhc2UgbW9kZWxzIG9uIGstMSBmb2xkcyBhbmQgZ2VuZXJhdGluZyBwcmVkaWN0aW9ucyBvbiB0aGUgay10aC4NCg0KKipNYXRoZW1hdGljYWwgRmxvdyoqOg0KTGV0Og0KLSBcKCBoX2koeCkgXCk6IE91dHB1dCBvZiBiYXNlIG1vZGVsIGkgIA0KLSBNZXRhLW1vZGVsIM+GIGdldHMgdmVjdG9yIFwoIFtoXzEoeCksIGhfMih4KSwgLi4uLCBoX24oeCldIFwpICANCi0gz4YgaXMgdHJhaW5lZCB0byBwcmVkaWN0IHRydWUgbGFiZWwgeSBmcm9tIHRoaXMgdmVjdG9yLg0KDQpcWyBccGhpKFtoXzEoeCksIGhfMih4KSwgLi4uLCBoX24oeCldKSBcYXBwcm94IHkgXF0NCg0KLS0tDQoNClNlY3Rpb24gNDogKipSZWFsLVdvcmxkIEltcGFjdCBvZiBFbnNlbWJsaW5nKioNCg0KRnJvbSBTbGlkZSAzOg0KDQotICoqU1F1QUQyIExlYWRlcmJvYXJkIChOb3YgMjAyMCkqKjoNCiAgLSBUb3AgNiBtb2RlbHMgd2VyZSBhbGwgZW5zZW1ibGVzLg0KICAtIEJlc3Qgc2luZ2xlIG1vZGVsOiBSYW5rIDcuDQogIC0gRW5zZW1ibGVzIGluY3JlYXNlIEVNIHNjb3JlLCBlLmcuIDg5LjU1IChzaW5nbGUpIOKGkiA5MC43MiAoZW5zZW1ibGUpDQoNCi0tLQ0KDQpTZWN0aW9uIDU6ICoqQ29kZSBFeGFtcGxlcyoqDQoNCioqVm90aW5nQ2xhc3NpZmllciAoU2tsZWFybikqKg0KDQpgYGBweXRob24NCmZyb20gc2tsZWFybi5lbnNlbWJsZSBpbXBvcnQgVm90aW5nQ2xhc3NpZmllcg0KZnJvbSBza2xlYXJuLmxpbmVhcl9tb2RlbCBpbXBvcnQgTG9naXN0aWNSZWdyZXNzaW9uDQpmcm9tIHNrbGVhcm4uc3ZtIGltcG9ydCBTVkMNCmZyb20gc2tsZWFybi50cmVlIGltcG9ydCBEZWNpc2lvblRyZWVDbGFzc2lmaWVyDQpmcm9tIHNrbGVhcm4ubW9kZWxfc2VsZWN0aW9uIGltcG9ydCBjcm9zc192YWxfc2NvcmUNCmZyb20gc2tsZWFybi5kYXRhc2V0cyBpbXBvcnQgbG9hZF9pcmlzDQoNClgsIHkgPSBsb2FkX2lyaXMocmV0dXJuX1hfeT1UcnVlKQ0KDQpjbGYxID0gTG9naXN0aWNSZWdyZXNzaW9uKG1heF9pdGVyPTEwMDApDQpjbGYyID0gRGVjaXNpb25UcmVlQ2xhc3NpZmllcigpDQpjbGYzID0gU1ZDKHByb2JhYmlsaXR5PVRydWUpDQoNCmVuc2VtYmxlID0gVm90aW5nQ2xhc3NpZmllcihlc3RpbWF0b3JzPVsNCiAgICAoJ2xyJywgY2xmMSksICgnZHQnLCBjbGYyKSwgKCdzdmMnLCBjbGYzKV0sIHZvdGluZz0nc29mdCcpDQoNCnNjb3JlcyA9IGNyb3NzX3ZhbF9zY29yZShlbnNlbWJsZSwgWCwgeSwgY3Y9NSkNCnByaW50KGYnVm90aW5nIEVuc2VtYmxlIEFjY3VyYWN5OiB7c2NvcmVzLm1lYW4oKTouNGZ9JykNCmBgYA0KDQoqKlN0YWNraW5nQ2xhc3NpZmllciAoU2tsZWFybikqKg0KDQpgYGBweXRob24NCmZyb20gc2tsZWFybi5lbnNlbWJsZSBpbXBvcnQgU3RhY2tpbmdDbGFzc2lmaWVyDQpmcm9tIHNrbGVhcm4ubmFpdmVfYmF5ZXMgaW1wb3J0IEdhdXNzaWFuTkINCmZyb20gc2tsZWFybi5uZWlnaGJvcnMgaW1wb3J0IEtOZWlnaGJvcnNDbGFzc2lmaWVyDQoNCmJhc2VfbGVhcm5lcnMgPSBbDQogICAgKCdnbmInLCBHYXVzc2lhbk5CKCkpLA0KICAgICgna25uJywgS05laWdoYm9yc0NsYXNzaWZpZXIoKSkNCl0NCm1ldGEgPSBMb2dpc3RpY1JlZ3Jlc3Npb24oKQ0KDQpzdGFja19tb2RlbCA9IFN0YWNraW5nQ2xhc3NpZmllcihlc3RpbWF0b3JzPWJhc2VfbGVhcm5lcnMsIGZpbmFsX2VzdGltYXRvcj1tZXRhKQ0Kc2NvcmVzID0gY3Jvc3NfdmFsX3Njb3JlKHN0YWNrX21vZGVsLCBYLCB5LCBjdj01KQ0KcHJpbnQoZidTdGFja2luZyBFbnNlbWJsZSBBY2N1cmFjeToge3Njb3Jlcy5tZWFuKCk6LjRmfScpDQpgYGANCg0KLS0tDQoNClNlY3Rpb24gNjogKipXaHkgSXQgV29ya3MqKg0KDQoqKkJpYXMtVmFyaWFuY2UgVHJhZGVvZmYqKjogIA0KLSBWb3RpbmcgbG93ZXJzIHZhcmlhbmNlLg0KLSBTdGFja2luZyBsb3dlcnMgYm90aCBiaWFzIGFuZCB2YXJpYW5jZSBpZiBiYXNlIG1vZGVscyBhcmUgZGl2ZXJzZS4NCg0KKipUaGVvcmVtKiogKFNpbXBsaWZpZWQpOg0KSWYgZWFjaCBjbGFzc2lmaWVyIGhhcyBlcnJvciA8IDAuNSBhbmQgZXJyb3JzIGFyZSBpbmRlcGVuZGVudCwgdGhlbiBlbnNlbWJsaW5nIHJlZHVjZXMgdG90YWwgZXJyb3IuDQoNCi0tLQ0KDQpGaW5hbCBUYWtlYXdheXM6DQoNCi0gVm90aW5nIHdvcmtzIHdlbGwgd2hlbiBtb2RlbHMgYXJlIGluZGVwZW5kZW50IGFuZCBkaXZlcnNlLg0KLSBTdGFja2luZyBpcyBtb3JlIHBvd2VyZnVsIGJ1dCByZXF1aXJlcyBjYXJlZnVsIGNyb3NzLXZhbGlkYXRpb24uDQotIEVuc2VtYmxlcyAqKmFsd2F5cyBvdXRwZXJmb3JtKiogYSBzaW5nbGUgbW9kZWwgaWYgaW1wbGVtZW50ZWQgcHJvcGVybHkuDQotIFJlYWwtd29ybGQgc3lzdGVtcyAoZS5nLiwgdG9wIE5MUCBzeXN0ZW1zKSB1c2UgZW5zZW1ibGVzIGZvciBTT1RBIHBlcmZvcm1hbmNlLg0KDQotLS0NCg0KUXVlc3Rpb25zIGZvciBEci4gU2xhdGVyOg0KDQoxLiBXaGVuIHN0YWNraW5nLCBob3cgZG8gd2UgYXZvaWQgb3ZlcmZpdHRpbmcgd2l0aCB0aGUgbWV0YS1tb2RlbD8NCjIuIEFyZSB0aGVyZSBrbm93biBsaW1pdHMgdG8gZW5zZW1ibGUgc2l6ZSBiZWZvcmUgaXQgYmVjb21lcyBjb21wdXRhdGlvbmFsbHkgaW5lZmZpY2llbnQ/DQozLiBIb3cgZG9lcyBlbnNlbWJsZSBkaXZlcnNpdHkgYWZmZWN0IGZpbmFsIGFjY3VyYWN5IGluIG5vbi1iaW5hcnkgY2xhc3NpZmljYXRpb24/DQo0LiBDYW4gZW5zZW1ibGUgbW9kZWxzIGJlIGludGVycHJldGVkIGVhc2lseSBmb3IgZXhwbGFpbmFiaWxpdHk/DQo1LiBBcmUgdGhlcmUgcHJhY3RpY2FsIHRlY2huaXF1ZXMgZm9yIGVuc2VtYmxpbmcgZGVlcCBsZWFybmluZyBtb2RlbHMgbGlrZSBDb252TmV0cyBvciBSTk5zPw0KDQo=