Ensemble Learning For Predictive Models
Ensemble learning for Higher Accuracy Predictive Models is a general meta approach to machine learning that seeks better predictive performance by combining the predictions from multiple models. By the end of this post, you will have a strong base knowledge about ensemble methods and how to use them to improve your Machine Learning models.
After reading this post you should know the following:
- What are Ensemble Methods?
- Types of Ensemble Methods
- Bagging
- Boosting
- Stacking
- Advantages & Disadvantages of Using Ensembles
- Code Examples
What is Ensemble Learning For Predictive Models?
Ensemble Learning methods use multiple weak learners (such as decision trees) to create a stronger learner that is able to make more accurate predictions than any single model could on its own.
This is done by combining different models together and then taking their average prediction or weighted average prediction depending on the type of ensemble method used.
The idea behind ensembles is that each individual model may have its own strengths and weaknesses which can be combined together to produce a stronger overall result than any one model alone would be able to achieve.
By combining several weaker models together, it also helps reduce overfitting since each model will bring something unique to the table and help improve generalization performance when making predictions on unseen data points outside the training set used for building the models initially.
Types of Ensemble Methods
There are two main types of ensemble methods:
Bagging Algorithms (Bootstrap Aggregating)
Bagging works by randomly sampling from the original dataset with replacement in order to create multiple datasets which can then be used for training separate classifiers/models independently from each other.
These classifiers/models will all have different results due to being trained differently but they can still be averaged out at inference time in order for us obtain an overall better result than just relying on any one individual classifier/model alone would give us.
This means we don’t need to have accessor knowledge about how well it performs compared against other potential alternatives available within our dataset’s feature space domain!
Boosting Algorithms
On top hand boosting algorithms work by sequentially adding new weak learners onto existing ones until we reach an optimal solution where accuracy has been maximized across all testing sets; this process involves giving greater weightage towards those samples which were misclassified previously thereby helping reduce bias errors made during previous iterations while simultaneously increasing variance between successive rounds too!
In summary:
- The main idea is to bias the training data toward those examples that are hard to predict
- Iteratively add ensemble members to correct and focus on wrong predictions of prior models
- Each model is given a weight based on its performance
- The final prediction is the weighted average combination of all models
- This method bias training data toward those examples that are hard to predict
Examples: AdaBoost or XGBoost (Extreme Gradient Boosting).
Stacking Algorithms
While bagging uses a subset of dataset and boosting adds new models sequentially, Stacking ensembles various weak learners in a parallel manner in such a way that by combining them with Meta learners.
In stacking:
- The whole unchanged dataset is passed to all the models
- Different models are used
- Then there is a final model to learn how to best combine predictions
Advantages & Disadvantages Of Using Ensembles
The biggest advantage offered by ensembles is improved accuracy due to combining several weaker models into one strong learner that makes more accurate predictions than either one individually ever could do on its own
This means less manual effort needs go into tuning hyper-parameters after initial setup has been completed successfully too!
However there are some drawbacks associated with using ensembles such as increased computational complexity since now instead just running a single algorithm need run many different versions concurrently plus extra storage space required store all intermediate outputs generated along way before obtaining final output values desired at end stage too; additionally, if not careful enough might even lead overfitting issues when dealing with high dimensional datasets especially if the number of features far exceeds the number of observations available within a given dataset.
Code Examples
First we import our libraries.
import numpy as np
import pandas as pd
from keras.datasets import mnist
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier, BaggingClassifier, StackingClassifier
from sklearn.metrics import classification_report, precision_recall_fscore_support
We define some classifiers that we will use in ensemble learning
# Decision Tree
classifier_DT = DecisionTreeClassifier()
classifier_DT.fit(X_train_sample, y_train_sample)
# SVM
classifier_SVC = SVC(kernel = 'sigmoid')
classifier_SVC.fit(X_train_sample, y_train_sample)
# Logistic regression
classifier_LR = LogisticRegression()
classifier_LR.fit(X_train_sample, y_train_sample)
# KNN
classifier_KNN = KNeighborsClassifier(n_neighbors = 245)
classifier_KNN.fit(X_train_sample, y_train_sample)
Then we define the Bagging algorithm, train and evaluate
# Define Bagging Classifier
classifier_bag = BaggingClassifier(base_estimator = DecisionTreeClassifier(), n_estimators=10)
# Train
classifier_bag.fit(X_train_sample, y_train_sample)
# Evaluate
y_pred_bag = classifier_bag.predict(X_test)
score_bag = round(precision_recall_fscore_support(y_test, y_pred_bag, average = 'weighted')[2]*100,2)
print(f"Bagging F1-score: {score_bag}%")
Or define our Stacking algorithm, train and evaluate
estimators = [('DT', DecisionTreeClassifier()), ('SVC', SVC(kernel = 'sigmoid')),('LR',LogisticRegression()),('KNN',KNeighborsClassifier(n_neighbors = 245))]
classifier_stack = StackingClassifier(estimators = estimators, final_estimator=LogisticRegression(), cv=10)
# Train
classifier_stack.fit(X_train_sample, y_train_sample)
# Evaluate
y_pred_stack = classifier_stack.predict(X_test)
score_stack = round(precision_recall_fscore_support(y_test, y_pred_stack, average = 'weighted')[2]*100,2)
print(f"Stacking F1-score: {score_stack}%")
Or define our Boosting algorithm, train and evaluate
classifier_Ada = AdaBoostClassifier(base_estimator = DecisionTreeClassifier(), n_estimators=10)
# Train
classifier_Ada.fit(X_train_sample, y_train_sample)
# Evaluate
y_pred_Ada = classifier_Ada.predict(X_test)
score_Ada = round(precision_recall_fscore_support(y_test, y_pred_Ada, average = 'weighted')[2]*100,2)
print(f"AdaBoost F1-score: {score_Ada}%")
Resources & References
- More readings about Ensemble methods
- Full Source code for ensemble learning