Bagging (Bootstrap Aggregating) and Boosting are two ensemble methods with distinct approaches to improving model performance. While both combine multiple models, Bagging builds them in parallel to reduce variance, whereas Boosting builds them sequentially to reduce bias.
Key Differences
Feature | Bagging | Boosting |
---|---|---|
Model Training | Parallel | Sequential |
Focus | Reduce Variance | Reduce Bias |
Model Independence | Independent Learners | Dependent Learners |
Performance on Overfitting | Helps avoid overfitting | May overfit if not tuned |
Example Algorithms | Random Forest, BaggingClassifier | AdaBoost, GradientBoostingClassifier |
Syntax Comparison
Bagging
What is it?
A parallel ensemble method that trains base learners on random subsets of the training data.
Syntax:
from sklearn.ensemble import BaggingClassifier
model = BaggingClassifier(n_estimators=10)
Explanation:
- Reduces variance by averaging predictions from diverse models.
- Suitable for high-variance base learners.
Boosting
What is it?
A sequential ensemble method that focuses on mistakes made by previous models.
Syntax:
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
Explanation:
- Reduces bias by iteratively improving weak learners.
- Effective on structured/tabular datasets.
Real-Life Use Case
Dataset
Customer churn prediction using tabular data.
Code Example
from sklearn.ensemble import BaggingClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer
# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Bagging
bagging = BaggingClassifier(n_estimators=50)
bagging.fit(X_train, y_train)
bag_pred = bagging.predict(X_test)
# Boosting
boosting = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
boosting.fit(X_train, y_train)
boost_pred = boosting.predict(X_test)
# Results
print("Bagging Accuracy:", accuracy_score(y_test, bag_pred))
print("Boosting Accuracy:", accuracy_score(y_test, boost_pred))
Expected Output
- Bagging and Boosting accuracy scores for comparison.
- Boosting often outperforms Bagging on well-preprocessed datasets.
Common Mistakes
- ❌ Not tuning
learning_rate
orn_estimators
in Boosting. - ❌ Using boosting on small/noisy datasets.
- ❌ Assuming Bagging always improves weak learners.
When to Use What?
Scenario | Preferred Method |
---|---|
High variance, low bias | Bagging |
High bias, complex data patterns | Boosting |
Small dataset with noise | Bagging |
Structured/tabular large dataset | Boosting |