Voting Classifiers are ensemble methods that combine predictions from multiple different models to improve overall performance. Scikit-learn provides VotingClassifier
, which supports both hard voting (majority class prediction) and soft voting (based on class probabilities).
Key Characteristics
- Combines multiple classifiers
- Supports hard and soft voting
- Increases prediction stability
- Suitable for classification tasks
Basic Rules
- Use diverse base classifiers to maximize benefit.
- Use soft voting if all classifiers can predict probabilities.
- Ensure classifiers are well-tuned individually.
- Analyze performance gain compared to base models.
Syntax Table
SL NO | Technique | Syntax Example | Description |
---|---|---|---|
1 | Hard Voting | VotingClassifier(estimators=[...], voting='hard') |
Majority class prediction |
2 | Soft Voting | VotingClassifier(estimators=[...], voting='soft') |
Average predicted probabilities |
Syntax Explanation
1. Hard Voting
What is it?
Combines predictions from each classifier and selects the majority class.
Syntax:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
model = VotingClassifier(
estimators=[
('lr', LogisticRegression()),
('dt', DecisionTreeClassifier()),
('svc', SVC())
],
voting='hard'
)
Explanation:
- Each base model predicts a class.
- Final prediction is the class with most votes.
- Does not require probability estimates.
2. Soft Voting
What is it?
Predicts the class label based on the average predicted probabilities from all models.
Syntax:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
model = VotingClassifier(
estimators=[
('lr', LogisticRegression()),
('nb', GaussianNB()),
('rf', RandomForestClassifier())
],
voting='soft'
)
Explanation:
- Requires classifiers with
predict_proba()
method. - More nuanced than hard voting.
- Useful when classifiers differ in confidence.
Real-Life Project: Voting Classifier on Breast Cancer Dataset
Project Name
Voting Classifier Comparison
Project Overview
Use multiple classifiers to predict breast cancer diagnosis.
Project Goal
Compare accuracy of hard vs soft voting ensemble.
Code for This Project
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Hard Voting
clf1 = LogisticRegression()
clf2 = RandomForestClassifier()
clf3 = GaussianNB()
hard_voting = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('nb', clf3)], voting='hard')
hard_voting.fit(X_train, y_train)
print("Hard Voting Accuracy:", accuracy_score(y_test, hard_voting.predict(X_test)))
# Soft Voting
soft_voting = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('nb', clf3)], voting='soft')
soft_voting.fit(X_train, y_train)
print("Soft Voting Accuracy:", accuracy_score(y_test, soft_voting.predict(X_test)))
Expected Output
- Accuracy scores for both hard and soft voting classifiers.
- Soft voting usually performs better if probabilities are well-calibrated.
Common Mistakes to Avoid
- β Using soft voting with classifiers that donβt support
predict_proba()
. - β Using very similar models reduces the ensemble benefit.
- β Ignoring individual model performance before ensembling.
Further Reading Recommendation
π Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan
π Available on Amazon