Voting Classifiers using Scikit-learn

Voting Classifiers are ensemble methods that combine predictions from multiple different models to improve overall performance. Scikit-learn provides VotingClassifier, which supports both hard voting (majority class prediction) and soft voting (based on class probabilities).

Key Characteristics

Combines multiple classifiers
Supports hard and soft voting
Increases prediction stability
Suitable for classification tasks

Basic Rules

Use diverse base classifiers to maximize benefit.
Use soft voting if all classifiers can predict probabilities.
Ensure classifiers are well-tuned individually.
Analyze performance gain compared to base models.

Syntax Table

SL NO	Technique	Syntax Example	Description
1	Hard Voting	`VotingClassifier(estimators=[...], voting='hard')`	Majority class prediction
2	Soft Voting	`VotingClassifier(estimators=[...], voting='soft')`	Average predicted probabilities

Syntax Explanation

1. Hard Voting

What is it?
Combines predictions from each classifier and selects the majority class.

Syntax:

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

model = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression()),
        ('dt', DecisionTreeClassifier()),
        ('svc', SVC())
    ],
    voting='hard'
)

Explanation:

Each base model predicts a class.
Final prediction is the class with most votes.
Does not require probability estimates.

2. Soft Voting

What is it?
Predicts the class label based on the average predicted probabilities from all models.

Syntax:

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier

model = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression()),
        ('nb', GaussianNB()),
        ('rf', RandomForestClassifier())
    ],
    voting='soft'
)

Explanation:

Requires classifiers with predict_proba() method.
More nuanced than hard voting.
Useful when classifiers differ in confidence.

Real-Life Project: Voting Classifier on Breast Cancer Dataset

Project Name

Voting Classifier Comparison

Project Overview

Use multiple classifiers to predict breast cancer diagnosis.

Project Goal

Compare accuracy of hard vs soft voting ensemble.

Code for This Project

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Hard Voting
clf1 = LogisticRegression()
clf2 = RandomForestClassifier()
clf3 = GaussianNB()
hard_voting = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('nb', clf3)], voting='hard')
hard_voting.fit(X_train, y_train)
print("Hard Voting Accuracy:", accuracy_score(y_test, hard_voting.predict(X_test)))

# Soft Voting
soft_voting = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('nb', clf3)], voting='soft')
soft_voting.fit(X_train, y_train)
print("Soft Voting Accuracy:", accuracy_score(y_test, soft_voting.predict(X_test)))

Expected Output

Accuracy scores for both hard and soft voting classifiers.
Soft voting usually performs better if probabilities are well-calibrated.

Common Mistakes to Avoid

❌ Using soft voting with classifiers that don’t support predict_proba().
❌ Using very similar models reduces the ensemble benefit.
❌ Ignoring individual model performance before ensembling.

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon