Multi-Class Classification Strategies in Scikit-learn

Multi-class classification is a supervised learning task where the goal is to assign each input sample to one of three or more classes. Scikit-learn provides several strategies to handle multi-class problems, including One-vs-Rest (OvR), One-vs-One (OvO), and native multiclass classifiers like RandomForestClassifier or LogisticRegression.

Key Characteristics

  • Handles more than two class labels
  • Supports One-vs-Rest (OvR) and One-vs-One (OvO) strategies
  • Can use native classifiers or meta-estimators
  • Works with both linear and nonlinear models

Basic Rules

  • Choose OvR for high efficiency on large datasets
  • OvO may work better with models sensitive to class boundaries
  • Evaluate confusion matrix to understand per-class performance
  • Use stratified train-test split to ensure balanced class distribution

Syntax Table

SL NO Technique Syntax Example Description
1 One-vs-Rest OneVsRestClassifier(LogisticRegression()) Trains one classifier per class vs all others
2 One-vs-One OneVsOneClassifier(SVC()) Trains one classifier per class pair
3 Native Support RandomForestClassifier() Native support for multi-class
4 Fit Model model.fit(X_train, y_train) Trains the chosen classifier
5 Predict Classes model.predict(X_test) Returns predicted class labels

Syntax Explanation

1. One-vs-Rest (OvR)

What is it?
A strategy that fits one classifier per class, where each classifier distinguishes a class from all others.

Syntax:

from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
model = OneVsRestClassifier(LogisticRegression())

Explanation:

  • Suitable for linear models
  • Efficient on large datasets
  • Each classifier outputs a confidence score; highest score wins

2. One-vs-One (OvO)

What is it?
A strategy that fits one classifier per class pair.

Syntax:

from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import SVC
model = OneVsOneClassifier(SVC())

Explanation:

  • Builds N(N-1)/2 classifiers for N classes
  • Each classifier votes, and majority class wins
  • Effective when class boundaries are complex

3. Native Multi-Class Classifier

What is it?
Classifiers like Random Forest and Logistic Regression inherently support multi-class classification.

Syntax:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()

Explanation:

  • No need to wrap with OvR or OvO
  • Handles class imbalance and non-linearity well
  • Straightforward integration

4. Fit Model

What is it?
Trains the selected model on the labeled dataset.

Syntax:

model.fit(X_train, y_train)

Explanation:

  • Accepts feature matrix and label vector
  • Learns the decision boundaries between classes
  • Can be combined with grid search or pipelines

5. Predict Classes

What is it?
Predicts the class labels for unseen test data.

Syntax:

predictions = model.predict(X_test)

Explanation:

  • Produces an array of predicted class labels
  • Useful for accuracy, confusion matrix, or F1 score evaluations
  • Can be used in real-time prediction systems

Real-Life Project: Digit Recognition (MNIST)

Project Overview

Use multi-class classification strategies to identify handwritten digits (0–9) from images.

Code Example

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Load data
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.3, random_state=42)

# Train model with native multi-class support
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

Expected Output

  • Per-class precision, recall, and F1-scores
  • Overall accuracy of multi-class classifier

Common Mistakes to Avoid

  • ❌ Ignoring label distribution in train/test splits
  • ❌ Using binary classifiers without wrapping them in OvR or OvO
  • ❌ Not evaluating model using class-specific metrics

Further Reading Recommendation

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon