Multi-class classification is a supervised learning task where the goal is to assign each input sample to one of three or more classes. Scikit-learn provides several strategies to handle multi-class problems, including One-vs-Rest (OvR), One-vs-One (OvO), and native multiclass classifiers like RandomForestClassifier
or LogisticRegression
.
Key Characteristics
- Handles more than two class labels
- Supports One-vs-Rest (OvR) and One-vs-One (OvO) strategies
- Can use native classifiers or meta-estimators
- Works with both linear and nonlinear models
Basic Rules
- Choose OvR for high efficiency on large datasets
- OvO may work better with models sensitive to class boundaries
- Evaluate confusion matrix to understand per-class performance
- Use stratified train-test split to ensure balanced class distribution
Syntax Table
SL NO | Technique | Syntax Example | Description |
---|---|---|---|
1 | One-vs-Rest | OneVsRestClassifier(LogisticRegression()) |
Trains one classifier per class vs all others |
2 | One-vs-One | OneVsOneClassifier(SVC()) |
Trains one classifier per class pair |
3 | Native Support | RandomForestClassifier() |
Native support for multi-class |
4 | Fit Model | model.fit(X_train, y_train) |
Trains the chosen classifier |
5 | Predict Classes | model.predict(X_test) |
Returns predicted class labels |
Syntax Explanation
1. One-vs-Rest (OvR)
What is it?
A strategy that fits one classifier per class, where each classifier distinguishes a class from all others.
Syntax:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
model = OneVsRestClassifier(LogisticRegression())
Explanation:
- Suitable for linear models
- Efficient on large datasets
- Each classifier outputs a confidence score; highest score wins
2. One-vs-One (OvO)
What is it?
A strategy that fits one classifier per class pair.
Syntax:
from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import SVC
model = OneVsOneClassifier(SVC())
Explanation:
- Builds N(N-1)/2 classifiers for N classes
- Each classifier votes, and majority class wins
- Effective when class boundaries are complex
3. Native Multi-Class Classifier
What is it?
Classifiers like Random Forest and Logistic Regression inherently support multi-class classification.
Syntax:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
Explanation:
- No need to wrap with OvR or OvO
- Handles class imbalance and non-linearity well
- Straightforward integration
4. Fit Model
What is it?
Trains the selected model on the labeled dataset.
Syntax:
model.fit(X_train, y_train)
Explanation:
- Accepts feature matrix and label vector
- Learns the decision boundaries between classes
- Can be combined with grid search or pipelines
5. Predict Classes
What is it?
Predicts the class labels for unseen test data.
Syntax:
predictions = model.predict(X_test)
Explanation:
- Produces an array of predicted class labels
- Useful for accuracy, confusion matrix, or F1 score evaluations
- Can be used in real-time prediction systems
Real-Life Project: Digit Recognition (MNIST)
Project Overview
Use multi-class classification strategies to identify handwritten digits (0–9) from images.
Code Example
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
# Load data
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.3, random_state=42)
# Train model with native multi-class support
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
Expected Output
- Per-class precision, recall, and F1-scores
- Overall accuracy of multi-class classifier
Common Mistakes to Avoid
- ❌ Ignoring label distribution in train/test splits
- ❌ Using binary classifiers without wrapping them in OvR or OvO
- ❌ Not evaluating model using class-specific metrics