Confusion Matrix Explained using Scikit-learn

A confusion matrix is a performance measurement tool for machine learning classification. It compares actual target values with those predicted by the model to help evaluate classification accuracy, precision, recall, and more.

Key Characteristics

2D Matrix Format
Shows TP, TN, FP, FN
Supports Binary and Multiclass Evaluation
Foundation for Other Metrics

Basic Rules

Use with classification models.
Ideal for analyzing both class-wise and overall performance.
Normalize if necessary for easier interpretation.
Visualize to identify patterns of errors.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Import Function	`from sklearn.metrics import confusion_matrix`	Load confusion matrix tool
2	Generate Matrix	`confusion_matrix(y_true, y_pred)`	Build raw matrix of classification
3	Plot Matrix	`ConfusionMatrixDisplay().plot()`	Show matrix as a heatmap

Syntax Explanation

1. Import Confusion Matrix Function

from sklearn.metrics import confusion_matrix

Explanation:

Loads the required function to compute the confusion matrix.
Used for manual inspection or metric derivation.

2. Generate Matrix

y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
cm = confusion_matrix(y_true, y_pred)
print(cm)

Explanation:

Returns a matrix [[TN, FP], [FN, TP]] for binary classification.
Helps visualize how many samples were correctly or incorrectly classified.
Each row of the matrix represents the actual class.
Each column represents the predicted class.

3. Plot Confusion Matrix

from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay(confusion_matrix=cm).plot()
plt.show()

Explanation:

Provides a visual representation (color-coded) of class performance.
Useful in presentations and quick analysis.
Easily interprets misclassifications and class-wise performance.

Real-Life Project: Visualizing Model Performance with Confusion Matrix

Objective

Assess how well a classifier performs using raw and visual confusion matrix outputs.

Code Example

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Load dataset
data = pd.read_csv('classification_data.csv')
X = data.drop('target', axis=1)
y = data['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Generate and plot confusion matrix
cm = confusion_matrix(y_test, y_pred)
ConfusionMatrixDisplay(confusion_matrix=cm).plot()
plt.grid(False)
plt.title('Confusion Matrix')
plt.show()

Expected Output

Text matrix with raw classification counts.
Visual heatmap showing true positives, false positives, etc.

Common Mistakes

❌ Using confusion matrix for regression tasks.
❌ Misinterpreting axes (actual vs predicted).
❌ Ignoring normalization when class imbalance is present.

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon

Key Characteristics

Basic Rules

Syntax Table

Syntax Explanation

1. Import Confusion Matrix Function

2. Generate Matrix

3. Plot Confusion Matrix

Real-Life Project: Visualizing Model Performance with Confusion Matrix

Objective

Code Example

Expected Output

Common Mistakes

Further Reading

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

Login