Confusion Matrix Explained using Scikit-learn

A confusion matrix is a performance measurement tool for machine learning classification. It compares actual target values with those predicted by the model to help evaluate classification accuracy, precision, recall, and more.

Key Characteristics

  • 2D Matrix Format
  • Shows TP, TN, FP, FN
  • Supports Binary and Multiclass Evaluation
  • Foundation for Other Metrics

Basic Rules

  • Use with classification models.
  • Ideal for analyzing both class-wise and overall performance.
  • Normalize if necessary for easier interpretation.
  • Visualize to identify patterns of errors.

Syntax Table

SL NO Function Syntax Example Description
1 Import Function from sklearn.metrics import confusion_matrix Load confusion matrix tool
2 Generate Matrix confusion_matrix(y_true, y_pred) Build raw matrix of classification
3 Plot Matrix ConfusionMatrixDisplay().plot() Show matrix as a heatmap

Syntax Explanation

1. Import Confusion Matrix Function

from sklearn.metrics import confusion_matrix

Explanation:

  • Loads the required function to compute the confusion matrix.
  • Used for manual inspection or metric derivation.

2. Generate Matrix

y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
cm = confusion_matrix(y_true, y_pred)
print(cm)

Explanation:

  • Returns a matrix [[TN, FP], [FN, TP]] for binary classification.
  • Helps visualize how many samples were correctly or incorrectly classified.
  • Each row of the matrix represents the actual class.
  • Each column represents the predicted class.

3. Plot Confusion Matrix

from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay(confusion_matrix=cm).plot()
plt.show()

Explanation:

  • Provides a visual representation (color-coded) of class performance.
  • Useful in presentations and quick analysis.
  • Easily interprets misclassifications and class-wise performance.

Real-Life Project: Visualizing Model Performance with Confusion Matrix

Objective

Assess how well a classifier performs using raw and visual confusion matrix outputs.

Code Example

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Load dataset
data = pd.read_csv('classification_data.csv')
X = data.drop('target', axis=1)
y = data['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Generate and plot confusion matrix
cm = confusion_matrix(y_test, y_pred)
ConfusionMatrixDisplay(confusion_matrix=cm).plot()
plt.grid(False)
plt.title('Confusion Matrix')
plt.show()

Expected Output

  • Text matrix with raw classification counts.
  • Visual heatmap showing true positives, false positives, etc.

Common Mistakes

  • ❌ Using confusion matrix for regression tasks.
  • ❌ Misinterpreting axes (actual vs predicted).
  • ❌ Ignoring normalization when class imbalance is present.

Further Reading

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon