A confusion matrix is a performance measurement tool for machine learning classification. It compares actual target values with those predicted by the model to help evaluate classification accuracy, precision, recall, and more.
Key Characteristics
- 2D Matrix Format
- Shows TP, TN, FP, FN
- Supports Binary and Multiclass Evaluation
- Foundation for Other Metrics
Basic Rules
- Use with classification models.
- Ideal for analyzing both class-wise and overall performance.
- Normalize if necessary for easier interpretation.
- Visualize to identify patterns of errors.
Syntax Table
SL NO | Function | Syntax Example | Description |
---|---|---|---|
1 | Import Function | from sklearn.metrics import confusion_matrix |
Load confusion matrix tool |
2 | Generate Matrix | confusion_matrix(y_true, y_pred) |
Build raw matrix of classification |
3 | Plot Matrix | ConfusionMatrixDisplay().plot() |
Show matrix as a heatmap |
Syntax Explanation
1. Import Confusion Matrix Function
from sklearn.metrics import confusion_matrix
Explanation:
- Loads the required function to compute the confusion matrix.
- Used for manual inspection or metric derivation.
2. Generate Matrix
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
cm = confusion_matrix(y_true, y_pred)
print(cm)
Explanation:
- Returns a matrix [[TN, FP], [FN, TP]] for binary classification.
- Helps visualize how many samples were correctly or incorrectly classified.
- Each row of the matrix represents the actual class.
- Each column represents the predicted class.
3. Plot Confusion Matrix
from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay(confusion_matrix=cm).plot()
plt.show()
Explanation:
- Provides a visual representation (color-coded) of class performance.
- Useful in presentations and quick analysis.
- Easily interprets misclassifications and class-wise performance.
Real-Life Project: Visualizing Model Performance with Confusion Matrix
Objective
Assess how well a classifier performs using raw and visual confusion matrix outputs.
Code Example
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv('classification_data.csv')
X = data.drop('target', axis=1)
y = data['target']
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Generate and plot confusion matrix
cm = confusion_matrix(y_test, y_pred)
ConfusionMatrixDisplay(confusion_matrix=cm).plot()
plt.grid(False)
plt.title('Confusion Matrix')
plt.show()
Expected Output
- Text matrix with raw classification counts.
- Visual heatmap showing true positives, false positives, etc.
Common Mistakes
- ❌ Using confusion matrix for regression tasks.
- ❌ Misinterpreting axes (actual vs predicted).
- ❌ Ignoring normalization when class imbalance is present.