Evaluating Classification Models using Scikit-learn

Evaluating a classification model is crucial to ensure that it performs well and generalizes to new data. Scikit-learn provides a comprehensive suite of evaluation metrics and tools that help assess various aspects of model performance—accuracy, precision, recall, F1-score, ROC-AUC, and confusion matrices.

Key Characteristics of Classification Evaluation

  • Accuracy Measurement: Evaluates overall correctness.
  • Precision and Recall: Useful for imbalanced datasets.
  • F1 Score: Harmonic mean of precision and recall.
  • ROC-AUC: Measures true/false positive trade-offs.
  • Confusion Matrix: Visualizes true/false classifications.

Basic Rules for Evaluation

  • Use different metrics for different goals (e.g., precision vs. recall).
  • For imbalanced classes, avoid relying solely on accuracy.
  • Use cross-validation to get reliable performance estimates.
  • Threshold tuning may improve recall/precision trade-offs.
  • Evaluate both train and test data to spot overfitting.

Syntax Table

SL NO Metric Syntax Example Description
1 Accuracy Score accuracy_score(y_true, y_pred) Overall correct predictions
2 Precision Score precision_score(y_true, y_pred) Positive predictive value
3 Recall Score recall_score(y_true, y_pred) True positive rate
4 F1 Score f1_score(y_true, y_pred) Balance of precision and recall
5 Confusion Matrix confusion_matrix(y_true, y_pred) Summary of prediction results
6 Classification Report classification_report(y_true, y_pred) Full report including all metrics
7 ROC-AUC Score roc_auc_score(y_true, y_proba) Area under ROC curve
8 ROC Curve roc_curve(y_true, y_proba) False positive vs. true positive rate

Syntax Explanation

1. Accuracy Score

  • What is it? Measures the ratio of correct predictions to total predictions.
  • Syntax:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
  • Explanation:
    • Best used for balanced datasets.
    • Not reliable when classes are imbalanced.

2. Precision Score

  • What is it? Measures the correctness of positive predictions.
  • Syntax:
from sklearn.metrics import precision_score
precision = precision_score(y_test, y_pred)
  • Explanation:
    • High precision means fewer false positives.
    • Important in spam detection, fraud detection, etc.

3. Recall Score

  • What is it? Measures how many actual positives were correctly predicted.
  • Syntax:
from sklearn.metrics import recall_score
recall = recall_score(y_test, y_pred)
  • Explanation:
    • High recall means fewer false negatives.
    • Critical in disease detection and safety applications.

4. F1 Score

  • What is it? Harmonic mean of precision and recall.
  • Syntax:
from sklearn.metrics import f1_score
f1 = f1_score(y_test, y_pred)
  • Explanation:
    • Best when precision and recall are both important.
    • Robust to imbalanced data.

5. Confusion Matrix

  • What is it? Matrix showing counts of true positives, false positives, etc.
  • Syntax:
from sklearn.metrics import confusion_matrix
matrix = confusion_matrix(y_test, y_pred)
  • Explanation:
    • Helps identify the type of classification errors.
    • Visual tool for model diagnostics.

6. Classification Report

  • What is it? Summary of all major metrics per class.
  • Syntax:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
  • Explanation:
    • Includes precision, recall, F1, and support per class.
    • Easy to understand and report model performance.

7. ROC-AUC Score

  • What is it? Area under the ROC curve, measuring classifier quality.
  • Syntax:
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_test, y_proba[:, 1])
  • Explanation:
    • Value between 0 and 1 (closer to 1 is better).
    • Works only with probabilistic output (use predict_proba).

8. ROC Curve

  • What is it? Plots true positive vs. false positive rate.
  • Syntax:
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba[:, 1])
  • Explanation:
    • Allows threshold selection and visual analysis.

Real-Life Project: Evaluating a Medical Diagnosis Model

Project Name

Medical Diagnosis Classifier Evaluation

Project Overview

This project evaluates a logistic regression model for diagnosing diabetes using patient data. It showcases multiple evaluation techniques.

Project Goal

  • Train a binary classifier
  • Evaluate using accuracy, precision, recall, and AUC
  • Visualize ROC curve and confusion matrix

Code for This Project

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_curve, roc_auc_score, classification_report, confusion_matrix

# Load dataset
data = pd.read_csv('diabetes.csv')
X = data.drop('Outcome', axis=1)
y = data['Outcome']

# Split
tt = train_test_split(X, y, test_size=0.3, random_state=42)
X_train, X_test, y_train, y_test = tt

# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_proba[:,1]))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# ROC Plot
fpr, tpr, _ = roc_curve(y_test, y_proba[:, 1])
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.grid(True)
plt.show()

Expected Output

  • Full metric report and confusion matrix
  • ROC curve visual
  • Insightful evaluation of classification ability

Common Mistakes to Avoid

  • ❌ Using accuracy alone on imbalanced datasets
  • ❌ Ignoring ROC when using probabilistic classifiers
  • ❌ Not visualizing confusion matrix for error types
  • ❌ Not validating with cross-validation

Further Reading Recommendation

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning
by Sarful Hassan
🔗 Available on Amazon

Also explore: