Evaluating Classification Models using Scikit-learn

Evaluating a classification model is crucial to ensure that it performs well and generalizes to new data. Scikit-learn provides a comprehensive suite of evaluation metrics and tools that help assess various aspects of model performance—accuracy, precision, recall, F1-score, ROC-AUC, and confusion matrices.

Key Characteristics of Classification Evaluation

Accuracy Measurement: Evaluates overall correctness.
Precision and Recall: Useful for imbalanced datasets.
F1 Score: Harmonic mean of precision and recall.
ROC-AUC: Measures true/false positive trade-offs.
Confusion Matrix: Visualizes true/false classifications.

Basic Rules for Evaluation

Use different metrics for different goals (e.g., precision vs. recall).
For imbalanced classes, avoid relying solely on accuracy.
Use cross-validation to get reliable performance estimates.
Threshold tuning may improve recall/precision trade-offs.
Evaluate both train and test data to spot overfitting.

Syntax Table

SL NO	Metric	Syntax Example	Description
1	Accuracy Score	`accuracy_score(y_true, y_pred)`	Overall correct predictions
2	Precision Score	`precision_score(y_true, y_pred)`	Positive predictive value
3	Recall Score	`recall_score(y_true, y_pred)`	True positive rate
4	F1 Score	`f1_score(y_true, y_pred)`	Balance of precision and recall
5	Confusion Matrix	`confusion_matrix(y_true, y_pred)`	Summary of prediction results
6	Classification Report	`classification_report(y_true, y_pred)`	Full report including all metrics
7	ROC-AUC Score	`roc_auc_score(y_true, y_proba)`	Area under ROC curve
8	ROC Curve	`roc_curve(y_true, y_proba)`	False positive vs. true positive rate

Syntax Explanation

1. Accuracy Score

What is it? Measures the ratio of correct predictions to total predictions.
Syntax:

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)

Explanation:
- Best used for balanced datasets.
- Not reliable when classes are imbalanced.

2. Precision Score

What is it? Measures the correctness of positive predictions.
Syntax:

from sklearn.metrics import precision_score
precision = precision_score(y_test, y_pred)

Explanation:
- High precision means fewer false positives.
- Important in spam detection, fraud detection, etc.

3. Recall Score

What is it? Measures how many actual positives were correctly predicted.
Syntax:

from sklearn.metrics import recall_score
recall = recall_score(y_test, y_pred)

Explanation:
- High recall means fewer false negatives.
- Critical in disease detection and safety applications.

4. F1 Score

What is it? Harmonic mean of precision and recall.
Syntax:

from sklearn.metrics import f1_score
f1 = f1_score(y_test, y_pred)

Explanation:
- Best when precision and recall are both important.
- Robust to imbalanced data.

5. Confusion Matrix

What is it? Matrix showing counts of true positives, false positives, etc.
Syntax:

from sklearn.metrics import confusion_matrix
matrix = confusion_matrix(y_test, y_pred)

Explanation:
- Helps identify the type of classification errors.
- Visual tool for model diagnostics.

6. Classification Report

What is it? Summary of all major metrics per class.
Syntax:

from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

Explanation:
- Includes precision, recall, F1, and support per class.
- Easy to understand and report model performance.

7. ROC-AUC Score

What is it? Area under the ROC curve, measuring classifier quality.
Syntax:

from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_test, y_proba[:, 1])

Explanation:
- Value between 0 and 1 (closer to 1 is better).
- Works only with probabilistic output (use predict_proba).

8. ROC Curve

What is it? Plots true positive vs. false positive rate.
Syntax:

from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba[:, 1])

Explanation:
- Allows threshold selection and visual analysis.

Real-Life Project: Evaluating a Medical Diagnosis Model

Project Name

Medical Diagnosis Classifier Evaluation

Project Overview

This project evaluates a logistic regression model for diagnosing diabetes using patient data. It showcases multiple evaluation techniques.

Project Goal

Train a binary classifier
Evaluate using accuracy, precision, recall, and AUC
Visualize ROC curve and confusion matrix

Code for This Project

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_curve, roc_auc_score, classification_report, confusion_matrix

# Load dataset
data = pd.read_csv('diabetes.csv')
X = data.drop('Outcome', axis=1)
y = data['Outcome']

# Split
tt = train_test_split(X, y, test_size=0.3, random_state=42)
X_train, X_test, y_train, y_test = tt

# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_proba[:,1]))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# ROC Plot
fpr, tpr, _ = roc_curve(y_test, y_proba[:, 1])
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.grid(True)
plt.show()

Expected Output

Full metric report and confusion matrix
ROC curve visual
Insightful evaluation of classification ability

Common Mistakes to Avoid

❌ Using accuracy alone on imbalanced datasets
❌ Ignoring ROC when using probabilistic classifiers
❌ Not visualizing confusion matrix for error types
❌ Not validating with cross-validation