ROC Curve and AUC in Scikit-learn

The ROC (Receiver Operating Characteristic) curve is a graphical representation of a classifier’s performance across all classification thresholds. AUC (Area Under the Curve) summarizes the ROC curve into a single value that indicates the overall ability of the model to discriminate between classes.

Key Characteristics

  • Threshold-Independent Evaluation
  • Displays Trade-off Between TPR and FPR
  • AUC Ranges From 0 to 1
  • Useful for Binary and Multiclass Classification

Basic Rules

  • Use ROC when you care about ranking predictions.
  • AUC closer to 1 indicates better performance.
  • Use roc_curve for curve points.
  • Use roc_auc_score for summary metric.

Syntax Table

SL NO Function Syntax Example Description
1 ROC Curve fpr, tpr, thresholds = roc_curve(y_true, y_score) Calculates FPR and TPR for all thresholds
2 AUC Score roc_auc_score(y_true, y_score) Computes area under the ROC curve
3 Plot ROC plt.plot(fpr, tpr) Plots ROC curve visually

Syntax Explanation

1. ROC Curve

What is it? Computes the false positive rate, true positive rate, and thresholds.

from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_true, y_score)

Explanation:

  • y_true contains the true binary labels (0 or 1).
  • y_score contains predicted probabilities or scores (not labels).
  • fpr: False Positive Rate at each threshold.
  • tpr: True Positive Rate at each threshold.
  • thresholds: Classification thresholds used to generate the ROC curve.

2. AUC Score

What is it? Calculates the area under the ROC curve.

from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_true, y_score)

Explanation:

  • Returns a single scalar score.
  • Perfect model: AUC = 1.0
  • Random model: AUC = 0.5
  • Useful for comparing models or selecting best classifiers.

3. Plot ROC

What is it? Visual representation of the ROC curve.

import matplotlib.pyplot as plt
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.plot([0, 1], [0, 1], 'k--')  # baseline
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

Explanation:

  • Visualizes the trade-off between sensitivity and specificity.
  • Baseline (diagonal) shows performance of a random model.
  • Higher the curve above the baseline, better the model.

Real-Life Project: Evaluate Classifier with ROC Curve

Objective

Evaluate a binary classifier using ROC curve and AUC score.

Code Example

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

# Load and preprocess data
data = pd.read_csv('binary_classification.csv')
X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:, 1]

# ROC & AUC
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)

# Plot
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid(True)
plt.show()

Expected Output

  • ROC curve plotted visually.
  • AUC value printed and visualized on the chart.

Common Mistakes

  • ❌ Using class labels instead of probabilities for ROC.
  • ❌ Not stratifying splits for imbalanced data.
  • ❌ Misinterpreting AUC for multiclass tasks.

Further Reading

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon