ROC Curve and AUC in Scikit-learn

The ROC (Receiver Operating Characteristic) curve is a graphical representation of a classifier’s performance across all classification thresholds. AUC (Area Under the Curve) summarizes the ROC curve into a single value that indicates the overall ability of the model to discriminate between classes.

Key Characteristics

Threshold-Independent Evaluation
Displays Trade-off Between TPR and FPR
AUC Ranges From 0 to 1
Useful for Binary and Multiclass Classification

Basic Rules

Use ROC when you care about ranking predictions.
AUC closer to 1 indicates better performance.
Use roc_curve for curve points.
Use roc_auc_score for summary metric.

Syntax Table

SL NO	Function	Syntax Example	Description
1	ROC Curve	`fpr, tpr, thresholds = roc_curve(y_true, y_score)`	Calculates FPR and TPR for all thresholds
2	AUC Score	`roc_auc_score(y_true, y_score)`	Computes area under the ROC curve
3	Plot ROC	`plt.plot(fpr, tpr)`	Plots ROC curve visually

Syntax Explanation

1. ROC Curve

What is it? Computes the false positive rate, true positive rate, and thresholds.

from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_true, y_score)

Explanation:

y_true contains the true binary labels (0 or 1).
y_score contains predicted probabilities or scores (not labels).
fpr: False Positive Rate at each threshold.
tpr: True Positive Rate at each threshold.
thresholds: Classification thresholds used to generate the ROC curve.

2. AUC Score

What is it? Calculates the area under the ROC curve.

from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_true, y_score)

Explanation:

Returns a single scalar score.
Perfect model: AUC = 1.0
Random model: AUC = 0.5
Useful for comparing models or selecting best classifiers.

3. Plot ROC

What is it? Visual representation of the ROC curve.

import matplotlib.pyplot as plt
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.plot([0, 1], [0, 1], 'k--')  # baseline
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

Explanation:

Visualizes the trade-off between sensitivity and specificity.
Baseline (diagonal) shows performance of a random model.
Higher the curve above the baseline, better the model.

Real-Life Project: Evaluate Classifier with ROC Curve

Objective

Evaluate a binary classifier using ROC curve and AUC score.

Code Example

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

# Load and preprocess data
data = pd.read_csv('binary_classification.csv')
X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:, 1]

# ROC & AUC
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)

# Plot
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid(True)
plt.show()

Expected Output

ROC curve plotted visually.
AUC value printed and visualized on the chart.

Common Mistakes

❌ Using class labels instead of probabilities for ROC.
❌ Not stratifying splits for imbalanced data.
❌ Misinterpreting AUC for multiclass tasks.

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon

Key Characteristics

Basic Rules

Syntax Table

Syntax Explanation

1. ROC Curve

2. AUC Score

3. Plot ROC

Real-Life Project: Evaluate Classifier with ROC Curve

Objective

Code Example

Expected Output

Common Mistakes

Further Reading

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

Login