The ROC (Receiver Operating Characteristic) curve is a graphical representation of a classifier’s performance across all classification thresholds. AUC (Area Under the Curve) summarizes the ROC curve into a single value that indicates the overall ability of the model to discriminate between classes.
Key Characteristics
- Threshold-Independent Evaluation
- Displays Trade-off Between TPR and FPR
- AUC Ranges From 0 to 1
- Useful for Binary and Multiclass Classification
Basic Rules
- Use ROC when you care about ranking predictions.
- AUC closer to 1 indicates better performance.
- Use
roc_curve
for curve points. - Use
roc_auc_score
for summary metric.
Syntax Table
SL NO | Function | Syntax Example | Description |
---|---|---|---|
1 | ROC Curve | fpr, tpr, thresholds = roc_curve(y_true, y_score) |
Calculates FPR and TPR for all thresholds |
2 | AUC Score | roc_auc_score(y_true, y_score) |
Computes area under the ROC curve |
3 | Plot ROC | plt.plot(fpr, tpr) |
Plots ROC curve visually |
Syntax Explanation
1. ROC Curve
What is it? Computes the false positive rate, true positive rate, and thresholds.
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_true, y_score)
Explanation:
y_true
contains the true binary labels (0 or 1).y_score
contains predicted probabilities or scores (not labels).fpr
: False Positive Rate at each threshold.tpr
: True Positive Rate at each threshold.thresholds
: Classification thresholds used to generate the ROC curve.
2. AUC Score
What is it? Calculates the area under the ROC curve.
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_true, y_score)
Explanation:
- Returns a single scalar score.
- Perfect model: AUC = 1.0
- Random model: AUC = 0.5
- Useful for comparing models or selecting best classifiers.
3. Plot ROC
What is it? Visual representation of the ROC curve.
import matplotlib.pyplot as plt
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.plot([0, 1], [0, 1], 'k--') # baseline
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()
Explanation:
- Visualizes the trade-off between sensitivity and specificity.
- Baseline (diagonal) shows performance of a random model.
- Higher the curve above the baseline, better the model.
Real-Life Project: Evaluate Classifier with ROC Curve
Objective
Evaluate a binary classifier using ROC curve and AUC score.
Code Example
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt
# Load and preprocess data
data = pd.read_csv('binary_classification.csv')
X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:, 1]
# ROC & AUC
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)
# Plot
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid(True)
plt.show()
Expected Output
- ROC curve plotted visually.
- AUC value printed and visualized on the chart.
Common Mistakes
- ❌ Using class labels instead of probabilities for ROC.
- ❌ Not stratifying splits for imbalanced data.
- ❌ Misinterpreting AUC for multiclass tasks.