Accuracy, precision, recall, and F1 score are the core classification metrics in Scikit-learn used to evaluate the performance of a machine learning model. Each of these metrics offers a different perspective on model performance, especially in imbalanced classification problems.
Key Characteristics
- Accuracy: Measures overall correctness
- Precision: Focuses on positive predictive value
- Recall: Focuses on sensitivity or true positive rate
- F1 Score: Harmonic mean of precision and recall
Basic Rules
- Use
accuracy_score
for balanced datasets. - Use
precision_score
when false positives are costly. - Use
recall_score
when false negatives are costly. - Use
f1_score
to balance precision and recall.
Syntax Table
SL NO | Metric | Function Name | Syntax Example | Description |
---|---|---|---|---|
1 | Accuracy | accuracy_score |
accuracy_score(y_true, y_pred) |
Proportion of correct predictions |
2 | Precision | precision_score |
precision_score(y_true, y_pred) |
TP / (TP + FP) |
3 | Recall | recall_score |
recall_score(y_true, y_pred) |
TP / (TP + FN) |
4 | F1 Score | f1_score |
f1_score(y_true, y_pred) |
Harmonic mean of precision and recall |
Syntax Explanation
1. Accuracy
What is it? Overall proportion of correct predictions made by the model.
from sklearn.metrics import accuracy_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print(accuracy_score(y_true, y_pred))
Explanation:
- Compares how many predictions match the true values.
- Formula:
(TP + TN) / (TP + TN + FP + FN)
- Best used when the dataset is balanced and classes occur with similar frequencies.
- Example output:
0.8
means 80% predictions were correct.
2. Precision
What is it? Measures the accuracy of positive predictions.
from sklearn.metrics import precision_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print(precision_score(y_true, y_pred))
Explanation:
- Formula:
TP / (TP + FP)
- Answers the question: “Of all items labeled positive, how many were truly positive?”
- High precision is critical in applications like spam detection, where false positives are undesirable.
- Example output:
1.0
means every predicted positive was actually positive.
3. Recall
What is it? Measures the completeness of positive predictions.
from sklearn.metrics import recall_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print(recall_score(y_true, y_pred))
Explanation:
- Formula:
TP / (TP + FN)
- Tells us how many actual positives were correctly identified.
- Important in medical testing or fraud detection where missing positives is costly.
- Example output:
0.666
means ~66.6% of all real positives were correctly predicted.
4. F1 Score
What is it? Combines precision and recall into a single score.
from sklearn.metrics import f1_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print(f1_score(y_true, y_pred))
Explanation:
- Formula:
2 * (precision * recall) / (precision + recall)
- Provides a balanced metric in cases where you care equally about precision and recall.
- Especially useful for datasets with class imbalance.
- Example output:
0.8
means the model has a good balance of precision and recall. - Can be macro, micro, or weighted averaged in multiclass settings using the
average
parameter.
Real-Life Project: Evaluate a Classifier on Imbalanced Dataset
Objective
Use F1 and recall to assess model performance on skewed data.
Code Example
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Load data
data = pd.read_csv('imbalanced_classification.csv')
X = data.drop('target', axis=1)
y = data['target']
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
Expected Output
- Printed scores for all four metrics.
- Clear understanding of model behavior with imbalanced data.
Common Mistakes
- β Using accuracy on imbalanced datasets.
- β Not considering the business context when selecting metrics.
- β Ignoring precision-recall trade-offs.
Further Reading
π Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan
π Available on Amazon