Accuracy, Precision, Recall, and F1 Score with Scikit-learn

Accuracy, precision, recall, and F1 score are the core classification metrics in Scikit-learn used to evaluate the performance of a machine learning model. Each of these metrics offers a different perspective on model performance, especially in imbalanced classification problems.

Key Characteristics

Accuracy: Measures overall correctness
Precision: Focuses on positive predictive value
Recall: Focuses on sensitivity or true positive rate
F1 Score: Harmonic mean of precision and recall

Basic Rules

Use accuracy_score for balanced datasets.
Use precision_score when false positives are costly.
Use recall_score when false negatives are costly.
Use f1_score to balance precision and recall.

Syntax Table

SL NO	Metric	Function Name	Syntax Example	Description
1	Accuracy	`accuracy_score`	`accuracy_score(y_true, y_pred)`	Proportion of correct predictions
2	Precision	`precision_score`	`precision_score(y_true, y_pred)`	TP / (TP + FP)
3	Recall	`recall_score`	`recall_score(y_true, y_pred)`	TP / (TP + FN)
4	F1 Score	`f1_score`	`f1_score(y_true, y_pred)`	Harmonic mean of precision and recall

Syntax Explanation

1. Accuracy

What is it? Overall proportion of correct predictions made by the model.

from sklearn.metrics import accuracy_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print(accuracy_score(y_true, y_pred))

Explanation:

Compares how many predictions match the true values.
Formula: (TP + TN) / (TP + TN + FP + FN)
Best used when the dataset is balanced and classes occur with similar frequencies.
Example output: 0.8 means 80% predictions were correct.

2. Precision

What is it? Measures the accuracy of positive predictions.

from sklearn.metrics import precision_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print(precision_score(y_true, y_pred))

Explanation:

Formula: TP / (TP + FP)
Answers the question: “Of all items labeled positive, how many were truly positive?”
High precision is critical in applications like spam detection, where false positives are undesirable.
Example output: 1.0 means every predicted positive was actually positive.

3. Recall

What is it? Measures the completeness of positive predictions.

from sklearn.metrics import recall_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print(recall_score(y_true, y_pred))

Explanation:

Formula: TP / (TP + FN)
Tells us how many actual positives were correctly identified.
Important in medical testing or fraud detection where missing positives is costly.
Example output: 0.666 means ~66.6% of all real positives were correctly predicted.

4. F1 Score

What is it? Combines precision and recall into a single score.

from sklearn.metrics import f1_score
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
print(f1_score(y_true, y_pred))

Explanation:

Formula: 2 * (precision * recall) / (precision + recall)
Provides a balanced metric in cases where you care equally about precision and recall.
Especially useful for datasets with class imbalance.
Example output: 0.8 means the model has a good balance of precision and recall.
Can be macro, micro, or weighted averaged in multiclass settings using the average parameter.

Real-Life Project: Evaluate a Classifier on Imbalanced Dataset

Objective

Use F1 and recall to assess model performance on skewed data.

Code Example

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load data
data = pd.read_csv('imbalanced_classification.csv')
X = data.drop('target', axis=1)
y = data['target']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))

Expected Output

Printed scores for all four metrics.
Clear understanding of model behavior with imbalanced data.

Common Mistakes

❌ Using accuracy on imbalanced datasets.
❌ Not considering the business context when selecting metrics.
❌ Ignoring precision-recall trade-offs.

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon

Key Characteristics

Basic Rules

Syntax Table

Syntax Explanation

1. Accuracy

2. Precision

3. Recall

4. F1 Score

Real-Life Project: Evaluate a Classifier on Imbalanced Dataset

Objective

Code Example

Expected Output

Common Mistakes

Further Reading

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

Login