Evaluating machine learning models is crucial for understanding their performance and guiding model improvement. Scikit-learn provides a wide range of metrics for classification, regression, and clustering tasks.
Key Characteristics
- Task-Specific Metrics: Classification, regression, clustering
- Supports Binary, Multiclass, Multilabel Problems
- Customizable Scoring Options
- Easy Integration with
GridSearchCVandcross_val_score
Basic Rules
- Choose metrics aligned with business or scientific goals.
- For imbalanced classes, use metrics beyond accuracy.
- For regression, evaluate both error and fit.
- Use
make_scorerto create custom scoring functions.
Syntax Table
| SL NO | Metric Type | Function Name | Syntax Example | Description |
|---|---|---|---|---|
| 1 | Classification | accuracy_score |
accuracy_score(y_true, y_pred) |
Proportion of correct predictions |
| 2 | Classification | precision_score |
precision_score(y_true, y_pred) |
True positives / predicted positives |
| 3 | Classification | recall_score |
recall_score(y_true, y_pred) |
True positives / actual positives |
| 4 | Classification | f1_score |
f1_score(y_true, y_pred) |
Harmonic mean of precision and recall |
| 5 | Regression | mean_squared_error |
mean_squared_error(y_true, y_pred) |
Average of squared errors |
| 6 | Regression | r2_score |
r2_score(y_true, y_pred) |
Coefficient of determination |
| 7 | Clustering | silhouette_score |
silhouette_score(X, labels) |
How well-separated the clusters are |
Syntax Explanation
1. Accuracy Score
What is it? Basic metric to evaluate classification.
from sklearn.metrics import accuracy_score
y_true = [0, 1, 1, 0]
y_pred = [0, 0, 1, 1]
print(accuracy_score(y_true, y_pred))
Explanation:
- Measures fraction of correctly classified instances.
- Not reliable for imbalanced datasets.
2. Precision Score
What is it? Measures exactness in classification.
from sklearn.metrics import precision_score
print(precision_score(y_true, y_pred))
Explanation:
- High precision means few false positives.
- Useful when false positives are costly.
3. Recall Score
What is it? Measures completeness of classification.
from sklearn.metrics import recall_score
print(recall_score(y_true, y_pred))
Explanation:
- High recall means few false negatives.
- Important when missing positives is costly.
4. F1 Score
What is it? Combines precision and recall.
from sklearn.metrics import f1_score
print(f1_score(y_true, y_pred))
Explanation:
- Useful when precision and recall are equally important.
- Balances false positives and false negatives.
5. Mean Squared Error (MSE)
What is it? Common metric for regression.
from sklearn.metrics import mean_squared_error
y_true = [3.0, -0.5, 2.0, 7.0]
y_pred = [2.5, 0.0, 2.1, 7.8]
print(mean_squared_error(y_true, y_pred))
Explanation:
- Penalizes larger errors more severely.
- Sensitive to outliers.
6. R^2 Score
What is it? Measures goodness of fit.
from sklearn.metrics import r2_score
print(r2_score(y_true, y_pred))
Explanation:
- Value between 0 and 1 for regression fit.
- Closer to 1 means better prediction.
7. Silhouette Score (Clustering)
What is it? Evaluates cohesion and separation.
from sklearn.metrics import silhouette_score
silhouette_score(X, labels)
Explanation:
- Measures how well each point fits into its cluster.
- Value near 1 indicates good clustering.
Real-Life Project: Evaluate a Classifier with F1 and Accuracy
Objective
Train and evaluate a logistic regression model using classification metrics.
Code Example
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
# Load dataset
data = pd.read_csv('binary_classification.csv')
X = data.drop('target', axis=1)
y = data['target']
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model training
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
Expected Output
- Classification metrics such as accuracy and F1 score.
- Model performance report ready for review.
Common Mistakes
- ❌ Using accuracy alone on imbalanced data.
- ❌ Confusing precision and recall.
- ❌ Ignoring regression vs classification context.
Further Reading
- Scikit-learn Metrics Documentation
- Choosing the Right Metric (Blog)
- Understanding R^2 Score (Kaggle)
