Model Evaluation Metrics Overview in Scikit-learn

Evaluating machine learning models is crucial for understanding their performance and guiding model improvement. Scikit-learn provides a wide range of metrics for classification, regression, and clustering tasks.

Key Characteristics

Task-Specific Metrics: Classification, regression, clustering
Supports Binary, Multiclass, Multilabel Problems
Customizable Scoring Options
Easy Integration with GridSearchCV and cross_val_score

Basic Rules

Choose metrics aligned with business or scientific goals.
For imbalanced classes, use metrics beyond accuracy.
For regression, evaluate both error and fit.
Use make_scorer to create custom scoring functions.

Syntax Table

SL NO	Metric Type	Function Name	Syntax Example	Description
1	Classification	`accuracy_score`	`accuracy_score(y_true, y_pred)`	Proportion of correct predictions
2	Classification	`precision_score`	`precision_score(y_true, y_pred)`	True positives / predicted positives
3	Classification	`recall_score`	`recall_score(y_true, y_pred)`	True positives / actual positives
4	Classification	`f1_score`	`f1_score(y_true, y_pred)`	Harmonic mean of precision and recall
5	Regression	`mean_squared_error`	`mean_squared_error(y_true, y_pred)`	Average of squared errors
6	Regression	`r2_score`	`r2_score(y_true, y_pred)`	Coefficient of determination
7	Clustering	`silhouette_score`	`silhouette_score(X, labels)`	How well-separated the clusters are

Syntax Explanation

1. Accuracy Score

What is it? Basic metric to evaluate classification.

from sklearn.metrics import accuracy_score
y_true = [0, 1, 1, 0]
y_pred = [0, 0, 1, 1]
print(accuracy_score(y_true, y_pred))

Explanation:

Measures fraction of correctly classified instances.
Not reliable for imbalanced datasets.

2. Precision Score

What is it? Measures exactness in classification.

from sklearn.metrics import precision_score
print(precision_score(y_true, y_pred))

Explanation:

High precision means few false positives.
Useful when false positives are costly.

3. Recall Score

What is it? Measures completeness of classification.

from sklearn.metrics import recall_score
print(recall_score(y_true, y_pred))

Explanation:

High recall means few false negatives.
Important when missing positives is costly.

4. F1 Score

What is it? Combines precision and recall.

from sklearn.metrics import f1_score
print(f1_score(y_true, y_pred))

Explanation:

Useful when precision and recall are equally important.
Balances false positives and false negatives.

5. Mean Squared Error (MSE)

What is it? Common metric for regression.

from sklearn.metrics import mean_squared_error
y_true = [3.0, -0.5, 2.0, 7.0]
y_pred = [2.5, 0.0, 2.1, 7.8]
print(mean_squared_error(y_true, y_pred))

Explanation:

Penalizes larger errors more severely.
Sensitive to outliers.

6. R^2 Score

What is it? Measures goodness of fit.

from sklearn.metrics import r2_score
print(r2_score(y_true, y_pred))

Explanation:

Value between 0 and 1 for regression fit.
Closer to 1 means better prediction.

7. Silhouette Score (Clustering)

What is it? Evaluates cohesion and separation.

from sklearn.metrics import silhouette_score
silhouette_score(X, labels)

Explanation:

Measures how well each point fits into its cluster.
Value near 1 indicates good clustering.

Real-Life Project: Evaluate a Classifier with F1 and Accuracy

Objective

Train and evaluate a logistic regression model using classification metrics.

Code Example

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score

# Load dataset
data = pd.read_csv('binary_classification.csv')
X = data.drop('target', axis=1)
y = data['target']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))

Expected Output

Classification metrics such as accuracy and F1 score.
Model performance report ready for review.

Common Mistakes

❌ Using accuracy alone on imbalanced data.
❌ Confusing precision and recall.
❌ Ignoring regression vs classification context.

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon

Key Characteristics

Basic Rules

Syntax Table

Syntax Explanation

1. Accuracy Score

2. Precision Score

3. Recall Score

4. F1 Score

5. Mean Squared Error (MSE)

6. R^2 Score

7. Silhouette Score (Clustering)

Real-Life Project: Evaluate a Classifier with F1 and Accuracy

Objective

Code Example

Expected Output

Common Mistakes

Further Reading

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

Login