Custom Scoring Functions in Scikit-learn

Custom scoring functions in Scikit-learn allow users to define personalized evaluation metrics to better suit specific business or domain requirements. These scoring functions can be used in model evaluation, cross-validation, and hyperparameter tuning.

Key Characteristics

  • Tailored to specific use-cases and domain needs
  • Integrated with GridSearchCV, cross_val_score, and make_scorer
  • Can use model predictions and probabilities
  • Works for classification, regression, or clustering tasks

Basic Rules

  • Always return a numeric score (higher is better for scoring)
  • For classification: Accept y_true and y_pred
  • For regression: Accept y_true and y_pred
  • If using probabilities, set needs_proba=True in make_scorer

Syntax Table

SL NO Technique Syntax Example Description
1 Import Scorer from sklearn.metrics import make_scorer Loads function to create custom scorer
2 Define Function def my_score(y_true, y_pred): ... Custom metric logic
3 Create Scorer scorer = make_scorer(my_score) Converts function into scikit-learn compatible
4 Use in GridSearch GridSearchCV(..., scoring=scorer) Applies custom scorer to tuning
5 Use in CV cross_val_score(model, X, y, scoring=scorer) Evaluates model with custom score

Syntax Explanation

1. Import Scorer

What is it?
Function to convert a user-defined metric into a Scikit-learn scoring object.

Syntax:

from sklearn.metrics import make_scorer

Explanation:

  • Required to use custom scoring in model selection APIs
  • Enables compatibility with GridSearchCV and cross_val_score

2. Define Custom Function

What is it?
User-defined function that calculates a custom metric.

Syntax:

def my_score(y_true, y_pred):
    return custom_logic_here

Explanation:

  • Must accept y_true and y_pred (or y_score if using probabilities)
  • Must return a float (the higher the score, the better the model)
  • Can use numpy, scikit-learn, or domain-specific math

3. Create Scorer

What is it?
Converts the raw Python function into a Scikit-learn-compatible scorer.

Syntax:

scorer = make_scorer(my_score, greater_is_better=True)

Explanation:

  • greater_is_better=True tells Scikit-learn to maximize the score
  • Can also use needs_proba=True if using predicted probabilities
  • Ensures integration with all model evaluation tools

4. Use in GridSearch

What is it?
Applies the custom scoring metric during hyperparameter tuning.

Syntax:

from sklearn.model_selection import GridSearchCV
gs = GridSearchCV(model, param_grid, scoring=scorer)

Explanation:

  • Plug your custom scorer directly into grid search
  • Allows model selection based on your specific metric
  • Works with RandomizedSearchCV too

5. Use in Cross-Validation

What is it?
Evaluates the model performance using the custom metric during cross-validation.

Syntax:

from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, scoring=scorer)

Explanation:

  • Computes score across folds using your function
  • Provides reliable estimate for model generalization
  • Returns list of scores that can be averaged or plotted

Real-Life Project: Custom F1 Scoring for Fraud Detection

Project Overview

Optimize a classifier based on a custom F1-score function emphasizing fraud (minority class).

Code Example

from sklearn.metrics import make_scorer, f1_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.datasets import make_classification

# Create imbalanced dataset
X, y = make_classification(n_classes=2, weights=[0.9, 0.1], n_samples=1000, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Define custom F1 scorer
custom_f1 = make_scorer(f1_score, pos_label=1)

# Model tuning
param_grid = {'n_estimators': [50, 100]}
model = RandomForestClassifier()
gs = GridSearchCV(model, param_grid, scoring=custom_f1)
gs.fit(X_train, y_train)

Expected Output

  • GridSearchCV optimized for F1 on minority class
  • Best estimator tuned using custom score

Common Mistakes to Avoid

  • ❌ Returning non-numeric values from the scoring function
  • ❌ Forgetting to use make_scorer
  • ❌ Using metrics incompatible with model type (e.g., F1 on regression)

Further Reading Recommendation

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon