Randomized Search CV with Scikit-learn

Randomized search is an efficient hyperparameter tuning method that samples a fixed number of parameter settings from a specified distribution. Unlike grid search, which tries all combinations, randomized search explores a subset and is faster for large parameter spaces.

Key Characteristics

  • Efficient for Large Search Spaces
  • Samples from Distributions
  • Supports Cross-Validation
  • Reduces Computation Time

Basic Rules

  • Use when the parameter space is large.
  • Prefer distributions like uniform, randint, or lists.
  • Control the number of iterations with n_iter.
  • Always scale the input data if required by the model.

Syntax Table

SL NO Function/Tool Syntax Example Description
1 Import RandomizedSearchCV from sklearn.model_selection import RandomizedSearchCV Import the tool
2 Define Distributions param_dist = {'n_neighbors': randint(1, 30)} Distributions for parameter sampling
3 Setup Randomized Search search = RandomizedSearchCV(model, param_dist, n_iter=10, cv=5) Create search object
4 Fit Search search.fit(X, y) Run search and fit model
5 Access Results search.best_params_, search.best_score_ Access best result

Syntax Explanation

1. Import RandomizedSearchCV

What is it? Tool for random sampling of hyperparameters combined with cross-validation.

Syntax:

from sklearn.model_selection import RandomizedSearchCV

Explanation:

  • Required to initialize the random search engine.
  • Works similar to GridSearchCV but faster on large grids.

2. Define Parameter Distributions

What is it? Dictionary with values as distributions or lists.

Syntax:

from scipy.stats import randint
param_dist = {'n_neighbors': randint(1, 30)}

Explanation:

  • Uses scipy’s distribution functions.
  • Can use uniform, randint, or simple lists.
  • Allows more flexibility than grid search.

3. Setup RandomizedSearchCV

What is it? Set the model, parameter space, and number of iterations.

Syntax:

search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy')

Explanation:

  • n_iter: Number of random combinations to try.
  • cv: Cross-validation folds.
  • scoring: Evaluation metric.
  • random_state: Optional for reproducibility.

4. Fit the Randomized Search

What is it? Trains the model using different parameter combinations.

Syntax:

search.fit(X_scaled, y)

Explanation:

  • Internally fits models and evaluates using cross-validation.
  • Much faster than exhaustive grid search.

5. Access Results

What is it? Get the optimal configuration and best model.

Syntax:

print(search.best_params_)
print(search.best_score_)

Explanation:

  • best_params_ shows the selected parameter combination.
  • best_estimator_ returns the full model.

Real-Life Project: Tuning KNN with Randomized Search

Objective

Efficiently tune the number of neighbors for a KNN classifier using randomized search.

Code Example

import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from scipy.stats import randint

# Load and preprocess data
data = pd.read_csv('classification_data.csv')
X = data.drop('target', axis=1)
y = data['target']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Define model and search space
model = KNeighborsClassifier()
param_dist = {'n_neighbors': randint(1, 30)}

# Randomized search
search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy', random_state=42)
search.fit(X_scaled, y)

# Results
print("Best Parameters:", search.best_params_)
print("Best Score:", search.best_score_)

Expected Output

  • Best sampled parameter (e.g., {'n_neighbors': 7})
  • Corresponding cross-validation accuracy.

Common Mistakes

  • ❌ Forgetting to set n_iter (defaults to 10).
  • ❌ Using overly broad/unreasonable distributions.
  • ❌ Omitting data scaling for distance-based models.

Further Reading

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon