Randomized search is an efficient hyperparameter tuning method that samples a fixed number of parameter settings from a specified distribution. Unlike grid search, which tries all combinations, randomized search explores a subset and is faster for large parameter spaces.
Key Characteristics
- Efficient for Large Search Spaces
- Samples from Distributions
- Supports Cross-Validation
- Reduces Computation Time
Basic Rules
- Use when the parameter space is large.
- Prefer distributions like
uniform,randint, or lists. - Control the number of iterations with
n_iter. - Always scale the input data if required by the model.
Syntax Table
| SL NO | Function/Tool | Syntax Example | Description |
|---|---|---|---|
| 1 | Import RandomizedSearchCV | from sklearn.model_selection import RandomizedSearchCV |
Import the tool |
| 2 | Define Distributions | param_dist = {'n_neighbors': randint(1, 30)} |
Distributions for parameter sampling |
| 3 | Setup Randomized Search | search = RandomizedSearchCV(model, param_dist, n_iter=10, cv=5) |
Create search object |
| 4 | Fit Search | search.fit(X, y) |
Run search and fit model |
| 5 | Access Results | search.best_params_, search.best_score_ |
Access best result |
Syntax Explanation
1. Import RandomizedSearchCV
What is it? Tool for random sampling of hyperparameters combined with cross-validation.
Syntax:
from sklearn.model_selection import RandomizedSearchCV
Explanation:
- Required to initialize the random search engine.
- Works similar to GridSearchCV but faster on large grids.
2. Define Parameter Distributions
What is it? Dictionary with values as distributions or lists.
Syntax:
from scipy.stats import randint
param_dist = {'n_neighbors': randint(1, 30)}
Explanation:
- Uses scipy’s distribution functions.
- Can use
uniform,randint, or simple lists. - Allows more flexibility than grid search.
3. Setup RandomizedSearchCV
What is it? Set the model, parameter space, and number of iterations.
Syntax:
search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy')
Explanation:
n_iter: Number of random combinations to try.cv: Cross-validation folds.scoring: Evaluation metric.random_state: Optional for reproducibility.
4. Fit the Randomized Search
What is it? Trains the model using different parameter combinations.
Syntax:
search.fit(X_scaled, y)
Explanation:
- Internally fits models and evaluates using cross-validation.
- Much faster than exhaustive grid search.
5. Access Results
What is it? Get the optimal configuration and best model.
Syntax:
print(search.best_params_)
print(search.best_score_)
Explanation:
best_params_shows the selected parameter combination.best_estimator_returns the full model.
Real-Life Project: Tuning KNN with Randomized Search
Objective
Efficiently tune the number of neighbors for a KNN classifier using randomized search.
Code Example
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from scipy.stats import randint
# Load and preprocess data
data = pd.read_csv('classification_data.csv')
X = data.drop('target', axis=1)
y = data['target']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Define model and search space
model = KNeighborsClassifier()
param_dist = {'n_neighbors': randint(1, 30)}
# Randomized search
search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy', random_state=42)
search.fit(X_scaled, y)
# Results
print("Best Parameters:", search.best_params_)
print("Best Score:", search.best_score_)
Expected Output
- Best sampled parameter (e.g.,
{'n_neighbors': 7}) - Corresponding cross-validation accuracy.
Common Mistakes
- ❌ Forgetting to set
n_iter(defaults to 10). - ❌ Using overly broad/unreasonable distributions.
- ❌ Omitting data scaling for distance-based models.
Further Reading
- Scikit-learn RandomizedSearchCV Docs
- Scipy Distributions for Hyperparameter Search
- Random Search vs Grid Search Blog (ML Mastery)
