Grid Search for Hyperparameter Tuning in Scikit-learn

Grid search is an exhaustive search technique used to find the optimal hyperparameters for machine learning models. Scikit-learn offers GridSearchCV, a powerful tool that automates this search using cross-validation to evaluate performance.

Key Characteristics

  • Exhaustive Hyperparameter Search
  • Integrates with Cross-Validation
  • Returns Best Model Automatically
  • Tracks Scores for All Parameter Combinations

Basic Rules

  • Always scale your data before fitting the model if required.
  • Define a reasonable search space to avoid excessive computation.
  • Use cv to control the cross-validation process.
  • Combine with scoring metrics like accuracy, f1, etc.

Syntax Table

SL NO Function/Tool Syntax Example Description
1 Import GridSearchCV from sklearn.model_selection import GridSearchCV Imports the tool
2 Define Param Grid param_grid = {'n_neighbors': [3, 5, 7]} Parameter space to search
3 Setup Grid Search grid = GridSearchCV(model, param_grid, cv=5) Defines grid search object
4 Fit Search grid.fit(X, y) Conducts the search and fits models
5 Access Results grid.best_params_, grid.best_score_ Gets the best parameters and CV score

Syntax Explanation

1. Import GridSearchCV

What is it? The class that conducts exhaustive hyperparameter search with cross-validation.

Syntax:

from sklearn.model_selection import GridSearchCV

Explanation:

  • Required to instantiate and run grid search.
  • Resides in the model_selection module.

2. Define Parameter Grid

What is it? A dictionary of hyperparameters to try.

Syntax:

param_grid = {'n_neighbors': [3, 5, 7]}

Explanation:

  • Keys are parameter names, values are lists of values to search.
  • Can include nested estimators like classifier__C.

3. Setup GridSearchCV

What is it? Configures the search strategy, model, and evaluation method.

Syntax:

grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')

Explanation:

  • cv: Number of cross-validation folds.
  • scoring: Metric to optimize.
  • refit=True allows access to the best model after search.

4. Fit the Grid Search

What is it? Runs all combinations of hyperparameters and evaluates via cross-validation.

Syntax:

grid.fit(X_scaled, y)

Explanation:

  • Trains multiple models internally.
  • Cross-validation is done on each hyperparameter setting.

5. Access the Results

What is it? Retrieves the best model, score, and parameter set.

Syntax:

print(grid.best_params_)
print(grid.best_score_)

Explanation:

  • Returns best parameter combination and associated performance score.
  • best_estimator_ gives direct access to the best-fit model.

Real-Life Project: Tuning a KNN Model with Grid Search

Objective

Use grid search to optimize the number of neighbors in a KNN classifier.

Code Example

import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler

# Load and scale data
data = pd.read_csv('classification_data.csv')
X = data.drop('target', axis=1)
y = data['target']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Define model and search grid
model = KNeighborsClassifier()
param_grid = {'n_neighbors': [3, 5, 7, 9]}

# Grid Search
grid = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')
grid.fit(X_scaled, y)

# Results
print("Best Parameters:", grid.best_params_)
print("Best Score:", grid.best_score_)

Expected Output

  • Best parameter set (e.g., {'n_neighbors': 5})
  • Highest cross-validated score.
  • Access to the best estimator.

Common Mistakes

  • ❌ Not standardizing features before applying grid search.
  • ❌ Including too many hyperparameters (computational cost).
  • ❌ Using test set inside grid search.

Further Reading

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon