Grid search is an exhaustive search technique used to find the optimal hyperparameters for machine learning models. Scikit-learn offers GridSearchCV, a powerful tool that automates this search using cross-validation to evaluate performance.
Key Characteristics
- Exhaustive Hyperparameter Search
- Integrates with Cross-Validation
- Returns Best Model Automatically
- Tracks Scores for All Parameter Combinations
Basic Rules
- Always scale your data before fitting the model if required.
- Define a reasonable search space to avoid excessive computation.
- Use
cvto control the cross-validation process. - Combine with scoring metrics like
accuracy,f1, etc.
Syntax Table
| SL NO | Function/Tool | Syntax Example | Description |
|---|---|---|---|
| 1 | Import GridSearchCV | from sklearn.model_selection import GridSearchCV |
Imports the tool |
| 2 | Define Param Grid | param_grid = {'n_neighbors': [3, 5, 7]} |
Parameter space to search |
| 3 | Setup Grid Search | grid = GridSearchCV(model, param_grid, cv=5) |
Defines grid search object |
| 4 | Fit Search | grid.fit(X, y) |
Conducts the search and fits models |
| 5 | Access Results | grid.best_params_, grid.best_score_ |
Gets the best parameters and CV score |
Syntax Explanation
1. Import GridSearchCV
What is it? The class that conducts exhaustive hyperparameter search with cross-validation.
Syntax:
from sklearn.model_selection import GridSearchCV
Explanation:
- Required to instantiate and run grid search.
- Resides in the
model_selectionmodule.
2. Define Parameter Grid
What is it? A dictionary of hyperparameters to try.
Syntax:
param_grid = {'n_neighbors': [3, 5, 7]}
Explanation:
- Keys are parameter names, values are lists of values to search.
- Can include nested estimators like
classifier__C.
3. Setup GridSearchCV
What is it? Configures the search strategy, model, and evaluation method.
Syntax:
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')
Explanation:
cv: Number of cross-validation folds.scoring: Metric to optimize.refit=Trueallows access to the best model after search.
4. Fit the Grid Search
What is it? Runs all combinations of hyperparameters and evaluates via cross-validation.
Syntax:
grid.fit(X_scaled, y)
Explanation:
- Trains multiple models internally.
- Cross-validation is done on each hyperparameter setting.
5. Access the Results
What is it? Retrieves the best model, score, and parameter set.
Syntax:
print(grid.best_params_)
print(grid.best_score_)
Explanation:
- Returns best parameter combination and associated performance score.
best_estimator_gives direct access to the best-fit model.
Real-Life Project: Tuning a KNN Model with Grid Search
Objective
Use grid search to optimize the number of neighbors in a KNN classifier.
Code Example
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
# Load and scale data
data = pd.read_csv('classification_data.csv')
X = data.drop('target', axis=1)
y = data['target']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Define model and search grid
model = KNeighborsClassifier()
param_grid = {'n_neighbors': [3, 5, 7, 9]}
# Grid Search
grid = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')
grid.fit(X_scaled, y)
# Results
print("Best Parameters:", grid.best_params_)
print("Best Score:", grid.best_score_)
Expected Output
- Best parameter set (e.g.,
{'n_neighbors': 5}) - Highest cross-validated score.
- Access to the best estimator.
Common Mistakes
- ❌ Not standardizing features before applying grid search.
- ❌ Including too many hyperparameters (computational cost).
- ❌ Using test set inside grid search.
Further Reading
- Scikit-learn GridSearchCV Docs
- Hyperparameter Optimization Blog (ML Mastery)
- Grid Search vs Randomized Search (Kaggle)
