Ridge and Lasso Regression using Scikit-learn

Regularized regression techniques like Ridge and Lasso are powerful tools for handling multicollinearity and preventing overfitting in linear models. They add penalty terms to the loss function, shrinking coefficients and improving generalization. Scikit-learn provides both via Ridge and Lasso classes.

Key Characteristics of Ridge and Lasso Regression

  • Regularization: Penalizes large coefficients to reduce overfitting.
  • Ridge (L2): Shrinks coefficients but keeps all variables.
  • Lasso (L1): Shrinks some coefficients to zero, enabling feature selection.
  • Works Like Linear Regression: Similar API with added regularization strength.
  • Useful with Multicollinearity: Helps when predictors are correlated.

Basic Rules for Ridge and Lasso Regression

  • Normalize features before applying (use StandardScaler).
  • Use alpha to control the strength of the penalty.
  • Ridge is better for multicollinearity, Lasso for sparse feature selection.
  • Tune alpha using cross-validation (e.g., RidgeCV, LassoCV).
  • Evaluate performance using RMSE and R² metrics.

Syntax Table

SL NO Function Syntax Example Description
1 Ridge Model Ridge(alpha=1.0) Adds L2 regularization
2 Lasso Model Lasso(alpha=0.1) Adds L1 regularization
3 Scaling StandardScaler().fit_transform(X) Standardizes features
4 Cross-Validation RidgeCV(alphas=[0.1, 1.0, 10.0]) Finds optimal alpha
5 Coefficients View model.coef_ Displays model coefficients

Syntax Explanation

1. Ridge Regression

  • What is it? A linear regression model with L2 regularization that penalizes large coefficients to prevent overfitting.
  • Syntax:
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
  • Explanation:
    • Adds a penalty equal to the square of the magnitude of coefficients.
    • Helps in cases of multicollinearity.
    • Does not eliminate any features.

2. Lasso Regression

  • What is it? A linear regression model with L1 regularization that can shrink some coefficients to zero for feature selection.
  • Syntax:
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
  • Explanation:
    • Adds a penalty equal to the absolute value of coefficients.
    • Encourages sparse models (zero coefficients for less important features).
    • Good for datasets with many features.

3. Feature Scaling

  • What is it? Standardizing features so they contribute equally to the model.
  • Syntax:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
  • Explanation:
    • Required before applying Ridge or Lasso.
    • Ensures penalty is applied fairly across features.

4. Cross-Validation for Hyperparameter Tuning

  • What is it? A method to find the best alpha (regularization strength) using multiple train-test splits.
  • Syntax:
from sklearn.linear_model import RidgeCV
model_cv = RidgeCV(alphas=[0.1, 1.0, 10.0], cv=5)
model_cv.fit(X_train, y_train)
  • Explanation:
    • alphas is a list of candidate values.
    • Automatically selects the best performing one.

5. Evaluating the Model

  • What is it? Assessing model performance using prediction metrics.
  • Syntax:
from sklearn.metrics import mean_squared_error, r2_score
pred = model.predict(X_test)
rmse = mean_squared_error(y_test, pred, squared=False)
r2 = r2_score(y_test, pred)
  • Explanation:
    • RMSE shows average prediction error.
    • R² reveals how well the features explain target variance.

Real-Life Project: Predicting Car Prices

Project Name

Car Price Prediction with Ridge and Lasso

Project Overview

This project uses Ridge and Lasso regression to predict used car prices based on engine size, age, mileage, and other numeric features. It compares the effect of regularization on model performance.

Project Goal

  • Compare Ridge and Lasso for price prediction
  • Visualize which features get eliminated by Lasso
  • Evaluate models using RMSE and R²

Code for This Project

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score

# Load data
data = pd.read_csv('used_cars.csv')
X = data.drop('Price', axis=1)
y = data['Price']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(X_train_scaled, y_train)
pred_ridge = ridge.predict(X_test_scaled)

# Lasso
lasso = Lasso(alpha=0.1)
lasso.fit(X_train_scaled, y_train)
pred_lasso = lasso.predict(X_test_scaled)

# Evaluate
print("Ridge RMSE:", mean_squared_error(y_test, pred_ridge, squared=False))
print("Lasso RMSE:", mean_squared_error(y_test, pred_lasso, squared=False))

Expected Output

  • RMSE and R² values for both models
  • Lasso may drop features → zero coefficients
  • Ridge keeps all features but reduces overfitting

Common Mistakes to Avoid

  • ❌ Using unscaled data → skews regularization
  • ❌ Too high alpha → underfitting
  • ❌ Ignoring feature selection in Lasso
  • ❌ Comparing Lasso to OLS without scaling

Further Reading Recommendation

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning
by Sarful Hassan
🔗 Available on Amazon

Also explore: