Feature importance analysis helps identify which input features have the most influence on a model’s predictions. This is crucial for interpretability, feature selection, and improving model performance. Scikit-learn offers multiple ways to compute feature importance, depending on the model type.
Key Characteristics
- Provides insight into model behavior
- Useful for feature selection and dimensionality reduction
- Supported by tree-based models, linear models, and permutation methods
- Can be visualized for better interpretability
Basic Rules
- Use model-specific
.feature_importances_
for tree-based models - Use
.coef_
for linear models (after scaling) - Apply
permutation_importance()
for model-agnostic insights - Normalize or scale data for linear models to get accurate importances
Syntax Table
SL NO | Technique | Syntax Example | Description |
---|---|---|---|
1 | Tree-based Importance | model.feature_importances_ |
Returns importance scores for each feature |
2 | Linear Model Coefficients | model.coef_ |
Coefficients representing feature weights |
3 | Permutation Importance | permutation_importance(model, X, y) |
Model-agnostic importance scores |
4 | Visualizing Importance | plt.barh(range(len(importances)), importances) |
Plots the importance scores |
5 | Sorting Importances | np.argsort(importances)[::-1] |
Ranks features from most to least important |
Syntax Explanation
1. Tree-based Feature Importance
What is it?
Extracts feature importance directly from tree-based models like RandomForest or GradientBoosting.
Syntax:
model.feature_importances_
Explanation:
- Returns an array of importance scores (summing to 1).
- Measures the average reduction in impurity brought by each feature.
- Works with
RandomForestClassifier
,GradientBoostingClassifier
, etc.
2. Linear Model Coefficients
What is it?
Uses the absolute magnitude of coefficients as a proxy for feature importance.
Syntax:
model.coef_
Explanation:
- Must scale features before interpretation (e.g., using
StandardScaler
). - Positive/negative values indicate direction of influence.
- Suitable for
LogisticRegression
,Ridge
,Lasso
, etc.
3. Permutation Importance
What is it?
Measures decrease in model performance when each feature is randomly shuffled.
Syntax:
from sklearn.inspection import permutation_importance
results = permutation_importance(model, X_test, y_test)
Explanation:
- Model-agnostic; works with any estimator.
- Requires a fitted model and evaluation data.
- Results include
importances_mean
andimportances_std
.
4. Visualizing Importance
What is it?
Plots feature importances using a horizontal bar chart.
Syntax:
import matplotlib.pyplot as plt
plt.barh(range(len(importances)), importances)
Explanation:
- Provides a clear view of feature rankings.
- Combine with
argsort
to order features. - Useful in presentations and model explainability.
5. Sorting Importances
What is it?
Ranks feature indices based on importance.
Syntax:
import numpy as np
sorted_idx = np.argsort(importances)[::-1]
Explanation:
- Helps list top-N important features.
- Can be used to reorder plots or reduce feature space.
Real-Life Project: Customer Churn Prediction
Project Overview
Identify key drivers of customer churn using feature importance from a Random Forest model.
Code Example
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import numpy as np
# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Get feature importances
importances = model.feature_importances_
sorted_idx = np.argsort(importances)
# Plot
plt.barh(range(len(importances)), importances[sorted_idx])
plt.yticks(range(len(importances)), [f"Feature {i}" for i in sorted_idx])
plt.xlabel("Importance")
plt.title("Feature Importance")
plt.show()
Expected Output
- Bar chart showing most to least important features
- Insight into which features affect churn decisions most
Common Mistakes to Avoid
- ❌ Interpreting unscaled coefficients from linear models
- ❌ Assuming correlation = importance
- ❌ Ignoring permutation variance in small datasets