Integration of Scikit-learn with Matplotlib and Seaborn

Integrating Scikit-learn with Matplotlib and Seaborn allows users to visualize data distributions, model performance, feature relationships, and decision boundaries. These visual insights are crucial for model evaluation, diagnostics, and presentations.

Key Characteristics

Enhances interpretability through visualizations
Useful for EDA (Exploratory Data Analysis) and model diagnostics
Compatible with Scikit-learn’s outputs like predictions, feature importance, confusion matrices, etc.
Enables plotting decision boundaries, correlation heatmaps, and distribution plots

Basic Rules

Use Matplotlib for low-level, customizable plotting
Use Seaborn for high-level, attractive statistical plots
Integrate visualizations at various steps: before training (EDA), during model evaluation, and after prediction
Convert NumPy arrays or Scikit-learn outputs into Pandas DataFrames for Seaborn compatibility

Syntax Table

SL NO	Task	Syntax Example	Description
1	Import Libraries	`import matplotlib.pyplot as plt`	Loads Matplotlib for plotting
		`import seaborn as sns`	Loads Seaborn for statistical plots
2	Plot Confusion Matrix	`sns.heatmap(cm, annot=True)`	Visualizes classification performance
3	Plot Feature Distribution	`sns.histplot(df['feature'])`	Shows distribution of a single feature
4	Scatter Plot with Hue	`sns.scatterplot(x=..., y=..., hue=...)`	Visualizes feature relationships
5	Decision Boundary (2D)	`plt.contourf(xx, yy, Z)`	Plots classifier decision boundaries

Syntax Explanation

1. Import Libraries

What is it?
Loads Matplotlib and Seaborn.

Syntax:

import matplotlib.pyplot as plt
import seaborn as sns

Explanation:

matplotlib.pyplot is used for flexible, low-level charting.
seaborn is built on top of Matplotlib, offering a simplified interface for statistical plots with built-in themes.

2. Plot Confusion Matrix

What is it?
Displays confusion matrix results as a heatmap.

Syntax:

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("Actual")

Explanation:

cm is typically obtained via confusion_matrix(y_test, y_pred).
annot=True displays the numbers inside cells.
fmt='d' specifies integer format.
cmap='Blues' applies a blue gradient for clarity.

3. Plot Feature Distribution

What is it?
Visualizes the distribution of a single feature or class.

Syntax:

sns.histplot(df['feature'], kde=True)

Explanation:

Shows the frequency of data points within intervals.
kde=True overlays a Kernel Density Estimate curve.
Helpful for checking normality or skew in data.

4. Scatter Plot with Hue

What is it?
Plots relationships between two numeric features, colored by class.

Syntax:

sns.scatterplot(x='feature1', y='feature2', hue='label', data=df)

Explanation:

Useful for visualizing separation or clusters by label.
hue defines color mapping based on categorical column.
Common in binary or multiclass classification visuals.

5. Plot Decision Boundary

What is it?
Shows the boundary regions learned by a classifier in 2D.

Syntax:

plt.contourf(xx, yy, Z, cmap=plt.cm.RdBu, alpha=0.6)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')

Explanation:

Requires meshgrid (xx, yy) and predictions Z = model.predict(...).
contourf() fills the regions separated by class.
Effective for classifiers like SVM, Logistic Regression, KNN in 2D.

Real-Life Project: Visualizing Decision Boundaries in Iris Dataset

Project Overview

Visualize how a classifier (e.g., Logistic Regression) separates classes in the Iris dataset.

Code Example

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load and prepare data
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.iloc[:, [2, 3]].values  # use petal length and width
y = df['target']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)

# Meshgrid for plotting
x_min, x_max = X_scaled[:, 0].min() - 1, X_scaled[:, 0].max() + 1
y_min, y_max = X_scaled[:, 1].min() - 1, X_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                     np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot
plt.figure(figsize=(10,6))
plt.contourf(xx, yy, Z, alpha=0.3)
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=y, palette="deep")
plt.title("Decision Boundary - Logistic Regression on Iris")
plt.xlabel("Petal Length (standardized)")
plt.ylabel("Petal Width (standardized)")
plt.show()

Expected Output

Scatter plot overlaid with decision regions
Differentiated classes via color-coded hues

Common Mistakes to Avoid

❌ Using raw NumPy arrays directly in Seaborn (prefer Pandas DataFrames)
❌ Not standardizing data before plotting decision boundaries
❌ Forgetting to adjust figure size or labels for clarity

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon

Key Characteristics

Basic Rules

Syntax Table

Syntax Explanation

1. Import Libraries

2. Plot Confusion Matrix

3. Plot Feature Distribution

4. Scatter Plot with Hue

5. Plot Decision Boundary

Real-Life Project: Visualizing Decision Boundaries in Iris Dataset

Project Overview

Code Example

Expected Output

Common Mistakes to Avoid

Further Reading Recommendation

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

Login