Scikit-learn

Logistic Regression for Classification in Scikit-learn

Posted on June 4, 2025 by Lab

Logistic regression is a fundamental classification algorithm that models the probability of class membership using a logistic (sigmoid) function. Despite its name, logistic regression is used for binary and multi-class classification tasks. Scikit-learn offers a robust implementation through the LogisticRegression class.

Key Characteristics of Logistic Regression

Classification, Not Regression: Used for binary or multi-class classification.
Outputs Probabilities: Estimates the likelihood of each class.
Sigmoid Function: Converts linear combination of inputs to probability.
Interpretable Coefficients: Feature weights indicate importance.
Supports Regularization: Includes L1 and L2 penalties for generalization.

Basic Rules for Logistic Regression

Target variable should be categorical (e.g., 0/1, or class labels).
Scale features for better convergence.
For multi-class, use multi_class='multinomial'.
Use solver='liblinear', saga, or lbfgs depending on dataset size and penalty.
Evaluate using metrics like accuracy, precision, recall, and ROC-AUC.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Import Model	`from sklearn.linear_model import LogisticRegression`	Loads logistic regression class
2	Create Model	`LogisticRegression()`	Initializes logistic classifier
3	Train Model	`model.fit(X_train, y_train)`	Fits model to training data
4	Predict Labels	`model.predict(X_test)`	Predicts class labels
5	Predict Probabilities	`model.predict_proba(X_test)`	Gives class probabilities
6	Evaluate Accuracy	`accuracy_score(y_test, y_pred)`	Measures classification performance

Syntax Explanation

1. Import and Initialize Model

What is it? Loads the logistic regression model for classification.
Syntax:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

Explanation:
- Supports binary and multi-class classification.
- You can set regularization type using penalty and solver method.

2. Fit the Model

What is it? Trains the model on labeled data.
Syntax:

model.fit(X_train, y_train)

Explanation:
- Learns the coefficients of the logistic model.
- Uses the sigmoid/logit function internally.

3. Predict Class Labels

What is it? Predicts the most likely class for new data.
Syntax:

y_pred = model.predict(X_test)

Explanation:
- Returns 0 or 1 (or more for multi-class).
- Useful for final decisions.

4. Predict Probabilities

What is it? Outputs the probability of each class.
Syntax:

probs = model.predict_proba(X_test)

Explanation:
- Each row contains probabilities for each class.
- Used in ROC curves and threshold tuning.

5. Evaluate Accuracy

What is it? Measures how often the model predicts correctly.
Syntax:

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)

Explanation:
- Compares predicted and actual labels.
- Good for balanced datasets; use F1 or ROC-AUC for imbalanced ones.

Real-Life Project: Spam Email Classification

Project Name

Spam Detector with Logistic Regression

Project Overview

This project classifies email messages as spam or not spam based on word frequencies and text features. Logistic regression offers a fast, interpretable, and effective solution.

Project Goal

Transform email text into numeric features
Train logistic model on labeled dataset
Evaluate prediction quality on unseen messages

Code for This Project

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load data
data = pd.read_csv('emails.csv')
X = data['message']
y = data['label']  # 0 = not spam, 1 = spam

# Text to numeric
vectorizer = CountVectorizer()
X_vec = vectorizer.fit_transform(X)

# Split
X_train, X_test, y_train, y_test = train_test_split(X_vec, y, test_size=0.2, random_state=42)

# Train
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Expected Output

Accuracy score
Precision, recall, and F1-score report
Working binary spam classifier

Common Mistakes to Avoid

❌ Not scaling numeric features if present
❌ Using wrong solver for large datasets
❌ Ignoring precision/recall on imbalanced data
❌ Overfitting by including too many irrelevant features

Polynomial Regression in Scikit-learn

Posted on June 4, 2025 by Lab

Polynomial regression allows linear models to fit nonlinear relationships by adding polynomial terms to the feature set. This technique enhances model flexibility while retaining the interpretability of linear regression. Scikit-learn offers an easy-to-use interface via PolynomialFeatures combined with LinearRegression.

Key Characteristics of Polynomial Regression

Extends Linear Regression: Captures curved trends by adding polynomial powers.
Works with Pipelines: Seamlessly integrate with Pipeline for preprocessing.
Degree Parameter: Controls model complexity and fit.
Requires Feature Scaling: Higher-degree terms may cause numeric instability.
Used for Curve Fitting: Ideal for modeling nonlinear patterns.

Basic Rules for Polynomial Regression

Scale your features if using high-degree polynomials.
Avoid too high a degree to prevent overfitting.
Combine with regularization (e.g., Ridge) for robust models.
Use train/test split or cross-validation to validate performance.
Always visualize predictions vs. actual values.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Polynomial Generator	`PolynomialFeatures(degree=2)`	Adds squared and interaction terms
2	Regression Model	`LinearRegression()`	Fits linear model on transformed features
3	Pipeline Integration	`Pipeline([...])`	Chains polynomial and regression together
4	Feature Scaling	`StandardScaler()`	Normalizes features for stability
5	Plotting Predictions	`plt.plot(X, model.predict(X))`	Visualizes the polynomial fit

Syntax Explanation

1. PolynomialFeatures

What is it? Transforms input features into polynomial combinations (e.g., x, x², x³).
Syntax:

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

Explanation:
- Adds polynomial terms to the dataset.
- Degree controls the highest power.
- Interaction terms between features are included.

2. LinearRegression

What is it? Performs ordinary least squares on the expanded polynomial feature set.
Syntax:

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_poly, y)

Explanation:
- Treats transformed input as standard linear regression.
- Requires separate prediction using transformed data.

3. Pipeline Integration

What is it? Combines transformation and modeling into a single object.
Syntax:

from sklearn.pipeline import Pipeline
pipe = Pipeline([
  ('poly', PolynomialFeatures(degree=3)),
  ('model', LinearRegression())
])
pipe.fit(X, y)

Explanation:
- Cleaner code for workflow.
- Easy to evaluate and reuse.

4. Scaling (Optional)

What is it? Standardizes features to avoid dominance by larger magnitude terms.
Syntax:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Explanation:
- Important for high-degree models.
- Reduces numerical instability.

5. Plotting

What is it? Visual representation of model’s curve fitting.
Syntax:

import matplotlib.pyplot as plt
plt.scatter(X, y)
plt.plot(X, pipe.predict(X))
plt.show()

Explanation:
- Helps assess under/overfitting visually.

Real-Life Project: Housing Price Curve Fitting

Project Name

Polynomial Regression on House Size vs. Price

Project Overview

This project predicts house prices using a nonlinear relationship between square footage and price. It shows how a polynomial regression model can fit curves better than a standard linear model.

Project Goal

Build a pipeline with polynomial features
Fit and evaluate nonlinear model
Visualize predicted vs. actual prices

Code for This Project

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline

# Load dataset
data = pd.read_csv('house_data.csv')
X = data[['SquareFeet']].values
y = data['Price'].values

# Pipeline
model = Pipeline([
  ('poly', PolynomialFeatures(degree=2)),
  ('linreg', LinearRegression())
])

model.fit(X, y)

# Plot
plt.scatter(X, y)
plt.plot(X, model.predict(X), color='red')
plt.title('Polynomial Regression Fit')
plt.xlabel('Square Feet')
plt.ylabel('Price')
plt.show()

Expected Output

Curved regression line fitting the scatter plot
Better fit than standard linear model

Common Mistakes to Avoid

❌ Using too high degree → overfitting
❌ Not scaling features → poor performance on higher degrees
❌ Forgetting to use fit_transform() → pipeline breaks
❌ Comparing results without visualization

Ridge and Lasso Regression using Scikit-learn

Posted on June 4, 2025 by Lab

Regularized regression techniques like Ridge and Lasso are powerful tools for handling multicollinearity and preventing overfitting in linear models. They add penalty terms to the loss function, shrinking coefficients and improving generalization. Scikit-learn provides both via Ridge and Lasso classes.

Key Characteristics of Ridge and Lasso Regression

Regularization: Penalizes large coefficients to reduce overfitting.
Ridge (L2): Shrinks coefficients but keeps all variables.
Lasso (L1): Shrinks some coefficients to zero, enabling feature selection.
Works Like Linear Regression: Similar API with added regularization strength.
Useful with Multicollinearity: Helps when predictors are correlated.

Basic Rules for Ridge and Lasso Regression

Normalize features before applying (use StandardScaler).
Use alpha to control the strength of the penalty.
Ridge is better for multicollinearity, Lasso for sparse feature selection.
Tune alpha using cross-validation (e.g., RidgeCV, LassoCV).
Evaluate performance using RMSE and R² metrics.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Ridge Model	`Ridge(alpha=1.0)`	Adds L2 regularization
2	Lasso Model	`Lasso(alpha=0.1)`	Adds L1 regularization
3	Scaling	`StandardScaler().fit_transform(X)`	Standardizes features
4	Cross-Validation	`RidgeCV(alphas=[0.1, 1.0, 10.0])`	Finds optimal alpha
5	Coefficients View	`model.coef_`	Displays model coefficients

Syntax Explanation

1. Ridge Regression

What is it? A linear regression model with L2 regularization that penalizes large coefficients to prevent overfitting.
Syntax:

from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

Explanation:
- Adds a penalty equal to the square of the magnitude of coefficients.
- Helps in cases of multicollinearity.
- Does not eliminate any features.

2. Lasso Regression

What is it? A linear regression model with L1 regularization that can shrink some coefficients to zero for feature selection.
Syntax:

from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

Explanation:
- Adds a penalty equal to the absolute value of coefficients.
- Encourages sparse models (zero coefficients for less important features).
- Good for datasets with many features.

3. Feature Scaling

What is it? Standardizing features so they contribute equally to the model.
Syntax:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Explanation:
- Required before applying Ridge or Lasso.
- Ensures penalty is applied fairly across features.

4. Cross-Validation for Hyperparameter Tuning

What is it? A method to find the best alpha (regularization strength) using multiple train-test splits.
Syntax:

from sklearn.linear_model import RidgeCV
model_cv = RidgeCV(alphas=[0.1, 1.0, 10.0], cv=5)
model_cv.fit(X_train, y_train)

Explanation:
- alphas is a list of candidate values.
- Automatically selects the best performing one.

5. Evaluating the Model

What is it? Assessing model performance using prediction metrics.
Syntax:

from sklearn.metrics import mean_squared_error, r2_score
pred = model.predict(X_test)
rmse = mean_squared_error(y_test, pred, squared=False)
r2 = r2_score(y_test, pred)

Explanation:
- RMSE shows average prediction error.
- R² reveals how well the features explain target variance.

Real-Life Project: Predicting Car Prices

Project Name

Car Price Prediction with Ridge and Lasso

Project Overview

This project uses Ridge and Lasso regression to predict used car prices based on engine size, age, mileage, and other numeric features. It compares the effect of regularization on model performance.

Project Goal

Compare Ridge and Lasso for price prediction
Visualize which features get eliminated by Lasso
Evaluate models using RMSE and R²

Code for This Project

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score

# Load data
data = pd.read_csv('used_cars.csv')
X = data.drop('Price', axis=1)
y = data['Price']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(X_train_scaled, y_train)
pred_ridge = ridge.predict(X_test_scaled)

# Lasso
lasso = Lasso(alpha=0.1)
lasso.fit(X_train_scaled, y_train)
pred_lasso = lasso.predict(X_test_scaled)

# Evaluate
print("Ridge RMSE:", mean_squared_error(y_test, pred_ridge, squared=False))
print("Lasso RMSE:", mean_squared_error(y_test, pred_lasso, squared=False))

Expected Output

RMSE and R² values for both models
Lasso may drop features → zero coefficients
Ridge keeps all features but reduces overfitting

Common Mistakes to Avoid

❌ Using unscaled data → skews regularization
❌ Too high alpha → underfitting
❌ Ignoring feature selection in Lasso
❌ Comparing Lasso to OLS without scaling

Linear Regression with Scikit-learn

Posted on June 4, 2025 by Lab

Linear regression is one of the simplest and most interpretable algorithms in machine learning. It models the relationship between one or more input variables and a continuous output variable by fitting a straight line (in simple regression) or hyperplane (in multiple regression). Scikit-learn offers a straightforward implementation of linear regression through the LinearRegression class.

Key Characteristics of Linear Regression

Continuous Target Variable: Predicts real-valued outputs.
Assumes Linearity: Relationship between features and target is linear.
Interpretability: Coefficients explain feature impact.
No Need for Scaling: Works without feature scaling (unlike regularized versions).
Fast and Efficient: Suitable for large datasets with linear patterns.

Basic Rules for Using Linear Regression

Ensure features are numerically encoded.
Check for linear relationship between inputs and output.
Remove multicollinearity among features if possible.
Split dataset into training and testing sets.
Evaluate model with RMSE or R² score.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Import Model	`from sklearn.linear_model import LinearRegression`	Loads regression model class
2	Create Model	`model = LinearRegression()`	Initializes model
3	Train Model	`model.fit(X_train, y_train)`	Trains model on training data
4	Make Predictions	`y_pred = model.predict(X_test)`	Predicts target values
5	Evaluate RMSE	`mean_squared_error(y_test, y_pred, squared=False)`	Root Mean Squared Error
6	Evaluate R² Score	`r2_score(y_test, y_pred)`	Measures goodness of fit

Syntax Explanation

1. Import and Initialize Model

What is it? Loads and prepares the regression model.
Syntax:

from sklearn.linear_model import LinearRegression
model = LinearRegression()

Explanation:
- Prepares a fresh instance of linear regression.
- Default fits intercept and does not normalize features.

2. Train the Model

What is it? Fits the linear regression model to training data.
Syntax:

model.fit(X_train, y_train)

Explanation:
- Learns the weights (coefficients) of input features.
- Fits a line or hyperplane that minimizes squared error.

3. Make Predictions

What is it? Predicts target values using the trained model.
Syntax:

y_pred = model.predict(X_test)

Explanation:
- Applies learned coefficients to unseen data.
- Produces continuous-valued outputs.

4. Evaluate with RMSE

What is it? Measures average prediction error in the same unit as the target.
Syntax:

from sklearn.metrics import mean_squared_error
rmse = mean_squared_error(y_test, y_pred, squared=False)

Explanation:
- Common metric for regression tasks.
- Lower RMSE = better model.

5. Evaluate with R² Score

What is it? Represents how much variance in the target is explained by features.
Syntax:

from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)

Explanation:
- Ranges from 0 (poor fit) to 1 (perfect fit).
- Indicates the strength of the linear relationship.

Real-Life Project: Predicting House Prices

Project Name

House Price Prediction Using Linear Regression

Project Overview

This project demonstrates the use of linear regression to predict house prices based on features such as square footage, number of bedrooms, and location index.

Project Goal

Build and evaluate a linear regression model
Predict continuous house prices
Interpret coefficients to understand feature impact

Code for This Project

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load dataset
data = pd.read_csv('house_prices.csv')
X = data[['SqFt', 'Bedrooms', 'LocationIndex']]
y = data['Price']

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print("RMSE:", rmse)
print("R² Score:", r2)

Expected Output

RMSE value indicating prediction error
R² score showing how well features explain price
Trained model ready for deployment or analysis

Common Mistakes to Avoid

❌ Using categorical variables without encoding
❌ Failing to check for multicollinearity
❌ Ignoring assumptions of linearity and homoscedasticity
❌ Using RMSE alone—consider visualizing residuals

Introduction to Supervised Learning in Scikit-learn

Posted on June 4, 2025 by Lab

Supervised learning is one of the most common machine learning paradigms, where the algorithm learns a mapping between input features and known output labels. Scikit-learn provides a rich set of tools for building and evaluating supervised learning models for both classification and regression tasks.

Key Characteristics of Supervised Learning

Labeled Training Data: Requires input-output pairs for training.
Two Main Types: Classification (categorical target) and Regression (continuous target).
Model Evaluation: Uses metrics like accuracy, precision, RMSE, etc.
Generalization: Learns patterns to make predictions on unseen data.
Scikit-learn Friendly: Offers estimators, pipelines, and evaluation tools.

Basic Rules for Supervised Learning in Scikit-learn

Split data into train and test sets using train_test_split().
Select appropriate model type (LogisticRegression, RandomForestClassifier, etc.).
Fit the model using model.fit(X_train, y_train).
Predict using model.predict(X_test).
Evaluate with relevant metrics using sklearn.metrics.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Train-Test Split	`train_test_split(X, y)`	Splits data for training/testing
2	Model Training	`model.fit(X_train, y_train)`	Trains the supervised model
3	Make Predictions	`model.predict(X_test)`	Predicts outputs from test input
4	Accuracy Score	`accuracy_score(y_test, y_pred)`	Measures performance (classification)
5	RMSE Score	`mean_squared_error(y_test, y_pred, squared=False)`	Measures regression error

Syntax Explanation

1. Train-Test Split

What is it? Separates your dataset into training and testing sets.
Syntax:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Explanation:
- Prevents overfitting by evaluating on unseen data.
- test_size=0.2 means 20% used for testing.

2. Model Training

What is it? Fits the model on training data.
Syntax:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

Explanation:
- The model learns patterns from X_train to predict y_train.
- Applies optimization based on selected algorithm.

3. Make Predictions

What is it? Uses the trained model to make predictions.
Syntax:

y_pred = model.predict(X_test)

Explanation:
- Applies learned rules to test inputs.
- Used to evaluate accuracy, error, or other performance metrics.

4. Accuracy Score (for Classification)

What is it? Measures the percentage of correct predictions.
Syntax:

from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

Explanation:
- Works for classification problems.
- 1.0 = perfect score, 0.0 = no correct predictions.

5. RMSE Score (for Regression)

What is it? Measures the average error in predictions.
Syntax:

from sklearn.metrics import mean_squared_error
rmse = mean_squared_error(y_test, y_pred, squared=False)

Explanation:
- Evaluates how far predictions are from true values.
- Lower RMSE indicates better performance.

Real-Life Project: Predicting Student Exam Pass/Fail

Project Name

Binary Classification for Exam Outcome Prediction

Project Overview

This project aims to predict whether a student will pass or fail an exam based on study hours and past performance using supervised learning.

Project Goal

Train a logistic regression classifier
Predict outcomes on new student records
Evaluate model accuracy

Code for This Project

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data
data = pd.read_csv('student_scores.csv')
X = data[['StudyHours', 'PastScore']]
y = data['Pass']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))

Expected Output

Trained model using student study features
Predictions for pass/fail labels
Accuracy score between 0 and 1

Common Mistakes to Avoid

❌ Not splitting data properly
❌ Using regression for categorical outputs
❌ Failing to evaluate model on test data
❌ Skipping feature scaling (if needed by model type)

Splitting Data into Train and Test Sets using Scikit-learn

Posted on June 4, 2025 by Lab

Train-test splitting is a fundamental concept in machine learning. It ensures that models are trained on one portion of the data and evaluated on another, promoting generalization and preventing overfitting. Scikit-learn provides a simple and reliable utility for splitting datasets.

Key Characteristics of Train-Test Splitting

Ensures Generalization: Evaluates model performance on unseen data.
Randomization Support: Randomizes the dataset before splitting.
Custom Split Ratios: Allows flexible train/test proportions.
Stratification: Maintains class balance during classification splits.
Reproducibility: Controlled with random seed (random_state).

Basic Rules for Train-Test Splits

Always split before preprocessing or model training.
Use train_test_split() from sklearn.model_selection.
Stratify on target variable when dealing with classification problems.
Avoid data leakage by ensuring test data is untouched during training.
Use a fixed random_state to ensure reproducibility.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Import Function	`from sklearn.model_selection import train_test_split`	Imports splitter from Scikit-learn
2	Basic Split	`X_train, X_test, y_train, y_test = train_test_split(X, y)`	Splits data into train/test
3	Custom Ratio	`train_test_split(X, y, test_size=0.3)`	70/30 split example
4	Set Seed	`train_test_split(X, y, random_state=42)`	Ensures reproducible results
5	Stratified Split	`train_test_split(X, y, stratify=y)`	Maintains label proportions

Syntax Explanation

1. Basic Train-Test Split

What is it? Separates features and target into training and testing groups.
Syntax:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

Explanation:
- Default split is 75% training and 25% testing.
- Random shuffling is performed before splitting.
- Keeps feature (X) and target (y) aligned.

2. Custom Split Ratio

What is it? Allows control over the percentage allocated to the test set.
Syntax:

train_test_split(X, y, test_size=0.2)

Explanation:
- 80% of data for training and 20% for testing.
- Accepts float (0.2 = 20%) or int (e.g., 100 samples).
- Ensure test size is not too small for model evaluation.

3. Stratified Splitting

What is it? Maintains label balance between train and test sets.
Syntax:

train_test_split(X, y, stratify=y)

Explanation:
- Especially useful for imbalanced datasets.
- Ensures proportion of each class is consistent.
- Crucial for fair performance evaluation.

4. Reproducibility with Random Seed

What is it? Ensures same random split every run.
Syntax:

train_test_split(X, y, random_state=42)

Explanation:
- Random shuffling can change results.
- Setting random_state makes results reproducible.
- Use same seed across experiments for consistency.

Real-Life Project: Splitting Heart Disease Dataset

Project Name

Train-Test Split for Predicting Heart Disease

Project Overview

The dataset includes various health metrics and a binary label indicating presence of heart disease. Proper train-test splitting will allow unbiased model evaluation.

Project Goal

Split data into train/test sets
Maintain label balance using stratification
Prepare data for preprocessing and modeling

Code for This Project

import pandas as pd
from sklearn.model_selection import train_test_split

# Load dataset
data = pd.read_csv('heart_disease.csv')
X = data.drop('target', axis=1)
y = data['target']

# Split with stratification and seed
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

Expected Output

80% of data in X_train, y_train
20% in X_test, y_test
Class distribution preserved
Reproducible split for modeling workflows

Common Mistakes to Avoid

❌ Fitting preprocessing before splitting → causes data leakage
❌ Ignoring class imbalance → skews evaluation metrics
❌ Forgetting random_state → inconsistent results
❌ Confusing X and y order → misaligned splits

Mastering Encoding Categorical Variables in Scikit-learn

Posted on June 4, 2025 by Lab

Many real-world datasets include categorical features—such as city names, gender, or product types—that machine learning models cannot process directly. Encoding transforms these textual or symbolic values into numerical format suitable for model training. Scikit-learn offers multiple encoding strategies tailored for different use cases.

Key Characteristics of Categorical Encoding

Label-Free Representations: Convert categories into integers or binary vectors.
Model-Friendly: Makes categorical data usable for statistical or ML models.
Multiple Strategies: Supports one-hot, ordinal, and frequency encoding.
Robustness: Can handle unknown categories and missing values.
Pipeline Compatible: Easily integrated into automated ML workflows.

Basic Rules for Encoding Categorical Variables

Use OneHotEncoder for nominal (unordered) categories.
Use OrdinalEncoder for ordinal (ordered) categories.
Handle unknown categories with handle_unknown='ignore'.
Always encode after imputing missing values.
Use ColumnTransformer to encode only relevant columns.

Syntax Table

SL NO	Function	Syntax Example	Description
1	One-Hot Encoding	`OneHotEncoder(sparse=False)`	Creates binary columns for each category
2	Ordinal Encoding	`OrdinalEncoder()`	Assigns ordered integers to categories
3	Column Encoding	`ColumnTransformer([...])`	Encode selected columns in pipeline
4	Handling Unknowns	`OneHotEncoder(handle_unknown='ignore')`	Avoids errors on unseen categories
5	Fit-Transform Logic	`fit_transform()` on train, `transform()` on test	Prevents data leakage
6	Full Pipeline	`Pipeline([...])`	Automates encoding with other preprocessing

Syntax Explanation

1. OneHotEncoder

What is it? Converts categories into multiple binary columns.
Syntax:

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=False)
X_encoded = encoder.fit_transform(X)

Explanation:
- Suitable for nominal data like Color, City, Brand.
- Each category gets its own binary column.
- sparse=False returns NumPy array instead of sparse matrix.

2. OrdinalEncoder

What is it? Assigns integer labels to each category based on order.
Syntax:

from sklearn.preprocessing import OrdinalEncoder
ord_enc = OrdinalEncoder()
X_ord = ord_enc.fit_transform(X)

Explanation:
- Best for ordinal features like Education Level, Size, Rank.
- Preserves order but not distance between values.
- Use caution with tree-based models—integer order might be misinterpreted.

3. ColumnTransformer

What is it? Applies encoders to specific column sets.
Syntax:

from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([
  ('city_ohe', OneHotEncoder(), ['City']),
  ('rank_ord', OrdinalEncoder(), ['Rank'])
])
X_transformed = ct.fit_transform(X)

Explanation:
- Separates encoding logic by column type.
- Maintains clean, modular pipeline structure.
- Useful when combining with numeric transformations.

4. Handling Unknown Categories

What is it? Avoids errors during prediction on unseen categories.
Syntax:

OneHotEncoder(handle_unknown='ignore')

Explanation:
- Ensures robustness across training and inference.
- Skips encoding for new labels instead of raising errors.
- Particularly useful in production pipelines.

5. Full Encoding Pipeline

What is it? Combines encoders with transformers and models.
Syntax:

from sklearn.pipeline import Pipeline
pipe = Pipeline([
  ('encoder', OneHotEncoder(handle_unknown='ignore'))
])
X_encoded = pipe.fit_transform(X)

Explanation:
- Streamlines entire preprocessing.
- Avoids duplication of logic across train/test.
- Works well with model training or cross-validation.

Real-Life Project: Encoding Loan Application Data

Project Name

Encoding Categorical Features for Loan Approval Prediction

Project Overview

Loan application datasets include several categorical features such as marital status, employment type, and loan purpose. This project demonstrates proper encoding using OneHotEncoder and OrdinalEncoder to prepare such data for modeling.

Project Goal

Encode categorical fields using appropriate methods
Maintain clean format for downstream ML models
Prevent model errors due to unseen categories

Code for This Project

import pandas as pd
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Sample dataset
df = pd.read_csv('loan_applications.csv')

# Define categorical features
nominal = ['Gender', 'Married', 'Loan_Purpose']
ordinal = ['Education_Level']

# Define transformers
ohe = OneHotEncoder(handle_unknown='ignore', sparse=False)
ord = OrdinalEncoder()

# Column transformer
preprocessor = ColumnTransformer([
  ('nominal_enc', ohe, nominal),
  ('ordinal_enc', ord, ordinal)
])

X_encoded = preprocessor.fit_transform(df)

Expected Output

All categorical variables numerically encoded.
Robust handling of unknown labels.
Ready for use in classification or regression models.

Common Mistakes to Avoid

❌ Using label encoding for nominal features
❌ Not setting handle_unknown='ignore'
❌ Forgetting to exclude target variable from encoding
❌ Mixing fit/transform logic between train/test data

Mastering Handling Missing Data with Scikit-learn

Posted on June 4, 2025 by Lab

Missing data is a common issue in real-world datasets. Whether due to user omission, system error, or data corruption, missing values can affect model performance and bias predictions. Scikit-learn provides robust strategies to detect and handle missing values efficiently.

Key Characteristics of Missing Data Handling

Flexible Imputation Strategies: Mean, median, mode, or custom value.
Column and Row-wise Detection: Identify missing values per column or row.
Pipeline Integration: Handle missing values as part of preprocessing.
Support for Numeric and Categorical Data: Choose appropriate imputation per data type.
Constant Value Fill: Useful for flags, categories, or default fill-in.

Basic Rules for Handling Missing Data

Always check for missing values before preprocessing or modeling.
Use visualization (heatmaps, missingno) for exploration.
Fit imputation on training data, then apply to test/validation sets.
Choose imputation strategies based on column types and distributions.
Combine imputation with scaling and encoding in a pipeline.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Detect Missing Values	`df.isnull().sum()`	Returns missing count per column
2	Drop Rows with NaNs	`df.dropna()`	Removes rows that contain NaN
3	Simple Imputer (mean)	`SimpleImputer(strategy='mean')`	Imputes numeric features with mean
4	Simple Imputer (most_frequent)	`SimpleImputer(strategy='most_frequent')`	Categorical mode fill
5	Constant Imputer	`SimpleImputer(strategy='constant', fill_value=0)`	Fill with custom value
6	Pipeline Integration	`Pipeline([...])`	Automates imputation within workflows

Syntax Explanation

1. Detect Missing Values

What is it? Identifies how many values are missing per column.
Syntax:

df.isnull().sum()

Explanation:
- Use df.isnull() to get a Boolean mask of missing cells.
- .sum() counts True (i.e., NaN) values column-wise.
- First step in any missing data strategy.

2. Drop Rows with NaNs

What is it? Removes any rows that contain missing values.
Syntax:

df_cleaned = df.dropna()

Explanation:
- Useful when missing data is minimal.
- May reduce dataset size significantly.
- Use with caution to avoid data loss.

3. SimpleImputer (Mean)

What is it? Replaces missing values with the mean of the column.
Syntax:

from sklearn.impute import SimpleImputer
imp = SimpleImputer(strategy='mean')
X_imputed = imp.fit_transform(X)

Explanation:
- Suitable for continuous numeric data.
- fit() learns column means from training data.
- transform() applies imputation to missing values.

4. SimpleImputer (Most Frequent)

What is it? Fills missing values with the most frequent value in a column.
Syntax:

SimpleImputer(strategy='most_frequent')

Explanation:
- Ideal for categorical or ordinal features.
- Prevents rare categories from being overused.
- Safer than constant fill in unknown domains.

5. SimpleImputer (Constant Value)

What is it? Fills missing values with a fixed specified value.
Syntax:

SimpleImputer(strategy='constant', fill_value='Unknown')

Explanation:
- Use for categorical placeholders or zero-fill.
- Makes missingness explicit for some models.
- Fill value must be type-compatible with column.

6. Pipeline Integration

What is it? Wraps imputation logic into a reproducible pipeline.
Syntax:

from sklearn.pipeline import Pipeline
pipe = Pipeline([
  ('imputer', SimpleImputer(strategy='median'))
])
X_clean = pipe.fit_transform(X)

Explanation:
- Ensures same imputation is applied consistently.
- Can be combined with scalers, encoders, and estimators.
- Ideal for production and evaluation workflows.

Real-Life Project: Imputing Customer Demographics

Project Name

Cleaning and Imputing Missing Values in Customer Dataset

Project Overview

We will clean a dataset containing customer profiles, where Age, Income, and City columns contain missing values. Using different strategies per column type, we prepare the dataset for segmentation and modeling.

Project Goal

Impute numerical values (Age, Income) using mean/median.
Impute categorical fields (City) using most frequent or a placeholder.
Wrap transformation into a single pipeline.

Code for This Project

import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

# Sample dataset
customer_data = pd.read_csv('customer_data.csv')

num_cols = ['Age', 'Income']
cat_cols = ['City']

num_pipeline = Pipeline([
  ('imputer', SimpleImputer(strategy='mean'))
])

cat_pipeline = Pipeline([
  ('imputer', SimpleImputer(strategy='most_frequent'))
])

preprocessor = ColumnTransformer([
  ('num', num_pipeline, num_cols),
  ('cat', cat_pipeline, cat_cols)
])

X_cleaned = preprocessor.fit_transform(customer_data)

Expected Output

Clean matrix with no missing values.
Numeric fields filled with statistical values.
Categorical fields filled with top occurring value.
Ready for modeling or export.

Common Mistakes to Avoid

❌ Applying imputation after scaling/encoding
❌ Using test data during fit (data leakage)
❌ Dropping rows with high info value
❌ Using mean imputation for categorical columns

Mastering Feature Engineering Techniques in Scikit-learn

Posted on June 4, 2025 by Lab

Feature engineering is the process of transforming raw data into meaningful inputs that enhance model performance. It is one of the most critical steps in the machine learning workflow. Scikit-learn offers a variety of built-in tools and transformers that simplify and automate common feature engineering tasks.

Key Characteristics of Feature Engineering in Scikit-learn

Automation Ready: Easily integrate with pipelines for consistent transformation.
Custom Transformation: Create your own logic using FunctionTransformer or TransformerMixin.
Rich Toolkit: Includes polynomial features, interaction terms, binning, and more.
Compatibility: Works seamlessly with numeric, categorical, and datetime features.
Composable: Supports chaining and parallel processing through Pipeline and ColumnTransformer.

Basic Rules for Feature Engineering

Always explore your data visually before feature engineering.
Use domain knowledge to guide feature creation.
Avoid leakage by using only training data for fitting transformers.
Scale or encode features after feature creation.
Evaluate feature importance and drop irrelevant ones.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Polynomial Features	`PolynomialFeatures(degree=2)`	Adds polynomial and interaction terms
2	Binning (Discretization)	`KBinsDiscretizer(n_bins=3)`	Converts continuous data into discrete bins
3	Custom Transformation	`FunctionTransformer(func)`	Apply user-defined logic to data
4	Feature Selection	`SelectKBest(score_func, k=5)`	Selects top-k features based on scoring
5	Feature Union	`FeatureUnion([...])`	Combines multiple transformers into one
6	Column Transformer Integration	`ColumnTransformer([...])`	Applies different engineering steps by column

Syntax Explanation

1. PolynomialFeatures

What is it? Generates new features by taking polynomial combinations of existing features.
Syntax:

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

Explanation:
- Adds interaction terms and powers of features.
- Useful for linear models capturing non-linear patterns.
- Rapidly increases dimensionality—use with care.

2. KBinsDiscretizer

What is it? Discretizes continuous data into specified number of bins.
Syntax:

from sklearn.preprocessing import KBinsDiscretizer
binning = KBinsDiscretizer(n_bins=4, encode='ordinal')
X_binned = binning.fit_transform(X)

Explanation:
- Converts numeric values into intervals.
- Helps with models sensitive to non-linearity or ordinal relationships.
- strategy options include ‘uniform’, ‘quantile’, and ‘kmeans’.

3. FunctionTransformer

What is it? Applies any custom function to transform your data.
Syntax:

from sklearn.preprocessing import FunctionTransformer
import numpy as np
log_transform = FunctionTransformer(np.log1p)
X_transformed = log_transform.fit_transform(X)

Explanation:
- Simple wrapper around any callable function.
- Keeps compatibility with pipelines.
- Great for log transforms, scaling, or unit conversions.

4. SelectKBest

What is it? Selects top k features based on statistical test.
Syntax:

from sklearn.feature_selection import SelectKBest, f_classif
selector = SelectKBest(score_func=f_classif, k=5)
X_selected = selector.fit_transform(X, y)

Explanation:
- Filters out weakly related features.
- Common score functions: f_classif, chi2, mutual_info_classif.
- Improves model performance and reduces overfitting.

5. FeatureUnion

What is it? Combines outputs from multiple transformers.
Syntax:

from sklearn.pipeline import FeatureUnion
combined = FeatureUnion([
  ('poly', PolynomialFeatures(degree=2)),
  ('binned', KBinsDiscretizer(n_bins=3))
])
X_combined = combined.fit_transform(X)

Explanation:
- Useful for parallel feature engineering.
- All outputs are concatenated.
- Each transformer runs independently.

6. ColumnTransformer

What is it? Applies specific transformers to selected columns.
Syntax:

from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer([
  ('bin_age', KBinsDiscretizer(n_bins=3), ['Age']),
  ('poly_income', PolynomialFeatures(degree=2), ['Income'])
])
X_processed = transformer.fit_transform(data)

Explanation:
- Great for structured datasets.
- Allows fine-grained control over feature creation.
- Keeps numeric and categorical workflows separate.

Real-Life Project: Feature Engineering on Titanic Dataset

Project Name

Creating Predictive Features for Titanic Survival Prediction

Project Overview

This project uses the Titanic dataset to demonstrate feature engineering techniques. We create new features like ‘FamilySize’ and ‘IsAlone’, bin age and fare, and apply polynomial features to improve model accuracy.

Project Goal

Derive new features from existing columns
Bin continuous variables into discrete categories
Apply transformations to prepare dataset for modeling

Code for This Project

import pandas as pd
from sklearn.preprocessing import KBinsDiscretizer, FunctionTransformer, PolynomialFeatures
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Load dataset
data = pd.read_csv('titanic.csv')
data['FamilySize'] = data['SibSp'] + data['Parch'] + 1

data['IsAlone'] = (data['FamilySize'] == 1).astype(int)

num_cols = ['Age', 'Fare']
poly = PolynomialFeatures(degree=2, include_bias=False)
binner = KBinsDiscretizer(n_bins=3, encode='ordinal')

column_transform = ColumnTransformer([
  ('poly', poly, num_cols),
  ('bin', binner, num_cols)
])

X = column_transform.fit_transform(data)

Expected Output

New features: FamilySize, IsAlone
Polynomial features for Age, Fare
Binned versions of continuous columns

Common Mistakes to Avoid

❌ Applying polynomial features on already scaled data
❌ Using fit_transform() on test set
❌ Creating features that leak future info (like survival-related info)
❌ Not validating new features’ contribution to model accuracy

Mastering Feature Scaling and Normalization with Scikit-learn

Posted on June 4, 2025 by Lab

Feature scaling and normalization are essential preprocessing steps in machine learning, especially when models rely on distance-based calculations. Without proper scaling, features with larger ranges can dominate others, skewing model performance. Scikit-learn offers powerful tools for performing both normalization and standardization effectively.

Key Characteristics of Feature Scaling and Normalization

Standardization: Transforms features to have zero mean and unit variance.
Normalization: Scales feature values to a fixed range, typically [0, 1].
Model Compatibility: Improves performance for SVM, KNN, logistic regression, etc.
Column-wise Transformation: Applies scaling only to numeric columns.
Integration with Pipelines: Easily incorporated into machine learning pipelines.

Basic Rules for Scaling and Normalization

Always scale numeric features only.
Use StandardScaler for standardization and MinMaxScaler for normalization.
Fit scalers on training data, then apply (transform) to test data.
Combine with imputation if there are missing values.
Avoid scaling categorical variables unless encoded numerically.

Syntax Table

SL NO	Function	Syntax Example	Description
1	Standard Scaling	`StandardScaler()`	Zero mean, unit variance
2	Min-Max Scaling	`MinMaxScaler()`	Rescales to a 0–1 range
3	Robust Scaling	`RobustScaler()`	Scales using median and IQR
4	MaxAbs Scaling	`MaxAbsScaler()`	Scales by maximum absolute value
5	Column Transformer	`ColumnTransformer([...])`	Applies scaling only to selected columns
6	Integration with Pipeline	`Pipeline([...])`	Combines scaler with other preprocessing steps

Syntax Explanation

1. StandardScaler

What is it? Scales features by removing the mean and scaling to unit variance.
Syntax:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Explanation:
- Useful for SVM, logistic regression, PCA.
- Transforms each column to have mean = 0, std = 1.
- Sensitive to outliers—may not be ideal for skewed data.

2. MinMaxScaler

What is it? Scales features to a defined range (default [0, 1]).
Syntax:

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

Explanation:
- Maintains the shape of the original distribution.
- Suitable for neural networks, KNN.
- Affected by outliers—can squash non-outlier values.

3. RobustScaler

What is it? Scales features using median and interquartile range.
Syntax:

from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)

Explanation:
- Robust to outliers.
- Ideal when data has extreme values.
- Does not normalize distribution; just rescales.

4. MaxAbsScaler

What is it? Scales each feature by its maximum absolute value.
Syntax:

from sklearn.preprocessing import MaxAbsScaler
scaler = MaxAbsScaler()
X_scaled = scaler.fit_transform(X)

Explanation:
- Retains sparsity in sparse data.
- Values remain in [-1, 1] if centered around zero.
- Fast and simple—ideal for sparse input matrices.

5. ColumnTransformer

What is it? Applies scalers only to numeric columns in a structured dataset.
Syntax:

from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer([
  ('scale', StandardScaler(), numeric_cols)
])
X_transformed = transformer.fit_transform(X)

Explanation:
- Keeps other columns unchanged.
- Supports integration with pipelines and encoders.
- Cleaner code for mixed-type datasets.

6. Pipeline Integration

What is it? Wraps the scaler with other steps for reuse and automation.
Syntax:

from sklearn.pipeline import Pipeline
pipe = Pipeline([
  ('scale', StandardScaler())
])
X_ready = pipe.fit_transform(X)

Explanation:
- Chain together multiple preprocessing steps.
- Ensures consistent transformation in training/testing.
- Simplifies deployment and reproducibility.

Real-Life Project: Scaling Features in Housing Prices Dataset

Project Name

Preprocessing and Scaling Boston Housing Dataset

Project Overview

This project demonstrates how to apply different scaling methods on a real-world regression dataset. We will scale the numeric features of the Boston housing dataset and prepare it for linear regression modeling.

Project Goal

Load Boston housing dataset
Apply standard scaling and min-max normalization
Compare effect on regression model performance

Code for This Project

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load dataset
X, y = load_diabetes(return_X_y=True)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standard Scaling
scaler_std = StandardScaler()
X_train_std = scaler_std.fit_transform(X_train)
X_test_std = scaler_std.transform(X_test)

# Train and evaluate model
model_std = LinearRegression().fit(X_train_std, y_train)
y_pred_std = model_std.predict(X_test_std)
print("MSE with StandardScaler:", mean_squared_error(y_test, y_pred_std))

# Min-Max Scaling
scaler_minmax = MinMaxScaler()
X_train_minmax = scaler_minmax.fit_transform(X_train)
X_test_minmax = scaler_minmax.transform(X_test)

# Train and evaluate model
model_minmax = LinearRegression().fit(X_train_minmax, y_train)
y_pred_minmax = model_minmax.predict(X_test_minmax)
print("MSE with MinMaxScaler:", mean_squared_error(y_test, y_pred_minmax))

Expected Output

Two mean squared error (MSE) values showing the impact of each scaling method.
Scaled datasets ready for regression or other modeling.

Common Mistakes to Avoid

❌ Scaling test data before fitting the scaler on training data
❌ Forgetting to apply the same transformation to test data
❌ Applying scaling to categorical features without encoding
❌ Mixing scaling methods within a single dataset

Key Characteristics of Logistic Regression

Basic Rules for Logistic Regression

Syntax Table

Syntax Explanation

1. Import and Initialize Model

2. Fit the Model

3. Predict Class Labels

4. Predict Probabilities

5. Evaluate Accuracy

Real-Life Project: Spam Email Classification

Project Name

Project Overview

Project Goal

Code for This Project

Expected Output

Common Mistakes to Avoid

Further Reading Recommendation

Key Characteristics of Polynomial Regression

Basic Rules for Polynomial Regression

Syntax Table

Syntax Explanation

1. PolynomialFeatures

2. LinearRegression

3. Pipeline Integration

4. Scaling (Optional)

5. Plotting

Real-Life Project: Housing Price Curve Fitting

Project Name

Project Overview

Project Goal

Code for This Project

Expected Output

Common Mistakes to Avoid

Further Reading Recommendation

Key Characteristics of Ridge and Lasso Regression

Basic Rules for Ridge and Lasso Regression

Syntax Table

Syntax Explanation

1. Ridge Regression

2. Lasso Regression

3. Feature Scaling

4. Cross-Validation for Hyperparameter Tuning

5. Evaluating the Model

Real-Life Project: Predicting Car Prices

Project Name

Project Overview

Project Goal

Code for This Project

Expected Output

Common Mistakes to Avoid

Further Reading Recommendation

Key Characteristics of Linear Regression

Basic Rules for Using Linear Regression

Syntax Table

Syntax Explanation

1. Import and Initialize Model

2. Train the Model

3. Make Predictions

4. Evaluate with RMSE

5. Evaluate with R² Score

Real-Life Project: Predicting House Prices

Project Name

Project Overview

Project Goal

Code for This Project

Expected Output

Common Mistakes to Avoid

Further Reading Recommendation

Key Characteristics of Supervised Learning

Basic Rules for Supervised Learning in Scikit-learn

Syntax Table

Syntax Explanation

1. Train-Test Split

2. Model Training

3. Make Predictions

4. Accuracy Score (for Classification)

5. RMSE Score (for Regression)

Real-Life Project: Predicting Student Exam Pass/Fail

Project Name

Project Overview