Support Vector Machines (SVMs) are powerful and versatile classifiers that aim to find the optimal hyperplane separating different classes. SVMs are particularly effective in high-dimensional spaces and for datasets with a clear margin of separation. Scikit-learn provides SVC for classification tasks.
Key Characteristics of SVM
- Effective in High Dimensions: Works well even with thousands of features.
- Margin Maximization: Finds the widest margin between classes.
- Kernel Trick: Supports linear and non-linear classification using kernels.
- Robust to Overfitting: Especially when regularization is tuned.
- Binary Classifier: Can be extended to multi-class with
one-vs-reststrategy.
Basic Rules for Using SVM
- Use
StandardScalerto normalize features before training. - Select kernel type (
linear,rbf,poly) based on problem. - Tune
Candgammafor better performance. - For large datasets, use
LinearSVCfor speed. - Always evaluate with cross-validation.
Syntax Table
| SL NO | Function | Syntax Example | Description |
|---|---|---|---|
| 1 | Import SVM | from sklearn.svm import SVC |
Imports the SVM classifier |
| 2 | Instantiate Model | model = SVC(kernel='rbf') |
Initializes the SVM with RBF kernel |
| 3 | Fit Model | model.fit(X_train, y_train) |
Trains the model |
| 4 | Predict Labels | model.predict(X_test) |
Predicts class labels |
| 5 | Feature Scaling | StandardScaler().fit_transform(X) |
Standardizes features for SVM |
Syntax Explanation
1. Import and Instantiate
- What is it? Load the SVM classifier with specified kernel.
- Syntax:
from sklearn.svm import SVC
model = SVC(kernel='rbf', C=1.0, gamma='scale')
- Explanation:
kernel='rbf'enables non-linear decision boundaries.Ccontrols margin trade-off;gammadefines kernel width.
2. Fit the Model
- What is it? Train the classifier.
- Syntax:
model.fit(X_train, y_train)
- Explanation:
- Finds the optimal hyperplane using support vectors.
3. Predict Labels
- What is it? Predict class of unseen instances.
- Syntax:
y_pred = model.predict(X_test)
- Explanation:
- Uses the learned boundary to classify inputs.
4. Feature Scaling
- What is it? Normalize input features.
- Syntax:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
- Explanation:
- Improves SVM performance by centering and scaling features.
Real-Life Project: Spam Email Detection with SVM
Project Name
SVM-based Spam Classifier
Project Overview
Build a binary classifier using SVM to distinguish between spam and legitimate emails using TF-IDF features.
Project Goal
- Train an SVM model on textual email data
- Evaluate using precision, recall, and F1
- Apply feature scaling before model training
Code for This Project
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
# Load dataset
data = pd.read_csv('emails.csv')
X = data['text']
y = data['label'] # spam or ham
# Text vectorization
vectorizer = TfidfVectorizer()
X_vec = vectorizer.fit_transform(X)
# Scaling (optional for sparse matrices, but shown for completeness)
# scaler = StandardScaler(with_mean=False)
# X_scaled = scaler.fit_transform(X_vec)
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_vec, y, test_size=0.3, random_state=42)
# Train model
model = SVC(kernel='linear')
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Expected Output
- High accuracy for spam detection
- Detailed precision/recall/F1 report
- Linear kernel SVM trained on email features
Common Mistakes to Avoid
- ❌ Using unscaled features → reduces performance
- ❌ Not tuning hyperparameters (
C,gamma,kernel) - ❌ Using SVM on large datasets without approximation (slow training)
- ❌ Ignoring class imbalance in spam/ham datasets
Further Reading Recommendation
📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning
by Sarful Hassan
🔗 Available on Amazon
Also explore:
- 🔗 Scikit-learn SVM Docs: https://scikit-learn.org/stable/modules/svm.html
- 🔗 SVM Visualization Tools: https://github.com/glemaitre/svm-toy
- 🔗 Hyperparameter Tuning with GridSearchCV
