One-Class Support Vector Machine (One-Class SVM) is an unsupervised anomaly detection algorithm that learns a decision function to separate normal data points from outliers. It is particularly effective for problems where only normal data is available for training.
Key Characteristics
- Learns a boundary around normal data in feature space
- Based on SVM principles with a special loss formulation
- Assumes data is centered around the origin in feature space
- Works well with smaller and moderately sized datasets
Basic Rules
- Always scale data before fitting the model
- Suitable when only normal examples are present in training
- Use
nu
to control the proportion of outliers kernel
selection is crucial for performance (e.g., ‘rbf’)
Syntax Table
SL NO | Technique | Syntax Example | Description |
---|---|---|---|
1 | Initialize Model | OneClassSVM(kernel='rbf', nu=0.05) |
Creates SVM with RBF kernel and anomaly ratio |
2 | Fit Model | model.fit(X_train) |
Learns the boundary from normal samples |
3 | Predict | model.predict(X_test) |
Returns -1 for anomalies, 1 for inliers |
4 | Score Samples | model.decision_function(X_test) |
Computes distance from decision boundary |
5 | Use in Pipeline | Pipeline([...]) |
Wraps One-Class SVM with preprocessing |
Syntax Explanation
1. Initialize Model
What is it?
Creates a One-Class SVM model to detect anomalies using a kernel function.
Syntax:
from sklearn.svm import OneClassSVM
model = OneClassSVM(kernel='rbf', nu=0.05)
Explanation:
OneClassSVM()
initializes the model using support vector machine formulation adapted for unsupervised outlier detection.kernel='rbf'
means the model uses a radial basis function kernel for nonlinear separation.nu
is a regularization parameter: it defines an upper bound on the fraction of anomalies and a lower bound on the fraction of support vectors.- Higher
nu
values allow more points to be considered outliers. - Kernel options like
'linear'
,'sigmoid'
, and'poly'
can also be explored depending on the data shape. - This parameter influences the model’s sensitivity to outliers and the generalization ability.
2. Fit Model
What is it?
Trains the One-Class SVM on a dataset containing only inliers.
Syntax:
model.fit(X_train)
Explanation:
fit()
builds the SVM model that defines the decision function separating normal from anomalous instances.- The method assumes that
X_train
consists only of normal observations. - Internally, the model tries to find a hypersphere or hyperplane that contains most of the data.
- Proper scaling is crucial as SVMs are sensitive to feature magnitudes.
- Always preprocess using
StandardScaler
orMinMaxScaler
before fitting. - Training only on normal data makes the model learn the support boundary of normal class distribution.
3. Predict
What is it?
Uses the trained model to classify new data points as either normal or anomalous.
Syntax:
predictions = model.predict(X_test)
Explanation:
- Predicts each sample in
X_test
as either1
(normal) or-1
(anomaly). - Output can be used to trigger alerts, logs, or further inspection.
- You can convert results to a binary fraud format using a simple list comprehension.
- Helps in integrating with anomaly filtering pipelines or dashboards.
- Effective in real-time monitoring systems.
4. Score Samples
What is it?
Measures how far each test instance is from the learned boundary.
Syntax:
scores = model.decision_function(X_test)
Explanation:
- This function returns real-valued scores: the more negative, the more likely a point is anomalous.
- Use scores to set a threshold rather than relying on
predict()
for strict binary labels. - This allows fine-tuning for specific recall or precision targets in deployment.
- Ideal for visualization (e.g., histograms of anomaly scores).
- Helps build custom logic for business-critical anomaly definitions.
5. Use in Pipeline
What is it?
Embeds One-Class SVM into a pipeline along with preprocessing steps like scaling.
Syntax:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([
('scale', StandardScaler()),
('svm', OneClassSVM(kernel='rbf', nu=0.05))
])
pipeline.fit(X_train)
Explanation:
- Helps standardize the modeling workflow and reduce error risk.
- Ensures the same scaler is applied during both training and inference.
- Use in cross-validation or
GridSearchCV
to optimize parameters. - Simplifies deployment and automation for production environments.
- Can integrate with more steps like PCA, feature selection, etc.
Real-Life Project: Credit Card Fraud Detection
Project Overview
Use One-Class SVM to identify fraudulent transactions in a credit card dataset.
Code Example
import pandas as pd
from sklearn.svm import OneClassSVM
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
# Load dataset
data = pd.read_csv('credit_card.csv')
X = data.drop(columns=['Class'])
y = data['Class'] # 0 = normal, 1 = fraud
# Use only normal transactions for training
X_train = X[y == 0]
X_test = X
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train model
model = OneClassSVM(kernel='rbf', nu=0.05)
model.fit(X_train_scaled)
# Predict
y_pred = model.predict(X_test_scaled)
y_pred = [1 if i == -1 else 0 for i in y_pred] # Convert -1 (anomaly) to 1 (fraud)
print(classification_report(y, y_pred))
Expected Output
- Improved detection of fraud class (1) using unsupervised method
- Precision/recall varies based on
nu
and dataset size
Common Mistakes to Avoid
- ❌ Forgetting to scale input data
- ❌ Misunderstanding
nu
as contamination rate (it’s an upper bound) - ❌ Using in supervised settings with labeled data (better to use classifiers there)
Further Reading Recommendation
- Scikit-learn One-Class SVM Documentation
- Original One-Class SVM Paper (Schölkopf, 2001)
- Unsupervised Anomaly Detection Tutorial (Blog)