The Wine dataset is a classic multiclass classification dataset available in Scikit-learn. It contains the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The goal is to classify the wine based on 13 features such as alcohol content, ash, flavanoids, and more.
Key Characteristics
- Multiclass classification problem (3 classes)
- Target: Wine class labels (0, 1, 2)
- Features: Alcohol, Malic acid, Ash, Flavanoids, etc.
- Clean and well-structured dataset
Basic Rules
- Standardize features before training
- Use accuracy and confusion matrix for evaluation
- Try different classifiers (Logistic Regression, KNN, SVM)
- Use
stratify=y
to maintain class proportions
Syntax Table
SL NO | Step | Syntax Example | Description |
---|---|---|---|
1 | Load dataset | load_wine(return_X_y=True) |
Loads wine features and class labels |
2 | Train/test split | train_test_split(X, y, stratify=y, test_size=0.3) |
Ensures balanced class split |
3 | Standard scaling | StandardScaler().fit_transform(X_train) |
Scales features |
4 | Train classifier | LogisticRegression().fit(X_train, y_train) |
Trains a classification model |
5 | Evaluate model | confusion_matrix(y_test, y_pred) |
Shows prediction correctness per class |
Syntax Explanation
1. Load Dataset
What is it?
Loads the Wine dataset from Scikit-learn.
Syntax:
from sklearn.datasets import load_wine
X, y = load_wine(return_X_y=True)
Explanation:
X
contains 13 chemical features of wine samplesy
contains the class labels (0, 1, 2)
2. Train/Test Split
What is it?
Divides the dataset into training and testing subsets.
Syntax:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3, random_state=42)
Explanation:
- Maintains class proportions in train and test sets
3. Standard Scaling
What is it?
Applies normalization to the input features.
Syntax:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Explanation:
- Prevents features with larger scales from dominating the model
4. Train Classifier
What is it?
Fits a logistic regression classifier on the wine data.
Syntax:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
Explanation:
- Learns decision boundaries for each wine class
- Logistic Regression supports multiclass classification
5. Evaluate Model
What is it?
Assesses the model performance with a confusion matrix.
Syntax:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))
Explanation:
- Shows how many instances were correctly or incorrectly classified
Real-Life Project: Wine Type Prediction
Project Name
Wine Quality Classifier
Project Overview
Classify wines into one of three types using their chemical properties.
Project Goal
Develop a model that accurately identifies the wine class based on input features.
Code for This Project
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
# Load data
X, y = load_wine(return_X_y=True)
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3, random_state=42)
# Scale
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Predict & Evaluate
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))
Expected Output
- Confusion matrix and accuracy score
- High classification accuracy (typically >95%)
Common Mistakes to Avoid
- ❌ Not scaling features before model training
- ❌ Ignoring class imbalance in split
- ❌ Using binary classifiers for multiclass problems