Real-World Dataset: Wine Classification in Scikit-learn

The Wine dataset is a classic multiclass classification dataset available in Scikit-learn. It contains the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The goal is to classify the wine based on 13 features such as alcohol content, ash, flavanoids, and more.

Key Characteristics

Multiclass classification problem (3 classes)
Target: Wine class labels (0, 1, 2)
Features: Alcohol, Malic acid, Ash, Flavanoids, etc.
Clean and well-structured dataset

Basic Rules

Standardize features before training
Use accuracy and confusion matrix for evaluation
Try different classifiers (Logistic Regression, KNN, SVM)
Use stratify=y to maintain class proportions

Syntax Table

SL NO	Step	Syntax Example	Description
1	Load dataset	`load_wine(return_X_y=True)`	Loads wine features and class labels
2	Train/test split	`train_test_split(X, y, stratify=y, test_size=0.3)`	Ensures balanced class split
3	Standard scaling	`StandardScaler().fit_transform(X_train)`	Scales features
4	Train classifier	`LogisticRegression().fit(X_train, y_train)`	Trains a classification model
5	Evaluate model	`confusion_matrix(y_test, y_pred)`	Shows prediction correctness per class

Syntax Explanation

1. Load Dataset

What is it?
Loads the Wine dataset from Scikit-learn.

Syntax:

from sklearn.datasets import load_wine
X, y = load_wine(return_X_y=True)

Explanation:

X contains 13 chemical features of wine samples
y contains the class labels (0, 1, 2)

2. Train/Test Split

What is it?
Divides the dataset into training and testing subsets.

Syntax:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3, random_state=42)

Explanation:

Maintains class proportions in train and test sets

3. Standard Scaling

What is it?
Applies normalization to the input features.

Syntax:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Explanation:

Prevents features with larger scales from dominating the model

4. Train Classifier

What is it?
Fits a logistic regression classifier on the wine data.

Syntax:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

Explanation:

Learns decision boundaries for each wine class
Logistic Regression supports multiclass classification

5. Evaluate Model

What is it?
Assesses the model performance with a confusion matrix.

Syntax:

from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))

Explanation:

Shows how many instances were correctly or incorrectly classified

Real-Life Project: Wine Type Prediction

Project Name

Wine Quality Classifier

Project Overview

Classify wines into one of three types using their chemical properties.

Project Goal

Develop a model that accurately identifies the wine class based on input features.

Code for This Project

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score

# Load data
X, y = load_wine(return_X_y=True)

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3, random_state=42)

# Scale
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict & Evaluate
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))

Expected Output

Confusion matrix and accuracy score
High classification accuracy (typically >95%)

Common Mistakes to Avoid

❌ Not scaling features before model training
❌ Ignoring class imbalance in split
❌ Using binary classifiers for multiclass problems

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon

Key Characteristics

Basic Rules

Syntax Table

Syntax Explanation

1. Load Dataset

2. Train/Test Split

3. Standard Scaling

4. Train Classifier

5. Evaluate Model

Real-Life Project: Wine Type Prediction

Project Name

Project Overview

Project Goal

Code for This Project

Expected Output

Common Mistakes to Avoid

Further Reading Recommendation

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

Login