Classification
Classification is a Supervised Learning algorithm where the goal is to predict the category or class labelof given data points. It is used when the output variable is categorical .
How Classification Works?
1. Training Phase:
A model learns the relationship b/w the input features and their corresponding class labels from a labeled dataset.
2. Testing Phase:
The trained model is used to classify new, unseen data points into one of the predefined classes.
Types of Classification:
Binary Classification
Multi class classification
Multi label classification
Examples of Classification Tasks:
Email filtering
Medical diagnosis
Image recognition
sentiment Analysis
Classification Algorithms
Logistic Regression
Decision Trees
Support Vector Machines
K-Nearest Neighbours
Navie bayes
Random Forest
Evalution Metrics for Classification
Accuracy - Percentage of correctly labeled classes
Precision - How many +ve predictions are correct
Recall
F1 Score
ROC-AUC
Workflow of Classification
Data Collection: Gather labeled data.Data Preprocessing: Clean, normalize, and split the data into training and testing sets.
Feature Selection: Identify the most relevant features.
Model Training: Train a classification model using a suitable algorithm.
Model Evaluation: Test the model on unseen data using evaluation metrics.
Prediction: Use the model for classifying new data.
Decision Tree Algorithm
A Decision Tree is a supervised machine learning algorithm used for classification and regression tasks. It models decisions and their possible consequences as a tree-like structure of nodes, where:
- Internal Nodes: Represent a decision based on a feature.
- Branches: Indicate the outcome of a decision.
- Leaf Nodes: Represent the final outcome (class label or prediction).
The tree splits the data into subsets based on the feature that provides the most information gain or least impurity at each step.
Concepts in Decision Trees
Root Node:
The starting point of the tree, representing the entire dataset.Splitting:
Dividing data at a node into subsets based on a condition (e.g.,feature > value
).Leaf Node:
A terminal node that provides the final prediction.Pruning:
Reducing the size of the tree to avoid overfitting.Impurity Measures (used for splitting):
- Gini Impurity: Measures the likelihood of incorrect classification if a random sample is classified based on the distribution.
- Entropy: Measures the randomness or impurity of a dataset.
- Information Gain: The reduction in entropy or impurity achieved by a split.
Advantages of Decision Trees
- Simple to Understand: Mimics human decision-making.
- Interpretable: Provides clear visualization of the decision process.
- Non-Parametric: No assumptions about data distribution.
- Handles Non-linear Relationships: Can model complex patterns.
Disadvantages of Decision Trees
- Overfitting: Trees can grow too large and fit the training data perfectly, leading to poor generalization.
- Bias to Dominant Features: Sensitive to unbalanced datasets.
- Instability: Small changes in data can result in significantly different trees.
How Decision Trees Work
- Start at the Root Node: Evaluate all features and split the data based on the feature that provides the highest information gain or lowest Gini impurity.
- Recursive Splitting: Continue splitting each subset at child nodes until:
- All data points in a node belong to the same class.
- A stopping criterion (e.g., maximum depth, minimum samples) is met.
- Prediction: For a new data point, follow the path from the root node to a leaf node to determine the predicted class.
Python Code
Bayes theorem
Bayes' Theorem is a fundamental concept in probability theory and statistics that describes the probability of an event, based on prior knowledge of conditions that might be related to the event. In machine learning, it is widely used for classification tasks, particularly in Naive Bayes classifiers.
Bayes' Theorem Formula
Bayes' Theorem is expressed mathematically as:
Where:
- is the posterior probability: The probability of event occurring given that has occurred.
- is the likelihood: The probability of event occurring given that has occurred.
- is the prior probability: The probability of event occurring before observing event .
- is the evidence or normalizing constant: The total probability of event occurring (the denominator ensures the total probability sums to 1).
Explanation of the Components
Prior Probability (): This represents what is known about before any new data is observed. It’s the initial belief or assumption about a class or event.
Likelihood (): This is the likelihood of observing the evidence given that is true. It quantifies how well the evidence supports the hypothesis.
Posterior Probability (): This is the probability of occurring after considering the evidence . It's the updated belief after observing the data.
Evidence (): This is the total probability of observing across all possible outcomes. It is used to normalize the result to ensure the probabilities sum to 1.
Naive Bayes Classifier in Machine Learning
In machine learning, the Naive Bayes classifier is based on Bayes' Theorem, with the assumption that the features used for prediction are conditionally independent given the class label. This simplifies the calculation, hence the term "naive."
Steps for Naive Bayes Classification:
Compute the Prior Probability:
The probability of each class occurring in the dataset.Compute the Likelihood:
Calculate the likelihood for each feature given the class label. For continuous features, this often assumes a Gaussian distribution.Compute the Posterior Probability:
For each class , compute the posterior probability of a new data point :Classify the Data Point:
Choose the class with the highest posterior probability as the predicted class.
Handling Class Imbalance in Support Vector Machines (SVM)
Class imbalance occurs when the number of samples in one class significantly exceeds those in another class. This can negatively impact the performance of Support Vector Machines (SVM), as the decision boundary may be biased towards the majority class, leading to poor performance on the minority class.
Challenges with Class Imbalance in SVM
- Bias Toward Majority Class: The SVM objective function may focus more on maximizing the margin for the majority class, ignoring the minority class.
- Skewed Decision Boundary: The decision boundary may shift closer to the minority class, reducing its classification performance.
- Poor Evaluation Metrics: Standard metrics like accuracy may not reflect the true performance of the model on imbalanced datasets.
Techniques to Address Class Imbalance in SVM
Class Weight Adjustment:
Assign higher weights to the minority class to balance the penalty during training.In Scikit-learn, this can be done using the
class_weight
parameter:Or manually specify weights:
Resampling Techniques:
- Oversampling: Increase the number of minority class samples by duplicating or generating synthetic samples (e.g., using SMOTE).
- Undersampling: Reduce the number of majority class samples to match the minority class.
Example using SMOTE:
Change the Decision Threshold: Adjust the threshold for classifying a sample as a minority class based on the predicted probabilities.
Example:
Kernel Selection: Use kernels that better separate the minority and majority classes, such as the RBF kernel, which can handle non-linear boundaries.
Feature Engineering: Transform or create features that emphasize differences between the classes. This can improve separability in the feature space.
Use of Alternative Metrics: Evaluate the model using metrics like:
- Precision, Recall, F1 Score
- Area Under the Receiver Operating Characteristic Curve (AUC-ROC)
- Precision-Recall Curve
Random Forest is a popular ensemble learning technique used for both classification and regression tasks. It combines the predictions of multiple decision trees to improve accuracy, reduce overfitting, and enhance generalization.
Key Features of Random Forest
- Ensemble Method: It creates multiple decision trees during training and combines their results for better performance.
- Bagging: It employs bootstrapping (sampling with replacement) to create different subsets of the training data for each tree.
- Random Feature Selection: At each split in a tree, only a random subset of features is considered, making the model less prone to overfitting.
- Majority Voting (Classification): For classification tasks, it uses the majority vote of the trees as the final prediction.
- Averaging (Regression): For regression tasks, it uses the average of the predictions from all trees.
How Random Forest Works
Bootstrap Aggregation (Bagging):
- Randomly sample the dataset with replacement to create multiple subsets of the data (one for each tree).
- Train each decision tree on a different subset.
Feature Randomness:
- At each split in a decision tree, a random subset of features is considered instead of evaluating all features.
- This adds diversity to the trees and prevents overfitting.
Prediction Aggregation:
- For classification: Take the majority vote across all decision trees.
- For regression: Compute the average of the predictions.
Advantages of Random Forest
- Robust to Overfitting: Combines multiple trees, reducing the risk of overfitting compared to individual decision trees.
- Handles Non-linear Data: Captures complex relationships in data.
- Works Well with Missing Values: Can handle datasets with missing values by averaging predictions.
- Scales Well to Large Datasets: Performs efficiently with high-dimensional data.
- Feature Importance: Provides insights into feature significance, aiding interpretability.
Disadvantages of Random Forest
- Computationally Intensive: Training multiple decision trees can be slow, especially with large datasets.
- Not as Interpretable: While decision trees are interpretable, combining them into a forest makes the model harder to understand.
- Bias in Small Data: May struggle with small datasets if not tuned properly.
- Memory Usage: Requires more memory as it stores multiple trees.
Random Forest Algorithm
- Input: Dataset with samples, number of trees , and number of features to select at each split.
- For Each Tree:
- Draw a bootstrap sample from .
- Build a decision tree on the bootstrap sample.
- At each split, randomly select features and choose the best split among them.
- Aggregate Predictions:
- For classification, use majority voting.
- For regression, compute the average