Support Vector Machine

 Support Vector Machines (SVM) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier.

Input vectors that support the margin are called "support vectors"

A hyperplane is a subspace of one dimension less than its ambient space. If a space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the space is 2-dimensional, its hyperplanes are the 1 dimensional lines.


“The goal of support vector machines (SVM) is to find an optimal hyperplane that separates the data into classes.”

In case of SVM, we have a training data with the help of which we build an SVM model which can best predict the test data. There are two approaches in SVM for fitting a model using training data – Hard Margin SVM and Soft Margin SVM.

In case of a hard margin classifier we find “w” and “b” such that
ø(w)=2/||w|| is maximized for all {(xi,yi)} where yi(wT.xi+b)>=1

Soft margin doesn’t require to classify all the training data correctly, unlike hard margin. As a result, soft margin misclassifies some of the training data. However, on an average it has comparatively higher prediction accuracy than hard margin classifier for test data.
Concept of “Slack Variable” - "εi" is introduced to allow misclassification, where εi represents the distance of that point from the boundary margin for that class.

In case of a soft  margin classifier we find “w” and “b” such that
ø(w)=(1/2)wTw+C∑εi is minimized for all {(xi,yi)} where yi(wT.xi+b) >= (1-ε
j) and εj >= 0 where j is the set of indices of violates the boundary hyper plane.
Parameter "C"  can be viewed as a way to control over-fitting.

For a given point,

  • If 0 ≤ εi ≤ 1 then the point is classified correctly, lies in between the hyper plane and the margin on the correct side of the hyperplane. This point exhibits a margin violation.

  • If εi > 1  then the point is misclassified, lies on the wrong side of the hyperplane and beyond the margin.

C is a regularization parameter that controls the margin as follows:

  • A small value of C implies that the model is more tolerant and hence has a larger margin.

  • A large value of C makes the constraints hard to ignore, and hence the model has a smaller margin.

  • When the value of C is infinity, then all the constraints are enforced and thus the SVM model is considered a hard-margin classifier

SVM can classify non-linearly separable data as well using the kernel trick.

method of classifying the data by transforming it into a higher dimension is called "kernel trick"

Popular Post

MindMaps

Featured post

Question 1: Reverse Words in a String III

  def reverseWords(s: str) -> str: words = s.split() return ' '.join(word[::-1] for word in words)