SKILLWILL

Compiler Design Part-3

Parsers

Generation of Parse Tree Using:

Top down Approach - Which production to use
Bottom up Approach - When to reduce

Classification of Parsers:

Grammar:

A phrase structure grammar is (N,T,P,S) where,

N - Non terminal
T - Terminal
N intersection T = phi
S - start symbol
P- Production rules

Type 0: / unstricted grammar

Type 1: Length Increasing grammar- Context Sensitve Grammar
Type 2: CFG
Type 3: Regular Grammar

Derivation of CFG:

Left Derivation
Right Derivation
Parse tree Derivation

Left Recursion-

A production of grammar is said to have left recursion if the leftmost variable of its RHS is same as variable of its LHS.
A grammar containing a production having left recursion is called as Left Recursive Grammar.

Example-

S → Sa / ∈

(Left Recursive Grammar)

Left recursion is considered to be a problematic situation for Top down parsers.
Therefore, left recursion has to be eliminated from the grammar.

Ambiguity in CFG:

Possible questions to ask
Whether the grammar is predective parsing or not
Check Whether the grammar is ambiguos or not

Problem due to Ambiguity
Associativity Property Violation
Precedence property Violation
Determine Associativity and precedence of the operators

NOTE:

For unambiguous grammars, Leftmost derivation and Rightmost derivation represents the same parse tree.
For ambiguous grammars, Leftmost derivation and Rightmost derivation represents different parse trees.

Recursion in CFG

Left recursion
Right Recursion

Elimination of Left Recursion
Problems with Left Recursion
Conversion of left to right Recursion

Non-Deterministic CFG
Elimination of Non-determination

Regression analysis is a statistical process for estimating the relationships between variables. It can be used to build a model to predict the value of the target variable from the predictor variables.

Mathematically, a regression model is represented as y= f(X), where y is the target or dependent variable and X is the set of predictors or independent variables (x1, x2, …, xn).

If a linear regression model involves only one predictor variable, it is called a Simple Linear Regression model.

f(X) = ß0 + ß1*x1 + ∈

The ß values are known as weights (ß0 is also called intercept and the subsequent ß1, ß2, etc. are called as coefficients). The error , ϵ is assumed to be normally distributed with a constant variance.

Assumptions of Linear Regression

Assumption 1: The target (dependent) variable and the predictor (independent) variables should be continuous numerical values.

Assumption 2: There should be linear relationship between the predictor variable and the target variable. A scatterplot with the predictor and the target variables along the x-axis and the y-axis, can be used as a simple check to validate this assumption.

Assumption 3: There should not be any significant outliers in the data.

Assumption 4: The data is iid (Independent and identically distributed). In other words, one observation should not depend on another.

Assumption 5: The residuals (difference between the actual value and predicted value) of a regression should not exhibit any pattern. That is, they should be homoscedastic (exhibit equal variance across all instances). This assumption can be validated by plotting a scatter plot of the residuals. If the residuals exhibit a pattern, then they are not homoscedastic (in other words, they are heteroscedastic). If the residuals are randomly distributed, then it is homoscedastic in nature.

Assumption 6: The residuals of the regression line should be approximately normally distributed. The assumption can be checked by plotting a Normal Q-Q plot on the residuals.

Implementation:
SLR Implementation

Compiler Design Part-1

Compiler: It is a language translator which translates from one language to other

Contents:

Lexical analysis, parsing, syntax-directed translation
Runtime environments
Intermediate code generation
Local optimization
Data flow analysis: constant propagation, liveness analysis, common subexpression elimination.

Language Processing System:

Structure of a compiler/Phases of a compiler

Usage of Symbol Table

Phase	Usage
Lexical Analysis -->	create new entites for new Identifiers
Syntax Analysis:-->	Adds information regarding attributes like type, scope, dimension, line of reference & line of use
Semantic Analysis:-->	Use the available information to check for semantics & is updated
Intermediate code--> generation	to add temporary variables
code optimization -->	Information in symbol table used in machine-dependent optimization by considering addresses & aliased variables information.
Target code generator->	Generates the code by using the addresses information of identifiers.

Symbol table entries

Each entry in the symbol table is assciated with attributes that support the compiler in different phases

Attributes are: Name, Size, Dimension, Type, Line of declaration, line of usage, Address.

Lexical analysis:

We need to define the rules to construct the expression based on source code.
The lexer takes the regular expression as input and then converts it into the equivalent FA
Every string which is scanned for source code is given as an input for Finite Automata to check the validity of the string.
If the string is accepted by FA, then the string becomes a token

P Problem Example

Difference between lexeme and token:
Lexeme is a token name, where token contains token name and attribute value.

Design of Lexical Analyzer
We can either write a manual program for lexical analyzer or use tool.(Lex tool)
Manual design: Token -> Pattern -> reg Exp -> FA -> Transistion table -> transistion function

Secondary Function of a Lexical analyzer

Elimination of white spaces.
Removal of comment lines
Correlating the error msg by tracing the line number

LA uses DFA for Tokenization, It is the only phase that reads the program char by char

Lexical Error: will be updated
Lexical Error Recovery

Left Factoring:

To get rid of non-determinism and common prefixes.

Eliminationg non-determinism doesn't effect Ambuguity.

Support Vector Machine

Support Vector Machines (SVM) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier.

Input vectors that support the margin are called "support vectors"

A hyperplane is a subspace of one dimension less than its ambient space. If a space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the space is 2-dimensional, its hyperplanes are the 1 dimensional lines.

“The goal of support vector machines (SVM) is to find an optimal hyperplane that separates the data into classes.”

In case of SVM, we have a training data with the help of which we build an SVM model which can best predict the test data. There are two approaches in SVM for fitting a model using training data – Hard Margin SVM and Soft Margin SVM.

In case of a hard margin classifier we find “w” and “b” such that
ø(w)=2/||w|| is maximized for all {(xi,yi)} where yi(wT.xi+b)>=1

Soft margin doesn’t require to classify all the training data correctly, unlike hard margin. As a result, soft margin misclassifies some of the training data. However, on an average it has comparatively higher prediction accuracy than hard margin classifier for test data.
Concept of “Slack Variable” - "εi" is introduced to allow misclassification, where εi represents the distance of that point from the boundary margin for that class.

In case of a soft margin classifier we find “w” and “b” such that
ø(w)=(1/2)wTw+C∑εi is minimized for all {(xi,yi)} where yi(wT.xi+b) >= (1-εj) and εj >= 0 where j is the set of indices of violates the boundary hyper plane.
Parameter "C" can be viewed as a way to control over-fitting.

For a given point,

If 0 ≤ εi ≤ 1 then the point is classified correctly, lies in between the hyper plane and the margin on the correct side of the hyperplane. This point exhibits a margin violation.
If εi > 1 then the point is misclassified, lies on the wrong side of the hyperplane and beyond the margin.

C is a regularization parameter that controls the margin as follows:

A small value of C implies that the model is more tolerant and hence has a larger margin.
A large value of C makes the constraints hard to ignore, and hence the model has a smaller margin.
When the value of C is infinity, then all the constraints are enforced and thus the SVM model is considered a hard-margin classifier

SVM can classify non-linearly separable data as well using the kernel trick.

method of classifying the data by transforming it into a higher dimension is called "kernel trick"

Translate

SKILLWILL

Search This Blog

Wikipedia

Search

Main header

2nd header links

Compiler Design Part-3

Compiler Design Part-2

Grammar:

Derivation of CFG:

Left Recursion-

Example-

Ambiguity in CFG:

Recursion in CFG

Simple Linear Regression

Compiler Design Part-1

Structure of a compiler/Phases of a compiler

Symbol table entries

Lexical analysis:

Support Vector Machine

Popular Post

MindMaps

Featured post

Question 1: Reverse Words in a String III

Labels