对比分析

Type-0 grammars又叫做unrestricted grammar，从它的名字可以看出，它是unrestricted的，即

No restrictions are made on the productions of an unrestricted grammar

那其它grammar（Type-1 grammars Context-sensitive grammar、Type-2 grammars Context-free grammar 、Type-3 grammars Regular grammar ），对各自的production的restriction是什么呢？下面从这个角度对这几种grammar进行对比分析，由上述Chomsky hierarchy所描述的层级可以看出，从Type 0 grammars到Type 3 grammars逐级增加restriction，所以在进行对比分析的时候，每个grammar只需要和它的升一级grammar进行对比即可。

Type-0 grammars:Unrestricted grammar formal definition

An unrestricted grammar is a formal grammar $ G=(N,\Sigma ,P,S) $, where $ N $ is a set of nonterminal symbols, $ \Sigma $ is a set of terminal symbols, $ N $ and $ \Sigma $ are disjoint, $ P $ is a set of production rules of the form $ \alpha \to \beta $ where $ \alpha $ and $ \beta $ are strings of symbols in $ N\cup \Sigma $ and $ \alpha $ is not the empty string, and $ S\in N $ is a specially designated start symbol. As the name implies, there are no real restrictions on the types of production rules that unrestricted grammars can have.

对比分析

和其他的grammar production相比，它的unrestriction体现在：

$\alpha$ 可以为terminal、non-terminal
$\beta$ 可以为empty string

Type-1 grammars:Context-sensitive grammar formal definition

A formal grammar G = (N, Σ, P, S), where N is a set of nonterminal symbols, Σ is a set of terminal symbols, P is a set of production rules, and S is the start symbol, is context-sensitive if all rules in P are of the form

α*A*β → αγβ

where A ∈ N, α,β ∈ (N∪Σ)* and γ ∈ (N∪Σ)+.

上述*和+都是使用的正则表达式的概念。

对比分析

和unrestricted grammar production相比，它的restriction在于：

γ不可为empty string
A（在产生式的左部）必须是non-terminal

上述两个restriction在Type-0 grammar中都不存在。

Type-2 grammars:Context-free grammar formal definitions

A context-free grammar G is defined by the 4-tuple:

$ G=(V,\Sigma ,R,S) $ where

1、V is a finite set; each element $ v\in V $ is called a nonterminal character or a variable. Each variable represents a different type of phrase or clause in the sentence. Variables are also sometimes called syntactic categories. Each variable defines a sub-language of the language defined by G.

2、Σ is a finite set of terminal*s, disjoint from *V, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar G.

3、R is a finite relation from V to $ (V\cup \Sigma )^{} $, where the asterisk represents the Kleene star operation. The members of *R are called the (rewrite) rule*s or *production*s of the grammar. (also commonly symbolized by a *P)

NOTE: See also

production rule

rewriting

4、S is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of V.

Production rule notation

A production rule in R is formalized mathematically as a pair $ (\alpha ,\beta )\in R $, where $ \alpha \in V $ is a nonterminal and $ \beta \in (V\cup \Sigma )^{} $ is a string of variables and/or terminals; rather than using ordered pair notation, production rules are usually written using an arrow operator with *α as its left hand side and β as its right hand side: $ \alpha \rightarrow \beta $.

It is allowed for β to be the empty string, and in this case it is customary to denote it by ε. The form $ \alpha \rightarrow \varepsilon $ is called an ε-production.

It is common to list all right-hand sides for the same left-hand side on the same line, using | (the pipe symbol) to separate them. Rules $ \alpha \rightarrow \beta _{1} $ and $ \alpha \rightarrow \beta _{2} $ can hence be written as $ \alpha \rightarrow \beta _{1}\mid \beta _{2} $. In this case, $ \beta _{1} $ and $ \beta _{2} $ is called the first and second alternative, respectively.

Rule application

For any strings $ u,v\in (V\cup \Sigma )^{} $, we say *u directly yields v, written as $ u\Rightarrow v\, $, if $ \exists (\alpha ,\beta )\in R $ with $ \alpha \in V $ and $ u_{1},u_{2}\in (V\cup \Sigma )^{} $ such that $ u\,=u_{1}\alpha u_{2} $ and $ v\,=u_{1}\beta u_{2} $. Thus, *v is a result of applying the rule $ (\alpha ,\beta ) $ to u.

Repetitive rule application

For any strings $ u,v\in (V\cup \Sigma )^{}, $ we say *u yields v, written as $ u{\stackrel {}{\Rightarrow }}v $ (or $ u\Rightarrow \Rightarrow v\, $ in some textbooks), if $ \exists k\geq 1\,\exists \,u_{1},\cdots ,u_{k}\in (V\cup \Sigma )^{} $ such that $ u=\,u_{1}\Rightarrow u_{2}\Rightarrow \cdots \Rightarrow u_{k}\,=v $. In this case, if $ k\geq 2 $ (i.e., $ u\neq v $), the relation $ u{\stackrel {+}{\Rightarrow }}v $ holds. In other words, $ ({\stackrel {*}{\Rightarrow }}) $ and $ ({\stackrel {+}{\Rightarrow }}) $ are the reflexive transitive closure (allowing a word to yield itself) and the transitive closure (requiring at least one step) of $ (\Rightarrow ) $, respectively.

Context-free language

The language of a grammar $ G=(V,\Sigma ,R,S) $ is the set

$ L(G)={w\in \Sigma ^{}:S{\stackrel {}{\Rightarrow }}w} $

A language L is said to be a context-free language (CFL), if there exists a CFG G, such that $ L\,=\,L(G) $.

Non-deterministic pushdown automata recognize exactly the context-free languages.

对比分析

Context-free grammar 的restriction在于

The left-hand side of the production rule is always a nonterminal symbol. This means that the symbol does not appear in the resulting formal language.

在type 1 grammar中，对产生式的左部中nonterminal symbol的个数并没有限制。

Type-3 grammars:Regular grammar formal definition

A right regular grammar (also called right linear grammar) is a formal grammar (N, Σ, P, S) such that all the production rules in P are of one of the following forms:

A → a, where A is a non-terminal in N and a is a terminal in Σ
A → aB, where A and B are non-terminals in N and a is in Σ
A → ε, where A is in N and ε denotes the empty string, i.e. the string of length 0.

In a left regular grammar (also called left linear grammar), all rules obey the forms

A → a, where A is a non-terminal in N and a is a terminal in Σ
A → Ba, where A and B are in N and a is in Σ
A → ε, where A is in N and ε is the empty string.

A regular grammar is a left or right regular grammar.

Some textbooks and articles disallow empty production rules, and assume that the empty string is not present in languages.

对比分析

与Type-2 grammars相比，Type-3 grammars的限制在于产生式右侧的non-terminal：

产生式的右侧最多只能够有一个non-terminal
产生式右侧的non-terminal只能够在最左侧或最右侧

显然，与Type-2 grammars相比，由于Type-3 grammar的production的右侧只能够有一个non-terminal，所以在按照它的production进行rewrite（扩展）的时候，只能够向一个方向进行扩展，所以最终的扩展结果只能够是一个线性的结构，所以它是linear grammar。而Type-2 grammar的production的右侧可以有多个non-terminal，所以在按照它的production进行rewrite（扩展）的时候，能够向多个方向进行扩展，所以最终的扩展结果是一个树形的结构，所以它不是linear grammar。

NOTE: 描述的结构

总结

上述所有grammar的production都可以支持递归，即都可以是recursive grammar。

NOTE: 描述的递归性

在Recursively enumerable language中有这样的描述：

All regular, context-free, context-sensitive and recursive languages are recursively enumerable.

对比分析By automaton

CFG的 automaton是pushdown automaton，我们知道pushdown automaton本质上来说就是stack，CFG本质上来说表示的是hierarchy结构/树形结构。