关于本章
本章节,首先描述上面是"formal",引出Formal language, 然后对它进行详细的分析。维基百科的Formal language内容是非常好的(全面,深入浅出),需要仔细阅读。
首先说明formal language,formal grammar,automata,Chomsky hierarchy等基本概念,建立理论框架。
然后按照Chomsky hierarchy的每一层进行说明,对Chomsky hierarchy的每一层的说明采用如下顺序,即:
- Type-3 grammars: Regular grammar
- Type-2 grammars: Context-free grammar
- Type-1 grammars: Context-sensitive grammar
- Type-0 grammars: Unrestricted grammar
这是因为从Type-3 grammars到Type-0 grammars,grammar越来越抽象,越来越难理解,这样由易到难会比较符合认知规律。
在了解了这些之后,我们不能够仅仅局限于此,而是应该走向更加宽广的理论:Theory of computation,因为按照2012 ACM Computing Classification System,前面所讨论的formal language,automata theory都属于此范轴。
理论往往不是孤立的,而是相互关联,相互借用的,相互启发的,在阅读Formal language时,你会发现作者对它进行了大量的发散,涉及到的学科有mathematics(尤其是 Mathematical logic), computer science, and linguistics。所以,在阅读的时候,就有必要梳理清楚概念之间的关系,抓住问题的本质,而不是被表面的描述识所扰乱。
学习计划--从入门到精通
Formal language所涉及的理论较多,并且大多数都是比较抽象的,刚开始学习(尤其对于缺乏使用programming language的人来说)可能会感觉比较吃力,以下是觉得比较好的学习计划:
首先搞清楚what is language、what is formal language、what is formal grammar、what is automata、What is the relationship between them,这样就建立起了建立起高屋建瓴的视野,这其实就是算入门了。推荐阅读下面的文章:
入门了之后,就需要建立起theory framework,最终我们会发现,使用Chomsky hierarchy就能够将整个理论给统一起来;然后再逐个进行细致分析,最后就能够融会贯通,这其实就是精通了。
Formal language
提及formal language,就得请出Noam Chomsky,因为下面的理论框架是由他所建立的,该theory framework的是按照Chomsky hierarchy来进行组织的:
| Chomsky hierarchy | Grammars | Languages | Abstract machines |
|---|---|---|---|
| Type-0 | Unrestricted | Recursively enumerable | Turing machine |
| — | (no common name) | Decidable | Decider |
| Type-1 | Context-sensitive | Context-sensitive | Linear-bounded |
| — | Positive range concatenation | Positive range concatenation* | PTIME Turing Machine |
| — | Indexed | Indexed* | Nested stack |
| — | — | — | Thread automaton |
| — | Linear context-free rewriting systems | Linear context-free rewriting language | restricted Tree stack automaton |
| — | Tree-adjoining | Tree-adjoining | Embedded pushdown |
| Type-2 | Context-free | Context-free | Nondeterministic pushdown |
| — | Deterministic context-free | Deterministic context-free | Deterministic pushdown |
| — | Visibly pushdown | Visibly pushdown | Visibly pushdown |
| Type-3 | Regular | Regular | Finite |
| — | — | Star-free | Counter-free (with aperiodic finite monoid) |
| — | Non-recursive | Finite | Acyclic finite |
Each category of languages, except those marked by a *, is a proper subset of the category directly above it. Any language in each category is generated by a grammar and by an automaton in the category in the same line.
Expressive power
Type-3->Type-2->Type-1->Type-0
Linear structure->hierarchy structure->graph->...
relation越来越复杂,structure越来越复杂;
Expressive power逐渐增强;
入门读物
入门读物推荐:
除此之外,推荐阅读如下巨著:
Cinderella Book VS Dragon Book
Introduction to Automata and Language Theory(aka Cinderella Book)是该领域的经典书籍。
Introduction to Automata and Language Theory(aka Cinderella Book)和Compilers: Principles, Techniques, and Tools (aka "Dragon Book")中的内容其实是紧密关联的,这不仅仅是因为Jeffrey D.Ullman参与了这两本书的编写,而是因为programming language是一种formal language,而Cinderella Book和Dragon Book其实都是在讲述和formal language相关的内容,当然,这些内容仅仅书中的一部分。Cinderella Book专注于讲述automata and language theory,而dragon book的内容则可以分为两个部分front end和back end,显然front end所讲述的内容就和automata and language theory紧密相关了。
参考
本项目的内容大多数来自: