7.1. FORMAL LANGUAGES
This section under major construction.
The theory that we consider in this chapter is baed on the following simple abstract notions. An alphabet is a set of symbols. A string over an alphabet is a sequence of symbols taken from that alphabet. A formal language over an alphabet is a set of strings over that alphabet. When addressing theoertical questions, we often use the binary alpahabet that consists of the two symbols 0 and 1. In practice, we use whatever alphabet is sutiable to the task at hand: the Roman alphabet for processing text; decimal digits for processing numbers, and the characters acgt for processing genetic data.
| Alphabet | Symbols | Symbol Name |
String Name |
|---|---|---|---|
| binary | 01 | bit | bitstring |
| Roman | abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ |
letter | word |
| decimal | 0123456789 | digit | integer |
| special | ~!@#$%^&*()_-+={[}]|\:;"'<,>.?/ | ||
| keyboard | Roman + decimal + special | keystroke | typescript |
| DNA | actg | gene | genome |
| monomer | ACDEFGHIKLMNPQRSTVWY | amino acid | protein |
| ASCII | see Appendix | byte | String |
| UNICODE | see Appendix | char | String |
The simplest way to specify a formal language is to enumerate its strings.
For example, we can list all bitstrings of length 3 by listing them.
000 001 010 011 100 101 110 111
One complication is that languages can be large or infinite. For the moment, we will use informal English descriptions to describe (possibly) infinite sets of strings.
| Language | true | false |
|---|---|---|
| Tail(2) next-to-last bit is 0 |
00 11101 111111010101 |
0 000010 11111111111 |
| Equal equal number of 0s and 1s |
10 110010 000011111100 |
0 11100 01010101010101010 |
Why not just work with precise, complete definitions?
This is the crux of the matter! Our goal is to
work with precise, complete definitions. This is
known as the specification problem for formal
languages. How is formal language related to computation?
Each formal language leads to the precise
statement of several specific computational problems.
Perhaps the most basic is the following: given a language
L and a bitstring x, determine whether x is in L or not.
This task is called the recognition problem
for formal languages.
With suitable interpretations of the meaning of the bitstrings,
we can define formal languages that relate to nontrivial
computational problems. For example, we might define
Prime to be the set of all bitstrings that are the binary
representation of a prime number. Solving the recognition
problem for this language involves understanding some mathematics,
and probably using a computer.
Formal languages are very important tools that are widely used
in practical applications. Natural languages like English,
programming languages like Java, and genetic codes are all examples
of formal languages. Specification, recognition, and related problems
for these sorts of languages are of critical importance in
modern computing.
