Formal Languages and Automata
Automata theory studies abstract machines and the problems they can solve. It defines what computation is and what its limits are. Formal languages are sets of strings defined by grammars or recognised by automata.
Alphabet, Strings, and Languages
An alphabet Σ is a finite set of symbols. A string is a finite sequence of symbols from Σ. The empty string is ε. A language is a set of strings over Σ. Operations include concatenation, union, intersection, complement, and Kleene star (L* = {ε} ∪ L ∪ LL ∪ ...).
Finite Automata
A Deterministic Finite Automaton (DFA) is (Q, Σ, δ, q₀, F) — states, alphabet, transition function, start state, and accepting states. It reads input one symbol at a time and transitions deterministically. A Nondeterministic FA (NFA) allows multiple transitions from a state on the same input. DFA and NFA recognise the same class of languages.
Regular Expressions
Regular expressions define regular languages using union (|), concatenation, and Kleene star (*). Example: (a|b)*abb matches strings over {a,b} ending in "abb". Every regular expression has an equivalent NFA and vice versa. Regular expressions are used in text processing, lexical analysis, and pattern matching.
Context-Free Grammars
A Context-Free Grammar (CFG) G = (V, T, P, S) has variables, terminals, productions, and a start symbol. Productions have the form A → α. CFGs generate context-free languages (CFLs), which include programming languages. Parse trees represent derivations. Ambiguous grammars have multiple parse trees for a string.
Pushdown Automata
A Pushdown Automaton (PDA) is a finite automaton augmented with a stack. It recognises context-free languages. The stack provides memory for matching nested structures (parentheses, HTML tags). Deterministic PDAs recognise a subset of CFLs used in most programming language parsers.
Turing Machines
A Turing Machine (TM) has an infinite tape, a head that reads/writes/moves, and states. It recognises recursively enumerable languages and decides recursive languages. The Church-Turing thesis states that anything intuitively computable is Turing-computable.
Chomsky Hierarchy
The Chomsky hierarchy classifies languages: Type 3 (regular, FA), Type 2 (context-free, PDA), Type 1 (context-sensitive, linear-bounded automaton), Type 0 (recursively enumerable, TM). Each level strictly contains the ones below it.
Summary
Automata and formal languages form the theoretical backbone of computer science. They define the capabilities and limitations of computation, underpin compiler design, and guide algorithm development.