Finite Automata

A major task in editing texts is to find the right position. Let’s see how this can be automated efficiently. As the decision whether a word occurs in the text depends on the whole text, we cannot expect to be any more efficient than linear in the length of the text. So, let’s stay at least with this. Also, as the presence of a word cannot be established by looking at a single letter, at least a finite amount of information needs to be carried over while traversing the text.

Deterministic finite automata

The concept of linearly traversing a text while keeping a finite amount of information is known as finite automata. More precisely, a deterministic finite automaton is given by the following data.

  • the alphabet Sigma the automaton works on
  • a fintie set Q of states
  • an initial state q_0 of Q
  • a transition function delta : Q x Sigma -> Q
  • a subset F of Q of accepting states

The transition function delta is lifted from letters to a function delta^* on words by iterating over the letters of the word. A word w is accepted by the automaton, if delta^*(q_0)(w) in F.

Non-deterministic finite automata

Since the power set of a finite set is still a finite set, we can allow non-determinism in the following sense.

  • instead of an initial state q_0, we take a subset I of Q, forming the possible initial states
  • the transition function is changed to have the signature delta : Q x Sigma -> 2^Q with the intended meaning that one of the elements of delta(q)(c) has to be chosen as successor state

The function delta is again lifted from letters to words by iteration over the letters of the word; this times by means of relational composition.

Non-deterministic automata can be transformed into deterministic ones, by taking 2^Q as the new state set, keeping track of the set of states the non-deterministic automaton can possibly be in.

Exercise. Work out the details.

Note. Power sets can easily be kept track of by bit vectors.

The languages definable by non-deterministic finite automata contains the single-word languages and is closed under the following operations.

  • union
  • juxtaposition, where LL' = { ww' | w in L and w' in L' }
  • Kleene closure, where L^* = { epsilon } union L union LL union LLL union ...

Additionally, the of the combined automaton is the sum of the sizes of the component automata plus O(1).

Exercise. Proof this.