Z Algorithm

Z algorithm is a linear time string matching algorithm which runs in O(n) complexity. Many programmers are still not aware this algorithm. It is used to find all occurrence of a  pattern P in longer text T, which is common string searching problem. We have already discussed naive inefficient implementation in previous post http://wikistack.com/string-matching-z-algorithm/ , Now let us discuss all about Z algorithm.

Z algorithm requires two components Z-boxes and Z-values or we can say a Z function

Now let us discuss what these are?

Let There is a string T=”WIKISWIKK”

Now Let us Define a function Zi(T)  where T is text string where Zi(T) is equal to longest sub-string that starts at i and is also prefix of T.

zalgo

From above tableZ[0] =9, it means length of the string, because WIKISWIKK is a longest prefix of itself .  

 Z[1] = 0, because The string starts from index 1 is IKISWIKK, not prefix of WIKISWIKK.  

 Z[2] = 0, because The string starts from index 2 is KISWIKK, not prefix of WIKISWIKK.      

 Z[3] = 0, because The string starts from index 3 is ISWIKK, not prefix of WIKISWIKK.  

 Z[4] = 0, because The string starts from index 4 is SWIKK, not prefix of WIKISWIKK.

 Z[5] = 3, because The string starts from index 5 is WIKK and WIK (length 3 ) is prefix of WIKISWIKK.

 Z[6] = 0, because The string starts from index 6 is IKK, not prefix of WIKISWIKK.

 Z[7] = 0, because The string starts from index 7 is KK , not prefix of WIKISWIKK.

 Z[8] = 0, because The string starts from index 8 is  K, not prefix of WIKISWIKK

                                                       

Now Take an example for string matching problem and apply above concept for search.

Let T=”WIKISWIKK” is text

P=”WIK” is pattern to be searched in text T.

Step for Z algorithm based P string searching in larger text T

  1. Create S=PT, where S is string formed by prepending pattern P to text T.
  2. Take a Z box like Z[L…..R],where l is left pointer and R is right pointer and assume that Z[L….R] is also a prefix of T.
  3. Initialize L=1 and R=0; here we are initializing L=1 , because already it is clear that Z[0] = length of string S. i.e a string is prefix of itself.
  4. Traverse through the string S starting from k=1, where k is another variable.
  5. If k > R calculate Z<sub>k</sub>(S) explicitly. If Z<sub>k</sub> > 0, set L = k and R := k+Z<sub>k</sub>-1.
  6. Else: p = k-L
    6.1. If Z<sub>p</sub> < R-k+1, set Z<sub>k</sub> = Z<sub>p</sub>.
    6.2. If Z<sub>p</sub> R-k+1, for each i, R +1 i < |S| do:
          6.2.1. If S(i) = S(i-k): increment i and continue.
          6.2.2. If S(i) S(i-k): set Z<sub>k</sub> = i-k, L = k, R = i-1 and break

( Above algo step is referenced by http://ivanyu.me/blog/2013/10/15/z-algorithm/  and below source code is translation of python program in c)