next up previous
Next: Entropy Encoding Summary Up: Lossless Compression Algorithms (Entropy Previous: Arithmetic Coding

Lempel-Ziv-Welch (LZW) Algorithm

The LZW algorithm is a very common compression technique.

Suppose we want to encode the Oxford Concise English dictionary which contains about 159,000 entries. Why not just transmit each word as an 18 bit number?

Problems:

Reference: Terry A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19.

The LZW Compression Algorithm can summarised as follows:

   w = NIL;
   while ( read a character k )
       {
         if wk exists in the dictionary
          w = wk;
         else
           add wk to the dictionary;
           output the code for w;
           w = k;
       }

Example:

Input string is "^WED^WE^WEE^WEB^WET". 


             w    k    output    index    symbol
          -----------------------------------------
            NIL   ^               
             ^    W      ^        256       ^W
             W    E      W        257       WE
             E    D      E        258       ED
             D    ^      D        259       D^
             ^    W       
            ^W    E     256       260      ^WE
             E    ^      E        261       E^
             ^    W
            ^W    E
           ^WE    E     260       262     ^WEE
             E    ^
            E^    W     261       263      E^W
             W    E
            WE    B     257       264      WEB
             B    ^      B        265       B^
             ^    W
            ^W    E
           ^WE    T     260       266     ^WET
             T   EOF     T

The LZW Decompression Algorithm is as follows:

 read a character k;
   output k;
   w = k;
   while ( read a character k )    
  /* k could be a character or a code. */
       {
         entry = dictionary entry for k;
         output entry;
         add w + entry[0] to dictionary;
         w = entry;
       }

Example (continued):

Input string is "^WED<256>E<260><261><257>B<260>T". 


             w      k    output    index    symbol
          -------------------------------------------
                    ^       ^
             ^      W       W       256       ^W
             W      E       E       257       WE
             E      D       D       258       ED
             D    <256>    ^W       259       D^
           <256>    E       E       260      ^WE
             E    <260>   ^WE       261       E^
           <260>  <261>    E^       262     ^WEE
           <261>  <257>    WE       263      E^W
           <257>    B       B       264      WEB
             B    <260>   ^WE       265       B^
           <260>    T       T       266     ^WET

next up previous
Next: Entropy Encoding Summary Up: Lossless Compression Algorithms (Entropy Previous: Arithmetic Coding
Dave Marshall
10/4/2001