The LZW algorithm is a very common compression technique.
Suppose we want to encode the Oxford Concise English dictionary which contains about 159,000 entries. Why not just transmit each word as an 18 bit number?
Problems:
Reference: Terry A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19.
The LZW Compression Algorithm can summarised as follows:
w = NIL; while ( read a character k ) { if wk exists in the dictionary w = wk; else add wk to the dictionary; output the code for w; w = k; }
Example:
Input string is "^WED^WE^WEE^WEB^WET". w k output index symbol ----------------------------------------- NIL ^ ^ W ^ 256 ^W W E W 257 WE E D E 258 ED D ^ D 259 D^ ^ W ^W E 256 260 ^WE E ^ E 261 E^ ^ W ^W E ^WE E 260 262 ^WEE E ^ E^ W 261 263 E^W W E WE B 257 264 WEB B ^ B 265 B^ ^ W ^W E ^WE T 260 266 ^WET T EOF T
The LZW Decompression Algorithm is as follows:
read a character k; output k; w = k; while ( read a character k ) /* k could be a character or a code. */ { entry = dictionary entry for k; output entry; add w + entry[0] to dictionary; w = entry; }
Example (continued):
Input string is "^WED<256>E<260><261><257>B<260>T". w k output index symbol ------------------------------------------- ^ ^ ^ W W 256 ^W W E E 257 WE E D D 258 ED D <256> ^W 259 D^ <256> E E 260 ^WE E <260> ^WE 261 E^ <260> <261> E^ 262 ^WEE <261> <257> WE 263 E^W <257> B B 264 WEB B <260> ^WE 265 B^ <260> T T 266 ^WET