Data Compression Review Part 2  

  The following review questions were covered in Labs 02 and 03 and are questions taken from the course website. NOT ALL questions from the course website were discussed in lab.
Please note that the information presented here does not necessarily represent a complete answer to the review question. The information is intended to spark student's memory for those who attended lab OR provide a base (hint) from where to start for those students who did not attend lab.


  Question 1: Describe what characteristics of a data set would be best compressed using each of the following algorithms (Arithmetic Coding, Run Length Encoding, LZW and variants, Huffman Coding, Front Compression). Explain why?  
   
ARIMETIC CODING - multisymbol alphabets, speed, effectiveness and low storage
RUN LENGTH ENCODING - bitmap type files, faxes (black and white)
LZW AND VARIANTS - files with similar sequqnces, graphics
HUFFMAN CODING - data sets where some symbols have higher frequencies than others, text
FRONT COMPRESSION - data set with concentrations of identical symbols that are unevenly distributed
 

  Question 2: Why is Run Length Encoding generally a poor choice for compressing text?  
   
Run Length Encoding is based on long 'runs' of identical symbols. In most cases, text (ie. words) does not exhibit this property.
 

  Question 3: Give one real life situation (be specific) where compression with loss of data is acceptable?  
   
MP3 (sound), Image reduction, etc ....
 

  Question 4: What is one important aspect of a file that is usually lost when it gets compressed?  
   
Its meaning (ex - when an image is compressed, the 'picture' itself is no longer there - just the encoding exists).
 

  Question 5: In Static Huffman Compression, if the resultant Huffman Tree is full and balanced, what can be said about the frequencies of the symbols?  
   
The frequencies must be close to equivalent.
 

  Question 6: Next to Arithmetic Coding, Huffman Coding is said to produce the best compression ratios but LZW (and its variants) is the method of choice for image compression. Explain this discrepancy?  
   
LZW compression works best when a data set has a number of sequences of symbols that re-occur throughout the data. Most images will have sequential pixel patterns that occur frequently throughout the file.
 

  Question 7: We want to compress a file using Run Length Enconding but all possible bit strings are used in the data so none are readily available for use as a repetition code. What do we do (2 solutions)?  
   
SOLUTION 1 - Combine 2 symbols (ex - combine 2 shades of gray in a grayscale image) and use the left over symbol as the escape code.
SOLUTION 2 - Do not use an escape code, rather use the repetition of a symbol to signify a run (ex - two symbols in a row followed by the numerical representation of the length of the run).
 

  Question 8: You are given a message that contains 4 symbols: a1, a2, a3 and a4
They appear with the following probabilites: a1 = 0.5, a2 = 0.3, a3 = 0.1, a4 = 0.1
They are currently encoded as straight forward two-bit values (ie. 00, 01, 10 and 11)
The entropy for the message is calculated to be 1.34

Varying length codes have been assigned to the symbols as follows: a1 = 1, a2 = 01, a3 = 000, a4 = 001
A. Given the preceeding information, what compression ratio would you expect in a typical message?
B. If the frequencies vary, at what point would it be pointless to try and use varying length codes?( formula ? )
 
   
PART A - Assume a message with 100 symbols...
SCENARIO 1: Expected Code Length = 100 symbols X 2 bits/symbol = 200 bits
SCENARIO 2: Expected Code Length = 100 symbols( 0.5 X 1 bit/symbol + 0.3 X 2 bits/symbol + 2( 0.1 X 3 bits/symbol ) ) = 170 bits
Compression Ratio = 170/200 = 85%

PART B - As the sum of the differences between frequencies approaches zero, varying length codes no longer make sense (ie - when all the frequencies are the same).
Recall that Maximum Entropy occurs when the probabilities for all symbols are the same; Redundancy is defined as the difference between the Maximum Entropy and the Actual Entropy.
 

BACK

© Copyright 2002
Questions? Please Email: gwen@cpsc.ucalgary.ca
Last modified October 29, 2002