Feature Column from the AMS

Digital Revolution (II) - Compression Codes and Technologies

Lossy and non-lossy compression

4. Lossy and non-lossy compression

When a compression algorithm is used in an applied environment, the choice of that algorithm is often governed by the situations where the compression will take place. Good algorithms for text compression may not be useful to compress a motion picture file, whereas an algorithm suitable for compression of an audio file may not be a good choice for a speech file. What are some of the issues that come up in applications? Obviously, speed of encoding and decoding are important, but another factor, not yet mentioned in detail, is the question of whether, when one uncompresses the compressed material, one gets back the original. If information is lost during the compression process, the algorithm is lossy. If the compression algorithm guarantees the uncompressed material is what was started with, this is called non-lossy compression or lossless compression.

Photographs provide an example of some issues involved. If you have a good digital camera, the number of pixels (picture elements) in each photograph will be very large. For final viewing it is the detailed images that you want to send to Aunt Joan or print out for your own personal use. However, you may want to display these photos in small thumbnail versions to help you remember the content of each image. The thumbnail photos will be compressed versions of the originals to save storage space and to cut down the time that it takes to transfer the images to your computer. These compressed images might be lossy versions of the original for which you would have the option to look at the non-lossy versions by clicking on the thumbnail. The detailed images may be stored in lossless compressed form to save storage space. Compressed images not only save money by reducing the cost of storage, but also time, when they are transmitted from one place to another. By way of contrast, text files must typically be compressed in a way that when they are reconstructed, no loss has occurred. Insurance companies that compress records of customer policies, software that is distributed over the Internet, and a myriad of other settings require lossless reconstruction after compression. Often, however, compression of either kind saves time and/or storage costs.

As a regular user of the Internet and electronic mail you may have examined or downloaded a variety of files. There are many extensions that indicate that a file has been compressed. Examples include: .gz, .sit (created by Stuffit or one of its relatives--one expander program for .sit is called Stuffit Expander--in the Apple Macintosh environment), .sea (for self-extracting archive, a compression approach that is responsible for automatic decompression), .z, .Z (created by Compress), or .zip (created by a variety of programs in the Windows environment). The extension .gz, which is often used to compress large postscript files to save time in transmitting them, is created with a program called gzip, which was written by Jean-loup Gailly. Sometimes these compression systems are invisible to the user because when one clicks on such a file, the file may be automatically opened and decompressed by software on one's computer and the file renamed in a way that will make it possible to open it with a particular piece of software.