The Mathematics of Communication
Web resources on information theory range from a complete Short Course in Information Theory by David J.C. MacKay of the Cavendish Laboratory, Cambridge (for the mathematically adult reader only) and Shannon's 1948 classic A Mathematical Theory of Communication to the somewhat more accessible Primer on Information Theory by Thomas Schneider of the NIH Laboratory of Molecular Biology (interesting focus on the genetic code) and the very useful and elementary Overview of Information Theory at the website of Alcatel-Lucent, the intellectual heirs of Shannon's home at the old Bell Labs. Another nice elementary presentation is Basics of Information Theory by Dave Touretzky at Carnegie Mellon. Many more web references are listed with annotations on Entropy on the World Wide Web posted by Chris Hillman (Washington).
1. Numbers, bases and logarithms
Take a number, say 565937. This number has an existence of its own independent of the numbering system used to represent it. It could be for example the population of a mid-sized city, or the salary of a mid-level executive.
The mathematics of communication addresses the problem of determining the cost of recording or transmitting this number and, beyond numbers, any conceivable message.
Staying with 565937, suppose we used for our numbering system a base different from 10. Here are some of the possibilities:
The trade-off between size of base and length of representation manifests what is called the amount of information carried by the number.
Information always depends on the context in which a signal is to be understood. In many cases, however, it is not practical to make up a different encoding system for each context. For example, in encoding numbers, one usually makes the hypothesis that in each place all digits are equally likely. So in writing a six-digit decimal number we are choosing one of the one million possibilities. On the other hand writing the same number in base 5 would imply that we are choosing one of the 1953125 possible 9-digit base-5 numbers.
To get a standard measure of this information, and for practical reasons as well, we use base 2. It takes 20 base-2 digits to write the number and consequently we say it has 20 bits (=Binary digITs) of information.
Since the decimal (base-10) representation gives the same information in 6 decimal digits, each digit in 565937 carries 20/6 = 3.33 bits of information. In general, the length of the base 2 representation of a number n is very nearly log2n; similarly the length of its base-10 representation will be very nearly log10n. For long strings of numbers these estimates become exact and the number of bits per decimal will be log2n / log10n = log210 = 3.3219..
Here is where logarithms come into the picture. More generally, for any number n and base b, so each digit is one of b equally likely possibilities, the number of bits per digit will be
This is the basic observation that underlies the mathematical theory of communication.
Welcome to the
These web essays are designed for those who have already discovered the joys of mathematics as well as for those who may be uncomfortable with mathematics.
Search Feature Column
Feature Column at a glance