If someone says that information = uncertainty = entropy, then they are confused, or something was not stated that should have been. Those equalities lead to a contradiction, since entropy of a system increases as the system becomes more disordered. So information corresponds to disorder according to this confusion.
If you always take information to be a decrease in uncertainty at the receiver and you will get straightened out:
R = Hbefore - Hafter.
where H is the Shannon uncertainty:
H = - sum (from i = 1 to number of symbols) Pi log2 Pi (bits per symbol)
and Pi is the probability of the ith symbol.
Imagine that we are in communication and that we have agreed on an alphabet. Before I send you a bunch of characters, you are uncertain (Hbefore) as to what I'm about to send. After you receive a character, your uncertainty goes down (to Hafter). Hafter is never zero because of noise in the communication system. Your decrease in uncertainty is the information (R) that you gain.
Since Hbefore and Hafter are state functions, this makes R a function of state. It allows you to lose information (it's called forgetting). You can put information into a computer and then remove it in a cycle.
Many of the statements in the early literature assumed a noiseless channel, so the uncertainty after receipt is zero (Hafter=0). This leads to the SPECIAL CASE where R = Hbefore. But Hbefore is NOT "the uncertainty", it is the uncertainty of the receiver BEFORE RECEIVING THE MESSAGE.
A way to see this is to work out the information in a bunch of DNA binding sites.
Definition of "binding": many proteins stick to certain special spots on DNA to control genes by turning them on or off. The only thing that distinguishes one spot from another spot is the pattern of letters (nucleotide bases) there. How much information is required to define this pattern?
Here is an aligned listing of the binding sites for the cI and cro proteins of the bacteriophage (i.e., virus) named lambda:
alist 5.66 aligned listing of: * 96/10/08 19:47:44, 96/10/08 19:31:56, lambda cI/cro sites piece names from: * 96/10/08 19:47:44, 96/10/08 19:31:56, lambda cI/cro sites The alignment is by delila instructions The book is from: -101 to 100 This alist list is from: -15 to 15 ------ ++++++ 111111--------- +++++++++111111 5432109876543210123456789012345 ............................... OL1 J02459 35599 + 1 tgctcagtatcaccgccagtggtatttatgt J02459 35599 - 2 acataaataccactggcggtgatactgagca OL2 J02459 35623 + 3 tttatgtcaacaccgccagagataatttatc J02459 35623 - 4 gataaattatctctggcggtgttgacataaa OL3 J02459 35643 + 5 gataatttatcaccgcagatggttatctgta J02459 35643 - 6 tacagataaccatctgcggtgataaattatc OR3 J02459 37959 + 7 ttaaatctatcaccgcaagggataaatatct J02459 37959 - 8 agatatttatcccttgcggtgatagatttaa OR2 J02459 37982 + 9 aaatatctaacaccgtgcgtgttgactattt J02459 37982 - 10 aaatagtcaacacgcacggtgttagatattt OR1 J02459 38006 + 11 actattttacctctggcggtgataatggttg J02459 38006 - 12 caaccattatcaccgccagaggtaaaatagt ^
Each horizontal line represents a DNA sequence, starting with the 5' end on the left, and proceeding to the 3' end on the right. The first sequence begins with: 5' tgctcag ... and ends with ... tttatgt 3'. Each of these twelve sequences is recognized by the lambda repressor protein (called cI) and also by the lambda cro protein.
What makes these sequences special so that these proteins like to stick to them? Clearly there must be a pattern of some kind.
Read the numbers on the top vertically. This is called a "numbar". Notice that position +7 always has a T (marked with the ^). That is, according to this rather limited data set, one or both of the proteins that bind here always require a T at that spot. Since the frequency of T is 1 and the frequencies of other bases there are 0, H(+7) = 0 bits. But that makes no sense whatsoever! This is a position where the protein requires information to be there.
That is, what is really happening is that the protein has two states. In the BEFORE state, it is somewhere on the DNA, and is able to probe all 4 possible bases. Thus the uncertainty before binding is Hbefore = log2(4) = 2 bits. In the AFTER state, the protein has bound and the uncertainty is lower: Hafter(+7) = 0 bits. The information content, or sequence conservation, of the position is Rsequence(+7) = Hbefore - Hafter = 2 bits. That is a sensible answer. Notice that this gives Rsequence close to zero outside the sites.
If you have uncertainty and information and entropy confused, I don't think you would be able to work through this problem. For one thing, one would get high information OUTSIDE the sites. Some people have published graphs like this.
A nice way to display binding site data so you can see them and grasp their meaning rapidly is by the sequence logo method. The sequence logo for the example above is at http://www-lecb.ncifcrf.gov/~toms/gallery/hawaii.fig1.gif.
More information about the theory of BEFORE and AFTER states is given in the papers http://www-lecb.ncifcrf.gov/~toms/paper/nano2 , http://www-lecb.ncifcrf.gov/~toms/paper/ccmm and http://www-lecb.ncifcrf.gov/~toms/paper/edmm.
Also there is the problem of finding genes:
A new Fourier transform approach for protein coding
measure based on the format of the Z curve