Friday, 26 October 2012

What is this functionally specified information after all?

From complexity theory we know that any object can be represented by a string of symbols from a universal alphabet. So we shall concentrate on strings for convenience and clarity. Consider three strings of the same length (72 ASCII characters):
  1. "jjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjbjjjb"
  2. "4hjjjqq,dsgqjg8 ii0gakkkdffeyzndkk,.j fnldeeeddzpGCZZaQ12 9nnnsskuu6 s."
  3. "Take 500 g. of flour, 250 ml of water, 2 tsp. of sugar, 1egg. Mix, whip."

String 1 exhibits high periodicity, it is Kolmogorov simple (highly compressible), monotonic, redundant and highly specific.

String 2 is a collection of random symbols (well, we assume they are genuinely random for the sake of argument). String 2 is a lot more complex than String 1, non-compressible and lacks specificity.

String 3 is from a food recipe and is therefore a message unlike the other two strings. Importantly, string 3 carries associated functionally specified information. Failure to recognise this fact leads to a widespread error which claims that strings 1, 2 and 3 are equiprobable in the hypothetical process of spontaneous information generation that might have raised biological function. In our example, the function associated with string 3 is the instruction to cook that dish.

Is it possible to rigorously define functionally specified information associated with a symbolic string? Is it possible to measure it? The answer is yes to both questions. In the literature on the subject [Hazen et. al 2007, Szostak 2003], information associated with a specific function f is defined as follows:

Here M(f) > 0 is the number of strings prescribing function f, W is the total number of possible strings of or up to a given length. Clearly, M(f) ≤ W. We can see that reducing specificity (raising the ratio to 1.0) reduces the amount of specified information and vice versa.

Note that specificity of information in a string of symbols determines the probability distribution of a given symbol appearing at various positions in the string. E.g. if we move the symbols "Take 500 g. of flour" from the beginning to the end of string 3, its functional integrity will be compromised. Expectedly, Douglas Axe [Axe 2004, Axe 2010a,b] came to the same conclusions after conducting a series of experiments aimed at studying the properties of protein domain functionality in bacteria: the functionality of protein domains has been found to be deeply isolated in the configuration space, whereas the number of amino acid residue permutations is hugely greater than the number of permutations where the given function is preserved. Neither highly periodic strings of type 1, nor random strings of type 2 are capable of carrying practically significant amounts of specific functional information.

The above is not taken into consideration in various experimental attempts to imitate the hypothetical spontaneous function generation, for example, in the form of meaningful text. I saw somewhere on the internet a discussion about an allegedly successful random generation of the first 24 consecutive symbols of a Shakespearian comedy. Assuming such a sequence was generated, the real question is what the generation algorithm was given for granted. The major problem with this kind of numerical experiments is that they a priori assume the existence of an interpretation protocol uploaded into the system (here it is the English language). Besides, computer code (e.g. the notoriously known Weasel program by R. Dawkins) in these experiments is often tuned to drive the search towards the goal state in the form of some chosen phrase. More subtle ways exist to sneak information about the solution distribution into the search. It is possible to implicitly drive the search even when the target phrase is not given in advance. As a result, the search is still made aware of the likelihood of meeting a solution in different areas of search space. E.g. given a structure of words and sentences in a language we have information about the mean frequencies of various letters which can be used in building random phrases. For more detail on how one can use during search so called active information about solution distributions in the search space, see [Ewert et al. 2012].

Earlier we pointed out that strings are useful mathematical representations of reality. Consequently, various configurations of actual material systems also carry certain quantities of functionally specified information. And this information can be measured, which is what is already being done. [Durston et al. 2007] proposed a method to quantify functionally specified information in proteins based on the reduction in functional uncertainty in functional sequences of nucleotides compared to a null state where any nucleotide sequence is equiprobable (total loss of function).

An objective analysis of functionally specified information found in nature leads to the following conclusions. High quantities of functionally specified information are detected only in human artefacts (such as complex information processing systems, natural or computer languages) and in biosystems. Consequently, we are entitled to infer by induction that life's origin is also artificial. This scientific inference can as a matter of course be debunked experimentally. To disprove it, it is sufficient to
empirically demonstrate that:
  • There exists a mechanism utilizing only spontaneous and law-like causal factors, which is capable of generating and applying a protocol of information processing in a multipart system;
  • This mechanism can spontaneously generate and unambiguously interpret long enough instructions in agreement with the protocol; the length of instructions in a language compatible with the protocol should at least be equivalent to 500 bits of information, as is the case with 72 ASCII characters (with the exception of special characters) of meaningful text such as string 3 in the above example. For biosystems, according to Durston, granting the highest possible replication rate for the entire span of natural history (4.5 billion years) the information threshold is 140 bits (20 ASCII text symbols, respectively).


  1. D. Axe (2004) Estimating the Prevalence of Protein Sequences Adopting Functional Enzyme Folds, Journal of Molecular Biology,Volume 341, Issue 5, 27 August 2004, Pages 1295-1315.
  2. D. Axe (2010a) The Case Against a Darwinian Origin of Protein Folds, Biocomplexity Journal.
  3. D. Axe (2010b) The Limits of Complex Adaptation: An Analysis Based on a Simple Model of Structured Bacterial Populations. Biocomplexity Journal.
  4. Durston, K.K., D.K.Y. Chiu, D.L. Abel and J.T. Trevors (2007) Measuring the functional sequence complexity of proteins", Theoretical Biology and Medical Modelling 4:47. [doi:10.1186/1742-4682-4-47]
  5. W. Ewert, W. Dembski, R. Marks (2012) Climbing the Steiner Tree—Sources of Active Informationin in a Genetic Algorithm for Solving the Euclidean Steiner Tree Problem, Bio-Complexity Journal.
  6. Hazen R.M., Griffen P.I., Carothers J.M., Szostak J.W. (2007) Functional information and the emergence of biocomplexity, PNAS, 104:8574-8581.
  7. Szostak JW (2003) Functional information: Molecular messages. Nature 2003, 423:689.

No comments:

Post a Comment