Life's Greatest Secret (46 page)

Read Life's Greatest Secret Online

Authors: Matthew Cobb

BOOK: Life's Greatest Secret
2.88Mb size Format: txt, pdf, ePub
*
Ever since 1953, when Watson and Crick wrote those apparently simple words ‘the precise sequence of the bases is the code which carries the genetical information’, biologists have considered that the idea that genes contain information is intuitively obvious. Philosophers have not been so easily convinced, and over the past two decades a debate about genetic information has taken place, away from the gaze of biologists. The main issue that has preoccupied the philosophers is the exact nature of the kind of information that is in genes, and, indeed, whether there is something there that can strictly be called information. The fact that most scientists are unaware of these arguments is due partly to the divisions between academic disciplines and partly, I suspect, to the fact that many of my colleagues take a dim view of philosophy. This is unfortunate, because one of the jobs of philosophers is to explore the complexity that lurks in apparently straightforward concepts such as information. Indeed, it is possible that had philosophers paid more attention to the issue in the 1950s, they might have been able to persuade the theoreticians not to view the code literally as a code or as a language, and less time might have been wasted on fruitless speculation.
Many scientists would probably agree with Michael Apter and Lewis Wolpert, who argued in 1965 that genetic information is simply a metaphor or an analogy, a way of describing what genes contain and how they exert their effects.
38
Apter and Wolpert claimed that the most thorough definition of information, as described in Shannon’s communication theory, does not apply to genetic information because the whole point of genetic information is that it does something, it has a function, a meaning, whereas Shannon’s view of information has no place for meaning. The difficulty involved in expressing the content of DNA in Shannon’s terms can be shown by trying to calculate the information content of a genome. The problems begin at the beginning: it is unclear whether the fundamental unit should be a single base, with four alternative states (and therefore two bits of information), or a codon – three bases, with sixty-four alternative states (and therefore eight bits of information) – or the output of the system, with twenty-one alternative states (twenty amino acids and ‘stop’), and therefore five bits of information. Calculations based on each of these approaches would produce different answers, and in all cases it is not clear what the outcome would mean.
Two other possibilities highlight the problem associated with using Shannon’s measure of information on data from molecular genetics. First, imagine two stretches of DNA of identical lengths, containing the same proportions of the four bases but in different orders. According to Shannon, the information content of those stretches of DNA, if calculated using each base, would be identical, and yet they would almost certainly have differing gene products that would affect the fitness of the organism in various ways – the biological content of their information would not be alike. Second, there is no agreed answer as to whether most of the DNA sequences in our genome, which have no apparent function and do not seem to be subject to natural selection, contain information or not. Most biologists would probably say not, because they would link information with function, whereas a mathematician would probably argue that they do. Although from Shannon’s point of view a sequence of junk DNA contains as much information as a sequence of codons from a protein-encoding gene, that is clearly not the case from the point of view of the cell, the organism or natural selection. Despite these obstacles, some scientists and philosophers continue to claim that DNA does contain Shannon information and have applied information theory to data from molecular genetics.
39
None of these attempts has yet convinced the scientific community as a whole.
Towards the end of his life, the theoretical population geneticist John Maynard Smith (1920–2004) began to explore the role of information in biology. In 1997 he wrote a book with Eörs Szathmáry entitled
The Major Transitions in Evolution,
in which they described the evolution of life as a set of changes in the way in which information is stored and transmitted.
40
For example, the evolution of multicellular organisms altered how information is transmitted and stored, with the appearance of differentiation between cells, underpinned by spatially and temporally modulated gene regulation. The most recent evolution of an information transfer system is the one we are using at this very moment – the appearance of language in humans.
In 2000, Maynard Smith wrote the first of a series of articles in which he explored the nature of genetic information and exchanged views with philosophers of biology.
41
Maynard Smith put evolution at the heart of his view of genetic information and where it comes from: ‘DNA contains information that has been programmed by natural selection’, he stated, and as a consequence the quantity and quality of genetic information has increased over the past 3.8 billion years.
42
From this point of view, natural selection is the coder that has given the DNA sequence meaning: ‘genomic information is “meaningful” in that it generates an organism able to survive in the environment in which selection has acted’, he wrote.
43
In other words, genes provide the cell with instructions that have been encoded through natural selection: that is the nature of genetic information.
Genetic information is not like the effect of the environment – most scientists and philosophers consider that environmental factors, although they have shaped genetic information through natural selection and form the conditions that allow genes to be expressed, do not themselves contain information (some philosophers disagree).
44
For example, although changes in temperature can alter the expression of sex-determining genes in crocodiles thereby changing the sex-ratio of a population, the meaning of increased temperature is not the production of more male crocodiles. ‘It is for this reason that we speak of genes carrying information during development, and of environmental fluctuations not doing so’, argued Maynard Smith.
45
For Ulrich Stegmann, a biologist turned philosopher at the University of Aberdeen, DNA contains information that is conditionally expressed. One way of thinking about this is that DNA sequences act a bit like a recipe. Protein synthesis proceeds in a step-by-step fashion, where each step depends on an external factor (codons in DNA and then in mRNA), in the same way that a recipe determines the order in which a cook puts together the ingredients and uses the utensils.
46
The idea of a gene as a computer program is another popular metaphor, according to which the program responds to input conditions in various ways and, depending on those inputs, produces various consistent outputs.
47
However, these are only metaphors. Genes are not programs or recipes, and organisms are not computers or cakes.
The first systematic critic of the concept of genetic information was the philosopher of biology Sahotra Sarkar, of the University of Texas at Austin. For Sarkar, like Wolpert and Apter in 1965, genetic information is ‘little more than a metaphor that masquerades as a theoretical concept’.
48
Sarkar’s critique rests partly on the fact that in eukaryotes, with their complex system of gene splicing, the DNA sequence does not correspond to the amino acid sequence. Strictly speaking, our genes therefore do not correspond to Crick’s definition of genetic information, because the DNA sequence has to be processed and mediated before it appears as an amino acid sequence. Sarkar also points out that genetic information differs from artificial codes because it is impossible to back-translate a protein sequence into a DNA sequence, owing to the redundancy of the genetic code, the presence of introns in eukaryotes and the existing of multiple splicing. For Sarkar, genetic information therefore fails what he calls the test of reverse differential specificity, and, he argues, the concept has ceased to be a useful tool for discovery.
49
To my mind, Sarkar’s critique does not invalidate the use of the term information when discussing the content of genes. Instead, it underlines that genetic information is not like other kinds of information. Neither does this critique undermine the existence of a genetic code: a particular codon will produce a particular amino acid – the triplet of bases represents and encodes that amino acid. That is a code. The fact that you cannot reliably back-translate from amino acid into DNA may disqualify the use of the word code for a philosopher, but it does not for a scientist, or for a member of the public.
As the philosopher Peter Godfrey-Smith has pointed out, part of the problem flows from the fact that the meaning of the word code as used to describe the content of genes is not strictly identical to the word code as used in other contexts (Godfrey-Smith nevertheless thinks that it is legitimate to use the term code in molecular genetics).
50
The genetic code is not an artificially designed system, it is a phrase that describes the sixty-four ways in which a part of one molecule (messenger RNA) binds with part of another (a tRNA), which in turn binds with another (an amino acid), the detail of which can only be fully understood in an evolutionary context. Sarkar put it pithily: ‘DNA is, ultimately, a molecule and not a language’.
51
DNA is a replicating molecule that, in the right context, leads to the production of certain chemical sequences through the information it contains.
Despite these philosophical clarifications, at first glance the genetic code does indeed look like an artificial code, and the initial assumption was that it therefore came with the associated baggage of such an artefact, such as strictly logical rules and the ability to back-translate. This apparent similarity between the genetic code and artificial codes beguiled many scientists in the 1950s as they tried to crack the code using mathematical principles. Interpreting the genetic code in terms of precise analogies, strict definitions and exact parallels to artificial systems will almost certainly fail, because the genetic code, like every other aspect of biology, has not been designed. It is part of life, and has evolved. It can be properly understood only in its historical, biological context. That was the lesson of the doomed attempts to break the code in the 1950s, and it should guide us today in trying to understand what is in our genes.
For some philosophers, describing the content of genes as information suggests that DNA determines all the characters of an organism in an absolute and unmediated fashion. This critique is misplaced, because in reality few, if any, scientists hold such extreme views. There is a rule of thumb in reading popular science reporting (or, indeed, a scientific paper): if an article describes ‘the gene for’ something, you are almost certainly reading an over-simplistic account. Genes rarely do just one thing; even if a gene produces only one kind of protein, that protein can have different consequences in different contexts.
The gene that got me interested in studying the effects of genes on behaviour, back in 1976, was a
Drosophila
gene called
dunce
that was identified in Seymour Benzer’s lab – flies with a mutation in this gene show defects in learning and memory.
52
Dunce
might seem to be a gene ‘for’ learning or memory, and it primarily codes for an enzyme that affects the level of an intracellular signalling molecule called cAMP, which has been implicated in learning in a wide range of organisms. But through multiple splicing
dunce
can produce seventeen separate proteins, varying in length from 521 to 1,209 amino acids. Mutations in this gene can affect a wide range of characters apart from learning and memory, including female fertility and the insect’s responses to organophosphates.
53
In the light of this knowledge, what exactly
dunce
is ‘for’ escapes easy definition. Although we know what it does under some circumstances, and what happens when specific parts of the gene are mutated, that does not mean that the gene has a single function. And remember,
dunce
is nothing special, it is just one gene out of billions that exist throughout nature.
Many of those philosophers who criticise the idea that genes contain information rightly point out that DNA can do nothing on its own, emphasising the role that proteins play in life.
54
This is hardly a major criticism – it is true of all representations, codes or languages. The printed symbols that you are looking at represent words and ultimately concepts that I have encoded onto paper, but they mean nothing until they are read. That does not stop them from being part of a language, and does not undermine their fundamental importance in communication. As to the essential role of proteins, Crick said basically the same thing in his 1957 lecture:
the main function of the genetic material is to control (not necessarily directly) the synthesis of proteins. … Once the central and unique role of proteins is admitted there seems little point in genes doing anything else.
55
Some of these critics argue that DNA is merely one of many factors, including the environment, that equally determine the life-cycles of organisms – this is called the parity thesis.
56
There are a handful of scientists who agree with this extreme position and argue that proteins, the environment, or the cell’s metabolism, play a role that is equal to, or greater than, DNA in determining the characteristics of organisms.
57
These scientists remain in a very small minority, because the overwhelming evidence is against their view. It does not correspond to what happens in our laboratories, where DNA is manipulated, altered and transferred according to gene-centred experimental protocols, and where the expected outcome occurs. When students in my laboratory take genes from three separate organisms and combine them, using a regulatory gene from yeast to drive the expression of a jellyfish gene that encodes fluorescent protein so that a single cell in a maggot’s nose glows, the determining causal factor is the genes. The environment, the cell, the maggot, and the ingenious humans who designed the experiment are all permissive factors that had to be in the correct state for the genes to produce their desired effect, but the contribution of these peripheral conditions to the outcome is qualitatively unlike the contribution of the DNA. In this case, the genes function exactly as if they contained information that determines the outcome, because they do.

Other books

The Cinderella Bride by Barbara Wallace
The Catswold Portal by Shirley Rousseau Murphy
Eleanor by Joseph P. Lash
Someone Like You by Cathy Kelly
Mixed Messages by Tina Wells
Cy in Chains by David L. Dudley
COOL BEANS by Erynn Mangum
The Warble by Simcox, Victoria