Life's Greatest Secret (34 page)

Read Life's Greatest Secret Online

Authors: Matthew Cobb

BOOK: Life's Greatest Secret
6.89Mb size Format: txt, pdf, ePub
* For example, an ACU codon in mRNA would bind with a UGA anticodon on a tRNA molecule.
UPDATE
Nearly half a century has passed since the final word of the genetic code was read. In the intervening years, science has made substantial advances, some of which seem to challenge the fundamental discoveries that were made in the heady years of 1944–67. The closing chapters bring the history of the genetic code up to date, showing what happened in the intervening decades.
–     TWELVE     –
SURPRISES AND SEQUENCES
All of the researchers involved in cracking the genetic code agreed on two basic principles. First, they assumed that there was a one-to-one correspondence between the DNA sequence of a gene and the corresponding amino acid sequence – what Crick called colinearity. Second, they considered that the genetic code and the way in which genes functioned were universal, so ‘anything found to be true of
E. coli
must also be true of elephants’, as Jacques Monod put it at Cold Spring Harbor in 1961.
1
Neither of these principles was required for genetics to work or for the genetic code to be cracked, but they made sense and they gave a universal significance to the models and interpretations that were being developed. They also ensured that the new science of molecular genetics fitted into the Darwinian framework according to which all life had a single origin and could therefore be assumed to share fundamental processes. Crick later said that those who were studying the genetic code had ‘a boundless optimism that the basic concepts involved were rather simple and probably much the same in all living things.’
2
Within ten years of the final word in the genetic code being read, it became obvious that such boundless optimism was unfounded, as the assumptions of colinearity and the universality of the genetic code were proved to be wrong.
*
In autumn 1977, several linked papers appeared in
Proceedings of the National Academy of Sciences
and in the new journal
Cell,
which had been set up three years earlier with the ambitious aim of being ‘a journal of exciting biology’.
3
For once, the reality lived up to the hype, as the articles announced that, in viruses, genes were not necessarily continuous stretches of DNA but instead could be spread out along a sequence, split into several pieces.
4
What Watson called the bombshell discovery of split genes had first been announced at the Cold Spring Harbor meeting in the summer of 1977, and the scientific community was abuzz with the implications. It was soon found that mammalian genes shared this property, which contrasted sharply with the strictly continuous organisation of genes in bacteria. The surprise and excitement felt by researchers is shown by the unprecedented language used in the title of the first paper in that issue of
Cell,
by Louise Chow, Richard Roberts and their colleagues, which described ‘An amazing sequence arrangement’ of viral nucleic acids. Scientists rarely use words like ‘amazing’ in their professional publications.
The reason for the excitement was simple: nearly twenty-five years of assumptions about gene structure had been overthrown by a completely unexpected discovery. Within a few months scientists were revelling in what was widely described as a revolution (Crick called it a mini-revolution).
5
* In eukaryotic cells (that is, in cells with a nucleus, so in all multicellular organisms and in single-celled organisms such as yeast) it turns out that genes often contain many bases that are not used to make a protein. As a result there is often no colinearity between the DNA sequence and the amino acid sequence of the protein. Between the start and the stop codons of a gene, there may be huge chunks of non-coding DNA that have no relation to the final protein. In an article in
Nature,
Wally Gilbert named these apparently irrelevant non-coding sequences introns (from ‘intragenic regions’); the DNA sequences that are expressed in protein were called exons.
6
Most eukaryotic DNA is a patchwork of exons and introns. Introns are generally around forty bases in length, but they can be very large – for example, one of the introns in the human dystrophin gene is more than 300,000 bases long.
7
In some rare cases, the intron of one gene can even contain a completely separate, protein-encoding gene.
8
The existence of introns means that the cell has to process the genetic message before it can be turned into protein. The first transcription of DNA into RNA was named pre-mRNA – it initially contains all the irrelevant introns, but these are immediately snipped away and the two new ends of the mRNA molecule joined together (‘spliced’) to form a messenger RNA sequence that corresponds to the final amino acid product of the gene, along with untranslated regions at the beginning and end of the mRNA sequence, which tell the cell how the gene is to be expressed and processed. This splicing is done by tiny cellular structures made of RNA and protein, known clumsily as spliceosomes (some RNA molecules can splice themselves, without the aid of a spliceosome). The beginning and end of an intron are marked by specific sequences that are recognised by the spliceosome and which indicate which bits of the pre-mRNA molecule need to be snipped out.
9
It is this spliced version of mRNA, called mature mRNA, that contains an RNA sequence that is colinear with the amino acid sequence of the protein and is used by the cell in protein synthesis.
Despite the initial amazement of the scientific community, the existence of all this non-coding DNA in eukaryotic organisms was soon welcomed by researchers, as it seemed to provide an answer to the nagging suspicion, first voiced by Burnet in 1956, that not all of the DNA in a genome actually contributes to producing proteins.
10
If important chunks of the genome were composed of what Wally Gilbert called ‘a matrix of silent DNA’, this would explain the situation, even if in 1959 Crick had described this possibility as unattractive.
11
Since Gilbert’s first description, there has been a long-running debate about where introns come from – some scientists have argued that even the earliest genomes had introns, but most now think that introns appeared with the evolution of the eukaryotes, because there is no evidence that any prokaryotic organism – single-celled organisms with no nucleus – ever had introns, or possessed the complex cellular machinery required for splicing them out.
12
Why introns evolved is still unclear.
Splicing is not just a matter of snipping out a few irrelevant bases. It allows the production of different proteins from a single gene, because under different conditions different exons can be spliced together – this is called alternative splicing. A single DNA sequence can give rise to several mRNA sequences, depending on a variety of external factors, including the type of cell that the gene is expressed in. Currently, the largest known number of mRNAs that can be produced by a single gene is 38,016. These mRNAs are encoded by the
Drosophila
gene
Dscam,
which has four clusters of exons, each of which has twelve, forty-eight, thirty-three or two alternative splices.
13
Many of the 38,016 potential Dscam proteins differ only slightly, but this variability is of major functional significance because they mean that the fly’s neurons differ. The consequence is that Dscam proteins help determine the intricate way that those neurons interconnect, shaping the brain.
14
The DNA sequence can contain an astonishing degree of complexity.
Until the discovery of introns, it had been assumed that gene mutation primarily involved point mutations – changes in a single base that would either lead to a different amino acid being inserted into the protein, or, if the base were deleted, would produce a frame-shift mutation in which the remaining bases of the genetic sequence would be read in a novel series of triplet codons, which would often be nonsense, as Crick had suggested in 1961. With the discovery of introns, it was realised that a mutation at the beginning or end of an intron could radically alter the structure of the translated protein by allowing new DNA sequences from the intron to be included in the coding region of the gene, thereby providing an additional source of genetic novelty. The two principal researchers involved in the discovery of what were initially called ‘split genes’ or ‘genes in pieces’ were Richard Roberts of Cold Spring Harbor and Phillip Sharp of MIT, and in 1993 they won the Nobel Prize in Physiology or Medicine for their work.
*
Two years after the discovery of introns, the scientific world was shaken yet again. The last word of the genetic code to be deciphered was the stop codon UGA (nicknamed opal), in 1967. In November 1979, a group at Cambridge discovered that in human mitochondria – small energy-producing structures found in all eukaryotic cells, which contain their own DNA and ribosomes – UGA does not encode stop but instead produces an amino acid, tryptophan.
15
The genetic code is not strictly universal; even more surprisingly, the same organism – you – contains two different genetic codes, one in your genomic DNA, the other in your mitochondrial DNA.
This fact tells us something fundamental about the history of life on our planet. In 1967, the US biologist Lynn Margulis began arguing that mitochondria were not merely micro-structures within eukaryotic cells but were remnants of a single-celled organism that had fused with the ancestor of all eukaryotic organisms, billions of years ago, probably as part of a symbiotic relationship. She was not the first to come up with this idea – in the early years of the twentieth century, both Paul Portier and Ivan Wallin suggested that mitochondria might be symbionts.
16
Margulis argued that these symbiotic bacteria subsequently found themselves trapped in every one of our cells and lost all their independence, but not their own, separate genome – a tiny ring of DNA about 16,500 base pairs long (in comparison, the human nuclear genome contains about 3 billion base pairs). (Genes and genomes are measured in ‘base pairs’ because of the two strands of the DNA double helix – for each base there is a complementary base on the other strand, forming a base pair.)
It appears that all mitochondria, in all the eukaryotes on the planet, have a common ancestor that was alive more than 1.5 billion years ago. The ancestors of plants subsequently incorporated another microbe in the same way, thus gaining their power-generating chloroplast organelles and the ability to gain energy from sunlight.
17
In the cases of both mitochondria and chloroplasts, there are arguments over exactly what kind of microbe fused with what, and above all the speed with which the fusion took place, but most scientists now think that in each case there was a single event that enabled what was effectively a hybrid organism to grow larger and to acquire the energy required by more complex organisms.
18
The extremely small nature of the mitochondrial genome, and its peculiar use of codons, can be explained in terms of the history of this symbiotic relationship. The mitochondrial genome codes for very few proteins – most of the other genes were lost before or shortly after fusion with our ancestors or were incorporated into the genomic DNA of the host – so the appearance of a new function for a codon in mitochondrial DNA through mutation would not have had an important effect on the symbiont, most of whose needs were provided by the host cell.
Mitochondria are not alone in having a non-standard genetic code. In 1985, it was discovered that single-cell ciliates – tiny organisms such as
Paramecium –
show variants of the nuclear genetic code that have appeared several times during evolution. In some species of ciliate, UAA and UAG code for glutamate rather than stop, with only UGA encoding stop; in others, UGA codes for tryptophan.
19
Sometimes UGA and UAG have been recoded by natural selection to code for extra amino acids, not generally found in life – selenocysteine and pyrrolysine, respectively.
20
This can occur by altering the genetic code only in particular genes. For example, the human genome contains a handful of genes in which UGA has been recoded to encode selenocysteine.
21
In these cases part of the mRNA for these genes instructs the cell to insert selenocysteine when it reads UGA; in all our other genes, UGA retains its normal stop function.
22
A recent study of 5.6 trillion base pairs of DNA from more than 1,700 samples of bacteria and bacteriophages isolated from natural environments, including on the human body, revealed that in an important proportion of the sequences, stop codons had been reassigned to code for amino acids, and an investigation of hitherto unstudied microbes revealed that in one group UAG had been reassigned from stop to code for glycine.
23
There are even cases of novel codons being used to start translation, for example, in 2012, it was discovered that in some unusual circumstances in the mammalian immune system, the genetic message does not begin with the normal AUG codon but can be initiated from a CUG codon, which normally codes for leucine.
24
More than fifteen alternative or non-canonical genetic codes are known to exist, and it can be assumed that more remain to be discovered.
25
The non-canonical codes generally involve the reassignment of stop codons; this may indicate that there is something about the machinery involved in stop codons that makes them particularly susceptible to change, or it may simply be that as long as the organism can still code stop using another codon, reassigning one stop codon to an amino acid does not cause those organisms any major physiological or evolutionary difficulties.
26
The exact process by which codon change takes place has been the focus of a great deal of theoretical and experimental research, and several hypotheses have been put forward to explain how variant codes might arise. The current front-runner is called the codon capture model, and was first put forward in 1987 by Jukes and Osawa. According to this model, random effects such as genetic drift can lead to the disappearance of a particular codon in a given genome; similar effects then lead to that codon being ‘captured’ by a tRNA that codes for another amino acid.
27
A recent experimental study of genetically engineered bacteria in which some codons had been artificially replaced supported this model, and even suggested that reassignment of codons could be advantageous in some circumstances, providing the organism with expanded functions.
28

Other books

The Ways of the World by Robert Goddard
The Cowboy's Baby by Linda Ford
The Teacher by Gray, Meg
My Seaswept Heart by Christine Dorsey
Enemy in Blue by Derek Blass
Broken to Pieces by Avery Stark