Read Arrival of the Fittest: Solving Evolution's Greatest Puzzle Online
Authors: Andreas Wagner
But we got more than we asked for. Thousands of man-years have already been poured into this endeavor, and the end is not near. To the contrary, the more we learn, the more strands of this web become evident, the more complex and tangled it seems. The road from genotype to phenotype extends to the horizon and beyond.
Throughout the twentieth century, many evolutionary biologists were undistracted by all this complexity. Basking in the glow of the modern synthesis, they were blissfully focused on the genotype. And this focus became even greater after Watson and Crick’s work had stirred the ocean of our ignorance, and after new technology to read the letter sequence of DNA molecules had been developed. This technology spawned a new research field known as
molecular evolutionary biology,
whose subject was variation in amino acid and DNA strings. The earliest incarnation of this technology was about as inefficient as Muybridge’s zoopraxiscope—a year’s work would reveal no more than a few hundred letters. By the mid-1980s, however, its efficiency had increased more than tenfold, enough to read short sequences of DNA from multiple individuals in a population.
43
When molecular evolutionists took advantage of this technology, they discovered something nobody had expected: enormous amounts of genetic variation, everywhere, even in organisms that had not changed for many millennia.
One early molecular evolution study focused on
alcoholdehydrogenase
,
an enzyme that helps detoxify ethanol. We have a gene for it, and so do fruit flies. No one knows whether they get as high on fermented fruit as any Skid Row wino, but they certainly are attracted to it, and they need this enzyme to prevent alcohol poisoning. In 1983, Martin Kreitman from Harvard University found that the DNA from a small sample of fruit flies contained more than forty-three different DNA text variants in this gene.
44
Similar variants occur in humans. One of them causes a form of alcohol intolerance where blotches erupt on the faces and bodies of sensitive individuals, a condition so widespread among people with Asian ancestry that it is known as “Asian flush.”
45
But what Kreitman did
not
find in the alcoholdehydrogenase gene was even more telling. Most of the mutations in this gene were
silent
. They changed the DNA sequence, but not the amino acid sequence of alcoholdehydrogenase. This is possible because the genetic code is redundant, because more than one three-letter word can encode the same amino acid. And it was surprising. Even with a redundant code, there should have been many more amino-acid-changing mutations, because mutations tend to sprinkle genes randomly with letter changes. Something had happened to these mutations.
The something was natural selection. Because these changes impaired the enzyme, natural selection had weeded them out long before Kreitman got to see them.
Kreitman’s discovery and others like it illustrate a fact that is easily overlooked: The revolutions in evolutionary thought are different from other scientific revolutions. Whereas the revolution of quantum physics in the early twentieth century, for example, gave rise to a worldview incompatible with that of classical physics, revolutions in evolutionary biology leave core elements of previous theories intact.
46
Rather than overturning the past, they deepen and sharpen it. They add layers of clarity and resolution, as well as new dimensions. The film
Seabiscuit
added color, music, dialogue, and the sound of hoofbeats to the first recording of Sallie Gardner’s ride, but it didn’t invalidate Muybridge’s revelation about the nature of galloping. Where Darwin used the natural world to infer the power of selection, the modern synthesis could see it in the ebb and flow of gene frequencies, and molecular evolutionists found it in DNA signatures, such as the excess of silent mutations. In doing so, they dissolved a fog of confusion that Darwin left behind. (
Some
of the fog, because the molecular revolution taught us more about genotypic than phenotypic change, the heart of the origination problem.)
The amount of variation Kreitman found in the alcoholdehydrogenase gene is not unusual. Animal and plant populations are chock-full of genetic variation. Genetic variants even occur in populations of living fossils whose phenotypes have not changed for many millions of years, such as the coelacanth, a strange fish thought to be extinct until a live specimen was found in 1939.
47
Their abundance raised questions that occupy molecular evolutionists to this day. Do most of them matter for phenotypic evolution? Are they necessary or irrelevant for life’s innovations? Their mere existence underlines how hard it is to understand phenotypic innovation and how it emerges from genetic change.
The ability to read a thousand letters of DNA text was still impressive in the 1980s. But a thousand letters are nothing compared to an organism’s genome, the totality of its DNA. Human DNA is three billion letters long, ten times longer than the
Encyclopædia
Britannica.
Every single one of the trillions of cells in our body contains a copy of it, packed into our forty-six chromosomes. Even the DNA of a bacterium like
Escherichia coli
has four and a half million letters, more than
War and Peace,
one of the longest novels ever written. DNA sequencing technology needed to get much better to read just the genome of a single individual, let alone to catalog variation in an entire population.
48
The impetus to develop this technology would come from the Human Genome Project, one of the largest international research collaborations ever, initiated in 1990 and spearheaded by the U.S. National Institutes of Health. This is not a coincidence, for the project aimed at understanding genes that cause disease, a special kind of new phenotype. Fierce competition to this publicly funded effort arose in 1998 from the company Celera Genomics and its founder, the biologist and entrepreneur Craig Venter. They managed to sequence the genome at a tenth of the cost, and crossed the finish line simultaneously with the publicly funded project in 2000, when a first draft human genome was published.
49
The human genome is another major milestone of biology that revealed a host of genetic information, how many genes we have, what proteins they encode, and so on. “The blueprint of life” is what President Bill Clinton called it in his 2000 State of the Union address. But if so, it is a very odd blueprint, one that we cannot use to build what it depicts, or even to guide a repairman to fix a problem. Because thus far, the genome has guarded the secrets of our phenotype well. Many had hoped, for example, that the genome would give us yes-or-no answers to the question of whether a person would get a genetic disease. But here is what Craig Venter himself had to say about our ability to predict disease in a 2010 interview with the German magazine
Der Spiegel:
We have, in truth, learned nothing from the genome other than probabilities. How does a 1 or 3 percent increased risk for something translate into the clinic? It is useless information.
50
This assessment is stark, but it holds a grain of truth. You guessed the reason: The relationship between genotype and phenotype is complex beyond imagination. The Human Genome Project is only a mile marker on the journey from genotype to phenotype. It’s not anywhere close to the end of the road.
Whatever its limitations, the genome project had many other benefits. One of them was that it whipped DNA sequencing technologies to blazing speeds. While in the year 2000 one person could read up to a million DNA letters in twenty-four hours, sequencing machines available in 2008 could already read up to a billion letters within the same span, and technologies have gotten much faster since then. As of this writing, full sequencing of a human genome costs little more than $1,000, and by the time you read this, the cost may have dropped to pennies. These technologies allow us to study genomic variation in large human populations and in many other organisms. They have transformed population genetics into
population genomics
.
Population genomics is the end of the road for studying genotypes. The same cannot be said for the phenotype. The molecular biology work that began in the mid-1950s to unravel the functions of proteins and their interactions continues undiminished. But in the 1990s it had to take a new tack to progress further. For processes like insulin signaling, it had previously identified key genes, the proteins they encoded, what these proteins do, and which of them interact.
51
All this information is like a who-is-who and a who-knows-whom of the cell. In the 1990s it became clear that such a catalog would fall short of predicting phenotypes, such as whether a person would develop diabetes. It fails to capture many subtleties that matter, the number of protein molecules involved, how firm their handshakes are, and so on. Dozens of different
kinds
of molecules contribute to diabetes, each contributing only a few percent to increased disease risk, but each conspiring with multiple others in subtle—and ill-understood—ways to cause disease. For these reasons it would not get us anywhere to merely list these molecules and their properties. We need to understand exactly how these molecular parts cooperate to form a whole phenotype.
The only tools that could offer this integration are mathematical. Equations can encapsulate a wealth of experimental data and describe how the concentrations and activities of molecules change over time.
52
And these activities are key to understanding phenotypes. For example, in type 2 diabetes the body shows insulin resistance, a phenotype different from that of a healthy individual: The pancreas releases insulin, but the liver reacts sluggishly. Somewhere along the signaling chain starting at the insulin receptor, the handshake of signaling molecules has become too weak (or too firm).
53
And this change percolates down the signaling chain to cause disease and suffering. Only the rigorous quantification of mathematics can help us understand such subtleties. No mere catalog of molecules could achieve that.
There is only one catch with the equations that can describe molecular phenotypes: They are not simple. They have many variables—molecules and their interactions—distilled from decades of experiments. They cannot be solved with pencil and paper. They are beyond the capabilities of even today’s most skilled mathematicians. Their solution requires computers.
Computers have become as essential to twenty-first-century biology as digital cameras have become to photography.
Computers do more than just run scientific equipment—from ultracold freezers to espresso machines—they are now instruments in their own right. Like the microscopes of the seventeenth century, they allow us to travel a new world, one so small that the most powerful imaging technologies, including electron microscopes, cannot resolve it: the world of molecules. Indeed, computers
are
the microscopes of the twenty-first century. They help us understand molecular webs that Darwin did not even know existed.
This centrality of computing is new, because for much of its history, biology was limited by data. Early explorers had to voyage for years to discover new life forms in faraway lands. Even early in the molecular era isolating a single gene could be years of work. No longer. Thanks to ever-accelerating technology, thousands of ever-growing databases are overflowing with biological information, not just about genes and genomes, but also about the millions of molecular parts living things harbor, about what these parts do and which other parts they interact with. Every year now, gigabytes and terabytes of new data enter these databases. A new generation of scientists—computational biologists—uses only knowledge gathered by others, and no longer experiments with living organisms. Biologists are being transformed into information scientists, with access to nearly limitless data. The limits exist in our imagination, and in our skills to detect laws of nature in that data.
These skills will surely be challenged, because the puzzle of how new phenotypes come into being has stymied science for more than a century. It’s one thing to recognize that phenotypes are like enormous pointillist paintings, created one molecular change at a time. It’s another to use that insight to understand how those paintings are actually created. The challenge is daunting, even on the smallest scale of proteins like the alcoholdehydrogenase that stands between you and Death by Happy Hour, since there are more ways to string amino acids together than there are hydrogen atoms in the entire universe. Referring to random change, recited like a mantra since Darwin’s time, as a source of all innovation is about as helpful as Anaximander’s argument that humans originated inside fish. It sweeps our ignorance under the rug by giving it a different name. This doesn’t mean that mutations don’t matter, or that natural selection isn’t absolutely necessary.
54
But given the staggering odds, selection is not enough. We need a principle that accelerates innovation.