Read Arrival of the Fittest: Solving Evolution's Greatest Puzzle Online
Authors: Andreas Wagner
Answering this question needs more than just computers. It also needs a librarian’s love of texts. A young Chilean researcher named Evandro Ferrada brought just this love to Zürich when he joined our group of researchers to get his Ph.D. He had already studied proteins and become skillful at mining huge protein databases for information about proteins, from their folds to their smallest atomic details. I had seen Evandro’s quiet, pensive personality before, in people whose minds constantly grapple with the deep mysteries of life. Perhaps this is why he agreed to work on this problem, because the structure of protein space is just such a mystery: one that not only is challenging and profound but also can be unraveled. What’s more, it also holds the secret to protein innovability.
Evandro focused on enzymes because they are an extremely diverse group of proteins—no surprise, since they catalyze more than five thousand different chemical reactions. They are also especially well studied: Thousands of them scattered throughout the library have been mapped. Their locations are precisely known, and we can use computers to analyze them. Evandro asked his computer to choose a pair of proteins with the same fold, but in different places on the same genotype network.
50
He then explored a small neighborhood around the first protein, and listed all known proteins in it, together with their function. After that, he explored the neighborhood of the second protein, and listed all known proteins and their functions in
its
neighborhood. Finally, he compared these lists, asking simply whether they were different, whether proteins in the two neighborhoods had different functions. He then chose another protein pair, yet another pair, and so on, asking the same question for them, until he had explored hundreds of pairs and their neighborhoods.
The final answer was simple. The neighborhoods of two proteins contain mostly different functions, even if the two proteins are close together in the library. For instance, even proteins that differ in fewer than 20 percent of their amino acids have neighborhoods whose proteins differ in most of their functions. The protein library has neighborhoods that are highly diverse, just like the metabolic library. And just as with metabolism, this diversity makes vast genotype networks ideal for exploring the library, helping populations to discover texts with new meaning while preserving old and useful meaning.
Both metabolic and protein libraries are full of genotype networks composed of synonymous texts that reach far through a vast multidimensional hypercube, and both harbor unimaginably many diverse neighborhoods. They have much in common with each other, but little with human libraries. And that’s not surprising: They were here long before us.
At least three billion years before us. That’s when proteins took over most of life’s jobs from RNA. They did so for a good reason. Because they have many more building blocks—twenty different amino acids compared to the four nucleotides of RNA—nature could write more texts with proteins. In an alphabet of four letters, you can write about one million different ten-letter strings, whereas an alphabet of twenty letters allows more than ten trillion such strings—ten million times more. This vastly larger number of protein texts increases further with longer texts. More texts mean more shapes, more chemical reactions you can catalyze, more tasks you can perform.
51
But RNA
did
come before proteins, and for this reason alone it deserves an honorable place in the pantheon of biological innovation. Without innovations made by the first replicators, we would not be here. And our job would be incomplete without understanding their innovability.
Fortunately, there are many parallels between RNA and proteins that can help us understand RNA innovability. We can organize RNA texts into a hypercubic library—not quite as large as that of proteins, but still formidable—where similar texts are near each other and dissimilar texts are far apart. This library also exists in many dimensions, meaning that its neighborhoods are much larger than in three-dimensional space—near any one text are many others. The meaning of many RNA texts is also expressed in a language of shapes, because RNA chains are highly flexible, like proteins. They bend and twist in space, organizing themselves into elaborate folds, like proteins.
Unfortunately, the parallels end with the recalcitrance of RNA molecules to reveal their shape. Experiments have traced this shape only for a few hundred RNAs, a paltry number compared to the many thousands of proteins whose form and function we know. Therefore, what we can do for proteins—compare many naturally occurring molecules to map the library—is not yet possible for RNA.
52
Thanks to the Austrian scientist Peter Schuster and his associates, though, the RNA library is not a lost cause. One of the grandfathers of computational biology in Europe, Schuster is now a retired professor at the University of Vienna, where he taught since the 1970s. A first encounter with Schuster seems to confirm the stereotypes that many Europeans have of Austrians. A jovial man with a generous girth and a wry sense of humor, Schuster would not have been out of place in the traditional Viennese cafés of the last days of the Austro-Hungarian Empire, where formidably well-educated polymaths held forth on everything from psychoanalysis to quantum theory. He is a scientist in that tradition, a purebred intellectual conversant with a broad variety of subjects. Not taking himself too seriously, Schuster opines with a tongue-in-cheek attitude, peppering the gravest discourse with humorous asides. He epitomizes an oft-repeated saying about how Austrians view life and its many challenges: “The situation may be hopeless, but it is never serious.”
There’s a broad mind and an incisive intellect, however, beneath the surface of Schuster’s jovial demeanor. He was among the first to propose how an RNA world might have originated.
53
And his research group developed computer programs that predict an important aspect of an RNA text’s molecular meaning, its secondary structure phenotype.
54
RNA secondary structure is what emerges first when an RNA string folds. As the string twists and bends and curls, some of its nucleotides pair with one another and create short stretches of double helices in the molecule, much like DNA’s famous spiral staircase. The secondary structure is a pattern of multiple such helices connected by stretches of intervening single-stranded text, all formed by a single molecule. Like the sheets and helices of proteins, these helices are the flowers that self-organize into the final bouquet of a three-dimensional fold.
55
Not only was Schuster able to compute RNA’s secondary structures from their nucleotide sequences, but his group’s computer programs were also blazingly fast. They could predict hundreds of these molecular shapes within seconds. (To this day, we cannot do this for the more complex three-dimensional RNA fold.) With programs as fast as this, one can begin to map the RNA library. And even though we are still miles from understanding RNA’s complete fold and function, the secondary structure is very important on its own: If a mutation in the letter sequence of an RNA molecule disrupts its secondary structure, the molecule can no longer fold properly in three dimensions. Secondary structure is
essential
for the molecular meaning of RNA molecules, just as there can be no bouquet without flowers. And that’s a very good reason to study it.
Schuster’s researchers found a bewildering number of potential molecular meanings in the RNA library, all of them expressed as shapes. For example, RNA strings that are merely one hundred letters long can already form 10
23
different shapes. Many natural RNA molecules are much longer, and such longer texts can form many more shapes.
56
What is more, texts with the same shapes are organized much like in the protein library. They form connected networks that reach far through the library, allowing you to revise any one text in little steps, radically, while leaving its molecular meaning unchanged.
57
And just as in the protein library, different neighborhoods are more like medieval villages than cookie-cutter suburbs. Each neighborhood contains many different shapes, and any two neighborhoods do not share many of them.
58
All this hints that innovability in RNA follows the same rules as in proteins. And recent experiments show that this is indeed the case.
In an ingenious experiment performed in the year 2000, Erik Schultes and David Bartel from the Massachusetts Institute of Technology blazed a trail through the RNA library.
59
The experiment started from two short RNA texts with fewer than a hundred letters each. The texts are far apart in the library and differ in many letters, but they are not just any two strings. Both molecules are enzymes—
ribozymes,
because they are composed of RNA rather than protein. Each of them wiggles into a different three-dimensional shape and catalyzes a different reaction. The first molecule can cleave an RNA string into two pieces, while the second does the exact opposite, joining two RNA strings by fusing their ends with atomic bonds. Let’s call these enzymes the “splitter” and the “fuser.”
If you already had a splitter, and you needed to find a fuser somewhere in the library, would that be easy or hard? And what about the opposite, creating a splitter from a fuser? In other words, can you create a specific molecular innovation from either one of these molecules by exploring the library as evolution would? If you were ignorant about genotype networks, you would think that should be impossible, because the two molecules are far apart. And even if were possible, it might be exceedingly difficult, since a single misstep that creates a defective molecule spells death in evolution.
Undaunted, Schultes and Bartel started from one of the molecules and walked toward the other, modifying its letter sequence step by step while requiring that each such step preserve the molecule’s function, just as natural selection would demand. They used their chemical knowledge to predict viable steps through the library, manufactured each candidate mutant as an RNA string, and asked whether it could still catalyze the same reaction as its ancestor. If not, they tried a different step.
60
What they found may no longer surprise you. Starting from the fuser, they were able to change forty letters in small steps toward the splitter without changing the molecule’s ability to fuse two RNA strings. And starting from the splitter, they could also change about forty letters in small steps toward the fuser without changing
its
ability to split two RNA molecules. About halfway between the two molecules, something fascinating happened: Fewer than three further steps completely transformed the function of either molecule. They changed the fuser into a splitter and vice versa.
61
Like many good experiments, this one carries more than one powerful message. The first is that many RNA texts can express the molecular meaning of the starting fuser and splitter molecules. Second, trails connect these molecules in the library, and they allow you to find a new meaningful text, even if each step must preserve the old meaning. (Genotype networks make all this possible.) Third, while you walk along one of these trails, the innovation you are searching for will appear at some point in a small neighborhood near you.
The experiment used a single reader to explore the library, not the huge populations that do so in real evolutionary time. And what’s more, this reader did not take blind, random steps, but was guided by the biochemical knowledge of expert scientists—its steps were designed to stay on the genotype network. This left a lingering doubt in my mind. Could genotype networks also help real evolution—blindly evolving RNA populations—innovate? It would take another ten years to find the answer, which came from an evolution experiment in my Zürich laboratory.
Most people think of evolution as glacially slow, unfolding on time scales much longer than our brief life span. While that is true for human evolution, where a mere fifty generations span a thousand years, many other organisms have much shorter generation times, such as
E. coli,
which reproduces every twenty minutes. Fifty of its generations pass in less than a day. And an RNA molecule can replicate in mere seconds, using the kind of molecular copying machine that replicates DNA.
62
You could fit thousands of its generations in a single day.
Fast-replicating organisms and molecules allow ambitious experiments to reenact evolution in the laboratory. Such
laboratory evolution experiments
monitor how evolution transforms entire populations over many generations. RNA molecules are especially attractive for such experiments, and for the same reason that they were central to early life. They contain both a genotype that can replicate and mutate, and a selectable molecular phenotype, in a single, extremely compact, evolvable package.