Authors: William Poundstone
Tags: #Business & Economics, #Investments & Securities, #General, #Stocks, #Games, #Gambling, #History, #United States, #20th Century
I
T WILL BE PLAUSIBLE
to many that Shannon’s knowledge and vision gave him an edge in picking technology stocks. In the 1950s and 1960s, Shannon stood on the cusp of history. He foresaw the digital revolution and bet his money on it. The average Wall Street analyst, much less the average investor, could not have guessed the future so well as Shannon did.
It is unlikely that this would or should convince a diehard believer in market efficiency. Nearly all of Shannon’s gain came from three smart (lucky?) picks. Three data points do not have much statistical significance. Scientific proof demands repeatability.
Repeatability has been the nub of the broad reappraisal of the efficient market hypothesis (EMH) in the academic literature. Starting in the 1980s, computers and databases allowed finance scholars to winnow historical data for investor biases supposedly demonstrating market inefficiency. They found scores of biases impressive enough for a journal to publish an article about them.
Among the “irrational” effects discussed in the literature are the P/E effect (“value stocks” with low price-to-earnings ratios supposedly do better than others), the size effect (small companies have higher returns than large), the January effect (stock markets post higher returns in January), the Monday effect (poor returns on Monday), and even a weather effect (market returns correlate with sunny days).
Few of the reported biases could pass the repeatability test. Once an “effect” was reported, another study would come along, with more data or more realistic assumptions, showing that the original effect was less statistically significant than reported, or never existed at all, or had vanished since the first publication, possibly because people started trying to exploit it.
“I have personally tried to invest money, my client’s money and my own, in every single anomaly and predictive device that academics have dreamed up,” complained economist and portfolio manager Richard Roll in 1992. “
And I have yet to make a nickel on any of these supposed market inefficiencies
…If there’s nothing investors can exploit in a systematic way, time in and time out, then it’s very hard to say that information is not being properly incorporated into stock prices.”
Most efficient market economists concede that there are anecdotal cases of egregious market inefficiencies. They shrug them off. Those traders or hedge funds that seem to beat the market are just lucky and will eventually blow up like LTCM or Eifuku. No one truly achieves excess risk-adjusted return.
The other side of the debate has often done a meager job of answering this challenge. Many papers barely address how one
might
exploit the reported biases. How would you make money off the weather effect, for instance? If the effect is genuine, the weather forecast for Manhattan gives a small edge in predicting that day’s NYSE performance. Okay, you could buy stocks in sunny New York and sell them short in foggy London (if that’s what the forecasts call for). Unlike a good hedge, there is no logical necessity that stocks can’t drop in New York
and
rise in London, whatever the weather. You could lose out on both ends of the trade. This risk and the large transaction costs (the weather changes every day) make this scheme an unlikely candidate for excess risk-adjusted return.
There is little overlap between the “effects” reported in the literature and those in use by successful arbitrageurs. Most of the studies concern relatively simple stock-picking or market-timing systems, the stuff of investor fads. The few investors who successfully pursue fundamental analysis over extended periods are judges of people as well as P/E ratios. Warren Buffett’s excess return probably resides in what he reads between the lines of balance sheets. This is unlikely to be captured in any model crunching “official” figures from databases.
In a 1984 speech, Buffett asked his listeners to imagine that all 215 million Americans pair off and bet a dollar on the outcome of a coin toss. The one who calls the toss incorrectly is eliminated and pays his dollar to the one who was correct.
The next day, the winners pair off and play the same game with each other, each now betting $2. Losers are eliminated and that day’s winners end up with $4. The game continues with a new toss at doubled stakes each day. After twenty tosses, 215 people will be left in the game. Each will have over a million dollars.
According to Buffett, some of these people will write books on their methods:
How I Turned a Dollar into a Million in Twenty Days Working Thirty Seconds a Morning.
Some will badger ivory-tower economists who say it can’t be done: “If it can’t be done, why are there 215 of us?” “Then some business school professor will probably be rude enough to bring up the fact that if 215 million orangutans had engaged in a similar exercise, the result would be the same—215 egotistical orangutans with 20 straight winning flips.”
What sort of evidence
ought
to convince us that someone can pick stocks well enough to beat the market? Every year, the Morningstar ratings identify mutual fund managers who have done much better than the market or their peers. A few of these managers manage to stay near the top of the ratings for many years in a row. Their funds’ ads leave the distinct impression that these track records have predictive power going forward (ignoring the fine print). But as Buffett’s tale suggests, there must inevitably be a small group of very, very lucky managers who achieve very long and impressive track records.
It makes sense to measure track records in decisions rather than years. The more profitable decisions the better. It is also more convincing (less orangutan-like) when outside observers can understand at least some of the logic behind the stock picks. Stock-picking is often subjective. It is based on so many factors that it is hard for an investor, or anyone else, to understand what a fund manager is doing. You are unlikely to convince a skeptic that a manager’s return is not just luck when no one else can understand the logic of his stock picks.
O
NE OF THE BEST CASES
for beating the stock market involves a scheme called
statistical arbitrage
. To make money in the market, you have to buy low and sell high. Why not use a computer to tell you which stocks are low and which are high? In concept, that is statistical arbitrage. Fundamental analysts look at scores of factors, many of them numerical, in deciding which stocks to buy. If there is any validity to this process, then it ought to be possible to automate it.
Ed Thorp began pursuing this idea as early as 1979. It emerged as one of the discoveries of what became known as the “Indicators Project” at Princeton-Newport. Jerome Baesel, a former UC Irvine professor whom Thorp had talked into coming to Princeton-Newport full-time, was in charge of the research.
The fundamental analyst usually buys stock to hold for months, years, or decades. The longer you hold a stock, the harder it is to beat the market by much. Say you are convinced that a stock is selling for 80 percent of its “real” value, a nice discount. If the market comes around to your way of thinking in a year’s time, you will be able to sell the stock for a 25 percent profit (on top of any other return: the 25 percentage points are how much you “beat the market” by).
If instead the market takes twenty years to realize that it has undervalued the stock, this slow reappraisal adds only about 1.1 percent to your annual return over those twenty years. The long-term investor who intends to beat the market must find stocks that are seriously undervalued now
and
must have a crystal ball on the distant future. Both are formidable requirements.
Thorp and Baesel focused instead on the short term. They had the software pick out the stocks that had gone up or down the most, percentage-wise, in the previous two weeks, adjusted for dividends and stock splits. These were companies that had surprised the market with news, good or bad. They found that the
up
stocks had a strong tendency to fall back in the near term, while the
down
stocks tended to rise.
This is exactly the opposite of what “momentum investors” bet on happening. It accords well with the truism that the market over-reacts to good news, bad news—and sometimes to no news at all. Then the emotion fades and the pendulum swings back.
Thorp and Baesel experimented with portfolios in which they bought the “most down” stocks and sold short the “most up.” As long as they bought enough stocks, this provided a decent hedge against general market movements. They concluded they could make about a 20 percent annual return. Ironically, that was the stumbling block. Princeton-Newport was already making that and more with its other trades. (The years 1980–82 were an especially hot streak, with annual returns of 28, 29, and 30 percent after the 20 percent fees had been deducted.) The returns of the most up, most down portfolios were also more variable than Princeton-Newport’s other trades.
Brilliant as the concept was, Princeton-Newport had no use for it. The Indicators Project was quietly tabled.
In 1982 or 1983, Jerry Bamberger independently got almost the same idea. Bamberger worked for Morgan Stanley in New York. He came up with a most-up, most-down system that was apparently superior to the discarded one at Princeton-Newport, for its returns were steadier. Bamberger began trading with it for Morgan Stanley in 1983. The system worked, and Morgan Stanley expanded it massively under Bamberger’s boss, Nunzio Tartaglia. Tartaglia got much of the credit.
Feeling unappreciated, Bamberger quit his job. He then came across an ad offering to bankroll people who had promising low-risk trading strategies. The ad had been placed by Princeton-Newport Partners.
Bamberger met with Thorp in Newport Beach and explained his system there. Bamberger’s system reduced risk by dividing the stocks into industry groups. It had counterbalancing long and short positions in each industry group. Thorp concluded that it was a real improvement and agreed to fund Bamberger.
They began testing the system in Newport Beach. Bamberger was a chain-smoker. Thorp, a competitive runner who measured his pulse daily, had a policy of not hiring smokers. They compromised by letting Bamberger go outside for cigarettes. Bamberger was also forbidden to go into the computer room, whose gigabyte hard drives, each the size of a washing machine, were reputedly vulnerable to the tiniest airborne mote.
Thorp noticed that Bamberger brought in the same brown-bagged lunch day after day. “How often do you have a tuna salad sandwich for lunch?” he asked.
“Every day for the last six years,” Bamberger answered.
Bamberger’s trading system worked well in computer simulations. Thorp and Regan set up a new venture named BOSS Partners, for Bamberger plus Oakley Sutton Securities. Based in New York, BOSS began managing money for Princeton-Newport, $30 to $60 million. It earned 25 to 30 percent annualized in 1985. This return eroded over the next couple of years. By 1987 it was down to 15 percent, no longer competitive with Princeton-Newport’s other opportunities.
The problem was apparently competition. Tartaglia continued to expand Morgan Stanley’s statistical arbitrage operation. By 1988 Tartaglia’s team was buying and selling $900 million worth of stock. Bamberger would often be trying to buy the same temporarily bargain-priced stock as Morgan Stanley, driving up the price. This cut into the profit.
Bamberger, who had made a good deal of money, decided to retire. BOSS was closed down. Finally, according to stories, Morgan Stanley’s operation suffered a substantial loss. The bank closed down its statistical arbitrage business too.
Thorp continued to tinker with statistical arbitrage. He replaced Bamberger’s division by industry groups with a more flexible “factor analysis” system. The system analyzed stocks by how their price moves correlated with factors such as the market indexes, inflation, the price of gold, and so on. This better managed risks. Princeton-Newport managed to launch the improved system, called STAR (short for “statistical arbitrage”), the month after Giuliani’s raid on the Princeton offices. STAR made a return of 25 percent, or 20 percent after fees. Then the partnership dissolved and the idea was put aside for a third time.
After Princeton-Newport closed, Thorp took some time off. He was out of the business of investing other people’s money for about a year. Like a compulsive gambler, he could not stay away long. He discovered some irresistible opportunities in Japanese warrants. By late 1990, he was trading them.
One of Thorp’s former investors suggested that he start a new statistical arbitrage operation. Thorp decided to start a new hedge fund, Ridgeline Partners, for this purpose. “I had an interest list that had accumulated,” Thorp said, of people “looking to invest in anything I might be doing. So I just made phone calls and before the day was done, we were ‘full.’” Ridgeline Partners began business in August 1994.
Ridgeline’s capacity was capped at about $300 million. By expansive 1990s standards, that was only a midsize hedge fund. Thorp wanted to make sure he could keep oversight on his staff. He also wanted the fund small enough that its own actions did not adversely affect returns. As it was, Ridgeline traded about 4 million shares per trading day. It was routinely accounting for something like half of a percent of the NYSE volume.
The operation was highly automated. On a typical morning, when Thorp first logged onto his trading computers, it was three hours later in New York and something like a million shares had already been traded. Steve Mizusawa had joined the new venture. It was Mizusawa’s job to scan the Bloomberg news for any surprise announcements that could upset the trades. Because of their unpredictability, mergers, spin-offs, and reorganizations were bad for the scheme. At the announcement of such news, Mizusawa put the affected companies on a “restricted list” of stocks to avoid in new trades.
According to Thorp, each trade had about a half-percent edge. Half of that went to transaction costs. The remaining quarter-of-a-percent profit on each trade added up to handsome returns. Ridgeline did even better than Princeton-Newport did, averaging 18 percent per year after fees from 1994 to 2002.
As a demonstration that “fat tails” need not be fatal, in 1998, the year of the Russian default, Ridgeline Partners made a return of 47 percent after fees.
Ridgeline had much competition. Among the most successful operations are Ken Griffin’s Citadel Investment Group, James Simons’s Medallion Fund, and D. E. Shaw and Co. Each is larger than Ridgeline was, managing billions of dollars. The managers are more or less in the Thorp mold: Simons is a former SUNY Stony Brook mathematician, Shaw a Stanford-educated computer scientist, and Griffin a Harvard physics undergraduate who began trading in his dorm room. Frank Meyer, one of Princeton-Newport’s early investors, set up Griffin’s hedge fund.
Medallion Fund’s employees include astrophysicists, number theorists, computer scientists, and linguists. Job applicants are expected to give a talk on their scientific research. “The advantage scientists bring into the game,” explained Simons, “is less their mathematical or computational skills than their ability to think scientifically. They are less likely to accept an apparent winning strategy that might be a mere statistical fluke.”
Each statistical arbitrage operation competes against the others to scoop up the so-called free money created by market inefficiency. All successful operations revise their software constantly to keep pace with changing markets and the changing nature of their competition.
The inexplicable aspect of Thorp’s achievement was his continuing ability to discover new market inefficiencies, year after year, as old ones played out. This is a talent, like discovering new theorems or jazz improvisations. Statistical arbitrage is nonetheless a few degrees easier to understand than the intuitive trading of more conventional portfolio managers. It is an algorithm, the trades churned out by lines of computer code. The success of statistical arbitrage operations makes a case that there are persistent classes of market inefficiencies and that Kelly-criterion-guided money management can use them to achieve higher-than-market return without ruinous risk. For that reason, funds like Ridgeline, Medallion, and Citadel probably pose a clearer challenge to efficient market theorists than even Berkshire Hathaway.
In May 1998 Thorp reported that his investments had grown at an average 20 percent annual return (with 6 percent standard deviation) over 28.5 years. “To help persuade you that this may not be luck,” Thorp wrote, “I estimate that…I have made $80 billion worth of purchases and sales (‘action,’ in casino language) for my investors. This breaks down into something like one and a quarter million individual ‘bets’ averaging about $65,000 each, with on average hundreds of ‘positions’ in place at any one time. Over all, it would seem to be a moderately ‘long run’ with a high probability that the excess performance is more than chance.”