Read How Music Got Free Online
Authors: Stephen Witt
The institute was a division of the Fraunhofer Society, a massive state-run research organization with dozens of campuses across the country—Germany’s answer to Bell Labs. Fraunhofer allocated taxpayer money toward promising research across a wide variety of academic disciplines, and, as the research matured, brokered commercial relationships with large consumer industrial firms. For a stake in the future revenues of Brandenburg’s ideas, Fraunhofer offered state-of-the-art supercomputers, high-end acoustic equipment, professional intellectual property expertise, and skilled engineering manpower.
The last was critical. Brandenburg’s method was complex, and required several computationally demanding mathematical operations to be conducted simultaneously. 1980s computing technology was barely up to the task, and algorithmic efficiency was key. Brandenburg needed a virtuoso, a caffeine-addled superstar who could translate graduate-level mathematical concepts into flawless computer code. At Fraunhofer he found his man: a 26-year-old computer programmer by the name of Bernhard Grill.
Grill was shorter than Brandenburg and his manner was far more calm. His face was broad and friendly and he wore his sandy hair a little long. He spoke more loudly than Brandenburg, with more passion, and conversations with him were composed and natural. He told jokes, too, jokes that were—well, not all that funny either, but certainly better than Brandenburg’s.
In the world of audio, Grill stood out, for it was possible to imagine him as something other than an engineer. Like Brandenburg, he
was Bavarian, but his attitude was more bohemian. He had a relaxed, wonkish nature to him, and was the sort of person who, had he lived in America, might have favored sandals and a Hawaiian shirt. Perhaps it was his background. While Brandenburg’s father was himself a professor, and most of the other Fraunhofer researchers hailed from the upper middle class, Grill’s father had worked in a factory. For Brandenburg, a university education had been a given, practically a birthright, but for Grill it had real meaning.
In his own way he had rebelled against the
typisch Deutsch
mentality. His original passion had been music. At a young age Grill had taken up the trumpet, and by his teens he was practicing six hours a day. During a brief period in his early 20s he had played professionally in a nine-piece swing band. When the economic realities of that career choice became apparent, he’d returned to engineering, and ended up studying computers. But music remained close to his heart, and over the years he amassed an enormous, eclectic collection of recorded music from a variety of obscure genres. His other hobby was building loudspeakers.
Brandenburg and Grill were joined by four other Fraunhofer researchers. Heinz Gerhäuser oversaw the institute’s audio research group; Harald Popp was a hardware specialist; Ernst Eberlein was a signal processing expert; Jürgen Herre was another graduate student whose mathematical prowess rivaled Brandenburg’s own. In later years this group would refer to themselves as “the original six.”
Beginning in 1987, they took on the full-time task of creating commercial products based on Brandenburg’s patent. The group saw two potential avenues for development. First, Brandenburg’s compression algorithm could be used to “stream” music—that is, send it directly to the user from a central server, as Seitzer had envisioned. Alternatively, Brandenburg’s compression algorithm could be used to “store” music—that is, create replayable music files that the user would keep on a personal computer. Either way, size mattered, and getting the compression ratio to 12 to 1 was the key.
It was slow going. Computing was still emerging from its homebrew origins, and the team built most of its equipment by hand. The lab was a sea of cables, speakers, signal processors, CD players, woofers, and converters. Brandenburg’s algorithm had to be coded directly onto programmable chips, a process that could take days. Once a chip was created, the team would use it to compress a ten-second sample from a compact disc, then compare it with the original to see if they could hear the difference. When they could—which, in the early days, was almost always—they refined the algorithm and tried again.
They started at the top, with the piccolo, then worked down the scale. Grill, who had obsessed over acoustics since childhood, could see at once that the compression technology was far from being marketable. Brandenburg’s algorithm generated a variety of unpredictable errors, and at times it was all Grill could do to take inventory. Sometimes, the encoding was “muddy,” as if the music were being played underwater. Sometimes it “hissed,” like static from an AM radio. Sometimes there was “double-speak,” as if the same recording had been overlaid twice. Worst of all was “pre-echo,” a peculiar phenomenon where ghostly remnants of musical phrases popped up several milliseconds early.
Brandenburg’s math was elegant, even beautiful, but it couldn’t fully account for the messy reality of perception. To truly model human hearing, they needed human test subjects. And these subjects required training to understand the vocabulary of failure as well as Grill did. And once this expertise was established, it would have to be submitted to thousands upon thousands of controlled, randomized, double-blind trials.
Grill approached this time-consuming endeavor with enthusiasm. He was what they called a “golden ear”: he could distinguish between microtones and pick up on frequencies normally available only to children and dogs. He approached the sense of hearing the way a perfumer approached the sense of smell, and this sharpened
sense allowed him to name and grade certain sensory phenomena—certain aspects of reality, really—that others could never know.
Charged with selecting the reference material, Grill combed his massive compact disc archive for every conceivable form of music: funk, jazz, rock, R&B, metal, classical—every genre except rap, which he disliked. He wanted to throw everything he could find at Brandenburg’s algorithm, to be sure it could handle every conceivable case. Funded by Fraunhofer’s generous research budget, Grill went beyond music to become a collector of exotic noise. He found recordings of fast talkers with difficult accents. He found recordings of birdcalls and crowd noise. He found recordings of clacking castanets and mistuned harpsichords. His personal favorite came from a visit to Boeing headquarters in Seattle, where, in the gift shop, he found a collection of audio samples from roaring jet engines.
Under Grill’s direction, Fraunhofer also purchased several pairs of thousand-dollar Stax headphones. Made in Japan, these “electrostatic earspeakers” were the size of bricks and required their own dedicated amplifiers. They were impractical and expensive, but Grill considered the Stax to be the finest piece of equipment in the history of audio. They revealed every imperfection with grating clarity, and the ability to isolate these digital glitches spurred a cycle of continuous improvement.
Like a shrinking ray,
the compression algorithm could target different output sizes. At half size, the files sounded decent. At quarter size, they sounded OK. In March 1988, Brandenburg isolated a recording of a piano solo, then dialed the encoding ratio as low as he dared—all the way down to Seitzer’s crazy stretch goal of one-twelfth CD size. The resulting encoding was lousy with errors. Brandenburg would later say the pianist sounded “drunk.” But even so, this experiment in uneasy listening gave him confidence, and he began to see for the first time how Seitzer’s vision might be achieved.
Increases in processing power spurred progress. Within a year
Brandenburg’s algorithm was handling a wide variety of recorded music. The team hit a milestone with the 1812 Overture, then another with Tracy Chapman, then another with a track by Gloria Estefan (Grill was on a Latin kick). In late 1988, the team made its first sale, and shipped a hand-built decoder to the first ever end user of mp3 technology: a tiny radio station run by missionaries on the remote Micronesian island of Saipan.
But one audio source was proving intractable: what Grill, with his imperfect command of English, called “the lonely voice.” (He meant “lone.”) Human speech could not, in isolation, be psychoacoustically masked. Nor could you use Huffman’s pattern recognition approach—the essence of speech was its dynamic nature, its plosives and sibilants and glottal stops. Brandenburg’s shrinking algorithm could handle symphonies, guitar solos, cannons, even “Oye Mi Canto,” but it still couldn’t handle a newscast.
Stuck, Brandenburg isolated samples of “lonely” voices. The first was a recording of a difficult German dialect that had plagued audio engineers for years. The second was a snippet of Suzanne Vega singing the opening bars of “Tom’s Diner,” her 1987 radio hit. Perhaps you remember the a cappella intro to “Tom’s Diner.” It goes like this:
Dut dut duh dut
Dut dut duh dut
Dut dut duh dut
Dut dut duh dut
Vega had a beautiful voice, but on the early stereo encodings it sounded as if there were rats scratching at the tape.
In 1989, Brandenburg defended his thesis and was awarded his PhD. He then took the voice samples with him on a fellowship to AT&T’s Bell Labs in Murray Hill, New Jersey. There, he worked with James Johnston, a specialist in voice encoding.
Johnston was the Newton to Brandenburg’s Leibniz—independently, he had hit upon an
identical mathematical approach to psychoacoustic modeling, at almost exactly the same time. After an initial period spent marking territory, the two decided to cooperate. Throughout 1989, listening tests continued in parallel in Erlangen and Murray Hill, but the American test subjects proved less patient than the Germans. After listening to the same rat-eaten, four-second sample of “Tom’s Diner” several hundred times, the volunteers at Bell Labs revolted, and Brandenburg was forced to finish the experiment on his own. He was there in New Jersey, listening to Suzanne Vega, when the Berlin Wall came down.
Johnston was impressed by Brandenburg. He’d spent his life around academic researchers and was accustomed to brilliance, but he’d never seen anybody work so hard. Their collaboration spurred several breakthroughs, and soon the scratching rats were banished. In early 1990, Brandenburg returned to Germany with a nearly finished product in hand. Many compressed samples now revealed a state of perfect “transparency”: even to a discriminating listener like Grill, using the best equipment, they were indistinguishable from the original compact discs.
Impressed, AT&T officially graced the technology with its imprimatur and a modicum of corporate funding. Thomson, a French consumer electronics concern, also began to provide money and technical support. Both firms were seeking an edge in psychoacoustics, as this long-ignored academic discipline was suddenly white hot. Research teams from Europe, Japan, and the United States had been working on the same problem, and other large corporations were jockeying for position. Many had thrown their weight behind Fraunhofer’s better-established competitors. Seeking to mediate, the Moving Picture Experts Group
(MPEG)—the standards committee that even today decides which technology makes it to the consumer marketplace—convened a contest in Stockholm in June 1990 to conduct formalized listening tests for the competing methods.
As the ’90s opened, MPEG was preparing for a decade of disruption, shaping technological standards for near-future technologies
like high-definition television and the digital video disc. Being moving picture experts, the committee had first focused exclusively on video quality. Audio encoding problems were an afterthought, one they’d tackled only after Brandenburg pointed out that there was no longer much of a market for silent movies. (This was the sort of joke that Brandenburg liked to make.)
An MPEG endorsement might mean a fortune in licensing fees, but Brandenburg knew it would be tough to get.
The Stockholm contest was to be graded against ten audio benchmarks: an Ornette Coleman solo, the Tracy Chapman song “Fast Car,” a trumpet solo, a glockenspiel, a recording of fireworks, two separate bass solos, a ten-second castanet sample, a snippet of a newscast, and a recording of Suzanne Vega performing “Tom’s Diner.” (The last was suggested by Fraunhofer.) The judges were neutral participants, selected from a group of Swedish graduate students. And, as MPEG needed undamaged ears that could still hear high-pitched frequencies, the evaluators skewed young.
Fourteen different groups submitted entries to the MPEG trials—the high-stakes version of a middle school science fair. On the eve of the contest, the competing groups conducted informal demonstrations. Brandenburg was confident his group would win. He felt that access to Zwicker’s seminal research, still untranslated from German, gave him an insurmountable edge.
The next day a room full of fair-haired, clear-eared Scandinavian virgins spent the morning listening to “Fast Car” ripped 14 different ways. The listeners scored the results for sound quality on a five-point scale. After tabulating the answers, MPEG announced the results—it was a tie! At the top was Fraunhofer, locked in a statistical dead heat with a rival group called MUSICAM. No one else was close.
Fraunhofer’s strong showing in the contest was unexpected. They were a dark horse candidate from a research institution, a bunch of graduate students competing against established corporate players.
MUSICAM was more representative of the typical MPEG contest winner—a well-funded consortium of inventors from four different European universities, with deep ties to the Dutch corporation Philips, which held the patents on the compact disc. MUSICAM also had several German researchers on staff, and Brandenburg suspected this was not a coincidence. They’d had access to Zwicker’s untranslated research, too.
MPEG had not anticipated a tie, and had not made provisions to break one. Fraunhofer’s approach provided better audio quality with less data, but MUSICAM’s required less processing power. Brandenburg felt this disparity worked in his favor, as computer processing speed improved with each new chip cycle, and doubled every 24 months or so. Improving bandwidth was more difficult, as it required digging up city streets and replacing thousands of miles of cable. Thus, Brandenburg felt, MPEG should look to conserve bandwidth rather than processing cycles, and he repeatedly made this argument to the audio committee. But he felt he was being ignored.