Read Rise of the Robots: Technology and the Threat of a Jobless Future Online
Authors: Martin Ford
One of the company’s initial projects involved retrieving the information necessary to update a collection of about 40,000 records. Previously, the corporate client had performed this process annually using an in-house staff at a cost of nearly $4 per record. After switching to the WorkFusion platform, the client was able to update the records monthly at a cost of just 20 cents each. WorkFusion has found that, as the system’s machine learning algorithms incrementally automate the process further, costs typically drop by about 50 percent after one year and still another 25 percent after a second year of operation.
13
Cognitive Computing and IBM Watson
In the fall of 2004, IBM executive Charles Lickel had dinner with a small team of researchers at a steakhouse near Poughkeepsie, New York. Members of the group were taken aback when, at precisely seven o’clock, people suddenly began standing up from their tables and crowding around a television in the bar area. It turned out that Ken Jennings, who had already won more than fifty straight matches on the TV game show
Jeopardy!,
was once again attempting to extend his historic winning streak. Lickel noticed that the restaurant’s patrons were so engaged that they abandoned their dinners, returning to finish their steaks only after the match concluded.
14
That incident, at least according to many recollections, marked the genesis of the idea to build a computer capable of playing—and beating the very best human champions at—
Jeopardy!
*
IBM had a long history of investing in high-profile projects called “grand challenges”
that have showcased the company’s technology while delivering the kind of organic marketing buzz that just can’t be purchased at any price. In a previous grand challenge, more than seven years earlier, IBM’s Deep Blue computer had defeated world chess champion Garry Kasparov in a six-game match—an event that forever anchored the IBM brand to the historic moment when a machine first achieved dominance in the game of chess. IBM executives wanted a new grand challenge that would captivate the public and position the company as a clear technology leader—and, in particular, combat any perception that the information technology innovation baton had passed from Big Blue to Google or to start-up companies emerging out of Silicon Valley.
As the idea for a
Jeopardy!
-based grand challenge that would culminate in a televised match between the best human competitors and an IBM computer began to gain traction with the company’s top managers, the computer scientists who would have to actually build such a system initially pushed back aggressively. A
Jeopardy!
computer would require capabilities far beyond anything that had been demonstrated previously. Many researchers feared that the company risked failure or, even worse, embarrassment on national television.
Indeed, there was little reason to believe that Deep Blue’s triumph at chess would be extensible to
Jeopardy!
Chess is a game with precise rules that operate within a strictly limited domain; it is almost ideally suited to a computational approach. To a significant extent, IBM succeeded simply by throwing powerful, customized hardware at the problem. Deep Blue was a refrigerator-sized system packed with processors that were designed specifically for playing chess. “Brute force” algorithms leveraged all that computing power by considering every conceivable move given the current state of the game. Then for each of those possibilities, the software looked many moves ahead, weighing potential actions by both players and iterating through countless permutations—a laborious process that ultimately nearly always produced the optimal course of action. Deep Blue was
fundamentally an exercise in pure mathematical calculation; all the information the computer needed to play the game was provided in a machine-friendly format it could process directly. There was no requirement for the machine to engage with its environment like a human chess player.
Jeopardy!
presented a dramatically different scenario. Unlike chess, it is essentially open-ended. Nearly any subject that would be accessible to an educated person—science, history, film, literature, geography, and popular culture, to name just a few—is fair game. A computer would also face an entire range of daunting technical challenges. Foremost among these was the need to comprehend natural language: the computer would have to receive information and provide its responses in the same format as its human competitors. The hurdle for succeeding at
Jeopardy!
is especially high because the show has to be not just a fair contest but also an engaging form of entertainment for its millions of television viewers. The show’s writers often intentionally weave humor, irony, and subtle plays on words into the clues—in other words, the kind of inputs that seem almost purposely designed to elicit ridiculous responses from a computer.
As an IBM document describing the Watson technology points out: “We have noses that run, and feet that smell. How can a slim chance and a fat chance be the same, but a wise man and a wise guy are opposites? How can a house burn up as it burns down? Why do we fill in a form by filling it out?”
15
A
Jeopardy!
computer would have to successfully navigate routine language ambiguities of that type while also exhibiting a level of general understanding far beyond what you’d typically find in computer algorithms designed to delve into mountains of text and retrieve relevant answers. As an example, consider the clue “Sink it & you’ve scratched.” That clue was presented in a show televised in July 2000 and appeared on the top row of the game board—meaning that it was considered to be very easy. Try searching for that phrase using Google, and you’ll get page after page of links to web pages about removing scratches
from stainless-steel kitchen sinks. (That’s assuming you exclude the exact match on a website about past
Jeopardy!
matches.) The correct response—“What is the cue ball?”—completely eludes Google’s keyword-based search algorithm.
*
All these challenges were well understood by David Ferrucci, the artificial intelligence expert who eventually assumed leadership of the team that built Watson. Ferrucci had previously managed a small group of IBM researchers focused on building a system that could answer questions provided in natural language format. The team entered their system, which they named “Piquant,” in a contest run by the National Bureau of Standards and Technology—the same government agency that sponsored the machine language contest in which Google prevailed. In the contest, the competing systems had to churn through a defined set of about a million documents and come up with the answers to questions, and they were subject to no time limit at all. In some cases, the algorithms would grind away for several minutes before returning an answer.
16
This was a dramatically easier challenge than playing
Jeopardy!,
where the clues could draw on a seemingly limitless body of knowledge and where the machine would have to generate consistently correct responses within a few seconds in order to have any chance against top human players.
Piquant (as well as its competitors) was not only slow; it was inaccurate. The system was able to answer questions correctly only about 35 percent of the time—not an appreciably better success rate than you could get by simply typing the question into Google’s search engine.
17
When Ferrucci’s team tried to build a prototype
Jeopardy!
-playing system based on the Piquant project, the results were uniformly dismal. The idea that Piquant might someday take on a top
Jeopardy!
competitor like Ken Jennings seemed laughable. Ferrucci
recognized that he would have to start from scratch—and that the project would be a major undertaking spanning as much as half a decade. He received the green light from IBM management in 2007 and set out to build, in his words, “the most sophisticated intelligence architecture the world has ever seen.”
18
To do this, he drew on resources from throughout the company and put together a team consisting of artificial intelligence experts from within IBM as well as at top universities, including MIT and Carnegie Mellon.
19
Ferrucci’s team, which eventually grew to include about twenty researchers, began by building a massive collection of reference information that would form the basis for Watson’s responses. This amounted to about 200 million pages of information, including dictionaries and reference books, works of literature, newspaper archives, web pages, and nearly the entire content of Wikipedia. Next they collected historical data for the
Jeopardy!
quiz show. Over 180,000 clues from previously televised matches became fodder for Watson’s machine learning algorithms, while performance metrics from the best human competitors were used to refine the computer’s betting strategy.
20
Watson’s development required thousands of separate algorithms, each geared toward a specific task—such as searching within text; comparing dates, times, and locations; analyzing the grammar in clues; and translating raw information into properly formatted candidate responses.
Watson begins by pulling apart the clue, analyzing the words, and attempting to understand what exactly it should look for. This seemingly simple step can, in itself, be a tremendous challenge for a computer. Consider, for example, a clue that appeared in a category entitled “Lincoln Blogs” and was used in training Watson: “Secretary Chase just submitted this to me for the third time; guess what, pal. This time I’m accepting it.” In order to have any chance at responding correctly, the machine would first need to understand that the initial instance of the word “this” acts as a placeholder for the answer it should seek.
21
Once it has a basic understanding of the clue, Watson simultaneously launches hundreds of algorithms, each of which takes a different approach as it attempts to extract a possible answer from the massive corpus of reference material stored in the computer’s memory. In the example above, Watson would know from the category that “Lincoln” is important, but the word “blogs” would likely be a distraction: unlike a human, the machine wouldn’t comprehend that the show’s writers were imagining Abraham Lincoln as a blogger.
As the competing search algorithms reel in hundreds of possible answers, Watson begins to rank and compare them. One technique used by the machine is to plug the potential answer into the original clue so that it forms a statement, and then go back out to the reference material and look for corroborating text. So if one of the search algorithms manages to come up with the correct response “resignation,” Watson might then search its dataset for a statement something like “Secretary Chase just submitted resignation to Lincoln for the third time.” It would find plenty of close matches, and the computer’s confidence in that particular answer would rise. In ranking its candidate responses, Watson also relies on reams of historical data; it knows precisely which algorithms have the best track records for various types of questions, and it listens far more attentively to the top performers. Watson’s ability to rank correctly worded natural language answers and then determine whether or not it has sufficient confidence to press the
Jeopardy!
buzzer is one of the system’s defining characteristics, and a quality that places it on the frontier of artificial intelligence. IBM’s machine “knows what it knows”—something that comes easily to humans but eludes nearly all computers when they delve into masses of unstructured information intended for people rather than machines.
Watson prevailed over
Jeopardy!
champions Ken Jennings and Brad Rutter in two matches televised in February 2011, giving IBM the massive publicity surge it hoped for. Well before the media frenzy surrounding that remarkable accomplishment began to fade, a far
more consequential story began to unfold: IBM launched its campaign to leverage Watson’s capabilities in the real world. One of the most promising areas is in medicine. Repurposed as a diagnostic tool, Watson offers the ability to extract precise answers from a staggering amount of medical information that might include textbooks, scientific journals, clinical studies, and even physicians’ and nurses’ notes for individual patients. No single doctor could possibly approach Watson’s ability to delve into vast collections of data and discover relationships that might not be obvious—especially if the information is drawn from sources that cross boundaries between medical specialties.
*
By 2013, Watson was helping to diagnose problems and refine patient treatment plans at major medical facilities, including the Cleveland Clinic and the University of Texas’s MD Anderson Cancer Center.
As a part of their effort to turn Watson into a practical tool, IBM researchers confronted one of the primary tenets of the big data revolution: the idea that prediction based on correlation is sufficient, and that a deep understanding of causation is usually both unachievable and unnecessary. A new feature they named “WatsonPaths” goes beyond simply providing an answer and lets researchers see the specific sources Watson consulted, the logic it used in its evaluation, and the inferences it made on its way to generating an answer. In other words, Watson is gradually progressing toward offering more insight into
why
something is true. WatsonPaths is also being used as a tool to help train medial students in diagnostic techniques. Less
than three years after a team of humans succeeded in building and training Watson, the tables have—at least to a limited extent—been turned, and people are now learning from the way the system reasons when presented with a complex problem.
22