Read Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy Online

Authors: Cathy O'Neil

Tags: #Business & Economics, #General, #Social Science, #Statistics, #Privacy & Surveillance, #Public Policy, #Political Science

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (29 page)

BOOK: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
6.53Mb size Format: txt, pdf, ePub
ads

Yet from society’s perspective, a simple hunt for government services puts a big target on the back of poor people, leading a certain number of them toward false promises and high-interest loans. Even considered strictly from an economic point of view,
it’s a drain on the system. The fact that people need food stamps in the first place represents a failing of the market economy. The government, using tax dollars, attempts to compensate for it, with the hope that food stamp recipients will eventually be able to fully support themselves. But the lead aggregators push them toward needless transactions, leaving a good number of them with larger deficits, and even more dependent on public assistance. The WMD, while producing revenue for search engines, lead aggregators, and marketers, is a leech on the economy as a whole.

A regulatory system for WMDs would have to measure such hidden costs, while also incorporating a host of non-numerical values. This is already the case for other types of regulation. Though economists may attempt to calculate costs for smog or agricultural runoff, or the extinction of the spotted owl, numbers can never express their value. And the same is often true of fairness and the common good in mathematical models. They’re concepts that reside only in the human mind, and they resist quantification. And since humans are in charge of making the models, they rarely go the extra mile or two to even try. It’s just considered too difficult. But we need to impose human values on these systems, even at the cost of efficiency. For example, a model might be programmed to make sure that various ethnicities or income levels are represented within groups of voters or consumers. Or it could highlight cases in which people in certain zip codes pay twice the average for certain services. These approximations may be crude, especially at first, but they’re essential. Mathematical models should be our tools, not our masters.

The achievement gap, mass incarceration, and voter apathy are big, nationwide problems that no free market nor mathematical algorithm will fix. So the first step is to get a grip on our techno-utopia, that unbounded and unwarranted hope in what
algorithms and technology can accomplish. Before asking them to do better, we have to admit they can’t do everything.

To disarm WMDs, we also need to measure their impact and conduct algorithmic audits. The first step, before digging into the software code, is to carry out research. We’d begin by treating the WMD as a black box that takes in data and spits out conclusions. This person has a medium risk of committing another crime, this one has a 73 percent chance of voting Republican, this teacher ranks in the lowest decile. By studying these outputs, we could piece together the assumptions behind the model and score them for fairness.

Sometimes, it is all too clear from the get-go that certain WMDs are only primitive tools, which hammer complexity into simplicity, making it easier for managers to fire groups of people or to offer discounts to others. The value-added model used in New York public schools, for example, the one that rated Tim Clifford a disastrous 6 one year and then a high-flying 96 a year later, is a statistical farce.
If you plot year-to-year scores on a chart, the dots are nearly as randomly placed as hydrogen atoms in a room. Many of the math students in those very schools could study those statistics for fifteen minutes and conclude, with confidence, that the scores measure nothing. Good teachers, after all, tend to be good one year after the next. Unlike, say, relief pitchers in baseball, they rarely have great seasons followed by disasters. (And also unlike relief pitchers, their performance resists quantitative analysis.)

There’s no fixing a backward model like the value-added model. The only solution in such a case is to ditch the unfair system. Forget, at least for the next decade or two, about building tools to measure the effectiveness of a teacher. It’s too complex to model, and the only available data are crude proxies. The model is simply not good enough yet to inform important decisions about the people we trust to teach our children. That’s a job that requires
subtlety and context. Even in the age of Big Data, it remains a problem for humans to solve.

Of course, the human analysts, whether the principal or administrators, should consider lots of data, including the students’ test scores. They should incorporate positive feedback loops. These are the angelic cousins of the pernicious feedback loops we’ve come to know so well. A positive loop simply provides information to the data scientist (or to the automatic system) so that the model can be improved. In this case, it’s simply a matter of asking teachers and students alike if the evaluations make sense for them, if they understand and accept the premises behind them. If not, how could they be enhanced? Only when we have an ecosystem with positive feedback loops can we expect to improve teaching using data. Until then it’s just punitive.

It is true, as data boosters are quick to point out, that the human brain runs internal models of its own, and they’re often tinged with prejudice or self-interest. So its outputs—in this case, teacher evaluations—must also be audited for fairness. And these audits have to be carefully designed and tested by human beings, and afterward automated. In the meantime, mathematicians can get to work on devising models to help teachers measure their own effectiveness and improve.

Other audits are far more complicated. Take the criminal recidivism models that judges in many states consult before sentencing prisoners. In these cases, since the technology is fairly new, we have a before and an after. Have judges’ sentencing patterns changed since they started receiving risk analysis from the WMD? We’ll see, no doubt, that a number of the judges ran similarly troubling models in their heads long before the software arrived, punishing poor prisoners and minorities more severely than others. In some of those cases, conceivably, the software might temper their judgments. In others, not. But with enough data,
patterns will become clear, allowing us to evaluate the strength and the tilt of the WMD.

If we find (as studies have already shown) that the recidivism models codify prejudice and penalize the poor, then it’s time to take a look at the inputs. In this case, they include loads of birds-of-a-feather connections. They predict an individual’s behavior on the basis of the people he knows, his job, and his credit rating—details that would be inadmissible in court. The fairness fix is to throw out that data.

But wait, many would say. Are we going to sacrifice the accuracy of the model for fairness? Do we have to dumb down our algorithms?

In some cases, yes. If we’re going to be equal before the law, or be treated equally as voters, we cannot stand for systems that drop us into different castes and treat us differently.
*
1

Movements toward auditing algorithms are already afoot. At Princeton, for example, researchers have launched the
Web Transparency and Accountability Project. They create software robots that masquerade online as people of all stripes—rich, poor, male, female, or suffering from mental health issues. By study
ing the treatment these robots receive, the academics can detect biases in automated systems from search engines to job placement sites. Similar initiatives are taking root at universities like Carnegie Mellon and MIT.

Academic support for these initiatives is crucial. After all, to police the WMDs we need people with the skills to build them. Their research tools can replicate the immense scale of the WMDs and retrieve data sets large enough to reveal the imbalances and injustice embedded in the models. They can also build crowdsourcing campaigns, so that people across society can provide details on the messaging they’re receiving from advertisers or politicians. This could illuminate the practices and strategies of microtargeting campaigns.

Not all of them would turn out to be nefarious. Following the 2012 presidential election, for example, ProPublica built what it called a
Message Machine, which used crowdsourcing to reverse-engineer the model for the Obama campaign’s targeted political ads. Different groups, as it turned out, heard glowing remarks about the president from different celebrities, each one presumably targeted for a specific audience. This was no smoking gun. But by providing information and eliminating the mystery behind the model, the Message Machine reduced (if only by a tad) grounds for dark rumors and suspicion. That’s a good thing.

If you consider mathematical models as the engines of the digital economy—and in many ways they are—these auditors are opening the hoods, showing us how they work. This is a vital step, so that we can equip these powerful engines with steering wheels—and brakes.

Auditors face resistance, however, often from the web giants, which are the closest thing we have to information utilities. Google, for example, has prohibited researchers from creating scores of
fake profiles in order to map the biases of the search engine.
*
2

Facebook, too. The social network’s rigorous policy to tie users to their real names severely limits the research outsiders can carry out there. The real-name policy is admirable in many ways, not least because it pushes users to be accountable for the messages they post. But Facebook also must be accountable to all of us—which means opening its platform to more data auditors.

The government, of course, has a powerful regulatory role to play, just as it did when confronted with the excesses and tragedies of the first industrial revolution. It can start by adapting and then enforcing the laws that are already on the books.

As we discussed in the chapter on credit scores, the civil rights laws referred to as the
Fair Credit Reporting Act (FCRA) and the
Equal Credit Opportunity Act (ECOA) were meant to ensure fairness in credit scoring. The FCRA guarantees that a consumer can see the data going into their score and correct any errors, and the ECOA prohibits linking race or gender to a person’s score.

These regulations are not perfect, and they desperately need updating. Consumer complaints are often ignored, and there’s nothing explicitly keeping credit-scoring companies from using zip codes as proxies for race. Still, they offer a good starting point. First, we need to demand transparency. Each of us should have the right to receive an alert when a credit score is being used to judge or vet us. And each of us should have access to the information being used to compute that score. If it is incorrect, we should have the right to challenge and correct it.

Next, the regulations should expand to cover new types of credit companies, like Lending Club, which use newfangled e-scores to predict the risk that we’ll default on loans. They should not be allowed to operate in the shadows.

The
Americans with Disabilities Act (ADA), which protects people with medical issues from being discriminated against at work, also needs an update. The bill currently prohibits medical exams as part of an employment screening. But we need to update it to take into account Big Data personality tests, health scores, and reputation scores. They all sneak around the law, and they shouldn’t be able to. One possibility already under discussion would extend protection of the ADA to include “predicted” health outcomes down the road. In other words, if a genome analysis shows that a person has a high risk for breast cancer, or for Alzheimer’s, that person should not be denied job opportunities.

We must also expand the
Health Insurance Portability and Accountability Act (HIPAA), which protects our medical information, in order to cover the medical data currently being collected by employers, health apps, and other Big Data companies. Any health-related data collected by brokers, such as Google searches for medical treatments, must also be protected.

If we want to bring out the big guns, we might consider moving toward
the European model, which stipulates that any data
collected must be approved by the user, as an opt-in. It also prohibits the reuse of data for other purposes. The opt-in condition is all too often bypassed by having a user click on an inscrutable legal box. But the “not reusable” clause is very strong: it makes it
illegal to sell user data. This keeps it from the data brokers whose dossiers feed toxic e-scores and microtargeting campaigns. Thanks to this “not reusable” clause, the data brokers in Europe are much more restricted, assuming they follow the law.

Finally, models that have a significant impact on our lives, including credit scores and e-scores, should be open and available to the public. Ideally, we could navigate them at the level of an app on our phones. In a tight month, for example, a consumer could use such an app to compare the impact of unpaid phone and electricity bills on her credit score and see how much a lower score would affect her plans to buy a car. The technology already exists. It’s only the will we’re lacking.

On a summer day in 2013, I took the subway to the southern tip of Manhattan and walked to a large administrative building across from New York’s City Hall. I was interested in building mathematical models to help society—the opposite of WMDs. So I’d signed on as an unpaid intern in a data analysis group within the city’s Housing and Human Services Departments.
The number of homeless people in the city had grown to sixty-four thousand, including twenty-two thousand children. My job was to help create a model that would predict how long a homeless family would stay in the shelter system and to pair each family with the appropriate services. The idea was to give people what they needed to take care of themselves and their families and to find a permanent home.

BOOK: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
6.53Mb size Format: txt, pdf, ePub
ads

Other books

Evil in Return by Elena Forbes
Strawberry Wine by Phillips, Kristy
Peter Pan by J. M. Barrie, Jack Zipes
Deeper (The Real Fling) by Bellatas, Lyla
Days Like This by Breton, Laurie
Be Careful What You Hear by Paul Pilkington
Linda Needham by My Wicked Earl
Forever Changed by Jambrea Jo Jones