Authors: Aaron Swartz
So there was probably a year or two of delay there. And in retrospect, we used that time to lay the groundwork for what came later. But that's not what it felt like at the time. At the time, it felt like we were going around telling people that these bills were awful, and in return, they told us that they thought we were crazy. I mean, we were kids wandering around waving our arms about how the government was going to censor the Internet. It does sound a little crazy. You can ask Larry tomorrow. I was constantly telling him what was going on, trying to get him involved, and I'm pretty sure he just thought I was exaggerating. Even I began to doubt myself. It was a rough period. But when the bill came back and started moving again, suddenly all the work we had done started coming together. All the folks we talked to about it suddenly began getting really involved and getting others involved. Everything started snowballing. It happened so fast.
I remember there was one week where I was having dinner with a friend in the technology industry, and he asked what I worked on, and I told him about this bill. And he said, “Wow! You need to tell people about that.” And I just groaned. And then, just a few weeks later, I remember I was chatting with this cute girl on the subway, and she wasn't in technology at all, but when she heard that I was, she turned to me very seriously and said, “You know, we have to stop âSOAP.'” So, progress, right?
But, you know, I think that story illustrates what happened during those couple weeks, because the reason we won wasn't because I was working on it or Reddit was working on it or Google was working on it or Tumblr or any other particular person. It was because there was this enormous mental shift in our industry. Everyone was thinking of ways they could help, often really clever, ingenious ways. People made videos. They made infographics. They started PACs. They designed ads. They bought billboards. They wrote news stories. They held meetings. Everybody saw it as their responsibility to help. I remember at one point during this period I held a meeting with a bunch of start-ups in New York, trying to encourage everyone to get involved, and I felt a bit like I was hosting one of these Clinton Global Initiative meetings, where I got to turn to every start-up in theâevery start-up founder in the room and be like, “What are you
going to do? And what are you going to do?” And everyone was trying to one-up each other.
If there was one day the shift crystallized, I think it was the day of the hearings on SOPA in the House, the day we got that phrase, “It's no longer OK not to understand how the Internet works.” There was just something about watching those clueless members of Congress debate the bill, watching them insist they could regulate the Internet and a bunch of nerds couldn't possibly stop them. They really brought it home for people that this was happening, that Congress was going to break the Internet, and it just didn't care.
I remember when this moment first hit me. I was at an event, and I was talking, and I got introduced to a U.S. senator, one of the strongest proponents of the original COICA bill, in fact. And I asked him why, despite being such a progressive, despite giving a speech in favor of civil liberties, why he was supporting a bill that would censor the Internet. And, you know, that typical politician smile he had suddenly faded from his face, and his eyes started burning this fiery red. And he started shouting at me, said, “Those people on the Internet, they think they can get away with anything! They think they can just put anything up there, and there's nothing we can do to stop them! They put up everything! They put up our nuclear missiles, and they just laugh at us! Well, we're going to show them! There's got to be laws on the Internet! It's got to be under control!”
Now, as far as I know, nobody has ever put up the U.S.'s nuclear missiles on the Internet. I mean, it's not something I've heard about. But that's sort of the point. He wasn't having a rational concern, right? It was this irrational fear that things were out of control. Here was this man, a United States senator, and those people on the Internet, they were just mocking him. They had to be brought under control. Things had to be under control. And I think that was the attitude of Congress. And just as seeing that fire in that senator's eyes scared me, I think those hearings scared a lot of people. They saw this wasn't the attitude of a thoughtful government trying to resolve trade-offs in order to best represent its citizens. This was more like the attitude of a tyrant. And so the citizens fought back.
The wheels came off the bus pretty quickly after that hearing. First the Republican senators pulled out, and then the White House
issued a statement opposing the bill, and then the Democrats, left all alone out there, announced they were putting the bill on hold so they could have a few further discussions before the official vote. And that was when, as hard as it was for me to believe, after all this, we had won. The thing that everyone said was impossible, that some of the biggest companies in the world had written off as kind of a pipe dream, had happened. We did it. We won.
And then we started rubbing it in. You all know what happened next. Wikipedia went black. Reddit went black. Craigslist went black. The phone lines on Capitol Hill flat-out melted. Members of Congress started rushing to issue statements retracting their support for the bill that they were promoting just a couple days ago. And it was just ridiculous. I mean, there's a chart from the time that captures it pretty well. It says something like “January 14th” on one side and has this big, long list of names supporting the bill, and then just a few lonely people opposing it; and on the other side, it says “January 15th,” and now it's totally reversedâeveryone is opposing it, just a few lonely names still hanging on in support.
I mean, this really was unprecedented. Don't take my word for it, but ask former senator Chris Dodd, now the chief lobbyist for Hollywood. He admitted, after he lost, that he had masterminded the whole evil plan. And he told the
New York Times
he had never seen anything like it during his many years in Congress. And everyone I've spoken to agrees. The people rose up, and they caused a sea change in Washingtonânot the press, which refused to cover the storyâjust coincidentally, their parent companies all happened to be lobbying for the bill; not the politicians, who were pretty much unanimously in favor of it; and not the companies, who had all but given up trying to stop it and decided it was inevitable. It was really stopped by the people, the people themselves. They killed the bill dead; so dead that when members of Congress propose something now that even touches the Internet, they have to give a long speech beforehand about how it is definitely not like SOPA; so dead that when you ask congressional staffers about it, they groan and shake their heads like it's all a bad dream they're trying really hard to forget; so dead that it's kind of hard to believe this story, hard to remember how close it all came to actually passing, hard to remember
how this could have gone any other way. But it wasn't a dream or a nightmare; it was all very real.
And it will happen again. Sure, it will have yet another name, and maybe a different excuse, and probably do its damage in a different way. But make no mistake: The enemies of the freedom to connect have not disappeared. The fire in those politicians' eyes hasn't been put out. There are a lot of people, a lot of powerful people, who want to clamp down on the Internet. And to be honest, there aren't a whole lot who have a vested interest in protecting it from all of that. Even some of the biggest companies, some of the biggest Internet companies, to put it frankly, would benefit from a world in which their little competitors could get censored. We can't let that happen.
Now, I've told this as a personal story, partly because I think big stories like this one are just more interesting at human scale. The director J. D. Walsh says good stories should be like the poster for
. There's a huge evil robot on the left side of the poster and a huge, big army on the right side of the poster. And in the middle, at the bottom, there's just a small family trapped in the middle. Big stories need human stakes. But mostly, it's a personal story, because I didn't have time to research any of the other part of it. But that's kind of the point. We won this fight because everyone made themselves the hero of their own story. Everyone took it as their job to save this crucial freedom. They threw themselves into it. They did whatever they could think of to do. They didn't stop to ask anyone for permission. You remember how Hacker News readers spontaneously organized this boycott of GoDaddy over their support of SOPA? Nobody told them they could do that. A few people even thought it was a bad idea. It didn't matter. The senators were right: The Internet really is out of control. But if we forget that, if we let Hollywood rewrite the story so it was just big company Google who stopped the bill, if we let them persuade us we didn't actually make a difference, if we start seeing it as someone else's responsibility to do this work and it's our job just to go home and pop some popcorn and curl up on the couch to watch
, well, then next time they might just win. Let's not let that happen.
n 2000, at the age of thirteen, Aaron Swartz coauthored the RDF Site Summary (RSS), 1.0 specification, which became the first major standard for syndicating website and blog content through feeds. It was published a few days after he turned fourteen. It is no easy task to work out a technical standard with nearly a dozen other peopleâsomething many adults lack both the patience and maturity to do. I call attention to it because Swartz's technical achievements show that he practiced what he preachedâa very rare quality. He wanted openness, debate, rationality, and critical thinking, and he refused to cut cornersâeven at the age of thirteen.
RSS itself was fundamentally about sharing, taking the content out of its presented form on a website and allowing it to be redistributed and aggregated by other individuals and entities. Another of Swartz's projects, the webpage authoring tool Markdown (2004, co-designed with John Gruber), was a lightweight tool to easily generate webpages and blogposts by turning marked-up text into HTML. Both point to one of Swartz's central driving passions: making the creation, distribution, and freedom of information as easy and frictionless as possible.
Swartz's technical skills were obviously superior, but what differentiated him from most programmers, even some of the greatest open-source gurus, was the
he went about his technical projects. Rather than retreating into a “cathedral” of elite programmers, he wanted to keep things simple, include people, and welcome them in by making things as accessible as he could. The technical projects he chose perfectly mirrored this instinct. They all point to his later, more explicitly political work, where two projects stand out: first, the tor2web proxy project, intended to make hidden deep websites accessible to everyday web users and not just techies; and second, the anonymous leak platform SecureDrop, now known as Strongbox and currently deployed at the
New Yorker, The Guardian
and elsewhere. Swartz saw the deep web as a good platform for sharing information anonymously, and told
, “the idea was to kind of produce this hybrid where people could publish stuff using Tor and make it so that anyone on the Internet could view it.” That, in essence, was his technical philosophy: to build things for
anyone on the Internet
, not just hackers.
Swartz's remarkable achievement was that he managed to merge political activism and technical knowhow to a degree managed by few beforeâperhaps Edward Felten's analysis of DRM methods and advocacy against them come closest. His technical efforts to ease and democratize the creation and flow of information aligned perfectly with his political ideals of openness, transparency, and reform. That the Internet is growing farther from his ideals rather than closer signals just how much we lost with him.
The following is an excerpt from Aaron Swartz's
A Programmable Web: An Unfinished Work
published in 2013 by Morgan & Claypool. Excerpted by permission of Morgan & Claypool Publishers.âEd
If you are like most people I know (and, since you're reading this book, you probably areâat least in this respect), you use the Web. A lot. In fact, in my own personal case, the vast majority of my days are spent reading or scanning web pagesâa scroll through my webmail client to talk with friends and colleagues, a weblog or two to catch up on the news of the day, a dozen short articles, a flotilla of Google queries, and the constant turn to Wikipedia for a stray fact to answer a nagging question.
All fine and good, of course; indeed, nigh indispensable. And yet, it is sobering to think that little over a decade ago none of this existed. Email had its own specialized applications, weblogs had yet to be invented, articles were found on paper, Google was yet unborn, and Wikipedia not even a distant twinkle in Larry Sanger's eye.
And so, it is striking to considerâalmost shocking, in factâwhat the world might be like when our software turns to the Web just as frequently and casually as we do. Today, of course, we can see the faint, future glimmers of such a world. There is software that phones home to find out if there's an update. There is software where part of its contentâthe help pages, perhaps, or some kind of catalogâis streamed over the Web. There is software that sends a copy of all your work to be stored on the Web. There is software specially
designed to help you navigate a certain kind of web page. There is software that consists of
a certain kind of web page. There is softwareâthe so-called “mashups”âthat consists of a web page combining information from two other web pages. And there is software that, using “APIs,” treats other web sites as just another part of the software infrastructure, another function it can call to get things done.
Our computers are so small and the Web so great and vast that this last scenario seems like part of an inescapable trend. Why
you depend on other web sites whenever you could, making their endless information and bountiful abilities a seamless part of yours? And so, I suspect, such uses will become increasingly common until, one day, your computer is as tethered to the Web as you yourself are now.
It is sometimes suggested that such a future is impossible, that making a Web that other computers could use is the fantasy of some (rather unimaginative, I would think) sci-fi novelist. That it would only happen in a world of lumbering robots and artificial intelligence and machines that follow you around, barking orders while intermittently unsuccessfully attempting to persuade you to purchase a new pair of shoes.
So it is perhaps unsurprising that one of the critics who has expressed something like this view, Cory Doctorow, is in fact a rather imaginative sci-fi novelist (amongst much else). Doctorow's complaint is expressed in his essay “Metacrap: Putting the torch to seven straw-men of the meta-utopia.” It is also reprinted in his book of essays
Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future
(2008, Tachyon Publications) which is likewise available online at
Doctorow argues that any system that collects accurate “metadata”âthe kind of machine-processable data that will be needed to make this dream of computers using-the-Web come trueâwill run into seven inescapable problems: people lie, people are lazy, people are stupid, people don't know themselves, schemas aren't neutral, metrics influence results, and there's more than one way to describe something. Instead, Doctorow proposes that instead of trying to get people to provide data, we should instead look at the data they produce
incidentally while doing other things (like how Google looks at the links people make when they write web pages) and use that instead.
Doctorow is, of course, attacking a strawman. Utopian fantasies of honest, complete, unbiased data about everything are obviously impossible. But who was trying for that anyway? The Web is rarely perfectly honest, complete, and unbiasedâbut it's still pretty damn useful. There's no reason making a Web for computers to use can't be the same way.
I have to say, however, the idea's proponents do not escape culpability for these utopian perceptions. Many of them have gone around talking about the “Semantic Web” in which our computers would finally be capable of “machine understanding.” Such a framing (among other factors) has attracted refugees from the struggling world of artificial intelligence, who have taken it as another opportunity to promote their life's work.
Instead of the “let's just build something that works” attitude that made the Web (and the Internet) such a roaring success, they brought the formalizing mindset of mathematicians and the institutional structures of academics and defense contractors. They formed committees to form working groups to write drafts of ontologies that carefully listed (in 100-page Word documents) all possible things in the universe and the various properties they could have, and they spent hours in Talmudic debates over whether a washing machine was a kitchen appliance or a household cleaning device.
With them has come academic research and government grants and corporate R&D and the whole apparatus of people and institutions that scream “pipedream.” And instead of spending time building things, they've convinced people interested in these ideas that the first thing we need to do is write
. (To engineers, this is absurd from the startâstandards are things you write
you've got something working, not before!)
And so the “Semantic Web Activity” at the Worldwide Web Consortium (W3C) has spent its time writing standard upon standard: the Extensible Markup Language (XML), the Resource Description Framework (RDF), the Web Ontology Language (OWL), tools for Gleaning Resource Descriptions from Dialects of Languages
(GRDDL), the Simple Protocol And RDF Query Language (SPARQL) (as created by the RDF Data Access Working Group (DAWG)).
Few have received any widespread use and those that have (XML) are uniformly scourges on the planet, offenses against hardworking programmers that have pushed out sensible formats (like JSON) in favor of overly complicated hairballs with no basis in reality (I'm not done yet!âmore on this in chapter 5).
Instead of getting existing systems to talk to each other and writing up the best practices, these self-appointed guarantors of the Semantic Web have spent their time creating their own little universe, complete with Semantic Web databases and programming languages. But databases and programming languages, while far from perfect, are largely solved problems. People already have their favorites, which have been tested and hacked to work in all sorts of unusual environments, and folks are not particularly inclined to learn a new one, especially for no good reason. It's hard enough getting people to share data as it is, harder to get them to share it in a particular format, and completely impossible to get them to store it and manage it in a completely new system.
And yet this is what Semantic Webheads are spending their time on. It's as if to get people to use the Web, they started writing a new operating system that had the Web built-in right at the core. Sure, we might end up there someday, but insisting that people do that from the start would have doomed the Web to obscurity from the beginning.
All of which has led “web engineers” (as this series' title so cutely calls them) to tune out and go back to doing real work, not wanting to waste their time with things that don't exist and, in all likelihood, never will. And it's led many who have been working on the Semantic Web, in the vain hope of actually building a world where software can communicate, to burn out and tune out and find more productive avenues for their attentions.
For an example, look at Sean B. Palmer. In his influential piece, “Ditching the Semantic Web?,” he proclaims “It's not prudent, perhaps even not moral (if that doesn't sound too melodramatic), to work on RDF, OWL, SPARQL, RIF, the broken ideas of distributed
trust, CWM, Tabulator, Dublin Core, FOAF, SIOC, and any of these kinds of things” and says not only will he “stop working on the Semantic Web” but “I will, moreover, actively dissuade anyone from working on the Semantic Web where it distracts them from working on” more practical projects.
It would be only fair here to point out that I am not exactly an unbiased observer. For one thing, Sean, like just about everyone else I cite in the book, is a friend. We met through working on these things together but since have kept in touch and share emails about what we're working on and are just generally nice to each other. And the same goes for almost all the other people I cite and criticize.
Moreover, the reason we were working together is that I too did my time in the Semantic Web salt mines. My first web application was a collaboratively written encyclopedia, but my second, aggregated news headlines from sites around the Web, led me into a downward spiral that ended with many years spent on RDF Core Working Groups and an ultimate decision to get out of the world of computers altogether.
Obviously, that didn't work out quite as planned. Jim Hendler, another friend and one of the AI transplants I've just spend so much time taking a swing at, asked me if I'd write a bit on the subject to kick off a new series of electronic books he's putting together. I'll do just about anything for a little cash (just kidding; I just wanted to get published (just kidding; I've been published plenty of times (just kidding; not that many times (just kidding; I've never been published (just kidding; I have, but I just wanted more practice (just kidding; I practice plenty (just kidding; I never practice (just kidding; I just wanted to publish a book (just kidding; I just wanted to write a book (just kidding; it's easy to write a book (just kidding; it's a death march (just kidding; it's not so bad (just kidding; my girlfriend left me (just kidding; I left her (just kidding, just kidding, just kidding))))))))))))))) and so here I am again, rehashing all the old ground and finally getting my chance to complain about what a mistake all the Semantic Web folks have made.
Yet, as my little thought experiment above has hopefully made clear, the programmable web is anything but a pipe dreamâit is
today's reality and tomorrow's banality. No software developer will remain content to limit themselves only to things on the user's own computer. And no web site developer will be content to limit their site only to users who act with it directly.
Just as the interlinking power of the World Wide Web sucked all available documents into its mawâencouraging people to digitize them, convert them into HTML, give them a URL, and put them on the Internet (hell, as we speak, Google is even doing this to entire libraries)âthe programmable Web will pull all applications within its grasp. The benefits that come from being connected are just too powerful to ultimately resist.
They will, of course, be granted challenges to business modelsâas new technologies always areâespecially for those who make their money off of gating up and charging access to data. But such practices simply aren't tenable in the long term, legally or practically (let alone morally). Under US law, facts aren't copyrightable (thanks to the landmark Supreme Court decision in
Feist v. Rural Telephone Service
) and databases are just collections of facts. (Some European countries have special database rights, but such extensions have been fervently opposed in the US.)
But even if the law didn't get in the way, there's so much value in sharing data that most data providers will eventually come around. Sure, providing a website where people can look things up can be plenty valuable, but it's nothing compared to what you can do when you combine that information with others.
To take an example from my own career, look at the website
. It collects information about who's contributing money to US political candidates and displays nice charts and tables about the industries that have funded the campaigns of presidential candidates and members of Congress.
Similarly, the website
provides a wealth of information about Congressional earmarksâthe funding requests that members of Congress slip into bills, requiring a couple million dollars be given to someone for a particular pet project. (The $398 million “Bridge to Nowhere” being the most famous example.)
Both are fantastic sites and are frequently used by observers of American politics, to good effect. But imagine how much better they
would be if you put them togetherâyou could search for major campaign contributors who had received large earmarks.
Note that this isn't the kind of “mashup” that can be achieved with today's APIs. APIs only let you look at the data in a particular way, typically the way that the hosting site looks at it. So with OpenSecrets' API you can get a list of the top contributors to a candidate. But this isn't enough for the kind of question we're interested inâyou'd need to compare each earmark against each donor to see if they match. It requires real access to the data.