Big Data’s Shadow
Every click you take in cyberspace can be tracked, your cell phone broadcasts your geolocation, and all of your purchases and phone calls are cataloged somewhere. Taken together, this information can be analyzed to paint a picture of you—one that, increasingly, others can see. It might define who you are and let users predict what you will do in the future. The result is a loss of privacy. After all, the problem with such data— so-called big data—is the magnification of its effect by how pervasive it is.
Big Data
● In an increasingly networked world, personal information is widely collected and widely available. As the storehouse of personal data has grown, so have governmental and commercial efforts to use this data for their own purposes.
● Commercial enterprises solicit new customers with targeted ads. Governments use the data to identify and locate previously unknown terror suspects—so-called clean skins, who are not in any intelligence database. We have discovered that we can link together individual bits of data to build a picture of a person that is more detailed than the individual parts.
● Big data offers all kinds of opportunities to those who have access to it. But this also comes at a price: It creates an ineradicable trove of information about us as individuals, making it increasingly difficult to safeguard our privacy.
● If the government collects data to build a picture of, for example, a previously undetected terrorist threat, it can also—if it is so minded— use this capability to build a picture of its political opponents. That navigable web of data poses threats in the Free World and, perhaps even more so, in authoritarian nations.
In thinking about this capability and the opportunities and threats it presents, we sometimes talk out of both sides of our mouths.
● Early in the century, there was significant hype surrounding the government’s launch of one such big data program, known as Total Information Awareness. It was a research project of the Defense Advanced Research Projects Agency (DARPA) in the immediate aftermath of the 9/11 terror attacks.
● DARPA’s working premise was that advanced data-analysis techniques could be used to search the information space of commercial and public-sector data and identify threat signatures indicative of a terrorist threat. Because this would have given the government access to vast quantities of data about individuals, it was decried as the harbinger of Big Brother and was eventually killed.
● Compare that public condemnation and the government’s reflexive response with the subsequent, almost universal criticism of the intelligence sector’s inability to connect the dots prior to a subsequent terrorist plot. This was the plan of the young Nigerian Umar Farouk Abdulmutallab to detonate an explosive aboard a jumbo jetliner, on an international flight bound for the United States on Christmas Day in 2009. Also known as the “underwear bomber,” he subsequently was sentenced in a U.S. court to life in prison.
● In that instance, we were told that we did not perform enough data analysis. We failed to link National Security Agency intercepts to airline travel records and State Department reports.
● The conundrum arises because the analytical techniques of big data are fundamentally similar to those used by traditional law-enforcement agencies. We use analytic algorithms to take a lead (a single piece of information as a starting point) and follow it to identify connections.
● This is what the police do on a daily basis, but in the big data system, computer systems operate on a much more vast set of data. And that data is much more readily subject to analysis and manipulation. As a result, the differences in degree between what the police used to do and what computer analysis can do today tend to become differences in kind.
● The ability to collect and analyze vast quantities of data is a fundamental change caused by technological advances that cannot be stopped or slowed. The phenomenon derives from two related yet distinct trends: increases in computing power and decreases in data storage costs.
Big Databases
● In the late 1980s—practically the dawn of time for personal computers— the Department of Justice went to a great deal of trouble to create a database with information about the criminal records of known offenders. These records were kept in disparate local, state, and federal databases.
● All of these records were generally public and, in theory, available for inspection by the media and private citizens. But in practice, the information was so widely scattered that no crusading journalist or enterprising individual could incur the expense of finding it all and creating a comprehensive dossier on any individual.
● Only the federal government possessed the degree of need and adequacy of resources to undertake the task of creating the precursor of what is today the National Crime Information Center. At very great expense, the Department of Justice began to collect criminal records on a small number of criminals who were of national interest.
● Then, large data-collection and data-aggregation companies, such as Experian and ChoicePoint, began to harvest—by hand—public records from government databases. These data-aggregation companies systematically compile birth records, credit and conviction records, real estate transactions and liens, bridal registries, and even kennel club records. One company, Acxiom, estimates that it holds, on average, approximately 1,500 pieces of data on each adult American.
● Anyone with enough data and sufficient computing power can develop a detailed picture of virtually any identifiable individual.
Big Data Failure
● Two of the 9/11 terrorists made reservations on American Airlines Flight 77. Their names also happened to be on the CIA’s watch list. But we didn’t connect those two pieces of information. If we had, we could have identified their home addresses from information they provided to the airline.
● And a simple cross-check would have discovered that three other individuals associated with these addresses—one of them named Mohammed Atta—also had made flight reservations on September 11th.
● If we cross-checked the callback phone number that Atta gave to the airline, we’d likely have discovered that five other individuals also had provided that same phone number to reservation agents for purposes of confirming their own flight reservations on September 11th.
● And had we looked in one more place in the airline database, we would have discovered the name of yet one more individual who used the same frequent flyer number as had one of the men on the CIA watch list. Then, if we had branched out to public sources, we would have found that two more individuals shared living arrangements—that is, they had the same address.
● Finally, the remaining six individuals associated with hijacking four commercial airplanes on that date—and launching them into the World Trade Center in New York and into the Pentagon in Washington DC, as well as the lone misfire that went astray on an empty field in Pennsylvania—could have been identified through a routine review of U.S. Immigration and Naturalization Service’s records (the expired visa/ illegal entry list). One terrorist was on that list, and five others had public records of having lived with him or among each other.
● And all, of course, shared the common characteristic of making reservations on flights for the morning of September 11th. In short, as a Department of Defense review committee concluded, with just seven clicks of the mouse through existing databases, all 19 terrorists could have been identified and linked to one another.
Big Data Success
● Ra’ed al-Banna—a Jordanian who attempted to enter the United States via Chicago on June 14, 2003—was probably a clean skin (a terrorist with no known record). He was carrying a valid business visa in his Jordanian passport and outwardly appeared to be an unremarkable business traveler from the Middle East.
● The Department of Homeland Security operates a sophisticated data analysis program called the Automated Targeting System (ATS) to assess the comparative risks of arriving passengers. Homeland Security uses ATS to decide who to stop and talk to and who to let through easily. The system has become essential, given the sheer volume of travelers to the United States.
● ATS flagged al-Banna for heightened scrutiny. He was pulled from the main line of entrants at Chicago’s O’Hare Airport and was individually questioned.
● During the interview, al-Banna’s answers were inconsistent and evasive— so much so that the U.S. Customs and Border Protection officer who conducted the interview decided to deny his application for entry and ordered him returned to his point of origin. As a matter of routine, al-Banna’s photograph and fingerprints were collected before he was sent on his way.
● The story might have ended there, because Customs and Border Protection officers reject entry applications daily for a host of reasons. But al-Banna proved to be an unusual case.
● More than a year later, in February 2005, a car filled with explosives rolled into a crowd of military and police recruits in the town of Hillah, Iraq. More than 125 people died—the largest death toll for a single incident in Iraq until that time.
● The suicide bomber’s hand and forearm were found chained to the steering wheel of the exploded car. After U.S. forces took fingerprints, a match was found to al-Banna’s in Chicago 20 months earlier.
Questions to Consider
It is shocking and disturbing that the government had information in hand that might have been used to prevent 9/11. How can we avoid that problem in the future without giving too much information to the government?
Do you use E-ZPass, or a similar automatic toll-paying device? Do you worry about the data being collected? If not, why not? If you do, why?