Scandal in the Age of Big Data
Newsrooms across the world have been preoccupied, over the last few days, with the question of how to handle the Ashley Madison data dump. But, as Connor Tomas O’Brien asks, is this just the beginning of a Big Data scandal that will never end?
Some days, it would be fascinating to sit in on the frantic meetings at major news organisations, as they try to figure out how to best exploit all the scandalous data they have piling up on their hard drives. While, for decades (centuries!), journalists struggled to collect enough information worth reporting on, the opposite is now sometimes the case. There are journalists sitting vast repositories of information that are too damaging and outrageous to publish without causing great harm.
The huge Ashley Madison data dump … provides a case study on what happens when the media are handed the journalistic equivalent of an atom bomb.
The huge Ashley Madison data dump – which contains 33 million names and addresses linked to the extramarital dating service – provides a case study on what happens when the media are handed the journalistic equivalent of an atom bomb. A good deal of the media’s power comes from their ability to destroy lives, but too much power can debilitate; The questions you must ask yourself if you’re in possession of an atom bomb, after all, are many and more varied than those you must ask if you’re in possession of a BB gun.
The existence of the web must be frustrating for news organisations, especially those with even the loosest ethical standards, because those in the darkest corners of the internet always act on scandalous information more quickly and with much less hesitation. When vast swathes of private, intimate or incriminating information are dumped, it is rarely broadsheets or tabloids that break the news. Instead, that data finds its way into the hands of users on sites like 4Chan and 8Chan, where a cloak of anonymity – and a general atmosphere of unscrupulous libertarianism– rewards users for rapidly collating, untangling and disseminating data, regardless of its potential for harm. While news organisations devise strategies for responsibly responding to big leaks, the information floats around the web for hours or days, in a strange limbo, both accessible and not – somehow too scandalous to be legitimately reported on, even as it rockets to the top of every ‘trending stories’ list on every social network.
The trick, for news organisations, is how to capitalise on interest for Big Data scandals without losing credibility, or being perceived as exploitative. As media theorist Nathan Jurgenson – who works as a researcher for Snapchat – suggests, one strategy is to couch reporting on data leaks in the language of concern, while at the same time providing readers with enough clues for them to find the incriminating information themselves on 4Chan threads where links to the real dark web beckon.
While this happens, sites like Have I Been Pwned? ingest the incriminating datasets, scrub them, and make them easily accessible to regular users. Usually, these sites claim that they offer restricted access to the datasets so that affected users can ‘check if they have an account that has been compromised in a data breach’, but this justification doesn’t hold up to close scrutiny. After all, when it comes to sites like Ashley Madison, Adult Friend Finder, or YouPorn – all of which have been hacked – it usually isn’t what is contained within a user’s account that is private, but the very existence of the account itself. In that context, search engines like Have I Been Pwned? simply act as a catalysts, speeding up the dissemination of sensitive data, and accelerating the harm that information might cause. News outlets, of course, rarely acknowledge this. Once a dataset appears on Have I Been Pwned?, publications can simply point readers directly toward search boxes they can use to comb through stolen datasets, under the pretence that they are providing a form of community service.
Update: Over the space of several hours, Troy Hunt, the Microsoft developer responsible for Have I Been Pwned?, watched as media outlets instructed readers to use his site to search for the names of others. As a result, the site now makes it more difficult for users to access the Ashley Madison database – opening up space for less scrupulous Have I Been Pwned? competitors to take the site’s place.
Those in the darkest corners of the internet always act on scandalous information more quickly and with much less hesitation.
The rules on how aggressively a publication should capitalise on leaked data seem to differ depending on who is implicated in the dump, and the ability of publications to control the flow of the data being leaked. If those implicated in a dump are exclusively public figures, for example, news organisations can readily justify publishing leaked data on the grounds of public interest, especially if that information first passes through a whistleblowing organisation like WikiLeaks, which can assist in working with multiple broadsheet publications simultaneously to boost the likelihood of a responsible? rollout. When dumped data about Very Important People can’t be seen to have legitimate public interest, though, things get trickier. Tabloids, for example, tend to traffic in benign invasions of celebrity privacy, but when that violation reaches a zenith – for example, when a dump of nude celebrity photographs was leaked last year – these publications face a difficult decision: to follow regular protocols and re-publish these images, or to refuse to publish on ethical grounds, recognising that traffic will invariably be lost to competitors with fewer reservations. More recently, when nude images of young Australian women were shared online without their consent, media outlets were forced to consider how to report on a scandalous data leak without sensationalising or promoting further sharing of the pictures.
With masses of once-private information leaking in a semi-controlled fashion from here to eternity, perhaps, we are now at the very beginning of a Big Data scandal that will simply never end.
In the case of Ashley Madison, the problem is that the sheer scale and breadth of the data dump means that the leaked information transcends all previously established classes of scandal. The Ashley Madison dataset, with its hundreds of millions of fields to pick through, undoubtedly contains information – somewhere – that could legitimately be reported on, but it likely contains much more information that, probably, should not be amplified. Ethically, how justifiable is it for a journalist or a news organisation to build stories from a dataset that might be responsible for ruining the lives of many millions of their readers? How do you capitalise on scandal without being viewed as opportunistic; generate clicks without sacrificing goodwill?
A day in, publications are trying to tell many different kinds of stories about the Ashley Madison data, though most are – perhaps surprisingly – showing some degree of restraint in terms of how much identifying information they are publishing about the site’s users. A piece on Buzzfeed, titled ‘I Talked To My Cheating Ex After Finding His Email In The Ashley Madison Hack’, managed to strike a creative balance, with the author utilising the dataset as the foundation for a personal narrative – albeit, one related to almost a million readers in less than 12 hours. Large and established tabloids, meanwhile, are moving slowly, apparently in a bid to tease out the amount of data they have at their disposal, revealing just enough to string readers – who might lack the technical know-how to access the dataset on their own – along with rumours and gossip.
With tens of millions of accounts to sift through – and the possibility, always, of more major data dumps from other private platforms like Snapchat or Whatsapp – how long can any of this continue? Will journalists, as Paul Ford has – half-facetiously – suggested, end up developing a program to automate the writing and publishing of the millions of articles yet to be written about individual Ashley Madison users, until every user of the site is made the subject of their own piece?
With masses of once-private information leaking in a semi-controlled fashion from here to eternity, perhaps, we are now at the very beginning of a Big Data scandal that will simply never end. After a couple of decades of moving around the web, most of us lack the ability to even recall how and where our private data resides – and exactly what information about us we should be worried about, were it made public. Over the next few years, much of this forgotten but intimate data will boomerang back, returning from the Cloud to collide with all of us – especially, perhaps, those of us who believe we have nothing to hide. ‘This hack could be ruinous – personally, professionally, financially – for [the directly affected] and their families. But for everyone else, it could haunt every email, private message, text and transaction across an internet where privacy has been taken for granted,’ writes John Herrman at The Awl. ‘Welcome to the first day of the rest of your internet.’