Big Deal: when Machines know better

Big Data is widely used by businesses to gather information on their consumers. But it’s quite evident that the same Big Data may be used against the businesses as well.

This post is the part of Big Data series, being published this week in Kaspersky Business. The first introductory post is available here.

Target retail chain made headlines late last year due to the immense leak of the shoppers’ payment information: a sophisticated and stealthy PoS malware had been used to steal data on ~70 million people. But a couple of years prior Target was extensively covered in New York Times’ article How Companies Learn Your Secrets. Target’s analysts and marketers learned how to identify mothers-to-be in order to barrage them with the ads and offers related to their status: prenatal vitamins, maternity clothing, etc. Pregnant women are a golden vein for merchants, since they are – for a good reason – guaranteed spenders

Target has a nation-wide database on its buyers – one of tremendous size, as one may guess. The company analysts have created a model which allowed them to identify “most likely pregnant” women via a number of circumstantial features and subtle changes in their buying patterns. For instance, some women load up on supplements like calcium, magnesium and zinc the first 20 weeks of their pregnancy, switch to scent-free soap, etc.


The model of detecting pregnancy (without any, well, effort from the buyers themselves) appeared to be very exact. Too exact, even: one day an angry father came to a Target store and demanded that a manager explain to him, why his daughter in high school received an advertising package for future mothers? A few days after, however, this angry father was apologizing to the store managers: he had just found out that his daughter was indeed pregnant.

Target’s analytics algorithm found this out first, which was a big win for the company marketers (sort of). But what was it for that girl? Apparently she had a different view on her privacy than Target’s “Machine”.

As shown above, Big Data may help to extract a dire lot of personal information about anyone, even such private matters as pregnancy, etc. It’s not hard to imagine this information being used for something much less “innocent” than inciting people into buying wares: industrial espionage, for instance? We hear a lot of stories about cybercriminals gathering all kind of personal information on their target business’ employees in order to ensure their spear-phishing attacks’ success – so that they could next establish a foothold within the victim company’s infrastructure and then exfiltrate whatever data they want. Why not use Big Data for the same purposes?

Well, of course, Big Data isn’t something accessible off the shelf for petty crooks, so this example may look a bit far-fetched. But then again, some 10-15 years ago, global cyber-espionage was a braver sci-fi author’s fantasies. Then in 2012 we caught “Flame”

In other words, when it comes to a cut-throat competition between transnational corporations or covert cyberwarfare between mutually hostile nations, using Big Data for preparing a large-scale cyberattack with personal data as an Archimedes’ lever doesn’t look fantastic at all. Especially given what troves of personal data people – mid-to-top level corporate workers included – tend to feed to social networking sites these days.

Aside from this, is there a way to ensure that Big Data mining won’t yield any specific trade secrets for the wrong parties? Back in 2012, Steve Durbin, global vice president of the Information Security Forum (ISF), wrote in Gigaom:

Organizations are part of often complex, global and interdependent supply chains, which can be their weakest link. Information is what binds supply chains together, ranging from simple mundane data, to trade or commercial secrets and intellectual property – loss of which can lead to reputational damage and financial or legal penalties. There is a key role for information security in coordinating the contracting and provisioning of business relationships, including outsourcers, offshorers and supply chain and cloud providers.

In other words, loose the grasp on information and – probably – lose it all. Now, the Bigger Data is, the stronger has to be the grasp. The question is how to do it in practice.

Big Data Week

<< Previous Post | Next Post >>