Data is not the oil and other observations of data protection legislation
Published on August 30, 2018
Oleg Abdurashitov, Head of Public Affairs, APAC
Who Owns Your Data?
Just like oil, if not more so, data makes our world run. It’s not surprising then that all the new regulations – from Europe to Australia – attempt to answer who owns, controls and has access to data in a country-specific legal culture.
However the oil analogy has its limits when dealing with personal data. In practical sense you don’t actually own the data - you generate it. Data does not exist unless there is a platform enabling data generation and collection, and unless you interact with this platform. Making a phone call, browsing web, ordering pizza online, putting fitness tracker on, or even passing by a street with CCTV cameras generates tons of data across multiple platforms.
Most of the data you generate is not really that personal. But just like oil data is getting more valuable when processed. If refined crude oil gives you jet fuel, refined data gives you an individual. Sometimes this de-anonymization of data happens by consent, like when you authorize an electronic payment. Often it is possible to de-anonymize data without explicit users’ consent by combining different sources of data or applying additional algorithms.
Tons of personal data is now willingly generated by ourselves on social media platforms, making it easy to identify user on the fly, and some researchers warned about the consequences of social platforms for privacy as early as in 2010. Since then the business model of collecting and selling data generated by users has become so lucrative that many companies from traditional industries such as aviation or retail now redesign their business processes to generate even more of it.
A Case for Necessity – or Lack of It
The recent Cambridge Analytica scandal has challenged the long-standing notion that people are happy to let platforms use all of their data in exchange for free services. In countries like Norway are even willing to pay so that the platforms collect as little data as possible (The Norwegian Data Protection Authority, 2016).
GDPR Article 25 describes the requirement that “only personal data which are necessary for each specific purpose of the processing are processed”. For instance, a good anti-virus product does not need to identify who owns a machine to perform its function. Our antivirus collects data to understand the machine’s environment, identify changes to the system and alert on those deemed malicious, but it does so to identify the malaise, not the patient. Our product doesn’t intentionally collect data that might be used for personalization, and neither do we specifically process it to extract personal data – in our business model this is simply impractical. Some reports paint an anti-virus as a magic tool for data collection, but in reality we have a lot less personal information about a user than it is common to assume and anonymize it wherever possible (more on our Data Processing Principles here).
There are very good reasons why in many other industries personal data is intentionally anonymized before being used. These reasons include minimizing the risks of bias, potential power abuse or violation of privacy. Take medical research or population studies. The researchers should know a lot of very personal, even intimate details about their study subjects, but they do not need to know their names or be able to recognize them in a supermarket – and probably neither should numerous social apps.
However, given that personalized data is easy to obtain, there is little evidence that platforms are reviewing whether anonymous data alone will suffice for their functions. With users mostly unaware how much data they generate, digital platforms – both private and public – will continue to amass, share and process identifiable data.
Better Protection or Less Access?
The databases rich in personal details make a perfect target for cybercriminals and other malicious actors. The more platforms collect identifiable data, the higher is the risk of a breach. The more data they collect, the higher are the stakes that a single breach will compromise other platforms and result in painful consequences for users.
But these large collections of identifiable data come handy for security agencies. Not surprisingly they are the most active lobbyists of the new Data Protection laws compelling platforms to grant them access to data through localization requirements (like India) or exterritorial jurisdiction (like in Australia or EU). Experts however warn that the rise of Big Data policing may distort policing work, introduce bias into algorithms or unnecessarily expose innocent citizens. Another risk is that several perfectly legitimate databases might be complied with hacked information or even used on their own to stage a complex information warfare operation – just as our researchers predicted back in 2016.
Hence in addition to taking measure to secure data platforms better, we shall ensure more transparency around the actual data practices, in particular these of unconstrained sharing of potentially identifiable data between the platforms.
Most importantly, we need better-informed users, who are aware of their hyperconnected environments, and of what can be done with data they generate. In fact, users are already not that willing to summon their contacts lists to a service provider – they just don’t know it’s often being fetched once they log in with social media profile. More transparency about data processing, including when it is passed to a third party is required; opting-out from data generation shall be much easier than it currently is (in short – nearly impossible). Kaspersky Lab’s Transparency Initiative offering independent validation of data and engineering practices is one of the potential ways forward.
In the end data practices are born out of balancing interests of regulators, users and data-driven businesses and in the next few years we will see them changing. Time to take part in this change is now.