A great deal of COVID-19-relevant information is potentially available in the digital world.
- Users of social networks voluntarily provide extensive personal information, usually including demographics (age, sex) and location;
- Users of mobile networks provide information necessary to receiving and paying for the service, and also provide location information.
- Consumers who seek health information might voluntarily provide additional information.
Location data from mobile devices has been an area of intense interest for governments in the past few weeks. The mobile network knows your location, whether you are in your home country or roaming internationally.
Many countries have worked with the providers of communication services and infrastructure to progressively improve this location information, primarily as a means of improving the accuracy with which mobile users can call for help in the event of emergencies (see Marcus, 2010; Marcus, 2014).
Privacy challenges
However, use of personally identifiable data is restricted in most democratic, developed countries. The European Union implements the General Data Protection Regulation (GDPR) (European Union, 2016), which is based on the recognition of individual privacy as a human right. That the EU has adopted a coherent overall horizontal framework for privacy is generally positive; however, the framework is relatively inflexible. This lack of flexibility becomes obvious now, when a nimble response is needed to a deep threat to the lives and safety of Europeans.
The use of data that is not personally identifiable is in general unrestricted, and several legal instruments at EU level actively encourage the making available of non-personal data and public sector information as a means of promoting economic efficiency (European Union, 2018 and 2019).
or commercial use of personally identifiable data, the GDPR puts a number of common-sense rules in place. The user must be told how the data will be used, to which third parties it will be provided and how they will use the data, how long data will be retained, and more.
The GDPR’s scope does not cover use of personally identifiable data collected by governments for purposes of law enforcement, which is a member-state competence.
Common practice in most developed democratic countries involves some combination of these elements:
- Data that is not personally identifiable (including anonymised data), or non-personal data, is subject to few if any restrictions.
- In order to collect data that is personally identifiable but that contains no content, public authorities must meet a fairly modest standard of proof of need. This tends to be the case for call data records (an indication as to who has been called from a telephone or internet device) and for user location data.
- In order to collect data that is personally identifiable and that contains actual content, a fairly high standard of proof of need must be met. Typically, an independent third party such as a magistrate must be convinced that there are valid grounds to suspect the individual, for instance of a past or likely future crime.
In order to understand how these broad principles interact with likely needs in terms of combatting COVID-19, it is useful to reflect on some of the use cases in which big data has been applied.
Ways in which big data has been used to date
There are three main forms of use that have been prominent to date: (1) strategic planning; (2) the tracking of (possibly infected) individuals; and (3) the provision of advice to concerned and possibly infected individuals.
Strategic planning
One of the most immediate and most promising uses of big data in combatting COVID-19 has been as a means of prediction, analysis, and strategic planning for national governments and national health authorities.
Epi-risk, for example, is a predictive model that looks at how the disease moves from one city to another as a function of air travel and commuting patterns. It draws on statistics on the number of known cases and deaths provided by national authorities, and on integration with air traffic data provided by the OAG database. The hope is that additional data from social networks can also be integrated. According to the lead researcher, “What we do as computer scientists and computational epidemiologists is provide [the doctors, nurses, and public health people in the field] with intelligence to anticipate the move of the enemy” (Waltz, 2020a).
As another example, an analysis of the evolution of the disease in China (Li et al, 2020) may serve to clarify the degree to which individuals who were not known to be infected contributed to the spread of the disease. The authors found that undocumented cases (ie cases that had not been reported) were only half as contagious as documented cases. Nonetheless, because some 86% of cases probably went unreported, they estimated that between 82% and 90% of all Chinese cases nationwide from 10–23 January were infected by people whose infections were undocumented. To estimate mobility between Chinese cities around Chinese New Year (which was 25 January in 2020), the researchers extrapolated from 1.7 billion records of 2018 travel records recorded by e-commerce merchant TenCent. This serves to demonstrate that big data can play a crucial role in valuable analyses.
Strategic planning in Austria, Italy and Germany has used mobile location data provided by mobile network operators. Mobility data from Deutsche Telekom is used to estimate the degree to which the German population is complying with requests or orders to stay at home. In Italy, data provided by mobile network operators Telecom Italia, Vodafone and WindTre demonstrates that movements exceeding 300-500 metres in the Lombardy region are down by some 60% since 21 February, the date on which the first case in the region was identified. In Austria, A1 Telekom Austria Group is feeding mobility data into a third-party tool that is more typically used to estimate how crowded a ski area will become, but in this case can be used to estimate the effectiveness of social distancing (Reuters, 2020).
A common feature of all these strategic uses of big data is that they generally do not rely on personally identifiable data, or use anonymised data. This avoids most if not all privacy concerns.
This approach can be said to be much “less invasive than the approach taken by countries like China, Taiwan and South Korea, which usesmartphone location readings to trace the contacts of individuals who have tested positive or to enforce quarantine orders.” (Reuters, 2020 ) Indeed Austrian privacy advocate Max Schrems observed that, “As long as the [mobile location data] data is properly anonymized, this is clearly legal.” (Reuters, 2020 )