Privacy Challenges in Geodata and Open Data
- Author: kth.diva-portal.org
- Full Title: Privacy Challenges in Geodata and Open Data
- Category: articles
- Document Tags: #geospatial
- URL: http://kth.diva-portal.org/smash/get/diva2:1781428/FULLTEXT01.pdf
Highlights
- Data safeguarding and ethics are important considerations for anyone working with geographically explicit data (henceforth, geodata). Although debates about Geographic Information Systems (GIS), geodata, ethics, and privacy/surveillance have existed for a long time in the geographic literature, going back to early critiques directed at Geographic Information Science (Pickles, 1995), the recent open data revolution evokes the need to further explore the intersection between sharing/dissemination of geodata and data privacy. ‘Geodata’ refers to any information that describes a location, and it is important that these data can become open data to promote transparency, collaboration, and innovation in fields such as urban planning, environmental management, and disaster response. Given the sensitive nature of geodata, which by definition identify locations and therefore potentially individuals, geodata presents a specific case that requires special attention (Goodchild et al., 2022; Keßler & McKenzie, 2018; Kounadi & Leitner, 2014; Kounadi & Resch, 2018; Kwan et al., 2004; Zipper et al., 2019). (View Highlight)
- The question of privacy and geodata has been debated since GIS became widespread in the early 1990s, and especially
after the emergence of ‘Critical GIS’ studies (see Goss, 1995; Otoole, 1994). Initially, geodata privacy concerns were mostly limited to databases set up by government agencies. This was also before the smartphone and other mobile devices that could record positions became widespread. As electronic devices began recording big data, especially geodata, some have claimed we live in a society of omniopticon (Zhang & McKenzie, 2022) and sousveillance (Mann et al., 2003), where a few with the help of technology can monitor the many. Consequently, the question of geodata privacy has gained public attention in the last decades. (View Highlight)
- Geographers have examined the balance between open research and privacy. Tullis and Kar (2021) address replicability and reproducibility challenges in GIS while emphasising ethical data usage in the context of disaster analytics. Cann and Price (2022) discuss ethical issues in location tracking technology during the COVID-19 pandemic, highlighting the need for international frameworks and involvement of geographic communities. Chen and Poorthuis (2021) introduce an R package for reproducibly identifying home locations from mobile data, while emphasising the importance of responsible big data research. These examples showcase the active work in this field. As geodata are increasingly used beyond the disciplinary boundaries of geography (e.g., epidemiology, urban planning, crime and security, conservation), it becomes essential that geographers promote the importance of geodata privacy beyond our discipline through collaboration with policymakers and other stakeholders in different fields. (View Highlight)
- ‘Geodata privacy’ refers to a set of principles and practices put in place to protect the confidentiality of individuals, groups, communities or organisations about whom there is geographically explicit data collected for research, commercial or other purposes (Keßler & McKenzie, 2018). This includes protecting data collected by geospatial applications and services, such as Global Positioning System (GPS) tracking and location-based services, as well as from data collection exercises initiated by government, academic, commercial, or other research organisations from unauthorised access, use, sharing and disclosure. We use ‘geodata privacy’ to refer to any issues of privacy (situations where an expectation of confidentiality is violated) which arise from the sharing (dissemination of raw data or findings) of geodata. Violations of geodata privacy are not necessarily through disregard for ethical practice, but rather lack of knowledge or information. For instance, such violations can happen in cases where the data may be believed to be anonymised. (View Highlight)
- The running and cycling activity logging application ‘Strava’ produces
a ‘Global Heatmap’ (https://www.strava.com/heatmap), which shows ‘heat’ made by aggregated, public activities (Strava, 2022). While most users explored this map to find popular running routes, an international security student managed to identify supposedly secret military bases and operations. Soldiers, stationed in these military bases, were using Strava to log their running activity, inadvertently mapping United States military bases in Afghanistan, Turkish military patrols in Syria, and a possible guard patrol in the Russian area of operations in Syria (Hsu, 2018). This was deemed an issue of privacy from an operations security standpoint. (View Highlight)
A potential solution is to use the data to build a useful aggregate measure which retains the important information. In their work aggregating individual-level registered data in Sweden, Andersson and Malmberg (2015) created individualised neighbourhoods (also called ‘egohoods’ or bespoke neighbourhoods) by expand- ing a buffer around a specific location until it encompasses the K-Nearest Neighbours, and then computed aggregate statistics of the population contained in each buffer. If we choose a large enough K-value, then we can minimise the risk to identification. Such a multi-scalar representation of geographical context allows for more nuanced estimations of how neighbourhood effects differ across demographic groups, compared with some other methods of geomasking and provide researchers a more secure geodata environment (Wang et al., 2022). (View Highlight)