Exploring the Assets and Detriments of Spatial Big Data Research

Recently, Xun Shi—Professor of Geography @ Dartmouth College—gave a presentation about the implementation of a bottom-up approach to epidemic modeling for the Geospatial Fellows Webinar Series. While the predominate top-down approach in scientific research has been used to establish then apply general models or laws to problems—as a means to simplify representations of the real world and generalize patterns, Shi points out the controlled, simplified, and deterministic properties of this approach don’t pair well with complexities geographic problems introduce. With this in mind, Shi’s research looked to model COVID-19 at the individual level using a bottom-up approach, where the availability of big data on individuals’ mobility and high-performance computing capabilities encouraged a meaningful assessment of how COVID-19 spread within a specific city of China. Based on simple rules, the bottom-up approach allowed for the modeling of local spaces and individuals, their interaction, stochastic/randomly determined processes, and feedbacks—all of which helped create understanding of the geographic complexities tied to COVID-19’s spreading that classical SIR models (top-down approach) fail to fully capture.

Within Shi’s research, however, the assets and detriments of using big data within spatial research is a critical topic worth addressing. In sourcing data, Shi notes how a fellow colleague of his worked closely with the Chinese CDC and other local governmental agencies to retrieve human mobility data (both individual and aggregated). The aggregated data—sourced from China Unicom, a telecommunications company—revealed the movements of individual clusters from one spatial unit to another during a given time. While the individual data provided this information as well on a person to person basis, there was gaps and missing data that made the aggregated data more effective and reliable to use.

Even though the Chinese government has established a regime of mass surveillance throughout the country—where a ‘lack of digital privacy’ is principal—the use of personal mobility data by researchers and other parties opens up a dialogue about the ethics and responsibilities associated with using big data. Despite the fact disaggregation and de-identification measures were taken by Shi in processing the data for use, the raw data initially attained contains sensitive information that could create ethical concerns and fears if weaponized to ‘expose’ and ‘single out’ individuals (i.e the Chinese government using the EpiForest visualizations to identify, locate, and punish the point-source individuals of a mass transmission chain). To push back against these concerns, within the educational pedagogy of investigating GIS data, by ethically critiquing and understanding the implications behind the use of ‘troublesome knowledge’ and addressing these implications within ones work, researchers can mitigate or at least forewarn their audience about the externalities of data politics (Crampton, 2018). Additionally, if the expansion of capabilities and applications for spatial big data research is paired with the evolving framework of developing stronger moral reasoning skills across scholars (students, professor, researchers, etc), then the emergence of practical ethics and moral responsibility within GIS will become more common place, allowing for more transparency and caution when using big data (DiBiase, 2017).

In Zook et al (2017), the argument for expanding the capabilities and applications for spatial big data research is furthered, where ten simple rules are established to help researchers create an atmosphere of responsible big data research. Rule 3—guarding against the reidentification of your data—addresses the location and privacy concern previously brought up, where working to minimize the capabilities to identify vectors of reidentification is a consideration/technique Shi probably took into account when using this confidential data from the Chinese government. One other rules worth noting is Rule 1—acknowledging that data are people and can do harm. By recognizing big data can often represent individual humans and the manipulation/publication of such data can unintentionally have negative effects on them, there’s an empathy and responsibility researchers carries, where by acknowledging their intents may have adverse implication, an ethic is created furthering the value of using big data in spatial research. By nature, our world operates on implicit trusts and consent to power structure, so although individuals may not have a say in how their ‘big data’ is used by researchers, there should be an underlying belief/faith that their information will confidentially be used for them, not against them. While the path for expanding spatial big data research isn’t crystal clear, I believe in order for big data to become more ethical, there needs to be an educational framework of moral reasoning and accountability applied to both researchers and hosts of big data. Without so, the expansion of big data for reproduction purposes presents a vast array of uncertainties, from privacy issues to a lack of empathy towards those ones research may impact.

Sources

Crampton, J. (2018). GIS and Critical Ethics. The Geographic Information Science & Technology Body of Knowledge (2nd Quarter 2018 Edition), John P. Wilson (ed.). DOI:10.22224/gistbok/2018.2.8
DiBiase, D. (2017). Professional and Practical Ethics of GIS&T. The Geographic Information Science & Technology Body of Knowledge (2nd Quarter 2017 Edition), John P. Wilson (ed.). doi: 10.22224/gistbok/2017.2.2(link is external).
Zook M, Barocas S, boyd d, Crawford K, Keller E, Gangadharan SP, et al. (2017) Ten simple rules for responsible big data research. PLoS Comput Biol 13(3): e1005399. https://doi.org/10.1371/journal.pcbi.1005399

Main Page