latest from the magazine
latest journal issue
See Linnet Taylor's most recent Society & Space contributions: No place to hide? The ethics and analytics of tracking mobility using mobile phone data
The social sciences are engaged in a trans-disciplinary debate on the meaning and use of new forms of digital data. One of the most important repercussions from Dalton and Thatcher’s call (2014) for a critical data studies has been an awareness that researchers need to continually sensitise themselves to the contextualities of data’s production and use (Kitchin, 2014; Graham and Shelton, 2013; Nissenbaum, 2010). This short essay responds to this ongoing debate, laying out the case for such an awareness and asking how we might better operationalise it in data studies. If researchers working with the new data sources – and geographers in particular – can learn to think across contexts in a more inclusive way, it may take us further toward realising big data’s promise as a tool for social scientific research.
Like Dalton and Thatcher, I use the terminology of ‘big data’ as central to the process of imagining a more contextually aware data studies, since it is precisely because of ‘bigness’ that context tends to disappear. ‘Big’ can easily become a synonym for ‘universal’ in ways that can be both unreflexive and insidious. For instance, a focus on the analytical challenges of large and complex datsets tends to crowd out a more inclusive perspective in favour of a focus on the most active online population – the US – because it provides the greatest breadth of data. ‘Big’ is powerful, it is epistemologically deterministic (Cherlet, 2013), and it suggests a truthiness that gets in the way of reflexivity.
The power and traction that big data’s truthiness currently enjoys – the idea of its universality and cultural flatness – tells us something about our own academic context. We are operating in a time of economic austerity that is decimating the capacity of the public sector to collect and act on its own data, and of geopolitical instabilities that are generating a desire for clarity and operationalisable research. Both these factors have played a role in the tremendous discursive power of big data in social scientific research and governance. It is supposed to produce solutions for every problem, despite our currently imperfect understanding of its risks and biases, and it is seen as essential to economic recovery and the creation of opportunity. There is even research funding being directed towards instrumenting people to perceive it more positively (EScience Center, 2015).
We, as researchers, inevitably play a role in this instrumentation, either proactive or resistant. The current call for reflexivity in critical geography, in particular, is a response to this involvement. Can we get beyond these pressures? Can we realistically engage with the global scale on which digital data are produced, the diversity inherent in their production, and the ways in which that diversity is in turn processed out of sight? Possibly not. Puschmann and Burgess (2014) in their study of the metaphors of big data show that it is perceived as something wild and non-human, despite the fact that all digital data is produced in socially mediated ways. The commonly used terminology of ‘data in the wild’ is a convenient fiction because it deemphasises this social mediation and absolves the researcher from the unweildy process of understanding the unfamiliar languages, cultures and institutional and political landscapes in which much data is generated. Big data analytical processes contribute to a sense that context is too big a problem to tackle, particularly since merging and linking datasets often creates exponentially more contexts to take into account. So how can the contextual be accessed and included in accounts of how big data is operating? And how can data’s diversity be understood on a more global and inclusive scale?
One step towards answering these questions is to become more conscious of the radical asymmetries of power and technology that shape big data’s production. Dalton and Thatcher recommend that researchers pay attention to the differential power geometries highlighted by Massey (1993), but data studies presents us with layered power geometries of both activity and data produced from that activity. This makes it necessary to examine the unevenness in the way that born-digital data are produced, collected and manipulated. Mark Graham (2015) in particular has called attention to the asymmetric ways that digital data represents those in lower-income countries and the global South, full of gaps, unknown spaces and biases that are hard to measure. A micro-level analysis of how connectivity has been becoming available to lower-income and marginalised groups (Taylor, 2015) demonstrates that access to the kinds of technologies that generate data as a by-product (primarily mobile phones and the internet) is highly uneven and interrupted.
This unevenness in data production suggests that big data’s universality is at best a methodologically necessary illusion supported by publication bias: high-profile journals are keen to publish innovative big data analytics but do not demand that researchers are specific about the shortcomings of their data. In fact, knowing the shortcomings of one’s data is also a challenge, since there is little research that explains what is missing. In particular, now that more than half of mobile phones are owned by the global non-elite (ITU, 2013) it is easy to confuse globally available data with globally representative data. People in lower-income places tend to produce sparser and less granular data because they have access to previous-generation devices. Further, fewer types of survey data are available on those areas, making it harder to gauge the validity of what is available. This means that data about lower-income places (i.e. most of the world) cannot tell us as much as data about higher-income places, yet sweeping claims are being made for it in terms of transforming human life and opportunities. These should be examined.
The patchiness of big data is also related to who controls its production. Power over big data analytics is oligarchic, at least where those data arise as a by-product of corporate-mediated processes such as communication, internet use or the use of sensors. Just as Morozov (2013) has warned that when we reify ‘the internet’ we are in danger of empowering certain interests over others, similarly, reifying big data as the ‘god’s eye view’ (Pentland, 2011) may also risk handing over the power to understand data to the private sector interests who control much of the access and analysis. For example, corporate power often modulates the way that people become data producers through practices such as zero-basing, where new users in developing countries are offered mobile internet in a monopolistic model (e.g. Facebook’s internet.org) including only a few ‘partner’ web services, limiting and skewing the signals they emit when they are online.
These asymmetries make it important to acknowledge the power politics that determine which data we get to see, and which remains uncollected, unanalysed or otherwise inaccessible. Although data production is global, the power to use data has much more of a core-periphery dynamic since most people worldwide do not have the chance to manipulate, channel or analyse data about themselves or their communities. Rather than gaining agency as conscious volunteers of data, the majority are instead becoming subjects of ‘invisible systems’ (Bowker and Star, 2000: 33) where technology firms and governments merely appropriate their data doubles for economic and political control. As Mann (2015) has pointed out, big data does not necessarily represent or empower just by existing. Instead, it gains representative weight where people can gain control over the signals they are emitting and transform that control into economic and political leverage.
If researchers can gain a clearer idea of these particular who’s and what’s of big data, we may better understand how to use it. Perceptions of what data is for and what may be done with it differ radically depending on one’s location, because so too do concepts that are taken for granted by social scientists as semantically stable, such as open data, privacy, volunteered data, and even the internet (as with the example of zero-basing, where ‘the internet’ differs by device).
This diverse view of data’s origins, meaning and use makes a strong case for interdisciplinary and transdisciplinary research. Instead, research on big data is subject to a strong pressure from funders to become extra-disciplinary by collaborating more with enterprise and helping to generate innovation. This suggests that data studies is at risk of orienting itself towards complementing this kind of marketable research, for instance by filling the gaps that such research tends to leave open around privacy and ethics. One role for critical research on data, then, is to de-instrument people and sensitise them to the diverse contexts of data’s use and production. In contrast, a lack of attention to this diversity makes it possible to flatten out data’s difficult unevenness, and inevitably diverts attention from the way data may serve certain populations at the expense of others, or channel resources to some places at the expense of others. For a data studies to be critical, it also needs to become more global. To do this, we must learn from those who are mapping these new colonial landscapes, and start to rise to the challenge of finding a more global perspective on the meaning and uses of data.