More than Data Collectors: Valuing Data Expertise Beyond Professional Science

This post is part of Bill of Health’s ongoing blog symposium on Critical Studies of Citizen Science in Biomedical Research. Responding to controversies over the validity of patient and patient groups’ contributions to biomedical research, in this post, Sabina Leonelli turns to the environmental sciences in order to examine the construction of expertise in participatory research. Background on the symposium is here. You can call up all of the symposium contributions already published by clicking here.

By Sabina Leonelli

In the context of biomedical research, particularly following the emergence of evidence-based medicine, extensive debate surrounds the choice and evaluation of appropriate sources of evidence. Contributions from patients and patients’ groups generate substantial scientific, ethical and methodological controversies, and clinicians often regard with suspicion any dataset that is gathered by non-professionals outside controlled conditions. At the same time, the collection of health data through social media is becoming increasingly visible as a potential source of information, with defenders of precision medicine going as far as to value those strategies of data collection over traditional clinical trials. To shed some light on this situation and its potential pitfalls, it is useful to consider the experience of fields such as botany, ethology, oceanography, environmental science and ornithology, which have long relied extensively on contributions by non-professionals. With the recent emergence of “citizen science” projects, these contributions are often construed in the form of data offerings, with citizens volunteering to collect data that can be fed into scientific projects – as in the well-known cases of eBird and eOceans. Just as often, researchers conceptualise these offerings as an opportunity to get hold of data that would otherwise remain beyond their reach. Due to their location, occupation or hobbies, citizens may be able to document circumstances and events of interest to researchers, but which researchers themselves do not have the resources and personnel to identify and witness in person. Thus, citizens are positioned as precious data collectors, whose engagement with science consists in the provision of potential sources of evidence. At the same time, a line is drawn between such data collectors and professional scientists as the veritable data experts, who can direct and situate citizens’ efforts to procure the raw materials for scientific investigation. In this configuration, key questions such as what the evidence will be useful for, and whether the data will be of good enough quality to play that role, are left to the scientists in charge; and data collectors remain peripheral to the formulation and development of research projects and related knowledge claims.

I here want to challenge this construal of data expertise, which I find both misleading and damaging to the quality of scientific outputs. In my view (detailed in this book), data expertise involves the ability to make a link between a given dataset and a knowledge claim, in order to show that the data at hand can serve as evidence to warrant the truth of that claim. Using data as evidence in this sense certainly involves many professional scientific skills, such as a knowledge of statistics and of the phenomena being analyzed. It also involves knowing an awful lot about where data come from, and how precisely they have been generated (what data scientists call ‘data provenance’), so as to be able to situate the data correctly. This view of data expertise disrupts the conceptualization of data collectors as having nothing to offer to the conceptualization and development of research projects. Indeed, people who collect data may well be unaware of the specific research projects that their data may be used to carry out, but this does not mean that they do not possess relevant expert knowledge, or that they are uninformed about what it is that they are documenting. It is the data collectors that possess the relevant understanding of the specific circumstances in which data are produced, as well as related skills such as the ability to distinguish unusual behaviors or morphologies when, e.g., observing the fauna and flora at a given location, and to evaluate the best position and time of the day for taking a measurement. This is also evident in biomedicine, where the observations and experiences of patients and their families are widely recognised as valuable to clinical research (as evident from the widespread attempts to harness these data by research platforms such as PatientsLikeMe).

Taking full advantage of such data expertise to improve scientific knowledge production is crucial. This is particularly evident at a time when open and detailed documentation of processes of knowledge production is increasingly recognized as a key to the quality, integrity and re-usability of research outputs (and particularly data). As variously defended by sociologists and philosophers of science such as Brian Wynne, Sheila Jasanoff, Hasok Chang and Helen Longino, and as I discussed at length in relation to the case of research on plant pathogens in the UK, scientific research at its best feeds on multiple forms of knowledge and involves consultation among a variety of expertises and viewpoints. This also, incidentally, encourages participation in research by non-scientists, and the disruption of views (and practices) of science as an ivory tower whose inner workings are inaccessible and unintelligible to bystanders. This commitment to co-production and social engagement lies at the core of the Open Science and Open Data policies recently proposed by the European Commission, and is beautifully instantiated by citizen science projects such as those promoted by Doing It Together Science (URL:, which take advantage of the expertise of data providers at all stages of research planning and development. It is also a commitment motivating the research undertaken within the Data Studies research group at the University of Exeter, where we track and analyze the journeys of various types of data across biological and biomedical contexts (within and beyond professional scientific circles, and in both low-income and high-income countries) to investigate the epistemology and social embedding of data-intensive research.

Stay tuned for more posts in the Citizen Science Blog Symposium!

Be Sociable, Share!

Leave a Reply

Your email address will not be published. Required fields are marked *