The aerial view of the concept of collecting data is beautiful. What could be better than high-quality information carefully examined to give a p-value less than .05? The potential for leveraging these results for narrow papers in high-profile journals, never to be checked except by other independent studies costing thousands – tens of thousands – is a moral imperative to honor those who put the time and effort into collecting that data.
However, many of us who have actually performed data analyses, managed large data sets and analyses, and curated data sets have concerns about the details. The first concern is that someone who is not regularly involved in the analysis of data may not understand the choices involved in statistical testing. Special problems arise if data are to be combined from independent experiments and considered comparable. How heterogeneous were the study populations? Does the underlying data fulfill the assumptions for each test? Can it be assumed that the differences found are due to chance or improper correction for complex features of the data set?
A second concern held by some is that a new class of research person will emerge – people who have very little mathematical and computational training but analyze data for their own ends, possibly stealing from the research productivity of those who have invested much of their career in these very areas, or even to use the data to try to prove what the original investigators had posited before data collection! There is concern among some front-line researchers that the system will be taken over by what some researcher have characterized as “empirical parasites”.
Wait wait, sorry, that was an incredibly stupid argument. I don’t know how I could have even come up with something like that… It’s probably something more like this:
A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”
Yes, that’s it, open science could lead the way to research parasites analyzing other people’s data. I now look forward to the many other subtle insights on science that the editors of NEJM have to say.