The Fault in Our Software

Although the vast majority of scientific articles fly well below the radar of the mainstream media, every once in a while one gets caught in the net. A few weeks ago, a research paper attracted a lot of public attention. It wasn’t about some fancy new drug running amok and causing dreadful side-effects or a bitter battle over the rights to a groundbreaking new technology. It was a fairly math- and statistics-heavy paper that found a flaw in an analysis program used in neuroimaging research.

Soon after the article came out and the media took hold of the situation (with gusto), I received a flood of emails, messages, and tags on Facebook and Twitter. These came from well-meaning friends and colleagues who had read the stories and were concerned. So what was all the fuss about?

The headlines were along the lines of (I’m paraphrasing here but if anything my versions are less dramatic, just google it) “Decades of brain research down the drain”. Several scientists have already come out to explain that the whole thing has been blown out of proportion. In fact, it’s a typical example of irresponsible science reporting (see this previous post). After all, people love a good story. And that’s often all that matters.

Inaccurate reporting of science is nothing new.

The “damage” is exaggerated.

Not to state the obvious, but I feel like it’s worth emphasizing that it’s not all brain research that is affected by this bug. Brain imaging is a great tool, and over the past few decades its use in neuroscience has flourished. But neuroscientists use many, many other techniques to investigate the brain. This bug has nothing whatsoever to do with most brain research.

It’s not even all imaging research that’s affected by the bug. We have so many different neuroimaging techniques – like PET, CT, NIRS, SPECT – that I’m expecting we’ll run out of palatable acronyms soon. MRI is just one of them, and functional MRI (fMRI) is a single application of this imaging technique.

A new take on an old problem.

Not since the infamous Case of the Undead Salmon (2009) has fMRI attracted so much criticism and attention. Actually, both the salmon study and the paper describing the bug are similar. The flaws they highlight mainly pertain to what is known as task-based fMRI.

Here, what essentially happens is a subject is presented with a stimulus or instructed to perform a task. The resulting tiny changes in blood flow and oxygenation are disentangled from the brain’s massive “background” activity and all kinds of other (for these purposes) irrelevant signals from inside and outside the body. In fMRI, the brain is divided up into many small units of space called voxels. To find out if the tiny changes caused by the stimulus are distinguishible from the background, statistics are applied to each voxel (there are tens of thousands).

However, every time you run a statistical test you have a certain chance of getting a false positive, and the more times you run the test the higher that chance becomes. Some form of correction for doing this test many times needs to take place. In a nutshell, the Undead Salmon paper showed that if you don’t apply a correction at all, you’ll see things going on in the brain that should definitely not be there (because the salmon is … well, dead).

The new paper showed that one approach used to limit the number of false positives, implemented in several commonly used fMRI analysis programs, doesn’t work. This failure was caused by two things – a bug in the code of one of the programs and because, as the paper showed, fMRI data violates an important statistical assumption needed for the approach to be valid (basically, because the characteristics of the data do not fit the analysis strategy, the result is unreliable).

Both a bug in the code and an inherent problem with the analysis are to blame.

The reality in my case.

After reading the news, I read the actual paper. Several times, in fact, and I’m not completely sure if I fully understand it yet. It’s not really my research focus. Although I do use fMRI, I do it in an entirely different way. My project actually repurposes fMRI – which is one of the reasons why I like it so much, because I get to do a lot of creative/innovative thinking in my work.

It also comes with the seemingly obvious yet still underestimated realization that making something new – or putting an existing technique to new use – is very, very hard. In my field my peers and I rely heavily on people far smarter than me (this isn’t humility, I’m talking objectively smarter here). These are the biomedical engineers, physicists, statisticians, computer scientists, and bioinformaticians who develop the tools used in neuroimaging research. Ask any imaging researcher – these people are not “para-researchers” – their role is not “just” supportive, it’s fundamental.

Hoping the hyperbole brings about change.

The trouble is, most of the time we use these tools to test hypotheses without thinking much about how they’re doing the things that they do. That’s understandable in a way – these things can be very, very complicated. It’s just not what biomedical researchers (especially those with a medicine/biology background) are trained to do.

The stakes are high for research software developers.

But incidents like these give us reason to stop and think. It’s a fact that people make mistakes, and if your role is as important as developing a tool that will be used by thousands of researchers, the stakes are much higher. When I mess up, the damage is usually far less widespread and hence controllable.

But that doesn’t mean we can’t do something to help. As the authors of the article pointed out, this problem would probably have been discovered much earlier if people would share their data. If the data were accessible, someone would have realized that something was amiss much sooner, not after the software had been in use for a whole 15 years and thousands of papers had been published.

Data sharing would have limited the damage.

Many research software developers encourage people to use their tools and openly share their experience with others so that bugs can be identified and fixed (all 3 software programs assessed in the paper are open source). Sadly, the way we currently do science is stubbornly resistant to that kind of policy.