Trial by p-values: Preliminary thoughts on the Jens Förster report

I’ve just had a quick look at the report (available at Retraction Watch) that led to the investigation of Jens Förster for possible data manipulation. It makes the case that the data in three of Förster’s papers are statistically highly improbable, largely due to the fact that the means for the levels of various three-level factors all tend to fall in a straight line. There are also claims about the data being far too consistent across independent studies, of effect sizes being implausibly large, and demographically implausible samples.

It is dry, depressing reading.

From the comments I’ve seen on Retraction Watch and Twitter, some people are already convinced. For my part, I’m reserving judgment until the psychological/statistical community has time to complete its “post-publication peer review” of the report. To stimulate discussion, here are some thoughts I had after a first read:

1. The report analyzes just 3 papers. Forster is a highly productive researcher with 50+ papers. How were these 3 chosen? Were there other papers by Forster which did not show any questionable patterns?

2. The F-test for linearity assumes continuous DVs, but some of the DVs are discrete (from rating scales). The simulations at the end of the report suggest that the test might be robust against violations of this assumption, but are the simulations themselves valid and based on reasonable assumptions?

3. Control studies were selected based on a search for single factor 3-level designs. Do control studies involve same type of data? Did the selection process for the control studies mimic the selection process that led to identification of the 3 questionable papers?

4. Could p-hacking give rise to a linear pattern? (this idea is from @bahnik)

If this is going to be a trial by p-values, I hope that we can make sure that Jens Förster gets a fair hearing!