Monday, June 24, 2013

JOC's estimated data manipulation rate? 0.4% of submissions

In the middle of a terribly interesting article by Stephen Ritter about Amos Smith's Organic Letters data integrity editorial, a noteworthy estimate of data manipulation:
C. Dale Poulter, a chemistry professor at the University of Utah and editor of the Journal of Organic Chemistry, knows these problems well. They are part of the reason he enlists the help of a data analyst. 
When it comes to checking reported data, a reviewer or data analyst must make sure the spectra, elemental analyses, and other data required by the journal are there, Poulter explains. The presented data are then reviewed to be certain there aren’t any blatant misinterpretations. Any anomalies reviewers find could be the result of a simple mistake—such as a typo, math error, or loading the wrong data set. Or they might point to data manipulation. 
The occurrence of data manipulation remains rare, Poulter says — only about a dozen cases out of the 3,000 manuscripts submitted to his journal each year. These cases aren’t reflected in paper corrections or retractions, he notes, because the papers are not accepted for publication.
This works out to about a 0.4% hit rate, which is pretty low.

One assumes that reviewers are reasonably vigilant about such matters, but I wonder if there should be a financial incentive to detect data manipulation? I wonder what a $250 bounty for provable data manipulation (paid to the volunteer peer reviewer who detects it) would do for data integrity.

(Of course, the unintended consequence would be people setting up fake professors and sending in bogus article submissions to get some extra cash for reviewing...) 


  1. One would imagine the actual number could be considerably higher no? I doubt they closely check the supporting information and other data before they have some idea that the paper will be accepted. Would it be safe to say that 12 is the number of papers rejected primarily for discovered data manipulation?

  2. I always had at least one reviewer who checked my supporting info pretty carefully. Even though they let me get away with a lot of free-style prose and figures, they would notice an NMR resonance that I put in by mistake (easy enough to do when you're copy and and pasting the format from a previous molecule and then just changing the shift numbers and adding or deleting a few). There was always someone who would say that it seems to be a phantom resonance and to explain myself. I couldn't even get away with posting my bad elemental analysis and hoping they wouldn't care too much that it was >0.5% off or that they thought like me that the elemental analysis where the instrument error is more than what is required by the journal is a foolish and stupid requirement, especially for organics. There was always someone who saw that and made a comment that "it seems the Elemental Analysis is not ideal". No shit it's not ideal, but I'm not about to run it for the fourth time when the boss complains about 'costs' and are you really going to sink the paper for that?

    Probably where you do want to cheat is by fudging the yield or ee numbers, especially on new methods papers and there is no way reading the SI carefully will help you catch that. Their stringent requirements for the supporting data would not catch any of it either. That's why it's definitely more than 0.4% if you take that data fudging as misconduct. Then to catch that paper you just got to hope a reviewer smells something fishy... like they should have with that Buchwald paper on Pd catalysis. No way it could have been that selective. Pd? Please...

  3. I agree with Andre. The 0.4% is meaningless unless the percentage of papers that were rigorously checked is known. If only 2% got a sufficiently thorough review, then suddenly 0.4% actually sounds pretty terrible.

    With respect to elemental analyses, I've seen a lot of sketchy ones that were far too perfect. I'm talking CHN with all numbers agreeing within 0.02 when the instrument error's about 0.3%. I fear that some people are cherry picking their numbers from multiple analyses ("I like the C from run 1 and the H from run 4...") or are doing even worse things.

  4. Using See Arr Oh's method of article estimation...

    Current JOC issue has 57 articles (papers and notes, not counting Perspective) so estimate 26*57 articles per year published = about 1500 articles accepted (maybe actual of 1300-1700?). That sets a floor (I would think) for the number of articles receiving data checking - the number of article checked is probably higher than that. 3000 would actually be a plausible number - 50% acceptance doesn't sound ridiculous, and if an article is sent out for review, it probably would be deemed worthy of data checking. That would also likely mean that the data falsification rate isn't much higher than 1%.

  5. The mixing and matching of CHN numbers is not too smart because you can get caught. Usually an outside company sends you a list for each sample. Of course no one checks them, but whatever. Usually people just repeat runs until the errors all line up below 0.4%, which seems to me like it defeats the purpose. I once had to do an analysis for a metal complex, and the really pure stuff I had all got used up for reactions and getting yields, etc... I had a batch of 80%-90% pure stuff by NMR that I was using to explore chemistry of the thing, and the NMR said the impurities were other complexes. So, I thought, "maybe one impurity has an extra CO, another one has a CO less, it might all balance out". The thing came back with a perfect CHN, when I wouldn't dare to use the batch for NMR spectra that I later put in the article. And this is a metal complex, where the elemental analysis actually makes sense to do since you're not doing a fractional separation or passing it through a column, and you might get some salt impurities which you don't see in NMR... Totally useless technique that bleeds money and time.

    Then again, a friend of mine in grad school had a jackass paper where his EA was off sometimes by 10%, and it was published. Got in somehow. Maybe the reviewers were like me and he decided to ignore his crappy date due to righteous anger at the technique. Or they were lazy. Who knows.

    1. Okay, maybe not 10%, but definitely a few percent at least. He probably did accidentally lose a hair from his eyebrow, or something when preparing his sample to send it out to the analytics company. But the NMR was clean.

  6. For C13-NMR data I recommend to use

    For a detailed description see: