35,000 papers may need to be retracted for image doctoring, says new paper

Yes, you read that headline right.

In a new preprint posted to bioRxiv, image sleuths scanned hundreds of papers published over a seven-year period in Molecular and Cellular Biology (MCB), published by the American Society for Microbiology (ASM). The researchers — Arturo Casadevall of Johns Hopkins University, Elisabeth Bik of uBiome, Ferric Fang of the University of Washington (also on the board of directors of our parent non-profit organization), Roger Davis of the University of Massachusetts (and former MCB editor), and Amy Kullas, ASM’s publication ethics manager — found 59 potentially problematic papers, of which five were retracted. Extrapolating from these findings and those of another paper that scanned duplication rates, the researchers propose that tens of thousands of papers might need to be purged from the literature. That 35,000 figure is double the amount of retractions we’ve tallied so far in our database, which goes back to the 1970s. We spoke with the authors about their findings — and how to prevent bad images from getting published in the first place.

Retraction Watch: You found 59 potential instances of inappropriate duplication — how did you define this, and validate that the images were problematic?

Arturo Casadevall: Images were spotted by Elisabeth who has an remarkable ability to detect problems based on patterns. She then sent them to Ferric Fang and I, and we all needed to agree for the figure to be classified as problematic.

Elisabeth Bik: I used the exact same criteria as in the mBio study in which I scanned 20,000 papers [described here by RW]. We flagged three types of duplications:

  1. duplication of the exact same panel (e.g. a Western blot strip or a photo of cells) within the same paper, but that represented different experiments.
  2. duplication of a panel with a shift (e.g. 2 photos that show an area of overlap, or a Western blot that was shifted or rotated)
  3. duplication within a photo (e.g. 2 lanes within the same Western blot, the same cell visible multiple times within the same photo)

Note that these were all apparent duplications – instances where I judged that bands or photos looked unexpectedly similar. I am not perfect, but in case of doubt I would not flag it.

I did not flag apparent splices for this set.

Roger Davis: All of the identified image anomalies were then subjected to [Office of Research Integrity] forensic image analysis to formally confirm the presence of image problems.

RW: Among the 59 cases, 42 were corrected, but only five papers have been retracted. Does that surprise and/or disappoint you?

AC: I think we expected that most image problems were the result of error in assembling figures so the 10% retraction was not surprising.

RD: Authors of papers with image problems were contacted with a request for the original data.  The constructed figures and original data were then examined using ORI forensic tools. Decisions to take no action, to correct, or to retract were made by rigorously following COPE guidelines.

EB: We trusted the authors when they said that the duplication was the result of an error. Our goal is to make sure that the science is correct, not to punish. In most cases, the error could be corrected, so that others will be able to use the right datasets for future experiments and citations. The cases that were retracted where the papers where we felt that there were too many errors to be corrected, or where misconduct was suspected.

RW: No action was taken in 12 papers. The reasons you state are: “origin from laboratories that had closed (2 papers), resolution of the issue in correspondence (4 papers), and occurrence of the event more than six years earlier (6 papers).” Are these reasonable explanations, in your opinion?

AC: The ASM had a policy to investigate allegations that were not older than 6 years.  I think this is reasonable. The 6 year limit is based on the ORI statute of limitations – this is the justification employed by ASM.

Ferric Fang: The ORI established an 6-year limit in 2005 after learning from experience that it is impractical to pursue allegations of misconduct when more than 6 years have elapsed, and ASM Journals has had a similar experience.  The NIH only requires investigators to retain research records for 3 years from the date of submission of the final financial report, and the NSF similarly requires retention of records for 3 years after the submission of “all required reports.”  This underscores the importance of trying to address questionable data in a timely manner.

EB: This age limit is used by many publishers, not just ASM, and there are several good reasons to use it. It is very hard to pursue those older cases. Most labs do not save lab notebooks and blots/films that are older than that time frame, and postdocs and graduate students have already moved on. It is almost impossible to track down errors that happen that long ago. Papers older than 5 years also will have a lower chance of being cited, and other studies might already have either confirmed or rejected the findings from older papers with duplicated images.

There are some duplications in older papers (not just MCB’s) that are suggestive for intention-to-mislead and that might benefit from being discussed or flagged, but this is my personal opinion, not necessarily that of ASM or my co-authors.

RW: You extrapolate that if 10% of the MCB papers needed to be retracted for image duplication, then 35,000 papers throughout the literature may need the same. How did you perform that calculation, and what assumptions is it based on?

EB: We extrapolated the results from previous studies to the rest of the literature. In our previous study, in which we analyzed 20,000 papers, we found that 3.8% contained duplicated images. We know that the percentage of duplicated images varies per journal, because of a wide variety of reasons (different editorial processes, variable levels of peer review, different demographics of the authors). Since this percentage was calculated on papers from 40 different journals with different impact factors, this percentage serves as a reasonable representation of the whole body of biomedical literature. The 10.1 % is the percentage of papers that were retracted in the MCB dataset. Granted, this was a much smaller dataset than the one from the mBio paper, but it was a set that was seriously looked at.

If there are 8,778,928 biomedical publications indexed in PubMed from 2009-2016, and 3.8% contain a problematic image, and 10.6% (CI 1.5- 19.8%) of that group contain images of sufficient concern to warrant retraction, then we can estimate that approximately 35,000 (CI 6,584-86,911) papers are candidates for retraction due to image duplication.

RW: 35,000 papers sounds like a lot — but, as you note, it is a small fraction of the total number of papers published. Should working scientists, who rely on the integrity of the scientific literature, feel concerned about the number of potentially problematic papers that appear, for all intents and purpose, 100% valid?

AC: The number is large in magnitude but small when compared to the fraction that may be candidates for retraction in the total literature.  I think scientists need to be aware that there are problem papers out there and just be cautious with any published information. To me being cautious is always good scientific practice.

EB: Agreed. Errors can be found anywhere, not just in scientific papers. It is reassuring to know that most are the result of errors, not science misconduct. Studies like ours are also meant to raise awareness among editors and peer reviewers. Catching these errors before publication is a much better strategy than after publication. In this current study we show that investing some additional time during the editorial process to screen for image problems is worth the effort, and can save time down the road, in case duplications are discovered after publication. I hope that our study will result in more journals following in the footsteps of ASM by starting to pay attention to these duplications and other image problems, before they publish their papers.

RW: You note that it takes six hours for editorial staffers to address image issues in a published paper, but 30 minutes to screen images before publication. That’s a powerful demonstration of the benefits of screening. What barriers could prevent that from happening?

AC: The 30 mins was the time taken by the production Department to screen the figures.  I think the major impediment to having screening implemented widely is cost and finding the people with the right expertise.

Amy Kullas: The 30 minutes does not refer to editorial time, but the time taken by the ASM image specialists to screen the figures.

RW: Even after screening was introduced at MCB, you still found that 4% of papers included inappropriate manipulations. How should we think about that?

AC: Screening is not perfect.

FF: The MCB pre-publication screening process is not designed to detect the kind of image duplication that Elies Bik is able to detect.  The MCB staff screen for obvious instances of splicing, etc. that do not comply with journal guidelines for image presentation. The screening may incidentally deter or detect other types of image problem but it is not designed to do so.

EB: I expect this number to continue to go down. Both peer reviewers and editors are getting better in recognizing these problems. We are just starting to recognize these problems. I also expect, unfortunately, that people who really want to commit science misconduct will get better at photoshopping and generate images that cannot be recognized as fake using the human eye. Both peer reviewers and editors are getting better in recognizing these problems.

RW: Are there ways to reduce the rate of image duplication, besides pre-publication screening?

AC: Yes, we suggest that one mechanism for reducing these types of problems is to have someone else in the group assemble the figures.  At the very least that would mean a second set of eyes looking at the figures.

EB: Other solutions would be to better train peer reviewers to recognize duplications, and to develop software to detect manipulated images. We also need to raise more awareness and point out that these duplications are not allowed, so that authors can recognize these issues before submitting the manuscripts, and even adopt policies to not allow photoshopping or other science misconduct practices in their lab.

Source: retractionwatch.com