This lesson is still being designed and assembled (Pre-Alpha version)

Scientific integrity, Open Science and reproducibility

Overview

Teaching: 60 min
Exercises: 60-90 min
Questions
  • What is Scientific integrity and what is the link to Open Science and reproducibility?

  • What is Open Science and which aspects are important to me?

  • What is reproducibility and why should I care about it?

Objectives
  • Understand the connections between scientific integrity, Open Science and reproducibility

  • Name the requirements on designing, carrying out and reporting of research projects such that scientific integrity is respected

  • Discrimate between so-called negative and positive results

  • List all/many of the dimensions of Open Science

  • Explain why and know where to preregister studies

  • Apply these concepts when reading about research

1. What is scientific integrity and what is the link to Open Science and reproducibility?

Scientific/research integrity at the University of Zurich

Often when the term “Scientific integrity” comes up one would think about topics such as

Note that conflicts of interest can also be the subject of studies: https://doi.org/10.1186/s13643-020-01318-5

For each of these topics we have the University of Zurich links here, most other universities will also have corresponding regulations in place. But these topics are not the main interest of this course, we will instead focus at the aspects of research integrity discussed below.

National and international guidance documents on research integrity

Several guidance documents exist, see three European examples here:

⇒ We will have a brief look at each of the documents and work on the Swiss document in more detail.

LERU: Towards a research integrity culture at Universities

In a summary chapter the guidance document states what Universities should do to empower sound research:

Improve the design and conduct of research:

Improve the soundness of reporting

⇒ The points in bold are topics of this course and directly related to reproducibility as we will see below and later.

European code of conduct for research integrity

The EU code states that Good research practices are based on fundamental principles of research integrity.

⇒ You will find these same main principles in the Swiss guidance document! Adhering to the principles of reliability, honesty and accountability requires, among other aspects, to work reproducibly and openly.

The Swiss code of conduct for scientific integrity

The same principles occur in the Swiss document, here with a direct pointer to reproducibility:

“Reliability, honesty, respect, and accountability are the basic principles of scientific integrity. They underpin the independence and credibility of science and its disciplines as well as the accountability and reproducibility of research findings and their acceptance by society. As a system operating according to specific rules, science has a responsibility to create the structures and an environment that foster scientific integrity.”

Quiz on the Swiss code of conduct for scientific integrity

For these questions, please read or search the Code until page 26.

Audience

At which of the following groups of people is the code of conduct aimed at?

  • researchers at research performing institutions
  • educators at higher education institutions
  • administrative staff at research performing institutions
  • students at higher education institutions

Solution

T researchers at research performing institutions
T educators at higher education institutions
F administrative staff at research performing institutions
T students at higher education institutions

Reliability

For reliability researchers need to use, e.g.,

  • appropriate study designs
  • the most current methods
  • simple analysis methods
  • transparent reporting
  • traceable materials and data

Solution

T appropriate study designs
F the most current methods
F simple analysis methods
T transparent reporting
T traceable materials and data

Computer code

The code does not mention reproducible code (in the sense of computer code) directly. Find an implicit location where the use of reproducible code is implied by the standards of Chapter 4. Copy the entire bullet point or just the relevant verb.

Solution

The Code states “Researchers should design, undertake, analyse, document, and publish their research with care and with an awareness of their responsibility to society, the environment, and nature.” Using a scripting language for data analysis and providing the corresponding code hence caters to the “documenting” step.

Negative results

The non-publication of so-called negative results can be seen as a violation of scientific integrity. Find the behavior in Chapter 5 of the Code which this can be related to.

Solution

The Code lists “omitting or withholding data and data sources” as a behavior wich is an examples of scientific misconduct.

Example

Publication of negative results

Therapeutic fashion and publication bias: the case of anti-arrhythmic drugs in heart attack

  • In the 1970s, it was found that the local anaesthetic drug lignocaine (lidocaine) suppressed arrhythmias after heart attacks
  • That this claim was wrong was difficult to recognise from small clinical trials looking only at effects on arrhythmias, not outcomes that really matter, like deaths.
  • Large clinical trials in the late 1980s showed that the drugs actually increased mortality.
  • The results of Hampton and co-authors’ small but negative trial regarding the anti-arrhythmic agent lorcainide were not published because no journal was willing to do so at the time.
  • A cumulative meta-analysis of previous anti-arrhythmic trials would have helped avoid tens of thousands of unnecessarily early deaths, even more so if results like those of Hampton and co-authors would have been available.
  • With the words ‘publication bias’ in the title, the trial results could finally be published in the early 1990s:
    Therapeutic fashion and publication bias: the case of anti-arrhythmic drugs in heart attack

J Hampton https://journals.sagepub.com/doi/10.1177/0141076815608562

Bottom line: This is a very impressive example of the consequences of non-publication of “negative” results. The authors themselves are not to blame, they have maintained their integrity as researchers. The example shows that the publication of all results is indeed a principle of research integrity in the sense of the integrity of the research record as a whole.

 

2. What is Open Science?

Let´s play the game “Open up your research”

https://www.openscience.uzh.ch/de/moreopenscience/game.html

Dimensions of Open Science

Which decisions did Emma need to take in the game?

Solution

  1. Involve a librarian?
  2. Write a data management plan?
  3. Preregister her research plan?
  4. Make her data FAIR?
  5. Publish Open Access?
  6. Publish data and/or code?

UNESCO recommendation on Open Science

In 2021 UNESCO published their recommendations for Open Science. From their point of view Open Science is a tool helping to create a sustainable future. In the bold face part of the quote we see the link of Open Science to scientific integrity and also reproducibility:

“Building on the essential principles of academic freedom, research integrity and scientific excellence, open science sets a new paradigm that integrates into the scientific enterprise practices for reproducibility, transparency, sharing and collaboration resulting from the increased opening of scientific contents, tools and processes.”

Image credit: UNESCO Recommendation on Open Science, CC-BY-SA.

Optional: Read the full recommendation text at https://en.unesco.org/science-sustainable-future/open-science/recommendation.

Open Science made easy by the Open Science in Psychology/Social Science initiatives

The Open Science in Psychology/Social Science initiatives summarize and explain the practice of Open Science in seven steps: https://osf.io/hktmf/. Some of these steps were also part of Emma’s decision process. Here we show an abbreviated version of the seven steps:

Image credit: Eva Furrer, unlicensed, abbreviated version of https://osf.io/hktmf/.

We will revisit the following steps in this lesson:

  1. Create OSF account (use easy infrastructure for collaboration)
  2. Pregregister your own studies
  3. Open Data
  4. Reproducible Code
  5. Open Access (preprints)

What is preregistration?

The Open up your research game and the seven steps above refer to preregistration. But what is preregistration? The Texas sharp shooter cartoon shows an unregistered experiment. The shooter first shoots and then draws the bull´s eyes around his shots. He did not preregister where he wanted to shoot before shooting.

Image credit: Illustration by Dirk-Jan Hoek, CC-BY.

When a researcher preregisters a study, the design and precise goal of the study are declared openly in advance: the bull´s eye is drawn.

Origins of preregistration: clinical trials

A clinical trial is an experiment involving human volunteers for example in the development of a new drug. Registration of clinical trials, i.e. announcing that a trial will be conducted and what its goal is before any data are collected, has become a standard since the late 1990s. It is considered a scientific, ethical and moral responsibility for all trials because:

  • Informed decisions are difficult under publication bias and selective reporting, i.e. the non-publication negative results and the focus on publication of positive results which might not reflect the original goals. Publication bias and selective reporting result in a biased view of the situation.
  • Describing clinical trials in progress simplifies identification of research gaps
  • The early identification of potential problems contributes to improvements in the quality

The Declaration of Helsinki requires since the late 1990s: “Every clinical trial must be registered […]”

Registries (non-exhaustive list)

Here is a list of registries, where (pre)registration can be done:

  • Clinicaltrials.gov: US and international registry for clinical trials, first of its kind, established 1997: https://clinicaltrials.gov/

  • OSF: General purpose registry, also a research management tool (not just for preregistration), embargo possible for up to 4 years: https://osf.io/

  • Aspredicted: General purpose registry, protocols can be private forever, possibility to automatically delete an entry after 24 hours:
    https://aspredicted.org/

  • Preclinicaltirals.ed: Comprehensive listing of preclinical animal study protocols
    https://preclinicaltrials.eu/

  • PROSPERO International prospective register of systematic reviews
    https://www.crd.york.ac.uk/prospero/

Quiz on registration

Does registration show an effect?

All large National Heart Lung, and Blood Institute (NHLBI) supported randomized controlled trials between 1970 and 2012 evaluating drugs or dietary supplements for the treatment or prevention of cardiovascular disease are shown with their reported outcome measure in the graphic. Trials were included if direct costs were bigger than 500,000$/year, participants were adult humans, and the primary outcome was cardiovascular risk, disease or death.


Image Credit: R Kaplan and V Irvin https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132382, CC-BY.

What is the difference between what you observe before and after the year 2000 in this graphic?

Solution

Before 2000 one sees many positive effects, i.e. treatments that lower the relative risk of cardiovascular disease, but also null effects, in general the effects are larger. After registration of the primary outcome becomes mandatory, less outcome switching can occur and many more null effects are reported. The policy change helped to overcome this particular aspect of selective reporting.

 

3. What is reproducibility?

Reproducibility vs replicability

Reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. This requires, at minimum, the sharing of data sets, relevant metadata, analytical code, and related software.

Replicability refers to the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected.

See S Goodman et al. https://www.science.org/doi/10.1126/scitranslmed.aaf5027 for a finer grained discussion of the concepts.

What is reproducibility?

“This is exactly how it seems when you try to figure out how authors got from a large and complex data set to a dense paper with lots of busy figures. Without access to the data and the analysis code, a miracle occurred. And there should be no miracles in science.”

See artwork by Sidney Harris at http://www.sciencecartoonsplus.com/ for an illustration of the remark “I think you should be more explicit here in step two” when a miracle occurs.

The quote is from F Markowetz https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0850-7. In this publication the author asks what working reproducibly means for his daily work and comes up with “Five selfish reasons to work reproducibly”, this is even the title of the paper.

“Working transparently and reproducibly has a lot to do with empathy: put yourself into the shoes of one of your collaboration partners and ask yourself, would that person be able to access my data and make sense of my analyses. Learning the tools of the trade will require commitment and a massive investment of your time and energy. A priori it is not clear why the benefits of working reproducibly outweigh its costs.”

In this course we will learn about some of the tools Markowetz lists in his paper.

(Anti-)Example from the Markowetz paper

How bright promise in cancer testing fell apart

Image Credit: adapted from the open access article by K Baggerly and K Coombes https://projecteuclid.org/journals/annals-of-applied-statistics/volume-3/issue-4/Deriving-chemosensitivity-from-cell-lines–Forensic-bioinformatics-and-reproducible/10.1214/09-AOAS291.full.

From G Kolata https://www.nytimes.com/2011/07/08/health/research/08genes.html.

“When Juliet Jacobs found out she had lung cancer, she was terrified, but realized that her hope lay in getting the best treatment medicine could offer. So she got a second opinion, then a third. In February of 2010, she ended up at Duke University, where she entered a research study whose promise seemed stunning.

Doctors would assess her tumor cells, looking for gene patterns that would determine which drugs would best attack her particular cancer. She would not waste precious time with ineffective drugs or trial-and-error treatment. The Duke program — considered a breakthrough at the time — was the first fruit of the new genomics, a way of letting a cancer cell’s own genes reveal the cancer’s weaknesses.

But the research at Duke turned out to be wrong. Its gene-based tests proved worthless, and the research behind them was discredited. Ms. Jacobs died a few months after treatment, and her husband and other patients’ relatives have retained lawyers.”

Markowetz wonders in his paper why no one noticed these issues before it was too late. And he comes to the conclusion that the reason was that the data and analysis were not transparent and required forensic bioinformatics to untangle

Those forensic bioinformatics were provided by K Baggerly and K Coombes https://projecteuclid.org/journals/annals-of-applied-statistics/volume-3/issue-4/Deriving-chemosensitivity-from-cell-lines–Forensic-bioinformatics-and-reproducible/10.1214/09-AOAS291.full:

“Poor documentation hid an off-by-one indexing error affecting all genes reported, the inclusion of genes from other sources, including other arrays (the outliers), and a sensitive/resistant label reversal.”

Bottom line: Data analyses that are done using reproducible code and that are documented well are easier to check, for the analysts themselves and for others. Such practices decrease the chances that errors as in this example are made and this outweighs the effort and time they cost.

   

Episode challenge

A waste of 1000 research papers

Read the article “A Waste of 1000 Research Papers” by Ed Yong (The Atlantic, 27.5. 2019).

Question 1

Find situations in the article where publication bias, preregistration and data sharing could have aided to avoid such waste. Copy the corresponding lines from the article and name one or two reasons why you think that those concepts could have helped.

Question 2

Use smart search terms to find the concepts such that you do not need to read the entire research article.

Question 3

Go to the research article of Border et al. that is mentioned in Yong’s article and find out which of the above concepts have been respected in this article. Justify with citations.

Question 4

What are your overall conclusions?

Solution

No solution provided here.

Key Points

  • Scientific integrity, Open Science and reproducibility are connected.

  • All three themes are important for the trustworthiness of research results

  • The tools that will be taught in this course help to increase trustworthiness