Coding2: Exploring reproducibility in qualitative data analysis (QDA) through research software development

* Czech Academy of Sciences, Institute of Philosophy
@hlageek@sciences.social, hladik@flu.cas.cz

Acknowledgements

TNA Fellowship

TCDH

Requal team

Martin Hájek

Michal Škvrňák

Nina Fárová

My research

  • computational sociology of science
  • topic modeling

Science mapping based on topics SocArXiv

Topic differences among funding agencies (WIP)

Qualitative coding

Interpretative technique in social and health sciences

  • Tools
    • manual
    • non-dedicated software (MS Word, MS Excel)
    • CAQDAS - Computer-assisted qualitative data analysis software
  • Approaches
    • Codebook applied (e.g., content analysis)
    • Codebook built from material (e.g., grounded theory)
  • Material
    • documents (various types/media)
      • e.g., newspaper articles
    • interview and focus group transcripts
    • field notes

Coding is just one way to conduct qualitative research!

Reproducibility & replication in qualitative research

replicability is desirable in the humanities: by that, I mean that many empirical studies in the humanities should indeed be such that an independent repetition of it, using similar or different methods and conducted under similar circumstances, can be carried out.

(Peels 2019)

we argue that there are limits to replicability across all fields; but in some fields, including parts of the humanities, these limits severely undermine the value of replication to account for the value of research.

(Penders, Holbrook, and de Rijcke 2019)

Most of the arguments with respect to the role of data and methods in RR [repetitive research] are valid primarily for research that operates with datasets that represent the domain being investigated as well as with algorithmic implementations of the method of analysis.

(Schöch 2023)

Repetitive research as a spectrum?

(Peng 2011)

Transparency and openness

CAQDAS

  • dominated by proprietary solutions
    • non-equitable access
    • problem for teaching (licenses for students)
    • difficulties in collaboration


QualCoder - feature rich desktop software

Taguette - browser based basic and collaborative coding

QCoder - lightweight coding package

OpenQDA - new collaborative platform (launching this month)

Can CAQDAS help in making qualitative research more “repeatable” and open?

  • CAQDAS is point and click and interpretations belong to researchers
    • however, CAQDAS is the time when qualitative and computer coding meet (coding2)
  • motivations to develop our own CAQDAS
    • RQDA - CAQDAS package for the R environment was deprecated in 2021
    • rethinking qualitative research (coding) from the perspective of transparency, collaboration, openness and reproducibility
    • pragmatic approach - moving the dial of qualitative coding on the reproducibility spectrum
    • unassuming and unobtrusive approach - respecting qualitative researchers, not enforcing a new qualitative coding paradigm

Database schema RQDA

Open source helps!

Database schema reQual

reQual

  • reQual is now a working prototype / MVP
  • tested for teaching qualitative methods at two Czech universities
  • piloted by researchers

reQual’s contributions to “repeatability” and openness

  • Implemented
    • free and open source
    • logs of user actions
    • export of coded data and codebook
    • collaboration
      • coding overlap and consensus
      • attributes of coders
  • In progress
    • exchangeable projects (REFI-QDA standard)
    • support for deidentification/anonymization of data

Coders’ agreement

  • mutual overlap of coding among ~90 students
  • visual (qualitative) examination rather than a metric

Coders’ consensus

Sensitivity of coding to coders’ attributes

Visions of the future

  • support for data attributes and cases (analytical units)
  • launching a service
  • more analytical tools
  • always more optimizations and UI improvements
  • re-integration from GUI back to R