• Open access
  • Published: 27 May 2020

How to use and assess qualitative research methods

  • Loraine Busetto   ORCID: orcid.org/0000-0002-9228-7875 1 ,
  • Wolfgang Wick 1 , 2 &
  • Christoph Gumbinger 1  

Neurological Research and Practice volume  2 , Article number:  14 ( 2020 ) Cite this article

680k Accesses

265 Citations

88 Altmetric

Metrics details

This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common methods of data collection are document study, (non-) participant observations, semi-structured interviews and focus groups. For data analysis, field-notes and audio-recordings are transcribed into protocols and transcripts, and coded using qualitative data management software. Criteria such as checklists, reflexivity, sampling strategies, piloting, co-coding, member-checking and stakeholder involvement can be used to enhance and assess the quality of the research conducted. Using qualitative in addition to quantitative designs will equip us with better tools to address a greater range of research problems, and to fill in blind spots in current neurological research and practice.

The aim of this paper is to provide an overview of qualitative research methods, including hands-on information on how they can be used, reported and assessed. This article is intended for beginning qualitative researchers in the health sciences as well as experienced quantitative researchers who wish to broaden their understanding of qualitative research.

What is qualitative research?

Qualitative research is defined as “the study of the nature of phenomena”, including “their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived” , but excluding “their range, frequency and place in an objectively determined chain of cause and effect” [ 1 ]. This formal definition can be complemented with a more pragmatic rule of thumb: qualitative research generally includes data in form of words rather than numbers [ 2 ].

Why conduct qualitative research?

Because some research questions cannot be answered using (only) quantitative methods. For example, one Australian study addressed the issue of why patients from Aboriginal communities often present late or not at all to specialist services offered by tertiary care hospitals. Using qualitative interviews with patients and staff, it found one of the most significant access barriers to be transportation problems, including some towns and communities simply not having a bus service to the hospital [ 3 ]. A quantitative study could have measured the number of patients over time or even looked at possible explanatory factors – but only those previously known or suspected to be of relevance. To discover reasons for observed patterns, especially the invisible or surprising ones, qualitative designs are needed.

While qualitative research is common in other fields, it is still relatively underrepresented in health services research. The latter field is more traditionally rooted in the evidence-based-medicine paradigm, as seen in " research that involves testing the effectiveness of various strategies to achieve changes in clinical practice, preferably applying randomised controlled trial study designs (...) " [ 4 ]. This focus on quantitative research and specifically randomised controlled trials (RCT) is visible in the idea of a hierarchy of research evidence which assumes that some research designs are objectively better than others, and that choosing a "lesser" design is only acceptable when the better ones are not practically or ethically feasible [ 5 , 6 ]. Others, however, argue that an objective hierarchy does not exist, and that, instead, the research design and methods should be chosen to fit the specific research question at hand – "questions before methods" [ 2 , 7 , 8 , 9 ]. This means that even when an RCT is possible, some research problems require a different design that is better suited to addressing them. Arguing in JAMA, Berwick uses the example of rapid response teams in hospitals, which he describes as " a complex, multicomponent intervention – essentially a process of social change" susceptible to a range of different context factors including leadership or organisation history. According to him, "[in] such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect" [ 8 ] . Instead of limiting oneself to RCTs, Berwick recommends embracing a wider range of methods , including qualitative ones, which for "these specific applications, (...) are not compromises in learning how to improve; they are superior" [ 8 ].

Research problems that can be approached particularly well using qualitative methods include assessing complex multi-component interventions or systems (of change), addressing questions beyond “what works”, towards “what works for whom when, how and why”, and focussing on intervention improvement rather than accreditation [ 7 , 9 , 10 , 11 , 12 ]. Using qualitative methods can also help shed light on the “softer” side of medical treatment. For example, while quantitative trials can measure the costs and benefits of neuro-oncological treatment in terms of survival rates or adverse effects, qualitative research can help provide a better understanding of patient or caregiver stress, visibility of illness or out-of-pocket expenses.

How to conduct qualitative research?

Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [ 13 , 14 ]. As Fossey puts it : “sampling, data collection, analysis and interpretation are related to each other in a cyclical (iterative) manner, rather than following one after another in a stepwise approach” [ 15 ]. The researcher can make educated decisions with regard to the choice of method, how they are implemented, and to which and how many units they are applied [ 13 ]. As shown in Fig.  1 , this can involve several back-and-forth steps between data collection and analysis where new insights and experiences can lead to adaption and expansion of the original plan. Some insights may also necessitate a revision of the research question and/or the research design as a whole. The process ends when saturation is achieved, i.e. when no relevant new information can be found (see also below: sampling and saturation). For reasons of transparency, it is essential for all decisions as well as the underlying reasoning to be well-documented.

figure 1

Iterative research process

While it is not always explicitly addressed, qualitative methods reflect a different underlying research paradigm than quantitative research (e.g. constructivism or interpretivism as opposed to positivism). The choice of methods can be based on the respective underlying substantive theory or theoretical framework used by the researcher [ 2 ].

Data collection

The methods of qualitative data collection most commonly used in health research are document study, observations, semi-structured interviews and focus groups [ 1 , 14 , 16 , 17 ].

Document study

Document study (also called document analysis) refers to the review by the researcher of written materials [ 14 ]. These can include personal and non-personal documents such as archives, annual reports, guidelines, policy documents, diaries or letters.


Observations are particularly useful to gain insights into a certain setting and actual behaviour – as opposed to reported behaviour or opinions [ 13 ]. Qualitative observations can be either participant or non-participant in nature. In participant observations, the observer is part of the observed setting, for example a nurse working in an intensive care unit [ 18 ]. In non-participant observations, the observer is “on the outside looking in”, i.e. present in but not part of the situation, trying not to influence the setting by their presence. Observations can be planned (e.g. for 3 h during the day or night shift) or ad hoc (e.g. as soon as a stroke patient arrives at the emergency room). During the observation, the observer takes notes on everything or certain pre-determined parts of what is happening around them, for example focusing on physician-patient interactions or communication between different professional groups. Written notes can be taken during or after the observations, depending on feasibility (which is usually lower during participant observations) and acceptability (e.g. when the observer is perceived to be judging the observed). Afterwards, these field notes are transcribed into observation protocols. If more than one observer was involved, field notes are taken independently, but notes can be consolidated into one protocol after discussions. Advantages of conducting observations include minimising the distance between the researcher and the researched, the potential discovery of topics that the researcher did not realise were relevant and gaining deeper insights into the real-world dimensions of the research problem at hand [ 18 ].

Semi-structured interviews

Hijmans & Kuyper describe qualitative interviews as “an exchange with an informal character, a conversation with a goal” [ 19 ]. Interviews are used to gain insights into a person’s subjective experiences, opinions and motivations – as opposed to facts or behaviours [ 13 ]. Interviews can be distinguished by the degree to which they are structured (i.e. a questionnaire), open (e.g. free conversation or autobiographical interviews) or semi-structured [ 2 , 13 ]. Semi-structured interviews are characterized by open-ended questions and the use of an interview guide (or topic guide/list) in which the broad areas of interest, sometimes including sub-questions, are defined [ 19 ]. The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. The topic list is usually adapted and improved at the start of the data collection process as the interviewer learns more about the field [ 20 ]. Across interviews the focus on the different (blocks of) questions may differ and some questions may be skipped altogether (e.g. if the interviewee is not able or willing to answer the questions or for concerns about the total length of the interview) [ 20 ]. Qualitative interviews are usually not conducted in written format as it impedes on the interactive component of the method [ 20 ]. In comparison to written surveys, qualitative interviews have the advantage of being interactive and allowing for unexpected topics to emerge and to be taken up by the researcher. This can also help overcome a provider or researcher-centred bias often found in written surveys, which by nature, can only measure what is already known or expected to be of relevance to the researcher. Interviews can be audio- or video-taped; but sometimes it is only feasible or acceptable for the interviewer to take written notes [ 14 , 16 , 20 ].

Focus groups

Focus groups are group interviews to explore participants’ expertise and experiences, including explorations of how and why people behave in certain ways [ 1 ]. Focus groups usually consist of 6–8 people and are led by an experienced moderator following a topic guide or “script” [ 21 ]. They can involve an observer who takes note of the non-verbal aspects of the situation, possibly using an observation guide [ 21 ]. Depending on researchers’ and participants’ preferences, the discussions can be audio- or video-taped and transcribed afterwards [ 21 ]. Focus groups are useful for bringing together homogeneous (to a lesser extent heterogeneous) groups of participants with relevant expertise and experience on a given topic on which they can share detailed information [ 21 ]. Focus groups are a relatively easy, fast and inexpensive method to gain access to information on interactions in a given group, i.e. “the sharing and comparing” among participants [ 21 ]. Disadvantages include less control over the process and a lesser extent to which each individual may participate. Moreover, focus group moderators need experience, as do those tasked with the analysis of the resulting data. Focus groups can be less appropriate for discussing sensitive topics that participants might be reluctant to disclose in a group setting [ 13 ]. Moreover, attention must be paid to the emergence of “groupthink” as well as possible power dynamics within the group, e.g. when patients are awed or intimidated by health professionals.

Choosing the “right” method

As explained above, the school of thought underlying qualitative research assumes no objective hierarchy of evidence and methods. This means that each choice of single or combined methods has to be based on the research question that needs to be answered and a critical assessment with regard to whether or to what extent the chosen method can accomplish this – i.e. the “fit” between question and method [ 14 ]. It is necessary for these decisions to be documented when they are being made, and to be critically discussed when reporting methods and results.

Let us assume that our research aim is to examine the (clinical) processes around acute endovascular treatment (EVT), from the patient’s arrival at the emergency room to recanalization, with the aim to identify possible causes for delay and/or other causes for sub-optimal treatment outcome. As a first step, we could conduct a document study of the relevant standard operating procedures (SOPs) for this phase of care – are they up-to-date and in line with current guidelines? Do they contain any mistakes, irregularities or uncertainties that could cause delays or other problems? Regardless of the answers to these questions, the results have to be interpreted based on what they are: a written outline of what care processes in this hospital should look like. If we want to know what they actually look like in practice, we can conduct observations of the processes described in the SOPs. These results can (and should) be analysed in themselves, but also in comparison to the results of the document analysis, especially as regards relevant discrepancies. Do the SOPs outline specific tests for which no equipment can be observed or tasks to be performed by specialized nurses who are not present during the observation? It might also be possible that the written SOP is outdated, but the actual care provided is in line with current best practice. In order to find out why these discrepancies exist, it can be useful to conduct interviews. Are the physicians simply not aware of the SOPs (because their existence is limited to the hospital’s intranet) or do they actively disagree with them or does the infrastructure make it impossible to provide the care as described? Another rationale for adding interviews is that some situations (or all of their possible variations for different patient groups or the day, night or weekend shift) cannot practically or ethically be observed. In this case, it is possible to ask those involved to report on their actions – being aware that this is not the same as the actual observation. A senior physician’s or hospital manager’s description of certain situations might differ from a nurse’s or junior physician’s one, maybe because they intentionally misrepresent facts or maybe because different aspects of the process are visible or important to them. In some cases, it can also be relevant to consider to whom the interviewee is disclosing this information – someone they trust, someone they are otherwise not connected to, or someone they suspect or are aware of being in a potentially “dangerous” power relationship to them. Lastly, a focus group could be conducted with representatives of the relevant professional groups to explore how and why exactly they provide care around EVT. The discussion might reveal discrepancies (between SOPs and actual care or between different physicians) and motivations to the researchers as well as to the focus group members that they might not have been aware of themselves. For the focus group to deliver relevant information, attention has to be paid to its composition and conduct, for example, to make sure that all participants feel safe to disclose sensitive or potentially problematic information or that the discussion is not dominated by (senior) physicians only. The resulting combination of data collection methods is shown in Fig.  2 .

figure 2

Possible combination of data collection methods

Attributions for icons: “Book” by Serhii Smirnov, “Interview” by Adrien Coquet, FR, “Magnifying Glass” by anggun, ID, “Business communication” by Vectors Market; all from the Noun Project

The combination of multiple data source as described for this example can be referred to as “triangulation”, in which multiple measurements are carried out from different angles to achieve a more comprehensive understanding of the phenomenon under study [ 22 , 23 ].

Data analysis

To analyse the data collected through observations, interviews and focus groups these need to be transcribed into protocols and transcripts (see Fig.  3 ). Interviews and focus groups can be transcribed verbatim , with or without annotations for behaviour (e.g. laughing, crying, pausing) and with or without phonetic transcription of dialects and filler words, depending on what is expected or known to be relevant for the analysis. In the next step, the protocols and transcripts are coded , that is, marked (or tagged, labelled) with one or more short descriptors of the content of a sentence or paragraph [ 2 , 15 , 23 ]. Jansen describes coding as “connecting the raw data with “theoretical” terms” [ 20 ]. In a more practical sense, coding makes raw data sortable. This makes it possible to extract and examine all segments describing, say, a tele-neurology consultation from multiple data sources (e.g. SOPs, emergency room observations, staff and patient interview). In a process of synthesis and abstraction, the codes are then grouped, summarised and/or categorised [ 15 , 20 ]. The end product of the coding or analysis process is a descriptive theory of the behavioural pattern under investigation [ 20 ]. The coding process is performed using qualitative data management software, the most common ones being InVivo, MaxQDA and Atlas.ti. It should be noted that these are data management tools which support the analysis performed by the researcher(s) [ 14 ].

figure 3

From data collection to data analysis

Attributions for icons: see Fig. 2 , also “Speech to text” by Trevor Dsouza, “Field Notes” by Mike O’Brien, US, “Voice Record” by ProSymbols, US, “Inspection” by Made, AU, and “Cloud” by Graphic Tigers; all from the Noun Project

How to report qualitative research?

Protocols of qualitative research can be published separately and in advance of the study results. However, the aim is not the same as in RCT protocols, i.e. to pre-define and set in stone the research questions and primary or secondary endpoints. Rather, it is a way to describe the research methods in detail, which might not be possible in the results paper given journals’ word limits. Qualitative research papers are usually longer than their quantitative counterparts to allow for deep understanding and so-called “thick description”. In the methods section, the focus is on transparency of the methods used, including why, how and by whom they were implemented in the specific study setting, so as to enable a discussion of whether and how this may have influenced data collection, analysis and interpretation. The results section usually starts with a paragraph outlining the main findings, followed by more detailed descriptions of, for example, the commonalities, discrepancies or exceptions per category [ 20 ]. Here it is important to support main findings by relevant quotations, which may add information, context, emphasis or real-life examples [ 20 , 23 ]. It is subject to debate in the field whether it is relevant to state the exact number or percentage of respondents supporting a certain statement (e.g. “Five interviewees expressed negative feelings towards XYZ”) [ 21 ].

How to combine qualitative with quantitative research?

Qualitative methods can be combined with other methods in multi- or mixed methods designs, which “[employ] two or more different methods [ …] within the same study or research program rather than confining the research to one single method” [ 24 ]. Reasons for combining methods can be diverse, including triangulation for corroboration of findings, complementarity for illustration and clarification of results, expansion to extend the breadth and range of the study, explanation of (unexpected) results generated with one method with the help of another, or offsetting the weakness of one method with the strength of another [ 1 , 17 , 24 , 25 , 26 ]. The resulting designs can be classified according to when, why and how the different quantitative and/or qualitative data strands are combined. The three most common types of mixed method designs are the convergent parallel design , the explanatory sequential design and the exploratory sequential design. The designs with examples are shown in Fig.  4 .

figure 4

Three common mixed methods designs

In the convergent parallel design, a qualitative study is conducted in parallel to and independently of a quantitative study, and the results of both studies are compared and combined at the stage of interpretation of results. Using the above example of EVT provision, this could entail setting up a quantitative EVT registry to measure process times and patient outcomes in parallel to conducting the qualitative research outlined above, and then comparing results. Amongst other things, this would make it possible to assess whether interview respondents’ subjective impressions of patients receiving good care match modified Rankin Scores at follow-up, or whether observed delays in care provision are exceptions or the rule when compared to door-to-needle times as documented in the registry. In the explanatory sequential design, a quantitative study is carried out first, followed by a qualitative study to help explain the results from the quantitative study. This would be an appropriate design if the registry alone had revealed relevant delays in door-to-needle times and the qualitative study would be used to understand where and why these occurred, and how they could be improved. In the exploratory design, the qualitative study is carried out first and its results help informing and building the quantitative study in the next step [ 26 ]. If the qualitative study around EVT provision had shown a high level of dissatisfaction among the staff members involved, a quantitative questionnaire investigating staff satisfaction could be set up in the next step, informed by the qualitative study on which topics dissatisfaction had been expressed. Amongst other things, the questionnaire design would make it possible to widen the reach of the research to more respondents from different (types of) hospitals, regions, countries or settings, and to conduct sub-group analyses for different professional groups.

How to assess qualitative research?

A variety of assessment criteria and lists have been developed for qualitative research, ranging in their focus and comprehensiveness [ 14 , 17 , 27 ]. However, none of these has been elevated to the “gold standard” in the field. In the following, we therefore focus on a set of commonly used assessment criteria that, from a practical standpoint, a researcher can look for when assessing a qualitative research report or paper.

Assessors should check the authors’ use of and adherence to the relevant reporting checklists (e.g. Standards for Reporting Qualitative Research (SRQR)) to make sure all items that are relevant for this type of research are addressed [ 23 , 28 ]. Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].


While methodological transparency and complete reporting is relevant for all types of research, some additional criteria must be taken into account for qualitative research. This includes what is called reflexivity, i.e. sensitivity to the relationship between the researcher and the researched, including how contact was established and maintained, or the background and experience of the researcher(s) involved in data collection and analysis. Depending on the research question and population to be researched this can be limited to professional experience, but it may also include gender, age or ethnicity [ 17 , 27 ]. These details are relevant because in qualitative research, as opposed to quantitative research, the researcher as a person cannot be isolated from the research process [ 23 ]. It may influence the conversation when an interviewed patient speaks to an interviewer who is a physician, or when an interviewee is asked to discuss a gynaecological procedure with a male interviewer, and therefore the reader must be made aware of these details [ 19 ].

Sampling and saturation

The aim of qualitative sampling is for all variants of the objects of observation that are deemed relevant for the study to be present in the sample “ to see the issue and its meanings from as many angles as possible” [ 1 , 16 , 19 , 20 , 27 ] , and to ensure “information-richness [ 15 ]. An iterative sampling approach is advised, in which data collection (e.g. five interviews) is followed by data analysis, followed by more data collection to find variants that are lacking in the current sample. This process continues until no new (relevant) information can be found and further sampling becomes redundant – which is called saturation [ 1 , 15 ] . In other words: qualitative data collection finds its end point not a priori , but when the research team determines that saturation has been reached [ 29 , 30 ].

This is also the reason why most qualitative studies use deliberate instead of random sampling strategies. This is generally referred to as “ purposive sampling” , in which researchers pre-define which types of participants or cases they need to include so as to cover all variations that are expected to be of relevance, based on the literature, previous experience or theory (i.e. theoretical sampling) [ 14 , 20 ]. Other types of purposive sampling include (but are not limited to) maximum variation sampling, critical case sampling or extreme or deviant case sampling [ 2 ]. In the above EVT example, a purposive sample could include all relevant professional groups and/or all relevant stakeholders (patients, relatives) and/or all relevant times of observation (day, night and weekend shift).

Assessors of qualitative research should check whether the considerations underlying the sampling strategy were sound and whether or how researchers tried to adapt and improve their strategies in stepwise or cyclical approaches between data collection and analysis to achieve saturation [ 14 ].

Good qualitative research is iterative in nature, i.e. it goes back and forth between data collection and analysis, revising and improving the approach where necessary. One example of this are pilot interviews, where different aspects of the interview (especially the interview guide, but also, for example, the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised [ 19 ]. In doing so, the interviewer learns which wording or types of questions work best, or which is the best length of an interview with patients who have trouble concentrating for an extended time. Of course, the same reasoning applies to observations or focus groups which can also be piloted.

Ideally, coding should be performed by at least two researchers, especially at the beginning of the coding process when a common approach must be defined, including the establishment of a useful coding list (or tree), and when a common meaning of individual codes must be established [ 23 ]. An initial sub-set or all transcripts can be coded independently by the coders and then compared and consolidated after regular discussions in the research team. This is to make sure that codes are applied consistently to the research data.

Member checking

Member checking, also called respondent validation , refers to the practice of checking back with study respondents to see if the research is in line with their views [ 14 , 27 ]. This can happen after data collection or analysis or when first results are available [ 23 ]. For example, interviewees can be provided with (summaries of) their transcripts and asked whether they believe this to be a complete representation of their views or whether they would like to clarify or elaborate on their responses [ 17 ]. Respondents’ feedback on these issues then becomes part of the data collection and analysis [ 27 ].

Stakeholder involvement

In those niches where qualitative approaches have been able to evolve and grow, a new trend has seen the inclusion of patients and their representatives not only as study participants (i.e. “members”, see above) but as consultants to and active participants in the broader research process [ 31 , 32 , 33 ]. The underlying assumption is that patients and other stakeholders hold unique perspectives and experiences that add value beyond their own single story, making the research more relevant and beneficial to researchers, study participants and (future) patients alike [ 34 , 35 ]. Using the example of patients on or nearing dialysis, a recent scoping review found that 80% of clinical research did not address the top 10 research priorities identified by patients and caregivers [ 32 , 36 ]. In this sense, the involvement of the relevant stakeholders, especially patients and relatives, is increasingly being seen as a quality indicator in and of itself.

How not to assess qualitative research

The above overview does not include certain items that are routine in assessments of quantitative research. What follows is a non-exhaustive, non-representative, experience-based list of the quantitative criteria often applied to the assessment of qualitative research, as well as an explanation of the limited usefulness of these endeavours.

Protocol adherence

Given the openness and flexibility of qualitative research, it should not be assessed by how well it adheres to pre-determined and fixed strategies – in other words: its rigidity. Instead, the assessor should look for signs of adaptation and refinement based on lessons learned from earlier steps in the research process.

Sample size

For the reasons explained above, qualitative research does not require specific sample sizes, nor does it require that the sample size be determined a priori [ 1 , 14 , 27 , 37 , 38 , 39 ]. Sample size can only be a useful quality indicator when related to the research purpose, the chosen methodology and the composition of the sample, i.e. who was included and why.


While some authors argue that randomisation can be used in qualitative research, this is not commonly the case, as neither its feasibility nor its necessity or usefulness has been convincingly established for qualitative research [ 13 , 27 ]. Relevant disadvantages include the negative impact of a too large sample size as well as the possibility (or probability) of selecting “ quiet, uncooperative or inarticulate individuals ” [ 17 ]. Qualitative studies do not use control groups, either.

Interrater reliability, variability and other “objectivity checks”

The concept of “interrater reliability” is sometimes used in qualitative research to assess to which extent the coding approach overlaps between the two co-coders. However, it is not clear what this measure tells us about the quality of the analysis [ 23 ]. This means that these scores can be included in qualitative research reports, preferably with some additional information on what the score means for the analysis, but it is not a requirement. Relatedly, it is not relevant for the quality or “objectivity” of qualitative research to separate those who recruited the study participants and collected and analysed the data. Experiences even show that it might be better to have the same person or team perform all of these tasks [ 20 ]. First, when researchers introduce themselves during recruitment this can enhance trust when the interview takes place days or weeks later with the same researcher. Second, when the audio-recording is transcribed for analysis, the researcher conducting the interviews will usually remember the interviewee and the specific interview situation during data analysis. This might be helpful in providing additional context information for interpretation of data, e.g. on whether something might have been meant as a joke [ 18 ].

Not being quantitative research

Being qualitative research instead of quantitative research should not be used as an assessment criterion if it is used irrespectively of the research problem at hand. Similarly, qualitative research should not be required to be combined with quantitative research per se – unless mixed methods research is judged as inherently better than single-method research. In this case, the same criterion should be applied for quantitative studies without a qualitative component.

The main take-away points of this paper are summarised in Table 1 . We aimed to show that, if conducted well, qualitative research can answer specific research questions that cannot to be adequately answered using (only) quantitative designs. Seeing qualitative and quantitative methods as equal will help us become more aware and critical of the “fit” between the research problem and our chosen methods: I can conduct an RCT to determine the reasons for transportation delays of acute stroke patients – but should I? It also provides us with a greater range of tools to tackle a greater range of research problems more appropriately and successfully, filling in the blind spots on one half of the methodological spectrum to better address the whole complexity of neurological research and practice.

Availability of data and materials

Not applicable.


Endovascular treatment

Randomised Controlled Trial

Standard Operating Procedure

Standards for Reporting Qualitative Research

Philipsen, H., & Vernooij-Dassen, M. (2007). Kwalitatief onderzoek: nuttig, onmisbaar en uitdagend. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Qualitative research: useful, indispensable and challenging. In: Qualitative research: Practical methods for medical practice (pp. 5–12). Houten: Bohn Stafleu van Loghum.

Chapter   Google Scholar  

Punch, K. F. (2013). Introduction to social research: Quantitative and qualitative approaches . London: Sage.

Kelly, J., Dwyer, J., Willis, E., & Pekarsky, B. (2014). Travelling to the city for hospital care: Access factors in country aboriginal patient journeys. Australian Journal of Rural Health, 22 (3), 109–113.

Article   Google Scholar  

Nilsen, P., Ståhl, C., Roback, K., & Cairney, P. (2013). Never the twain shall meet? - a comparison of implementation science and policy implementation research. Implementation Science, 8 (1), 1–12.

Howick J, Chalmers I, Glasziou, P., Greenhalgh, T., Heneghan, C., Liberati, A., Moschetti, I., Phillips, B., & Thornton, H. (2011). The 2011 Oxford CEBM evidence levels of evidence (introductory document) . Oxford Center for Evidence Based Medicine. https://www.cebm.net/2011/06/2011-oxford-cebm-levels-evidence-introductory-document/ .

Eakin, J. M. (2016). Educating critical qualitative health researchers in the land of the randomized controlled trial. Qualitative Inquiry, 22 (2), 107–118.

May, A., & Mathijssen, J. (2015). Alternatieven voor RCT bij de evaluatie van effectiviteit van interventies!? Eindrapportage. In Alternatives for RCTs in the evaluation of effectiveness of interventions!? Final report .

Google Scholar  

Berwick, D. M. (2008). The science of improvement. Journal of the American Medical Association, 299 (10), 1182–1184.

Article   CAS   Google Scholar  

Christ, T. W. (2014). Scientific-based research and randomized controlled trials, the “gold” standard? Alternative paradigms and mixed methodologies. Qualitative Inquiry, 20 (1), 72–80.

Lamont, T., Barber, N., Jd, P., Fulop, N., Garfield-Birkbeck, S., Lilford, R., Mear, L., Raine, R., & Fitzpatrick, R. (2016). New approaches to evaluating complex health and care systems. BMJ, 352:i154.

Drabble, S. J., & O’Cathain, A. (2015). Moving from Randomized Controlled Trials to Mixed Methods Intervention Evaluation. In S. Hesse-Biber & R. B. Johnson (Eds.), The Oxford Handbook of Multimethod and Mixed Methods Research Inquiry (pp. 406–425). London: Oxford University Press.

Chambers, D. A., Glasgow, R. E., & Stange, K. C. (2013). The dynamic sustainability framework: Addressing the paradox of sustainment amid ongoing change. Implementation Science : IS, 8 , 117.

Hak, T. (2007). Waarnemingsmethoden in kwalitatief onderzoek. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Observation methods in qualitative research] (pp. 13–25). Houten: Bohn Stafleu van Loghum.

Russell, C. K., & Gregory, D. M. (2003). Evaluation of qualitative research studies. Evidence Based Nursing, 6 (2), 36–40.

Fossey, E., Harvey, C., McDermott, F., & Davidson, L. (2002). Understanding and evaluating qualitative research. Australian and New Zealand Journal of Psychiatry, 36 , 717–732.

Yanow, D. (2000). Conducting interpretive policy analysis (Vol. 47). Thousand Oaks: Sage University Papers Series on Qualitative Research Methods.

Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects. Education for Information, 22 , 63–75.

van der Geest, S. (2006). Participeren in ziekte en zorg: meer over kwalitatief onderzoek. Huisarts en Wetenschap, 49 (4), 283–287.

Hijmans, E., & Kuyper, M. (2007). Het halfopen interview als onderzoeksmethode. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [The half-open interview as research method (pp. 43–51). Houten: Bohn Stafleu van Loghum.

Jansen, H. (2007). Systematiek en toepassing van de kwalitatieve survey. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Systematics and implementation of the qualitative survey (pp. 27–41). Houten: Bohn Stafleu van Loghum.

Pv, R., & Peremans, L. (2007). Exploreren met focusgroepgesprekken: de ‘stem’ van de groep onder de loep. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Exploring with focus group conversations: the “voice” of the group under the magnifying glass (pp. 53–64). Houten: Bohn Stafleu van Loghum.

Carter, N., Bryant-Lukosius, D., DiCenso, A., Blythe, J., & Neville, A. J. (2014). The use of triangulation in qualitative research. Oncology Nursing Forum, 41 (5), 545–547.

Boeije H: Analyseren in kwalitatief onderzoek: Denken en doen, [Analysis in qualitative research: Thinking and doing] vol. Den Haag Boom Lemma uitgevers; 2012.

Hunter, A., & Brewer, J. (2015). Designing Multimethod Research. In S. Hesse-Biber & R. B. Johnson (Eds.), The Oxford Handbook of Multimethod and Mixed Methods Research Inquiry (pp. 185–205). London: Oxford University Press.

Archibald, M. M., Radil, A. I., Zhang, X., & Hanson, W. E. (2015). Current mixed methods practices in qualitative research: A content analysis of leading journals. International Journal of Qualitative Methods, 14 (2), 5–33.

Creswell, J. W., & Plano Clark, V. L. (2011). Choosing a Mixed Methods Design. In Designing and Conducting Mixed Methods Research . Thousand Oaks: SAGE Publications.

Mays, N., & Pope, C. (2000). Assessing quality in qualitative research. BMJ, 320 (7226), 50–52.

O'Brien, B. C., Harris, I. B., Beckman, T. J., Reed, D. A., & Cook, D. A. (2014). Standards for reporting qualitative research: A synthesis of recommendations. Academic Medicine : Journal of the Association of American Medical Colleges, 89 (9), 1245–1251.

Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Burroughs, H., & Jinks, C. (2018). Saturation in qualitative research: Exploring its conceptualization and operationalization. Quality and Quantity, 52 (4), 1893–1907.

Moser, A., & Korstjens, I. (2018). Series: Practical guidance to qualitative research. Part 3: Sampling, data collection and analysis. European Journal of General Practice, 24 (1), 9–18.

Marlett, N., Shklarov, S., Marshall, D., Santana, M. J., & Wasylak, T. (2015). Building new roles and relationships in research: A model of patient engagement research. Quality of Life Research : an international journal of quality of life aspects of treatment, care and rehabilitation, 24 (5), 1057–1067.

Demian, M. N., Lam, N. N., Mac-Way, F., Sapir-Pichhadze, R., & Fernandez, N. (2017). Opportunities for engaging patients in kidney research. Canadian Journal of Kidney Health and Disease, 4 , 2054358117703070–2054358117703070.

Noyes, J., McLaughlin, L., Morgan, K., Roberts, A., Stephens, M., Bourne, J., Houlston, M., Houlston, J., Thomas, S., Rhys, R. G., et al. (2019). Designing a co-productive study to overcome known methodological challenges in organ donation research with bereaved family members. Health Expectations . 22(4):824–35.

Piil, K., Jarden, M., & Pii, K. H. (2019). Research agenda for life-threatening cancer. European Journal Cancer Care (Engl), 28 (1), e12935.

Hofmann, D., Ibrahim, F., Rose, D., Scott, D. L., Cope, A., Wykes, T., & Lempp, H. (2015). Expectations of new treatment in rheumatoid arthritis: Developing a patient-generated questionnaire. Health Expectations : an international journal of public participation in health care and health policy, 18 (5), 995–1008.

Jun, M., Manns, B., Laupacis, A., Manns, L., Rehal, B., Crowe, S., & Hemmelgarn, B. R. (2015). Assessing the extent to which current clinical research is consistent with patient priorities: A scoping review using a case study in patients on or nearing dialysis. Canadian Journal of Kidney Health and Disease, 2 , 35.

Elsie Baker, S., & Edwards, R. (2012). How many qualitative interviews is enough? In National Centre for Research Methods Review Paper . National Centre for Research Methods. http://eprints.ncrm.ac.uk/2273/4/how_many_interviews.pdf .

Sandelowski, M. (1995). Sample size in qualitative research. Research in Nursing & Health, 18 (2), 179–183.

Sim, J., Saunders, B., Waterfield, J., & Kingstone, T. (2018). Can sample size in qualitative research be determined a priori? International Journal of Social Research Methodology, 21 (5), 619–634.

Download references


no external funding.

Author information

Authors and affiliations.

Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany

Loraine Busetto, Wolfgang Wick & Christoph Gumbinger

Clinical Cooperation Unit Neuro-Oncology, German Cancer Research Center, Heidelberg, Germany

Wolfgang Wick

You can also search for this author in PubMed   Google Scholar


LB drafted the manuscript; WW and CG revised the manuscript; all authors approved the final versions.

Corresponding author

Correspondence to Loraine Busetto .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Busetto, L., Wick, W. & Gumbinger, C. How to use and assess qualitative research methods. Neurol. Res. Pract. 2 , 14 (2020). https://doi.org/10.1186/s42466-020-00059-z

Download citation

Received : 30 January 2020

Accepted : 22 April 2020

Published : 27 May 2020

DOI : https://doi.org/10.1186/s42466-020-00059-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Qualitative research
  • Mixed methods
  • Quality assessment

Neurological Research and Practice

ISSN: 2524-3489

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

research articles using qualitative methods

Criteria for Good Qualitative Research: A Comprehensive Review

  • Regular Article
  • Open access
  • Published: 18 September 2021
  • Volume 31 , pages 679–689, ( 2022 )

Cite this article

You have full access to this open access article

  • Drishti Yadav   ORCID: orcid.org/0000-0002-2974-0323 1  

70k Accesses

22 Citations

72 Altmetric

Explore all metrics

This review aims to synthesize a published set of evaluative criteria for good qualitative research. The aim is to shed light on existing standards for assessing the rigor of qualitative research encompassing a range of epistemological and ontological standpoints. Using a systematic search strategy, published journal articles that deliberate criteria for rigorous research were identified. Then, references of relevant articles were surveyed to find noteworthy, distinct, and well-defined pointers to good qualitative research. This review presents an investigative assessment of the pivotal features in qualitative research that can permit the readers to pass judgment on its quality and to condemn it as good research when objectively and adequately utilized. Overall, this review underlines the crux of qualitative research and accentuates the necessity to evaluate such research by the very tenets of its being. It also offers some prospects and recommendations to improve the quality of qualitative research. Based on the findings of this review, it is concluded that quality criteria are the aftereffect of socio-institutional procedures and existing paradigmatic conducts. Owing to the paradigmatic diversity of qualitative research, a single and specific set of quality criteria is neither feasible nor anticipated. Since qualitative research is not a cohesive discipline, researchers need to educate and familiarize themselves with applicable norms and decisive factors to evaluate qualitative research from within its theoretical and methodological framework of origin.

Similar content being viewed by others

research articles using qualitative methods

What is Qualitative in Qualitative Research

Patrik Aspers & Ugo Corte

research articles using qualitative methods

Environmental-, social-, and governance-related factors for business investment and sustainability: a scientometric review of global trends

Hadiqa Ahmad, Muhammad Yaqub & Seung Hwan Lee

research articles using qualitative methods

Literature reviews as independent studies: guidelines for academic practice

Sascha Kraus, Matthias Breier, … João J. Ferreira

Avoid common mistakes on your manuscript.


“… It is important to regularly dialogue about what makes for good qualitative research” (Tracy, 2010 , p. 837)

To decide what represents good qualitative research is highly debatable. There are numerous methods that are contained within qualitative research and that are established on diverse philosophical perspectives. Bryman et al., ( 2008 , p. 262) suggest that “It is widely assumed that whereas quality criteria for quantitative research are well‐known and widely agreed, this is not the case for qualitative research.” Hence, the question “how to evaluate the quality of qualitative research” has been continuously debated. There are many areas of science and technology wherein these debates on the assessment of qualitative research have taken place. Examples include various areas of psychology: general psychology (Madill et al., 2000 ); counseling psychology (Morrow, 2005 ); and clinical psychology (Barker & Pistrang, 2005 ), and other disciplines of social sciences: social policy (Bryman et al., 2008 ); health research (Sparkes, 2001 ); business and management research (Johnson et al., 2006 ); information systems (Klein & Myers, 1999 ); and environmental studies (Reid & Gough, 2000 ). In the literature, these debates are enthused by the impression that the blanket application of criteria for good qualitative research developed around the positivist paradigm is improper. Such debates are based on the wide range of philosophical backgrounds within which qualitative research is conducted (e.g., Sandberg, 2000 ; Schwandt, 1996 ). The existence of methodological diversity led to the formulation of different sets of criteria applicable to qualitative research.

Among qualitative researchers, the dilemma of governing the measures to assess the quality of research is not a new phenomenon, especially when the virtuous triad of objectivity, reliability, and validity (Spencer et al., 2004 ) are not adequate. Occasionally, the criteria of quantitative research are used to evaluate qualitative research (Cohen & Crabtree, 2008 ; Lather, 2004 ). Indeed, Howe ( 2004 ) claims that the prevailing paradigm in educational research is scientifically based experimental research. Hypotheses and conjectures about the preeminence of quantitative research can weaken the worth and usefulness of qualitative research by neglecting the prominence of harmonizing match for purpose on research paradigm, the epistemological stance of the researcher, and the choice of methodology. Researchers have been reprimanded concerning this in “paradigmatic controversies, contradictions, and emerging confluences” (Lincoln & Guba, 2000 ).

In general, qualitative research tends to come from a very different paradigmatic stance and intrinsically demands distinctive and out-of-the-ordinary criteria for evaluating good research and varieties of research contributions that can be made. This review attempts to present a series of evaluative criteria for qualitative researchers, arguing that their choice of criteria needs to be compatible with the unique nature of the research in question (its methodology, aims, and assumptions). This review aims to assist researchers in identifying some of the indispensable features or markers of high-quality qualitative research. In a nutshell, the purpose of this systematic literature review is to analyze the existing knowledge on high-quality qualitative research and to verify the existence of research studies dealing with the critical assessment of qualitative research based on the concept of diverse paradigmatic stances. Contrary to the existing reviews, this review also suggests some critical directions to follow to improve the quality of qualitative research in different epistemological and ontological perspectives. This review is also intended to provide guidelines for the acceleration of future developments and dialogues among qualitative researchers in the context of assessing the qualitative research.

The rest of this review article is structured in the following fashion: Sect.  Methods describes the method followed for performing this review. Section Criteria for Evaluating Qualitative Studies provides a comprehensive description of the criteria for evaluating qualitative studies. This section is followed by a summary of the strategies to improve the quality of qualitative research in Sect.  Improving Quality: Strategies . Section  How to Assess the Quality of the Research Findings? provides details on how to assess the quality of the research findings. After that, some of the quality checklists (as tools to evaluate quality) are discussed in Sect.  Quality Checklists: Tools for Assessing the Quality . At last, the review ends with the concluding remarks presented in Sect.  Conclusions, Future Directions and Outlook . Some prospects in qualitative research for enhancing its quality and usefulness in the social and techno-scientific research community are also presented in Sect.  Conclusions, Future Directions and Outlook .

For this review, a comprehensive literature search was performed from many databases using generic search terms such as Qualitative Research , Criteria , etc . The following databases were chosen for the literature search based on the high number of results: IEEE Explore, ScienceDirect, PubMed, Google Scholar, and Web of Science. The following keywords (and their combinations using Boolean connectives OR/AND) were adopted for the literature search: qualitative research, criteria, quality, assessment, and validity. The synonyms for these keywords were collected and arranged in a logical structure (see Table 1 ). All publications in journals and conference proceedings later than 1950 till 2021 were considered for the search. Other articles extracted from the references of the papers identified in the electronic search were also included. A large number of publications on qualitative research were retrieved during the initial screening. Hence, to include the searches with the main focus on criteria for good qualitative research, an inclusion criterion was utilized in the search string.

From the selected databases, the search retrieved a total of 765 publications. Then, the duplicate records were removed. After that, based on the title and abstract, the remaining 426 publications were screened for their relevance by using the following inclusion and exclusion criteria (see Table 2 ). Publications focusing on evaluation criteria for good qualitative research were included, whereas those works which delivered theoretical concepts on qualitative research were excluded. Based on the screening and eligibility, 45 research articles were identified that offered explicit criteria for evaluating the quality of qualitative research and were found to be relevant to this review.

Figure  1 illustrates the complete review process in the form of PRISMA flow diagram. PRISMA, i.e., “preferred reporting items for systematic reviews and meta-analyses” is employed in systematic reviews to refine the quality of reporting.

figure 1

PRISMA flow diagram illustrating the search and inclusion process. N represents the number of records

Criteria for Evaluating Qualitative Studies

Fundamental criteria: general research quality.

Various researchers have put forward criteria for evaluating qualitative research, which have been summarized in Table 3 . Also, the criteria outlined in Table 4 effectively deliver the various approaches to evaluate and assess the quality of qualitative work. The entries in Table 4 are based on Tracy’s “Eight big‐tent criteria for excellent qualitative research” (Tracy, 2010 ). Tracy argues that high-quality qualitative work should formulate criteria focusing on the worthiness, relevance, timeliness, significance, morality, and practicality of the research topic, and the ethical stance of the research itself. Researchers have also suggested a series of questions as guiding principles to assess the quality of a qualitative study (Mays & Pope, 2020 ). Nassaji ( 2020 ) argues that good qualitative research should be robust, well informed, and thoroughly documented.

Qualitative Research: Interpretive Paradigms

All qualitative researchers follow highly abstract principles which bring together beliefs about ontology, epistemology, and methodology. These beliefs govern how the researcher perceives and acts. The net, which encompasses the researcher’s epistemological, ontological, and methodological premises, is referred to as a paradigm, or an interpretive structure, a “Basic set of beliefs that guides action” (Guba, 1990 ). Four major interpretive paradigms structure the qualitative research: positivist and postpositivist, constructivist interpretive, critical (Marxist, emancipatory), and feminist poststructural. The complexity of these four abstract paradigms increases at the level of concrete, specific interpretive communities. Table 5 presents these paradigms and their assumptions, including their criteria for evaluating research, and the typical form that an interpretive or theoretical statement assumes in each paradigm. Moreover, for evaluating qualitative research, quantitative conceptualizations of reliability and validity are proven to be incompatible (Horsburgh, 2003 ). In addition, a series of questions have been put forward in the literature to assist a reviewer (who is proficient in qualitative methods) for meticulous assessment and endorsement of qualitative research (Morse, 2003 ). Hammersley ( 2007 ) also suggests that guiding principles for qualitative research are advantageous, but methodological pluralism should not be simply acknowledged for all qualitative approaches. Seale ( 1999 ) also points out the significance of methodological cognizance in research studies.

Table 5 reflects that criteria for assessing the quality of qualitative research are the aftermath of socio-institutional practices and existing paradigmatic standpoints. Owing to the paradigmatic diversity of qualitative research, a single set of quality criteria is neither possible nor desirable. Hence, the researchers must be reflexive about the criteria they use in the various roles they play within their research community.

Improving Quality: Strategies

Another critical question is “How can the qualitative researchers ensure that the abovementioned quality criteria can be met?” Lincoln and Guba ( 1986 ) delineated several strategies to intensify each criteria of trustworthiness. Other researchers (Merriam & Tisdell, 2016 ; Shenton, 2004 ) also presented such strategies. A brief description of these strategies is shown in Table 6 .

It is worth mentioning that generalizability is also an integral part of qualitative research (Hays & McKibben, 2021 ). In general, the guiding principle pertaining to generalizability speaks about inducing and comprehending knowledge to synthesize interpretive components of an underlying context. Table 7 summarizes the main metasynthesis steps required to ascertain generalizability in qualitative research.

Figure  2 reflects the crucial components of a conceptual framework and their contribution to decisions regarding research design, implementation, and applications of results to future thinking, study, and practice (Johnson et al., 2020 ). The synergy and interrelationship of these components signifies their role to different stances of a qualitative research study.

figure 2

Essential elements of a conceptual framework

In a nutshell, to assess the rationale of a study, its conceptual framework and research question(s), quality criteria must take account of the following: lucid context for the problem statement in the introduction; well-articulated research problems and questions; precise conceptual framework; distinct research purpose; and clear presentation and investigation of the paradigms. These criteria would expedite the quality of qualitative research.

How to Assess the Quality of the Research Findings?

The inclusion of quotes or similar research data enhances the confirmability in the write-up of the findings. The use of expressions (for instance, “80% of all respondents agreed that” or “only one of the interviewees mentioned that”) may also quantify qualitative findings (Stenfors et al., 2020 ). On the other hand, the persuasive reason for “why this may not help in intensifying the research” has also been provided (Monrouxe & Rees, 2020 ). Further, the Discussion and Conclusion sections of an article also prove robust markers of high-quality qualitative research, as elucidated in Table 8 .

Quality Checklists: Tools for Assessing the Quality

Numerous checklists are available to speed up the assessment of the quality of qualitative research. However, if used uncritically and recklessly concerning the research context, these checklists may be counterproductive. I recommend that such lists and guiding principles may assist in pinpointing the markers of high-quality qualitative research. However, considering enormous variations in the authors’ theoretical and philosophical contexts, I would emphasize that high dependability on such checklists may say little about whether the findings can be applied in your setting. A combination of such checklists might be appropriate for novice researchers. Some of these checklists are listed below:

The most commonly used framework is Consolidated Criteria for Reporting Qualitative Research (COREQ) (Tong et al., 2007 ). This framework is recommended by some journals to be followed by the authors during article submission.

Standards for Reporting Qualitative Research (SRQR) is another checklist that has been created particularly for medical education (O’Brien et al., 2014 ).

Also, Tracy ( 2010 ) and Critical Appraisal Skills Programme (CASP, 2021 ) offer criteria for qualitative research relevant across methods and approaches.

Further, researchers have also outlined different criteria as hallmarks of high-quality qualitative research. For instance, the “Road Trip Checklist” (Epp & Otnes, 2021 ) provides a quick reference to specific questions to address different elements of high-quality qualitative research.

Conclusions, Future Directions, and Outlook

This work presents a broad review of the criteria for good qualitative research. In addition, this article presents an exploratory analysis of the essential elements in qualitative research that can enable the readers of qualitative work to judge it as good research when objectively and adequately utilized. In this review, some of the essential markers that indicate high-quality qualitative research have been highlighted. I scope them narrowly to achieve rigor in qualitative research and note that they do not completely cover the broader considerations necessary for high-quality research. This review points out that a universal and versatile one-size-fits-all guideline for evaluating the quality of qualitative research does not exist. In other words, this review also emphasizes the non-existence of a set of common guidelines among qualitative researchers. In unison, this review reinforces that each qualitative approach should be treated uniquely on account of its own distinctive features for different epistemological and disciplinary positions. Owing to the sensitivity of the worth of qualitative research towards the specific context and the type of paradigmatic stance, researchers should themselves analyze what approaches can be and must be tailored to ensemble the distinct characteristics of the phenomenon under investigation. Although this article does not assert to put forward a magic bullet and to provide a one-stop solution for dealing with dilemmas about how, why, or whether to evaluate the “goodness” of qualitative research, it offers a platform to assist the researchers in improving their qualitative studies. This work provides an assembly of concerns to reflect on, a series of questions to ask, and multiple sets of criteria to look at, when attempting to determine the quality of qualitative research. Overall, this review underlines the crux of qualitative research and accentuates the need to evaluate such research by the very tenets of its being. Bringing together the vital arguments and delineating the requirements that good qualitative research should satisfy, this review strives to equip the researchers as well as reviewers to make well-versed judgment about the worth and significance of the qualitative research under scrutiny. In a nutshell, a comprehensive portrayal of the research process (from the context of research to the research objectives, research questions and design, speculative foundations, and from approaches of collecting data to analyzing the results, to deriving inferences) frequently proliferates the quality of a qualitative research.

Prospects : A Road Ahead for Qualitative Research

Irrefutably, qualitative research is a vivacious and evolving discipline wherein different epistemological and disciplinary positions have their own characteristics and importance. In addition, not surprisingly, owing to the sprouting and varied features of qualitative research, no consensus has been pulled off till date. Researchers have reflected various concerns and proposed several recommendations for editors and reviewers on conducting reviews of critical qualitative research (Levitt et al., 2021 ; McGinley et al., 2021 ). Following are some prospects and a few recommendations put forward towards the maturation of qualitative research and its quality evaluation:

In general, most of the manuscript and grant reviewers are not qualitative experts. Hence, it is more likely that they would prefer to adopt a broad set of criteria. However, researchers and reviewers need to keep in mind that it is inappropriate to utilize the same approaches and conducts among all qualitative research. Therefore, future work needs to focus on educating researchers and reviewers about the criteria to evaluate qualitative research from within the suitable theoretical and methodological context.

There is an urgent need to refurbish and augment critical assessment of some well-known and widely accepted tools (including checklists such as COREQ, SRQR) to interrogate their applicability on different aspects (along with their epistemological ramifications).

Efforts should be made towards creating more space for creativity, experimentation, and a dialogue between the diverse traditions of qualitative research. This would potentially help to avoid the enforcement of one's own set of quality criteria on the work carried out by others.

Moreover, journal reviewers need to be aware of various methodological practices and philosophical debates.

It is pivotal to highlight the expressions and considerations of qualitative researchers and bring them into a more open and transparent dialogue about assessing qualitative research in techno-scientific, academic, sociocultural, and political rooms.

Frequent debates on the use of evaluative criteria are required to solve some potentially resolved issues (including the applicability of a single set of criteria in multi-disciplinary aspects). Such debates would not only benefit the group of qualitative researchers themselves, but primarily assist in augmenting the well-being and vivacity of the entire discipline.

To conclude, I speculate that the criteria, and my perspective, may transfer to other methods, approaches, and contexts. I hope that they spark dialog and debate – about criteria for excellent qualitative research and the underpinnings of the discipline more broadly – and, therefore, help improve the quality of a qualitative study. Further, I anticipate that this review will assist the researchers to contemplate on the quality of their own research, to substantiate research design and help the reviewers to review qualitative research for journals. On a final note, I pinpoint the need to formulate a framework (encompassing the prerequisites of a qualitative study) by the cohesive efforts of qualitative researchers of different disciplines with different theoretic-paradigmatic origins. I believe that tailoring such a framework (of guiding principles) paves the way for qualitative researchers to consolidate the status of qualitative research in the wide-ranging open science debate. Dialogue on this issue across different approaches is crucial for the impending prospects of socio-techno-educational research.

Amin, M. E. K., Nørgaard, L. S., Cavaco, A. M., Witry, M. J., Hillman, L., Cernasev, A., & Desselle, S. P. (2020). Establishing trustworthiness and authenticity in qualitative pharmacy research. Research in Social and Administrative Pharmacy, 16 (10), 1472–1482.

Article   Google Scholar  

Barker, C., & Pistrang, N. (2005). Quality criteria under methodological pluralism: Implications for conducting and evaluating research. American Journal of Community Psychology, 35 (3–4), 201–212.

Bryman, A., Becker, S., & Sempik, J. (2008). Quality criteria for quantitative, qualitative and mixed methods research: A view from social policy. International Journal of Social Research Methodology, 11 (4), 261–276.

Caelli, K., Ray, L., & Mill, J. (2003). ‘Clear as mud’: Toward greater clarity in generic qualitative research. International Journal of Qualitative Methods, 2 (2), 1–13.

CASP (2021). CASP checklists. Retrieved May 2021 from https://casp-uk.net/casp-tools-checklists/

Cohen, D. J., & Crabtree, B. F. (2008). Evaluative criteria for qualitative research in health care: Controversies and recommendations. The Annals of Family Medicine, 6 (4), 331–339.

Denzin, N. K., & Lincoln, Y. S. (2005). Introduction: The discipline and practice of qualitative research. In N. K. Denzin & Y. S. Lincoln (Eds.), The sage handbook of qualitative research (pp. 1–32). Sage Publications Ltd.

Google Scholar  

Elliott, R., Fischer, C. T., & Rennie, D. L. (1999). Evolving guidelines for publication of qualitative research studies in psychology and related fields. British Journal of Clinical Psychology, 38 (3), 215–229.

Epp, A. M., & Otnes, C. C. (2021). High-quality qualitative research: Getting into gear. Journal of Service Research . https://doi.org/10.1177/1094670520961445

Guba, E. G. (1990). The paradigm dialog. In Alternative paradigms conference, mar, 1989, Indiana u, school of education, San Francisco, ca, us . Sage Publications, Inc.

Hammersley, M. (2007). The issue of quality in qualitative research. International Journal of Research and Method in Education, 30 (3), 287–305.

Haven, T. L., Errington, T. M., Gleditsch, K. S., van Grootel, L., Jacobs, A. M., Kern, F. G., & Mokkink, L. B. (2020). Preregistering qualitative research: A Delphi study. International Journal of Qualitative Methods, 19 , 1609406920976417.

Hays, D. G., & McKibben, W. B. (2021). Promoting rigorous research: Generalizability and qualitative research. Journal of Counseling and Development, 99 (2), 178–188.

Horsburgh, D. (2003). Evaluation of qualitative research. Journal of Clinical Nursing, 12 (2), 307–312.

Howe, K. R. (2004). A critique of experimentalism. Qualitative Inquiry, 10 (1), 42–46.

Johnson, J. L., Adkins, D., & Chauvin, S. (2020). A review of the quality indicators of rigor in qualitative research. American Journal of Pharmaceutical Education, 84 (1), 7120.

Johnson, P., Buehring, A., Cassell, C., & Symon, G. (2006). Evaluating qualitative management research: Towards a contingent criteriology. International Journal of Management Reviews, 8 (3), 131–156.

Klein, H. K., & Myers, M. D. (1999). A set of principles for conducting and evaluating interpretive field studies in information systems. MIS Quarterly, 23 (1), 67–93.

Lather, P. (2004). This is your father’s paradigm: Government intrusion and the case of qualitative research in education. Qualitative Inquiry, 10 (1), 15–34.

Levitt, H. M., Morrill, Z., Collins, K. M., & Rizo, J. L. (2021). The methodological integrity of critical qualitative research: Principles to support design and research review. Journal of Counseling Psychology, 68 (3), 357.

Lincoln, Y. S., & Guba, E. G. (1986). But is it rigorous? Trustworthiness and authenticity in naturalistic evaluation. New Directions for Program Evaluation, 1986 (30), 73–84.

Lincoln, Y. S., & Guba, E. G. (2000). Paradigmatic controversies, contradictions and emerging confluences. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (2nd ed., pp. 163–188). Sage Publications.

Madill, A., Jordan, A., & Shirley, C. (2000). Objectivity and reliability in qualitative analysis: Realist, contextualist and radical constructionist epistemologies. British Journal of Psychology, 91 (1), 1–20.

Mays, N., & Pope, C. (2020). Quality in qualitative research. Qualitative Research in Health Care . https://doi.org/10.1002/9781119410867.ch15

McGinley, S., Wei, W., Zhang, L., & Zheng, Y. (2021). The state of qualitative research in hospitality: A 5-year review 2014 to 2019. Cornell Hospitality Quarterly, 62 (1), 8–20.

Merriam, S., & Tisdell, E. (2016). Qualitative research: A guide to design and implementation. San Francisco, US.

Meyer, M., & Dykes, J. (2019). Criteria for rigor in visualization design study. IEEE Transactions on Visualization and Computer Graphics, 26 (1), 87–97.

Monrouxe, L. V., & Rees, C. E. (2020). When I say… quantification in qualitative research. Medical Education, 54 (3), 186–187.

Morrow, S. L. (2005). Quality and trustworthiness in qualitative research in counseling psychology. Journal of Counseling Psychology, 52 (2), 250.

Morse, J. M. (2003). A review committee’s guide for evaluating qualitative proposals. Qualitative Health Research, 13 (6), 833–851.

Nassaji, H. (2020). Good qualitative research. Language Teaching Research, 24 (4), 427–431.

O’Brien, B. C., Harris, I. B., Beckman, T. J., Reed, D. A., & Cook, D. A. (2014). Standards for reporting qualitative research: A synthesis of recommendations. Academic Medicine, 89 (9), 1245–1251.

O’Connor, C., & Joffe, H. (2020). Intercoder reliability in qualitative research: Debates and practical guidelines. International Journal of Qualitative Methods, 19 , 1609406919899220.

Reid, A., & Gough, S. (2000). Guidelines for reporting and evaluating qualitative research: What are the alternatives? Environmental Education Research, 6 (1), 59–91.

Rocco, T. S. (2010). Criteria for evaluating qualitative studies. Human Resource Development International . https://doi.org/10.1080/13678868.2010.501959

Sandberg, J. (2000). Understanding human competence at work: An interpretative approach. Academy of Management Journal, 43 (1), 9–25.

Schwandt, T. A. (1996). Farewell to criteriology. Qualitative Inquiry, 2 (1), 58–72.

Seale, C. (1999). Quality in qualitative research. Qualitative Inquiry, 5 (4), 465–478.

Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects. Education for Information, 22 (2), 63–75.

Sparkes, A. C. (2001). Myth 94: Qualitative health researchers will agree about validity. Qualitative Health Research, 11 (4), 538–552.

Spencer, L., Ritchie, J., Lewis, J., & Dillon, L. (2004). Quality in qualitative evaluation: A framework for assessing research evidence.

Stenfors, T., Kajamaa, A., & Bennett, D. (2020). How to assess the quality of qualitative research. The Clinical Teacher, 17 (6), 596–599.

Taylor, E. W., Beck, J., & Ainsworth, E. (2001). Publishing qualitative adult education research: A peer review perspective. Studies in the Education of Adults, 33 (2), 163–179.

Tong, A., Sainsbury, P., & Craig, J. (2007). Consolidated criteria for reporting qualitative research (COREQ): A 32-item checklist for interviews and focus groups. International Journal for Quality in Health Care, 19 (6), 349–357.

Tracy, S. J. (2010). Qualitative quality: Eight “big-tent” criteria for excellent qualitative research. Qualitative Inquiry, 16 (10), 837–851.

Download references

Open access funding provided by TU Wien (TUW).

Author information

Authors and affiliations.

Faculty of Informatics, Technische Universität Wien, 1040, Vienna, Austria

Drishti Yadav

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Drishti Yadav .

Ethics declarations

Conflict of interest.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Yadav, D. Criteria for Good Qualitative Research: A Comprehensive Review. Asia-Pacific Edu Res 31 , 679–689 (2022). https://doi.org/10.1007/s40299-021-00619-0

Download citation

Accepted : 28 August 2021

Published : 18 September 2021

Issue Date : December 2022

DOI : https://doi.org/10.1007/s40299-021-00619-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Qualitative research
  • Evaluative criteria
  • Find a journal
  • Publish with us
  • Track your research

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base


  • What Is Qualitative Research? | Methods & Examples

What Is Qualitative Research? | Methods & Examples

Published on June 19, 2020 by Pritha Bhandari . Revised on June 22, 2023.

Qualitative research involves collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences. It can be used to gather in-depth insights into a problem or generate new ideas for research.

Qualitative research is the opposite of quantitative research , which involves collecting and analyzing numerical data for statistical analysis.

Qualitative research is commonly used in the humanities and social sciences, in subjects such as anthropology, sociology, education, health sciences, history, etc.

  • How does social media shape body image in teenagers?
  • How do children and adults interpret healthy eating in the UK?
  • What factors influence employee retention in a large organization?
  • How is anxiety experienced around the world?
  • How can teachers integrate social issues into science curriculums?

Table of contents

Approaches to qualitative research, qualitative research methods, qualitative data analysis, advantages of qualitative research, disadvantages of qualitative research, other interesting articles, frequently asked questions about qualitative research.

Qualitative research is used to understand how people experience the world. While there are many approaches to qualitative research, they tend to be flexible and focus on retaining rich meaning when interpreting data.

Common approaches include grounded theory, ethnography , action research , phenomenological research, and narrative research. They share some similarities, but emphasize different aims and perspectives.

Note that qualitative research is at risk for certain research biases including the Hawthorne effect , observer bias , recall bias , and social desirability bias . While not always totally avoidable, awareness of potential biases as you collect and analyze your data can prevent them from impacting your work too much.

Prevent plagiarism. Run a free check.

Each of the research approaches involve using one or more data collection methods . These are some of the most common qualitative methods:

  • Observations: recording what you have seen, heard, or encountered in detailed field notes.
  • Interviews:  personally asking people questions in one-on-one conversations.
  • Focus groups: asking questions and generating discussion among a group of people.
  • Surveys : distributing questionnaires with open-ended questions.
  • Secondary research: collecting existing data in the form of texts, images, audio or video recordings, etc.
  • You take field notes with observations and reflect on your own experiences of the company culture.
  • You distribute open-ended surveys to employees across all the company’s offices by email to find out if the culture varies across locations.
  • You conduct in-depth interviews with employees in your office to learn about their experiences and perspectives in greater detail.

Qualitative researchers often consider themselves “instruments” in research because all observations, interpretations and analyses are filtered through their own personal lens.

For this reason, when writing up your methodology for qualitative research, it’s important to reflect on your approach and to thoroughly explain the choices you made in collecting and analyzing the data.

Qualitative data can take the form of texts, photos, videos and audio. For example, you might be working with interview transcripts, survey responses, fieldnotes, or recordings from natural settings.

Most types of qualitative data analysis share the same five steps:

  • Prepare and organize your data. This may mean transcribing interviews or typing up fieldnotes.
  • Review and explore your data. Examine the data for patterns or repeated ideas that emerge.
  • Develop a data coding system. Based on your initial ideas, establish a set of codes that you can apply to categorize your data.
  • Assign codes to the data. For example, in qualitative survey analysis, this may mean going through each participant’s responses and tagging them with codes in a spreadsheet. As you go through your data, you can create new codes to add to your system if necessary.
  • Identify recurring themes. Link codes together into cohesive, overarching themes.

There are several specific approaches to analyzing qualitative data. Although these methods share similar processes, they emphasize different concepts.

Qualitative research often tries to preserve the voice and perspective of participants and can be adjusted as new research questions arise. Qualitative research is good for:

  • Flexibility

The data collection and analysis process can be adapted as new ideas or patterns emerge. They are not rigidly decided beforehand.

  • Natural settings

Data collection occurs in real-world contexts or in naturalistic ways.

  • Meaningful insights

Detailed descriptions of people’s experiences, feelings and perceptions can be used in designing, testing or improving systems or products.

  • Generation of new ideas

Open-ended responses mean that researchers can uncover novel problems or opportunities that they wouldn’t have thought of otherwise.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Researchers must consider practical and theoretical limitations in analyzing and interpreting their data. Qualitative research suffers from:

  • Unreliability

The real-world setting often makes qualitative research unreliable because of uncontrolled factors that affect the data.

  • Subjectivity

Due to the researcher’s primary role in analyzing and interpreting data, qualitative research cannot be replicated . The researcher decides what is important and what is irrelevant in data analysis, so interpretations of the same data can vary greatly.

  • Limited generalizability

Small samples are often used to gather detailed data about specific contexts. Despite rigorous analysis procedures, it is difficult to draw generalizable conclusions because the data may be biased and unrepresentative of the wider population .

  • Labor-intensive

Although software can be used to manage and record large amounts of text, data analysis often has to be checked or performed manually.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

There are five common approaches to qualitative research :

  • Grounded theory involves collecting data in order to develop new theories.
  • Ethnography involves immersing yourself in a group or organization to understand its culture.
  • Narrative research involves interpreting stories to understand how people make sense of their experiences and perceptions.
  • Phenomenological research involves investigating phenomena through people’s lived experiences.
  • Action research links theory and practice in several cycles to drive innovative changes.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organize your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). What Is Qualitative Research? | Methods & Examples. Scribbr. Retrieved March 19, 2024, from https://www.scribbr.com/methodology/qualitative-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, qualitative vs. quantitative research | differences, examples & methods, how to do thematic analysis | step-by-step guide & examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Qualitative Study


  • 1 University of Nebraska Medical Center
  • 2 GDB Research and Statistical Consulting
  • 3 GDB Research and Statistical Consulting/McLaren Macomb Hospital
  • PMID: 29262162
  • Bookshelf ID: NBK470395

Qualitative research is a type of research that explores and provides deeper insights into real-world problems. Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data. Qualitative research gathers participants' experiences, perceptions, and behavior. It answers the hows and whys instead of how many or how much. It could be structured as a stand-alone study, purely relying on qualitative data or it could be part of mixed-methods research that combines qualitative and quantitative data. This review introduces the readers to some basic concepts, definitions, terminology, and application of qualitative research.

Qualitative research at its core, ask open-ended questions whose answers are not easily put into numbers such as ‘how’ and ‘why’. Due to the open-ended nature of the research questions at hand, qualitative research design is often not linear in the same way quantitative design is. One of the strengths of qualitative research is its ability to explain processes and patterns of human behavior that can be difficult to quantify. Phenomena such as experiences, attitudes, and behaviors can be difficult to accurately capture quantitatively, whereas a qualitative approach allows participants themselves to explain how, why, or what they were thinking, feeling, and experiencing at a certain time or during an event of interest. Quantifying qualitative data certainly is possible, but at its core, qualitative data is looking for themes and patterns that can be difficult to quantify and it is important to ensure that the context and narrative of qualitative work are not lost by trying to quantify something that is not meant to be quantified.

However, while qualitative research is sometimes placed in opposition to quantitative research, where they are necessarily opposites and therefore ‘compete’ against each other and the philosophical paradigms associated with each, qualitative and quantitative work are not necessarily opposites nor are they incompatible. While qualitative and quantitative approaches are different, they are not necessarily opposites, and they are certainly not mutually exclusive. For instance, qualitative research can help expand and deepen understanding of data or results obtained from quantitative analysis. For example, say a quantitative analysis has determined that there is a correlation between length of stay and level of patient satisfaction, but why does this correlation exist? This dual-focus scenario shows one way in which qualitative and quantitative research could be integrated together.

Examples of Qualitative Research Approaches


Ethnography as a research design has its origins in social and cultural anthropology, and involves the researcher being directly immersed in the participant’s environment. Through this immersion, the ethnographer can use a variety of data collection techniques with the aim of being able to produce a comprehensive account of the social phenomena that occurred during the research period. That is to say, the researcher’s aim with ethnography is to immerse themselves into the research population and come out of it with accounts of actions, behaviors, events, etc. through the eyes of someone involved in the population. Direct involvement of the researcher with the target population is one benefit of ethnographic research because it can then be possible to find data that is otherwise very difficult to extract and record.

Grounded Theory

Grounded Theory is the “generation of a theoretical model through the experience of observing a study population and developing a comparative analysis of their speech and behavior.” As opposed to quantitative research which is deductive and tests or verifies an existing theory, grounded theory research is inductive and therefore lends itself to research that is aiming to study social interactions or experiences. In essence, Grounded Theory’s goal is to explain for example how and why an event occurs or how and why people might behave a certain way. Through observing the population, a researcher using the Grounded Theory approach can then develop a theory to explain the phenomena of interest.


Phenomenology is defined as the “study of the meaning of phenomena or the study of the particular”. At first glance, it might seem that Grounded Theory and Phenomenology are quite similar, but upon careful examination, the differences can be seen. At its core, phenomenology looks to investigate experiences from the perspective of the individual. Phenomenology is essentially looking into the ‘lived experiences’ of the participants and aims to examine how and why participants behaved a certain way, from their perspective . Herein lies one of the main differences between Grounded Theory and Phenomenology. Grounded Theory aims to develop a theory for social phenomena through an examination of various data sources whereas Phenomenology focuses on describing and explaining an event or phenomena from the perspective of those who have experienced it.

Narrative Research

One of qualitative research’s strengths lies in its ability to tell a story, often from the perspective of those directly involved in it. Reporting on qualitative research involves including details and descriptions of the setting involved and quotes from participants. This detail is called ‘thick’ or ‘rich’ description and is a strength of qualitative research. Narrative research is rife with the possibilities of ‘thick’ description as this approach weaves together a sequence of events, usually from just one or two individuals, in the hopes of creating a cohesive story, or narrative. While it might seem like a waste of time to focus on such a specific, individual level, understanding one or two people’s narratives for an event or phenomenon can help to inform researchers about the influences that helped shape that narrative. The tension or conflict of differing narratives can be “opportunities for innovation”.

Research Paradigm

Research paradigms are the assumptions, norms, and standards that underpin different approaches to research. Essentially, research paradigms are the ‘worldview’ that inform research. It is valuable for researchers, both qualitative and quantitative, to understand what paradigm they are working within because understanding the theoretical basis of research paradigms allows researchers to understand the strengths and weaknesses of the approach being used and adjust accordingly. Different paradigms have different ontology and epistemologies . Ontology is defined as the "assumptions about the nature of reality” whereas epistemology is defined as the “assumptions about the nature of knowledge” that inform the work researchers do. It is important to understand the ontological and epistemological foundations of the research paradigm researchers are working within to allow for a full understanding of the approach being used and the assumptions that underpin the approach as a whole. Further, it is crucial that researchers understand their own ontological and epistemological assumptions about the world in general because their assumptions about the world will necessarily impact how they interact with research. A discussion of the research paradigm is not complete without describing positivist, postpositivist, and constructivist philosophies.

Positivist vs Postpositivist

To further understand qualitative research, we need to discuss positivist and postpositivist frameworks. Positivism is a philosophy that the scientific method can and should be applied to social as well as natural sciences. Essentially, positivist thinking insists that the social sciences should use natural science methods in its research which stems from positivist ontology that there is an objective reality that exists that is fully independent of our perception of the world as individuals. Quantitative research is rooted in positivist philosophy, which can be seen in the value it places on concepts such as causality, generalizability, and replicability.

Conversely, postpositivists argue that social reality can never be one hundred percent explained but it could be approximated. Indeed, qualitative researchers have been insisting that there are “fundamental limits to the extent to which the methods and procedures of the natural sciences could be applied to the social world” and therefore postpositivist philosophy is often associated with qualitative research. An example of positivist versus postpositivist values in research might be that positivist philosophies value hypothesis-testing, whereas postpositivist philosophies value the ability to formulate a substantive theory.


Constructivism is a subcategory of postpositivism. Most researchers invested in postpositivist research are constructivist as well, meaning they think there is no objective external reality that exists but rather that reality is constructed. Constructivism is a theoretical lens that emphasizes the dynamic nature of our world. “Constructivism contends that individuals’ views are directly influenced by their experiences, and it is these individual experiences and views that shape their perspective of reality”. Essentially, Constructivist thought focuses on how ‘reality’ is not a fixed certainty and experiences, interactions, and backgrounds give people a unique view of the world. Constructivism contends, unlike in positivist views, that there is not necessarily an ‘objective’ reality we all experience. This is the ‘relativist’ ontological view that reality and the world we live in are dynamic and socially constructed. Therefore, qualitative scientific knowledge can be inductive as well as deductive.”

So why is it important to understand the differences in assumptions that different philosophies and approaches to research have? Fundamentally, the assumptions underpinning the research tools a researcher selects provide an overall base for the assumptions the rest of the research will have and can even change the role of the researcher themselves. For example, is the researcher an ‘objective’ observer such as in positivist quantitative work? Or is the researcher an active participant in the research itself, as in postpositivist qualitative work? Understanding the philosophical base of the research undertaken allows researchers to fully understand the implications of their work and their role within the research, as well as reflect on their own positionality and bias as it pertains to the research they are conducting.

Data Sampling

The better the sample represents the intended study population, the more likely the researcher is to encompass the varying factors at play. The following are examples of participant sampling and selection:

Purposive sampling- selection based on the researcher’s rationale in terms of being the most informative.

Criterion sampling-selection based on pre-identified factors.

Convenience sampling- selection based on availability.

Snowball sampling- the selection is by referral from other participants or people who know potential participants.

Extreme case sampling- targeted selection of rare cases.

Typical case sampling-selection based on regular or average participants.

Data Collection and Analysis

Qualitative research uses several techniques including interviews, focus groups, and observation. [1] [2] [3] Interviews may be unstructured, with open-ended questions on a topic and the interviewer adapts to the responses. Structured interviews have a predetermined number of questions that every participant is asked. It is usually one on one and is appropriate for sensitive topics or topics needing an in-depth exploration. Focus groups are often held with 8-12 target participants and are used when group dynamics and collective views on a topic are desired. Researchers can be a participant-observer to share the experiences of the subject or a non-participant or detached observer.

While quantitative research design prescribes a controlled environment for data collection, qualitative data collection may be in a central location or in the environment of the participants, depending on the study goals and design. Qualitative research could amount to a large amount of data. Data is transcribed which may then be coded manually or with the use of Computer Assisted Qualitative Data Analysis Software or CAQDAS such as ATLAS.ti or NVivo.

After the coding process, qualitative research results could be in various formats. It could be a synthesis and interpretation presented with excerpts from the data. Results also could be in the form of themes and theory or model development.


To standardize and facilitate the dissemination of qualitative research outcomes, the healthcare team can use two reporting standards. The Consolidated Criteria for Reporting Qualitative Research or COREQ is a 32-item checklist for interviews and focus groups. The Standards for Reporting Qualitative Research (SRQR) is a checklist covering a wider range of qualitative research.

Examples of Application

Many times a research question will start with qualitative research. The qualitative research will help generate the research hypothesis which can be tested with quantitative methods. After the data is collected and analyzed with quantitative methods, a set of qualitative methods can be used to dive deeper into the data for a better understanding of what the numbers truly mean and their implications. The qualitative methods can then help clarify the quantitative data and also help refine the hypothesis for future research. Furthermore, with qualitative research researchers can explore subjects that are poorly studied with quantitative methods. These include opinions, individual's actions, and social science research.

A good qualitative study design starts with a goal or objective. This should be clearly defined or stated. The target population needs to be specified. A method for obtaining information from the study population must be carefully detailed to ensure there are no omissions of part of the target population. A proper collection method should be selected which will help obtain the desired information without overly limiting the collected data because many times, the information sought is not well compartmentalized or obtained. Finally, the design should ensure adequate methods for analyzing the data. An example may help better clarify some of the various aspects of qualitative research.

A researcher wants to decrease the number of teenagers who smoke in their community. The researcher could begin by asking current teen smokers why they started smoking through structured or unstructured interviews (qualitative research). The researcher can also get together a group of current teenage smokers and conduct a focus group to help brainstorm factors that may have prevented them from starting to smoke (qualitative research).

In this example, the researcher has used qualitative research methods (interviews and focus groups) to generate a list of ideas of both why teens start to smoke as well as factors that may have prevented them from starting to smoke. Next, the researcher compiles this data. The research found that, hypothetically, peer pressure, health issues, cost, being considered “cool,” and rebellious behavior all might increase or decrease the likelihood of teens starting to smoke.

The researcher creates a survey asking teen participants to rank how important each of the above factors is in either starting smoking (for current smokers) or not smoking (for current non-smokers). This survey provides specific numbers (ranked importance of each factor) and is thus a quantitative research tool.

The researcher can use the results of the survey to focus efforts on the one or two highest-ranked factors. Let us say the researcher found that health was the major factor that keeps teens from starting to smoke, and peer pressure was the major factor that contributed to teens to start smoking. The researcher can go back to qualitative research methods to dive deeper into each of these for more information. The researcher wants to focus on how to keep teens from starting to smoke, so they focus on the peer pressure aspect.

The researcher can conduct interviews and/or focus groups (qualitative research) about what types and forms of peer pressure are commonly encountered, where the peer pressure comes from, and where smoking first starts. The researcher hypothetically finds that peer pressure often occurs after school at the local teen hangouts, mostly the local park. The researcher also hypothetically finds that peer pressure comes from older, current smokers who provide the cigarettes.

The researcher could further explore this observation made at the local teen hangouts (qualitative research) and take notes regarding who is smoking, who is not, and what observable factors are at play for peer pressure of smoking. The researcher finds a local park where many local teenagers hang out and see that a shady, overgrown area of the park is where the smokers tend to hang out. The researcher notes the smoking teenagers buy their cigarettes from a local convenience store adjacent to the park where the clerk does not check identification before selling cigarettes. These observations fall under qualitative research.

If the researcher returns to the park and counts how many individuals smoke in each region of the park, this numerical data would be quantitative research. Based on the researcher's efforts thus far, they conclude that local teen smoking and teenagers who start to smoke may decrease if there are fewer overgrown areas of the park and the local convenience store does not sell cigarettes to underage individuals.

The researcher could try to have the parks department reassess the shady areas to make them less conducive to the smokers or identify how to limit the sales of cigarettes to underage individuals by the convenience store. The researcher would then cycle back to qualitative methods of asking at-risk population their perceptions of the changes, what factors are still at play, as well as quantitative research that includes teen smoking rates in the community, the incidence of new teen smokers, among others.

Copyright © 2024, StatPearls Publishing LLC.

  • Introduction
  • Issues of Concern
  • Clinical Significance
  • Enhancing Healthcare Team Outcomes
  • Review Questions

Publication types

  • Study Guide
  • Search Menu
  • Advance articles
  • Editor's Choice
  • ESHRE Pages
  • Mini-reviews
  • Author Guidelines
  • Submission Site
  • Reasons to Publish
  • Open Access
  • Advertising and Corporate Services
  • Advertising
  • Reprints and ePrints
  • Sponsored Supplements
  • Branded Books
  • Journals Career Network
  • About Human Reproduction
  • About the European Society of Human Reproduction and Embryology
  • Editorial Board
  • Self-Archiving Policy
  • Dispatch Dates
  • Contact ESHRE
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

Introduction, when to use qualitative research, how to judge qualitative research, conclusions, authors' roles, conflict of interest.

  • < Previous

Qualitative research methods: when to use them and how to judge them

  • Article contents
  • Figures & tables
  • Supplementary Data

K. Hammarberg, M. Kirkman, S. de Lacey, Qualitative research methods: when to use them and how to judge them, Human Reproduction , Volume 31, Issue 3, March 2016, Pages 498–501, https://doi.org/10.1093/humrep/dev334

  • Permissions Icon Permissions

In March 2015, an impressive set of guidelines for best practice on how to incorporate psychosocial care in routine infertility care was published by the ESHRE Psychology and Counselling Guideline Development Group ( ESHRE Psychology and Counselling Guideline Development Group, 2015 ). The authors report that the guidelines are based on a comprehensive review of the literature and we congratulate them on their meticulous compilation of evidence into a clinically useful document. However, when we read the methodology section, we were baffled and disappointed to find that evidence from research using qualitative methods was not included in the formulation of the guidelines. Despite stating that ‘qualitative research has significant value to assess the lived experience of infertility and fertility treatment’, the group excluded this body of evidence because qualitative research is ‘not generally hypothesis-driven and not objective/neutral, as the researcher puts him/herself in the position of the participant to understand how the world is from the person's perspective’.

Qualitative and quantitative research methods are often juxtaposed as representing two different world views. In quantitative circles, qualitative research is commonly viewed with suspicion and considered lightweight because it involves small samples which may not be representative of the broader population, it is seen as not objective, and the results are assessed as biased by the researchers' own experiences or opinions. In qualitative circles, quantitative research can be dismissed as over-simplifying individual experience in the cause of generalisation, failing to acknowledge researcher biases and expectations in research design, and requiring guesswork to understand the human meaning of aggregate data.

As social scientists who investigate psychosocial aspects of human reproduction, we use qualitative and quantitative methods, separately or together, depending on the research question. The crucial part is to know when to use what method.

The peer-review process is a pillar of scientific publishing. One of the important roles of reviewers is to assess the scientific rigour of the studies from which authors draw their conclusions. If rigour is lacking, the paper should not be published. As with research using quantitative methods, research using qualitative methods is home to the good, the bad and the ugly. It is essential that reviewers know the difference. Rejection letters are hard to take but more often than not they are based on legitimate critique. However, from time to time it is obvious that the reviewer has little grasp of what constitutes rigour or quality in qualitative research. The first author (K.H.) recently submitted a paper that reported findings from a qualitative study about fertility-related knowledge and information-seeking behaviour among people of reproductive age. In the rejection letter one of the reviewers (not from Human Reproduction ) lamented, ‘Even for a qualitative study, I would expect that some form of confidence interval and paired t-tables analysis, etc. be used to analyse the significance of results'. This comment reveals the reviewer's inappropriate application to qualitative research of criteria relevant only to quantitative research.

In this commentary, we give illustrative examples of questions most appropriately answered using qualitative methods and provide general advice about how to appraise the scientific rigour of qualitative studies. We hope this will help the journal's reviewers and readers appreciate the legitimate place of qualitative research and ensure we do not throw the baby out with the bath water by excluding or rejecting papers simply because they report the results of qualitative studies.

In psychosocial research, ‘quantitative’ research methods are appropriate when ‘factual’ data are required to answer the research question; when general or probability information is sought on opinions, attitudes, views, beliefs or preferences; when variables can be isolated and defined; when variables can be linked to form hypotheses before data collection; and when the question or problem is known, clear and unambiguous. Quantitative methods can reveal, for example, what percentage of the population supports assisted conception, their distribution by age, marital status, residential area and so on, as well as changes from one survey to the next ( Kovacs et al. , 2012 ); the number of donors and donor siblings located by parents of donor-conceived children ( Freeman et al. , 2009 ); and the relationship between the attitude of donor-conceived people to learning of their donor insemination conception and their family ‘type’ (one or two parents, lesbian or heterosexual parents; Beeson et al. , 2011 ).

In contrast, ‘qualitative’ methods are used to answer questions about experience, meaning and perspective, most often from the standpoint of the participant. These data are usually not amenable to counting or measuring. Qualitative research techniques include ‘small-group discussions’ for investigating beliefs, attitudes and concepts of normative behaviour; ‘semi-structured interviews’, to seek views on a focused topic or, with key informants, for background information or an institutional perspective; ‘in-depth interviews’ to understand a condition, experience, or event from a personal perspective; and ‘analysis of texts and documents’, such as government reports, media articles, websites or diaries, to learn about distributed or private knowledge.

Qualitative methods have been used to reveal, for example, potential problems in implementing a proposed trial of elective single embryo transfer, where small-group discussions enabled staff to explain their own resistance, leading to an amended approach ( Porter and Bhattacharya, 2005 ). Small-group discussions among assisted reproductive technology (ART) counsellors were used to investigate how the welfare principle is interpreted and practised by health professionals who must apply it in ART ( de Lacey et al. , 2015 ). When legislative change meant that gamete donors could seek identifying details of people conceived from their gametes, parents needed advice on how best to tell their children. Small-group discussions were convened to ask adolescents (not known to be donor-conceived) to reflect on how they would prefer to be told ( Kirkman et al. , 2007 ).

When a population cannot be identified, such as anonymous sperm donors from the 1980s, a qualitative approach with wide publicity can reach people who do not usually volunteer for research and reveal (for example) their attitudes to proposed legislation to remove anonymity with retrospective effect ( Hammarberg et al. , 2014 ). When researchers invite people to talk about their reflections on experience, they can sometimes learn more than they set out to discover. In describing their responses to proposed legislative change, participants also talked about people conceived as a result of their donations, demonstrating various constructions and expectations of relationships ( Kirkman et al. , 2014 ).

Interviews with parents in lesbian-parented families generated insight into the diverse meanings of the sperm donor in the creation and life of the family ( Wyverkens et al. , 2014 ). Oral and written interviews also revealed the embarrassment and ambivalence surrounding sperm donors evident in participants in donor-assisted conception ( Kirkman, 2004 ). The way in which parents conceptualise unused embryos and why they discard rather than donate was explored and understood via in-depth interviews, showing how and why the meaning of those embryos changed with parenthood ( de Lacey, 2005 ). In-depth interviews were also used to establish the intricate understanding by embryo donors and recipients of the meaning of embryo donation and the families built as a result ( Goedeke et al. , 2015 ).

It is possible to combine quantitative and qualitative methods, although great care should be taken to ensure that the theory behind each method is compatible and that the methods are being used for appropriate reasons. The two methods can be used sequentially (first a quantitative then a qualitative study or vice versa), where the first approach is used to facilitate the design of the second; they can be used in parallel as different approaches to the same question; or a dominant method may be enriched with a small component of an alternative method (such as qualitative interviews ‘nested’ in a large survey). It is important to note that free text in surveys represents qualitative data but does not constitute qualitative research. Qualitative and quantitative methods may be used together for corroboration (hoping for similar outcomes from both methods), elaboration (using qualitative data to explain or interpret quantitative data, or to demonstrate how the quantitative findings apply in particular cases), complementarity (where the qualitative and quantitative results differ but generate complementary insights) or contradiction (where qualitative and quantitative data lead to different conclusions). Each has its advantages and challenges ( Brannen, 2005 ).

Qualitative research is gaining increased momentum in the clinical setting and carries different criteria for evaluating its rigour or quality. Quantitative studies generally involve the systematic collection of data about a phenomenon, using standardized measures and statistical analysis. In contrast, qualitative studies involve the systematic collection, organization, description and interpretation of textual, verbal or visual data. The particular approach taken determines to a certain extent the criteria used for judging the quality of the report. However, research using qualitative methods can be evaluated ( Dixon-Woods et al. , 2006 ; Young et al. , 2014 ) and there are some generic guidelines for assessing qualitative research ( Kitto et al. , 2008 ).

Although the terms ‘reliability’ and ‘validity’ are contentious among qualitative researchers ( Lincoln and Guba, 1985 ) with some preferring ‘verification’, research integrity and robustness are as important in qualitative studies as they are in other forms of research. It is widely accepted that qualitative research should be ethical, important, intelligibly described, and use appropriate and rigorous methods ( Cohen and Crabtree, 2008 ). In research investigating data that can be counted or measured, replicability is essential. When other kinds of data are gathered in order to answer questions of personal or social meaning, we need to be able to capture real-life experiences, which cannot be identical from one person to the next. Furthermore, meaning is culturally determined and subject to evolutionary change. The way of explaining a phenomenon—such as what it means to use donated gametes—will vary, for example, according to the cultural significance of ‘blood’ or genes, interpretations of marital infidelity and religious constructs of sexual relationships and families. Culture may apply to a country, a community, or other actual or virtual group, and a person may be engaged at various levels of culture. In identifying meaning for members of a particular group, consistency may indeed be found from one research project to another. However, individuals within a cultural group may present different experiences and perceptions or transgress cultural expectations. That does not make them ‘wrong’ or invalidate the research. Rather, it offers insight into diversity and adds a piece to the puzzle to which other researchers also contribute.

In qualitative research the objective stance is obsolete, the researcher is the instrument, and ‘subjects’ become ‘participants’ who may contribute to data interpretation and analysis ( Denzin and Lincoln, 1998 ). Qualitative researchers defend the integrity of their work by different means: trustworthiness, credibility, applicability and consistency are the evaluative criteria ( Leininger, 1994 ).


A report of a qualitative study should contain the same robust procedural description as any other study. The purpose of the research, how it was conducted, procedural decisions, and details of data generation and management should be transparent and explicit. A reviewer should be able to follow the progression of events and decisions and understand their logic because there is adequate description, explanation and justification of the methodology and methods ( Kitto et al. , 2008 )


Credibility is the criterion for evaluating the truth value or internal validity of qualitative research. A qualitative study is credible when its results, presented with adequate descriptions of context, are recognizable to people who share the experience and those who care for or treat them. As the instrument in qualitative research, the researcher defends its credibility through practices such as reflexivity (reflection on the influence of the researcher on the research), triangulation (where appropriate, answering the research question in several ways, such as through interviews, observation and documentary analysis) and substantial description of the interpretation process; verbatim quotations from the data are supplied to illustrate and support their interpretations ( Sandelowski, 1986 ). Where excerpts of data and interpretations are incongruent, the credibility of the study is in doubt.


Applicability, or transferability of the research findings, is the criterion for evaluating external validity. A study is considered to meet the criterion of applicability when its findings can fit into contexts outside the study situation and when clinicians and researchers view the findings as meaningful and applicable in their own experiences.

Larger sample sizes do not produce greater applicability. Depth may be sacrificed to breadth or there may be too much data for adequate analysis. Sample sizes in qualitative research are typically small. The term ‘saturation’ is often used in reference to decisions about sample size in research using qualitative methods. Emerging from grounded theory, where filling theoretical categories is considered essential to the robustness of the developing theory, data saturation has been expanded to describe a situation where data tend towards repetition or where data cease to offer new directions and raise new questions ( Charmaz, 2005 ). However, the legitimacy of saturation as a generic marker of sampling adequacy has been questioned ( O'Reilly and Parker, 2013 ). Caution must be exercised to ensure that a commitment to saturation does not assume an ‘essence’ of an experience in which limited diversity is anticipated; each account is likely to be subtly different and each ‘sample’ will contribute to knowledge without telling the whole story. Increasingly, it is expected that researchers will report the kind of saturation they have applied and their criteria for recognising its achievement; an assessor will need to judge whether the choice is appropriate and consistent with the theoretical context within which the research has been conducted.

Sampling strategies are usually purposive, convenient, theoretical or snowballed. Maximum variation sampling may be used to seek representation of diverse perspectives on the topic. Homogeneous sampling may be used to recruit a group of participants with specified criteria. The threat of bias is irrelevant; participants are recruited and selected specifically because they can illuminate the phenomenon being studied. Rather than being predetermined by statistical power analysis, qualitative study samples are dependent on the nature of the data, the availability of participants and where those data take the investigator. Multiple data collections may also take place to obtain maximum insight into sensitive topics. For instance, the question of how decisions are made for embryo disposition may involve sampling within the patient group as well as from scientists, clinicians, counsellors and clinic administrators.


Consistency, or dependability of the results, is the criterion for assessing reliability. This does not mean that the same result would necessarily be found in other contexts but that, given the same data, other researchers would find similar patterns. Researchers often seek maximum variation in the experience of a phenomenon, not only to illuminate it but also to discourage fulfilment of limited researcher expectations (for example, negative cases or instances that do not fit the emerging interpretation or theory should be actively sought and explored). Qualitative researchers sometimes describe the processes by which verification of the theoretical findings by another team member takes place ( Morse and Richards, 2002 ).

Research that uses qualitative methods is not, as it seems sometimes to be represented, the easy option, nor is it a collation of anecdotes. It usually involves a complex theoretical or philosophical framework. Rigorous analysis is conducted without the aid of straightforward mathematical rules. Researchers must demonstrate the validity of their analysis and conclusions, resulting in longer papers and occasional frustration with the word limits of appropriate journals. Nevertheless, we need the different kinds of evidence that is generated by qualitative methods. The experience of health, illness and medical intervention cannot always be counted and measured; researchers need to understand what they mean to individuals and groups. Knowledge gained from qualitative research methods can inform clinical practice, indicate how to support people living with chronic conditions and contribute to community education and awareness about people who are (for example) experiencing infertility or using assisted conception.

Each author drafted a section of the manuscript and the manuscript as a whole was reviewed and revised by all authors in consultation.

No external funding was either sought or obtained for this study.

The authors have no conflicts of interest to declare.

Beeson D , Jennings P , Kramer W . Offspring searching for their sperm donors: how family types shape the process . Hum Reprod 2011 ; 26 : 2415 – 2424 .

Google Scholar

Brannen J . Mixing methods: the entry of qualitative and quantitative approaches into the research process . Int J Soc Res Methodol 2005 ; 8 : 173 – 184 .

Charmaz K . Grounded Theory in the 21st century; applications for advancing social justice studies . In: Denzin NK , Lincoln YS (eds). The Sage Handbook of Qualitative Research . California : Sage Publications Inc. , 2005 .

Google Preview

Cohen D , Crabtree B . Evaluative criteria for qualitative research in health care: controversies and recommendations . Ann Fam Med 2008 ; 6 : 331 – 339 .

de Lacey S . Parent identity and ‘virtual’ children: why patients discard rather than donate unused embryos . Hum Reprod 2005 ; 20 : 1661 – 1669 .

de Lacey SL , Peterson K , McMillan J . Child interests in assisted reproductive technology: how is the welfare principle applied in practice? Hum Reprod 2015 ; 30 : 616 – 624 .

Denzin N , Lincoln Y . Entering the field of qualitative research . In: Denzin NK , Lincoln YS (eds). The Landscape of Qualitative Research: Theories and Issues . Thousand Oaks : Sage , 1998 , 1 – 34 .

Dixon-Woods M , Bonas S , Booth A , Jones DR , Miller T , Shaw RL , Smith JA , Young B . How can systematic reviews incorporate qualitative research? A critical perspective . Qual Res 2006 ; 6 : 27 – 44 .

ESHRE Psychology and Counselling Guideline Development Group . Routine Psychosocial Care in Infertility and Medically Assisted Reproduction: A Guide for Fertility Staff , 2015 . http://www.eshre.eu/Guidelines-and-Legal/Guidelines/Psychosocial-care-guideline.aspx .

Freeman T , Jadva V , Kramer W , Golombok S . Gamete donation: parents' experiences of searching for their child's donor siblings or donor . Hum Reprod 2009 ; 24 : 505 – 516 .

Goedeke S , Daniels K , Thorpe M , Du Preez E . Building extended families through embryo donation: the experiences of donors and recipients . Hum Reprod 2015 ; 30 : 2340 – 2350 .

Hammarberg K , Johnson L , Bourne K , Fisher J , Kirkman M . Proposed legislative change mandating retrospective release of identifying information: consultation with donors and Government response . Hum Reprod 2014 ; 29 : 286 – 292 .

Kirkman M . Saviours and satyrs: ambivalence in narrative meanings of sperm provision . Cult Health Sex 2004 ; 6 : 319 – 336 .

Kirkman M , Rosenthal D , Johnson L . Families working it out: adolescents' views on communicating about donor-assisted conception . Hum Reprod 2007 ; 22 : 2318 – 2324 .

Kirkman M , Bourne K , Fisher J , Johnson L , Hammarberg K . Gamete donors' expectations and experiences of contact with their donor offspring . Hum Reprod 2014 ; 29 : 731 – 738 .

Kitto S , Chesters J , Grbich C . Quality in qualitative research . Med J Aust 2008 ; 188 : 243 – 246 .

Kovacs GT , Morgan G , Levine M , McCrann J . The Australian community overwhelmingly approves IVF to treat subfertility, with increasing support over three decades . Aust N Z J Obstetr Gynaecol 2012 ; 52 : 302 – 304 .

Leininger M . Evaluation criteria and critique of qualitative research studies . In: Morse J (ed). Critical Issues in Qualitative Research Methods . Thousand Oaks : Sage , 1994 , 95 – 115 .

Lincoln YS , Guba EG . Naturalistic Inquiry . Newbury Park, CA : Sage Publications , 1985 .

Morse J , Richards L . Readme First for a Users Guide to Qualitative Methods . Thousand Oaks : Sage , 2002 .

O'Reilly M , Parker N . ‘Unsatisfactory saturation’: a critical exploration of the notion of saturated sample sizes in qualitative research . Qual Res 2013 ; 13 : 190 – 197 .

Porter M , Bhattacharya S . Investigation of staff and patients' opinions of a proposed trial of elective single embryo transfer . Hum Reprod 2005 ; 20 : 2523 – 2530 .

Sandelowski M . The problem of rigor in qualitative research . Adv Nurs Sci 1986 ; 8 : 27 – 37 .

Wyverkens E , Provoost V , Ravelingien A , De Sutter P , Pennings G , Buysse A . Beyond sperm cells: a qualitative study on constructed meanings of the sperm donor in lesbian families . Hum Reprod 2014 ; 29 : 1248 – 1254 .

Young K , Fisher J , Kirkman M . Women's experiences of endometriosis: a systematic review of qualitative research . J Fam Plann Reprod Health Care 2014 ; 41 : 225 – 234 .

  • conflict of interest
  • credibility
  • qualitative research
  • quantitative methods

Email alerts

Citing articles via.

  • Recommend to your Library


  • Online ISSN 1460-2350
  • Copyright © 2024 European Society of Human Reproduction and Embryology
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Qualitative Research – Methods, Analysis Types and Guide

Qualitative Research – Methods, Analysis Types and Guide

Table of Contents

Qualitative Research

Qualitative Research

Qualitative research is a type of research methodology that focuses on exploring and understanding people’s beliefs, attitudes, behaviors, and experiences through the collection and analysis of non-numerical data. It seeks to answer research questions through the examination of subjective data, such as interviews, focus groups, observations, and textual analysis.

Qualitative research aims to uncover the meaning and significance of social phenomena, and it typically involves a more flexible and iterative approach to data collection and analysis compared to quantitative research. Qualitative research is often used in fields such as sociology, anthropology, psychology, and education.

Qualitative Research Methods

Types of Qualitative Research

Qualitative Research Methods are as follows:

One-to-One Interview

This method involves conducting an interview with a single participant to gain a detailed understanding of their experiences, attitudes, and beliefs. One-to-one interviews can be conducted in-person, over the phone, or through video conferencing. The interviewer typically uses open-ended questions to encourage the participant to share their thoughts and feelings. One-to-one interviews are useful for gaining detailed insights into individual experiences.

Focus Groups

This method involves bringing together a group of people to discuss a specific topic in a structured setting. The focus group is led by a moderator who guides the discussion and encourages participants to share their thoughts and opinions. Focus groups are useful for generating ideas and insights, exploring social norms and attitudes, and understanding group dynamics.

Ethnographic Studies

This method involves immersing oneself in a culture or community to gain a deep understanding of its norms, beliefs, and practices. Ethnographic studies typically involve long-term fieldwork and observation, as well as interviews and document analysis. Ethnographic studies are useful for understanding the cultural context of social phenomena and for gaining a holistic understanding of complex social processes.

Text Analysis

This method involves analyzing written or spoken language to identify patterns and themes. Text analysis can be quantitative or qualitative. Qualitative text analysis involves close reading and interpretation of texts to identify recurring themes, concepts, and patterns. Text analysis is useful for understanding media messages, public discourse, and cultural trends.

This method involves an in-depth examination of a single person, group, or event to gain an understanding of complex phenomena. Case studies typically involve a combination of data collection methods, such as interviews, observations, and document analysis, to provide a comprehensive understanding of the case. Case studies are useful for exploring unique or rare cases, and for generating hypotheses for further research.

Process of Observation

This method involves systematically observing and recording behaviors and interactions in natural settings. The observer may take notes, use audio or video recordings, or use other methods to document what they see. Process of observation is useful for understanding social interactions, cultural practices, and the context in which behaviors occur.

Record Keeping

This method involves keeping detailed records of observations, interviews, and other data collected during the research process. Record keeping is essential for ensuring the accuracy and reliability of the data, and for providing a basis for analysis and interpretation.

This method involves collecting data from a large sample of participants through a structured questionnaire. Surveys can be conducted in person, over the phone, through mail, or online. Surveys are useful for collecting data on attitudes, beliefs, and behaviors, and for identifying patterns and trends in a population.

Qualitative data analysis is a process of turning unstructured data into meaningful insights. It involves extracting and organizing information from sources like interviews, focus groups, and surveys. The goal is to understand people’s attitudes, behaviors, and motivations

Qualitative Research Analysis Methods

Qualitative Research analysis methods involve a systematic approach to interpreting and making sense of the data collected in qualitative research. Here are some common qualitative data analysis methods:

Thematic Analysis

This method involves identifying patterns or themes in the data that are relevant to the research question. The researcher reviews the data, identifies keywords or phrases, and groups them into categories or themes. Thematic analysis is useful for identifying patterns across multiple data sources and for generating new insights into the research topic.

Content Analysis

This method involves analyzing the content of written or spoken language to identify key themes or concepts. Content analysis can be quantitative or qualitative. Qualitative content analysis involves close reading and interpretation of texts to identify recurring themes, concepts, and patterns. Content analysis is useful for identifying patterns in media messages, public discourse, and cultural trends.

Discourse Analysis

This method involves analyzing language to understand how it constructs meaning and shapes social interactions. Discourse analysis can involve a variety of methods, such as conversation analysis, critical discourse analysis, and narrative analysis. Discourse analysis is useful for understanding how language shapes social interactions, cultural norms, and power relationships.

Grounded Theory Analysis

This method involves developing a theory or explanation based on the data collected. Grounded theory analysis starts with the data and uses an iterative process of coding and analysis to identify patterns and themes in the data. The theory or explanation that emerges is grounded in the data, rather than preconceived hypotheses. Grounded theory analysis is useful for understanding complex social phenomena and for generating new theoretical insights.

Narrative Analysis

This method involves analyzing the stories or narratives that participants share to gain insights into their experiences, attitudes, and beliefs. Narrative analysis can involve a variety of methods, such as structural analysis, thematic analysis, and discourse analysis. Narrative analysis is useful for understanding how individuals construct their identities, make sense of their experiences, and communicate their values and beliefs.

Phenomenological Analysis

This method involves analyzing how individuals make sense of their experiences and the meanings they attach to them. Phenomenological analysis typically involves in-depth interviews with participants to explore their experiences in detail. Phenomenological analysis is useful for understanding subjective experiences and for developing a rich understanding of human consciousness.

Comparative Analysis

This method involves comparing and contrasting data across different cases or groups to identify similarities and differences. Comparative analysis can be used to identify patterns or themes that are common across multiple cases, as well as to identify unique or distinctive features of individual cases. Comparative analysis is useful for understanding how social phenomena vary across different contexts and groups.

Applications of Qualitative Research

Qualitative research has many applications across different fields and industries. Here are some examples of how qualitative research is used:

  • Market Research: Qualitative research is often used in market research to understand consumer attitudes, behaviors, and preferences. Researchers conduct focus groups and one-on-one interviews with consumers to gather insights into their experiences and perceptions of products and services.
  • Health Care: Qualitative research is used in health care to explore patient experiences and perspectives on health and illness. Researchers conduct in-depth interviews with patients and their families to gather information on their experiences with different health care providers and treatments.
  • Education: Qualitative research is used in education to understand student experiences and to develop effective teaching strategies. Researchers conduct classroom observations and interviews with students and teachers to gather insights into classroom dynamics and instructional practices.
  • Social Work : Qualitative research is used in social work to explore social problems and to develop interventions to address them. Researchers conduct in-depth interviews with individuals and families to understand their experiences with poverty, discrimination, and other social problems.
  • Anthropology : Qualitative research is used in anthropology to understand different cultures and societies. Researchers conduct ethnographic studies and observe and interview members of different cultural groups to gain insights into their beliefs, practices, and social structures.
  • Psychology : Qualitative research is used in psychology to understand human behavior and mental processes. Researchers conduct in-depth interviews with individuals to explore their thoughts, feelings, and experiences.
  • Public Policy : Qualitative research is used in public policy to explore public attitudes and to inform policy decisions. Researchers conduct focus groups and one-on-one interviews with members of the public to gather insights into their perspectives on different policy issues.

How to Conduct Qualitative Research

Here are some general steps for conducting qualitative research:

  • Identify your research question: Qualitative research starts with a research question or set of questions that you want to explore. This question should be focused and specific, but also broad enough to allow for exploration and discovery.
  • Select your research design: There are different types of qualitative research designs, including ethnography, case study, grounded theory, and phenomenology. You should select a design that aligns with your research question and that will allow you to gather the data you need to answer your research question.
  • Recruit participants: Once you have your research question and design, you need to recruit participants. The number of participants you need will depend on your research design and the scope of your research. You can recruit participants through advertisements, social media, or through personal networks.
  • Collect data: There are different methods for collecting qualitative data, including interviews, focus groups, observation, and document analysis. You should select the method or methods that align with your research design and that will allow you to gather the data you need to answer your research question.
  • Analyze data: Once you have collected your data, you need to analyze it. This involves reviewing your data, identifying patterns and themes, and developing codes to organize your data. You can use different software programs to help you analyze your data, or you can do it manually.
  • Interpret data: Once you have analyzed your data, you need to interpret it. This involves making sense of the patterns and themes you have identified, and developing insights and conclusions that answer your research question. You should be guided by your research question and use your data to support your conclusions.
  • Communicate results: Once you have interpreted your data, you need to communicate your results. This can be done through academic papers, presentations, or reports. You should be clear and concise in your communication, and use examples and quotes from your data to support your findings.

Examples of Qualitative Research

Here are some real-time examples of qualitative research:

  • Customer Feedback: A company may conduct qualitative research to understand the feedback and experiences of its customers. This may involve conducting focus groups or one-on-one interviews with customers to gather insights into their attitudes, behaviors, and preferences.
  • Healthcare : A healthcare provider may conduct qualitative research to explore patient experiences and perspectives on health and illness. This may involve conducting in-depth interviews with patients and their families to gather information on their experiences with different health care providers and treatments.
  • Education : An educational institution may conduct qualitative research to understand student experiences and to develop effective teaching strategies. This may involve conducting classroom observations and interviews with students and teachers to gather insights into classroom dynamics and instructional practices.
  • Social Work: A social worker may conduct qualitative research to explore social problems and to develop interventions to address them. This may involve conducting in-depth interviews with individuals and families to understand their experiences with poverty, discrimination, and other social problems.
  • Anthropology : An anthropologist may conduct qualitative research to understand different cultures and societies. This may involve conducting ethnographic studies and observing and interviewing members of different cultural groups to gain insights into their beliefs, practices, and social structures.
  • Psychology : A psychologist may conduct qualitative research to understand human behavior and mental processes. This may involve conducting in-depth interviews with individuals to explore their thoughts, feelings, and experiences.
  • Public Policy: A government agency or non-profit organization may conduct qualitative research to explore public attitudes and to inform policy decisions. This may involve conducting focus groups and one-on-one interviews with members of the public to gather insights into their perspectives on different policy issues.

Purpose of Qualitative Research

The purpose of qualitative research is to explore and understand the subjective experiences, behaviors, and perspectives of individuals or groups in a particular context. Unlike quantitative research, which focuses on numerical data and statistical analysis, qualitative research aims to provide in-depth, descriptive information that can help researchers develop insights and theories about complex social phenomena.

Qualitative research can serve multiple purposes, including:

  • Exploring new or emerging phenomena : Qualitative research can be useful for exploring new or emerging phenomena, such as new technologies or social trends. This type of research can help researchers develop a deeper understanding of these phenomena and identify potential areas for further study.
  • Understanding complex social phenomena : Qualitative research can be useful for exploring complex social phenomena, such as cultural beliefs, social norms, or political processes. This type of research can help researchers develop a more nuanced understanding of these phenomena and identify factors that may influence them.
  • Generating new theories or hypotheses: Qualitative research can be useful for generating new theories or hypotheses about social phenomena. By gathering rich, detailed data about individuals’ experiences and perspectives, researchers can develop insights that may challenge existing theories or lead to new lines of inquiry.
  • Providing context for quantitative data: Qualitative research can be useful for providing context for quantitative data. By gathering qualitative data alongside quantitative data, researchers can develop a more complete understanding of complex social phenomena and identify potential explanations for quantitative findings.

When to use Qualitative Research

Here are some situations where qualitative research may be appropriate:

  • Exploring a new area: If little is known about a particular topic, qualitative research can help to identify key issues, generate hypotheses, and develop new theories.
  • Understanding complex phenomena: Qualitative research can be used to investigate complex social, cultural, or organizational phenomena that are difficult to measure quantitatively.
  • Investigating subjective experiences: Qualitative research is particularly useful for investigating the subjective experiences of individuals or groups, such as their attitudes, beliefs, values, or emotions.
  • Conducting formative research: Qualitative research can be used in the early stages of a research project to develop research questions, identify potential research participants, and refine research methods.
  • Evaluating interventions or programs: Qualitative research can be used to evaluate the effectiveness of interventions or programs by collecting data on participants’ experiences, attitudes, and behaviors.

Characteristics of Qualitative Research

Qualitative research is characterized by several key features, including:

  • Focus on subjective experience: Qualitative research is concerned with understanding the subjective experiences, beliefs, and perspectives of individuals or groups in a particular context. Researchers aim to explore the meanings that people attach to their experiences and to understand the social and cultural factors that shape these meanings.
  • Use of open-ended questions: Qualitative research relies on open-ended questions that allow participants to provide detailed, in-depth responses. Researchers seek to elicit rich, descriptive data that can provide insights into participants’ experiences and perspectives.
  • Sampling-based on purpose and diversity: Qualitative research often involves purposive sampling, in which participants are selected based on specific criteria related to the research question. Researchers may also seek to include participants with diverse experiences and perspectives to capture a range of viewpoints.
  • Data collection through multiple methods: Qualitative research typically involves the use of multiple data collection methods, such as in-depth interviews, focus groups, and observation. This allows researchers to gather rich, detailed data from multiple sources, which can provide a more complete picture of participants’ experiences and perspectives.
  • Inductive data analysis: Qualitative research relies on inductive data analysis, in which researchers develop theories and insights based on the data rather than testing pre-existing hypotheses. Researchers use coding and thematic analysis to identify patterns and themes in the data and to develop theories and explanations based on these patterns.
  • Emphasis on researcher reflexivity: Qualitative research recognizes the importance of the researcher’s role in shaping the research process and outcomes. Researchers are encouraged to reflect on their own biases and assumptions and to be transparent about their role in the research process.

Advantages of Qualitative Research

Qualitative research offers several advantages over other research methods, including:

  • Depth and detail: Qualitative research allows researchers to gather rich, detailed data that provides a deeper understanding of complex social phenomena. Through in-depth interviews, focus groups, and observation, researchers can gather detailed information about participants’ experiences and perspectives that may be missed by other research methods.
  • Flexibility : Qualitative research is a flexible approach that allows researchers to adapt their methods to the research question and context. Researchers can adjust their research methods in real-time to gather more information or explore unexpected findings.
  • Contextual understanding: Qualitative research is well-suited to exploring the social and cultural context in which individuals or groups are situated. Researchers can gather information about cultural norms, social structures, and historical events that may influence participants’ experiences and perspectives.
  • Participant perspective : Qualitative research prioritizes the perspective of participants, allowing researchers to explore subjective experiences and understand the meanings that participants attach to their experiences.
  • Theory development: Qualitative research can contribute to the development of new theories and insights about complex social phenomena. By gathering rich, detailed data and using inductive data analysis, researchers can develop new theories and explanations that may challenge existing understandings.
  • Validity : Qualitative research can offer high validity by using multiple data collection methods, purposive and diverse sampling, and researcher reflexivity. This can help ensure that findings are credible and trustworthy.

Limitations of Qualitative Research

Qualitative research also has some limitations, including:

  • Subjectivity : Qualitative research relies on the subjective interpretation of researchers, which can introduce bias into the research process. The researcher’s perspective, beliefs, and experiences can influence the way data is collected, analyzed, and interpreted.
  • Limited generalizability: Qualitative research typically involves small, purposive samples that may not be representative of larger populations. This limits the generalizability of findings to other contexts or populations.
  • Time-consuming: Qualitative research can be a time-consuming process, requiring significant resources for data collection, analysis, and interpretation.
  • Resource-intensive: Qualitative research may require more resources than other research methods, including specialized training for researchers, specialized software for data analysis, and transcription services.
  • Limited reliability: Qualitative research may be less reliable than quantitative research, as it relies on the subjective interpretation of researchers. This can make it difficult to replicate findings or compare results across different studies.
  • Ethics and confidentiality: Qualitative research involves collecting sensitive information from participants, which raises ethical concerns about confidentiality and informed consent. Researchers must take care to protect the privacy and confidentiality of participants and obtain informed consent.

Also see Research Methods

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Case Study Research

Case Study – Methods, Examples and Guide

Descriptive Research Design

Descriptive Research Design – Types, Methods and...

Qualitative Research Methods

Qualitative Research Methods

Basic Research

Basic Research – Types, Methods and Examples

Exploratory Research

Exploratory Research – Types, Methods and...

One-to-One Interview in Research

One-to-One Interview – Methods and Guide

  • Open access
  • Published: 19 March 2024

Ethical challenges in global research on health system responses to violence against women: a qualitative study of policy and professional perspectives

  • Natalia V. Lewis 1   na1 ,
  • Beatriz Kalichman 2   na1 ,
  • Yuri Nishijima Azeredo 2 ,
  • Loraine J. Bacchus 3 &
  • Ana Flavia d’Oliveira 2  

BMC Medical Ethics volume  25 , Article number:  32 ( 2024 ) Cite this article

Metrics details

Studying global health problems requires international multidisciplinary teams. Such multidisciplinarity and multiculturalism create challenges in adhering to a set of ethical principles across different country contexts. Our group on health system responses to violence against women (VAW) included two universities in a European high-income country (HIC) and four universities in low-and middle-income countries (LMICs). This study aimed to investigate professional and policy perspectives on the types, causes of, and solutions to ethical challenges specific to the ethics approval stage of the global research projects on health system responses to VAW.

We used the Network of Ethical Relationships model, framework method, and READ approach to analyse qualitative semi-structured interviews ( n  = 18) and policy documents ( n  = 27). In March-July 2021, we recruited a purposive sample of researchers and members of Research Ethics Committees (RECs) from the five partner countries. Interviewees signposted policies and guidelines on research ethics, including VAW.

We developed three themes with eight subthemes summarising ethical challenges across three contextual factors. The global nature of the group contributed towards power and resource imbalance between HIC and LMICs and differing RECs’ rules. Location of the primary studies within health services highlighted differing rules between university RECs and health authorities. There were diverse conceptualisations of VAW and vulnerability of research participants between countries and limited methodological and topic expertise in some LMIC RECs. These factors threatened the timely delivery of studies and had a negative impact on researchers and their relationships with RECs and HIC funders. Most researchers felt frustrated and demotivated by the bureaucratised, uncoordinated, and lengthy approval process. Participants suggested redistributing power and resources between HICs and LMICs, involving LMIC representatives in developing funding agendas, better coordination between RECs and health authorities and capacity strengthening on ethics in VAW research.


The process of ethics approval for global research on health system responses to VAW should be more coordinated across partners, with equal power distribution between HICs and LMICs, researchers and RECs. While some of these objectives can be achieved through education for RECs and researchers, the power imbalance and differing rules should be addressed at the institutional, national, and international levels. Three of the authors were also research participants, which had potential to introduce bias into the findings. However, rigorous reflexivity practices mitigated against this. This insider perspective was also a strength, as it allowed us to access and contribute to more nuanced understandings to enhance the credibility of the findings. It also helped to mitigate against unequal power dynamics.

Peer Review reports


Violence against women (VAW) is a global public health and clinical problem leading to increased mortality and morbidity among women and their children [ 1 ]. Globally, 27% of ever-partnered women aged 15–49 years have experienced physical and/or sexual intimate partner violence in their lifetime, with 13% experiencing it in the past year. Low-income countries reported higher prevalence compared with high-income countries [ 2 ]. Health systems have a crucial role in a multisector response to VAW through identifying and supporting people who have experienced violence [ 3 ]. Prior research identified considerable system-, organisation-, and individual- level barriers to health system responses to VAW, especially in low-income and middle-income countries (LMICs) [ 4 ] and proposed a framework for improving health system readiness to address VAW [ 5 ]. In the past decade, governments and other funders in high-income countries (HIC) made substantial investments in global research addressing the Sustainable Development Goals, including elimination of VAW [ 6 ].

Studying VAW as a global public health and clinical problem requires collaboration between researchers from different disciplines and countries. Such multidisciplinary and multiculturalism create challenges in adhering to a single set of ethical standards applied across differing country-specific contexts characterised by power and resource inequalities. Research activities happen in the contexts which reflect both global and local cultural and social dynamics, with research ethics regulations varying not only across countries but also across fields of knowledge which means that multidisciplinary multicounty research is bound to face specific challenges. Members of global research groups are embedded within their teams and organisations which have differing resources, structures, cultures and politics. The organisations are influenced by the differing economic, social, and political environments. The provision of funding and research capacity from HICs to LMICs exacerbates existing power imbalances. Which ethical standards hold precedence – those developed by the international community, the HIC funder and grant holding institution, the LMICs where the research is taking place, or all the above?

Studies on VAW fall into the category of sensitive topics because they impose additional emotional burden and threat to physical and social self of participants and researchers. The sensitivity of the VAW research is also determined by the exploration of culturally and politically rooted issues of social control, coercion and domination, interests of powerful people, the ‘sacred’ concepts of family relations and power, and the lived realities of people who have experienced or used violence [ 7 ]. The increased sensitivity surrounding the topic of VAW gives rise to additional ethical dilemmas concerning the principles of respect for persons, confidentiality, justice, beneficence, and nonmaleficence [ 8 ]. Global groups studying health system response to VAW should resolve these dilemmas while applying international-, funder-, and country-specific ethical requirements to the sociocultural and economic context, VAW services and health systems in LMICs. How can LMIC researchers adhere to all the ethical requirements while protecting their cultural diversity and the safety of their research participants, communities, and researchers?

Previous studies have acknowledged ethical challenges in global research [ 9 ] including studies on VAW [ 8 , 10 ]. Ethical and methodological challenges in global research on VAW are interlinked and both can undermine the quality of the data and findings [ 11 , 12 ]. Recent theoretical developments equipped researchers with tools for exploring and addressing ethical challenges in global research. Reid et al. [ 13 ] created the ‘4Ps’ model: Place, People, Principle and Precedent—for analysing and developing solutions to ethical conflicts in global research. Morrison et al. [ 14 ] developed the Network of Ethical Relationships (NER) model in the context of global population health research. NER identified relational challenges within research teams, with Research Ethics Committees (RECs), funders, and participants which were embedded in the complex and conflicting normative framework regarding HIC and LMIC legal rules, societal norms, moral values, and institutional rules. The ethical relationship challenges were explained by differing cultural backgrounds, REC requirements and participant values, conflicting requirements between HIC and LMIC RECs and funding procedures. However, to our knowledge, no studies have addressed ethical challenges in global research on health system response to VAW. This study aimed to investigate professional and policy perspectives on the types, causes of, and solutions to ethical challenges specific to the REC approval stage of a global research programme on health system responses to VAW.

Study context: the global health research group

This paper draws on our experience as a global research group on health system responses to VAW in LMICs. The partnership included two universities in a European HIC and four LMIC universities (one South American, one in the Middle-Eastern region, and two in different South Asian countries). The group, funded by the government agency in the European HIC, aimed to: (i) develop and pilot-test LMIC-specific interventions in sexual and reproductive health services addressing VAW, (ii) strengthen the research capacity of HIC and LMICs universities, (iii) evaluate capacity strengthening activities (Fig.  1 ).

figure 1

Global research group on health system responses to violence against women in low- and middle-income countries

The countries were diverse amongst themselves, with different health systems and research infrastructures, which shaped all aspects of the research, including ethics approval. At macro level, the HIC funding agency dictated the financial and management structure of the global research group. The funder created conditions for maintaining power at the lead contracting university in the HIC, which held the grant and distributed funds quarterly to the five partner universities. Two group co-directors were also based at the HIC universities. Each LMIC university had a local principal investigator or co-principal investigators with a team of researchers and PhD students responsible for country-specific primary studies. HIC researchers supported the study designs and methodological development, were involved in the capacity strengthening workstream, and led on syntheses of findings from LMIC primary studies.

At meso level, the composition of the group also created conditions for maintaining power at HIC universities because their researchers had more methodological expertise and experience. However, researchers in both HIC universities and two LMIC universities brought together extensive expertise and experience in the VAW topic. We proactively explored and addressed potential power imbalances through meetings and research capacity strengthening activities. As described in a separate publication, we carried out a baseline evaluation and mapping exercise of the research capacities within and across all country teams [ 15 ]. The evaluation showed that while the HIC teams included more mid-career and senior researchers with extensive methodological expertise in health system responses to VAW, their knowledge of the health systems and socio-cultural-historical contexts in the partner LMICs was limited. In contrast, LMICs teams had a greater proportion of early career researchers and less expertise in some research methods. However, they were well embedded within the local communities and health care systems where the primary research was conducted. They also possessed a high degree of local knowledge regarding power dynamics between different stakeholders, processes for engaging with them, and political and cultural sensitivity. It is important to acknowledge that the power imbalance was partly influenced by the nature of the work being done by the different teams. The LMICs partners were primarily responsible for fieldwork, a role typically assigned to early career researchers, whereas the HIC partners focussed more on supporting instrument development, data analysis, and capacity strengthening workpackage requiring researchers with greater experience.

To reflect on the power imbalances between the teams, we organised a participatory workshop with researchers from all country teams [ 15 ]. We agreed on shared values, identified barriers, and planned capacity strengthening activities. The shared values included: mutual learning, respect, fair opportunity, clear boundaries, honesty, and transparency. LMIC researchers identified barriers related to limited methodological expertise, access to training courses, information technologies and English-language skills for academic writing which we targeted through the capacity strengthening activities. We mapped areas of methodological expertise within countries and identified opportunities for mentoring and mutual learning across partners. For example, the South American team which was involved in developing ethical principles for the WHO Multi-Country Study on Women’s Health and Domestic Violence Against Women [ 16 ] delivered ethics training to the whole group. Researchers from the Middle-Eastern and European teams co-delivered a workshop on measurement and routinely collected data. Researchers from the South American and South Asian teams co-led a workshop on how we use feminist principles and theory in our research on VAW. Mutual learning took place through joint development of protocols and research tools for primary studies, early career researchers virtual peer support and education group, virtual monthly team meetings, and annual face-to-face and hybrid workshops in partner countries.

During one of the group meetings, an LMIC researcher raised concerns about the group adopting terminology used by HIC policy makers and funders which perpetuated the existing power imbalance. They argued that term “capacity building” implied a lack of research capacity in LMICs which they found disempowering. This conflicted with the shared values of respect and mutual learning within our group, as well as the extensive experience in VAW research present in the South American and South Asian partner universities. As a result, we revised our terminology and replaced the term ‘capacity building’ with ‘capacity strengthening’.

From the start, the group made efforts to carry out equitable work despite inequitable conditions, whilst navigating variations in institutional research ethics review requirements, and adhering to diverse regulations imposed by academic and health system institutions. To conduct primary studies, we had to obtain ethics approvals from two HIC and four LMIC universities, as well as additional approvals from the health authorities in all LMICs. This process highlighted power dynamics between the different countries and institutions involved. Tensions arose because of the disparate policies and practices. We encountered challenges that were not previously reported in literature. The two HIC university RECs had conflicting requirements regarding the sequence of ethics approvals among group partners. The REC at the lead HIC university encouraged a local ethics review where possible because the local REC would possess the most relevant expertise to assess the ethics application for research undertaken in the country concerned. This approach aimed to prevent contradictory responses from two separate REC decisions. In contrast, the second HIC university insisted that their REC would require reviewing all studies involving their staff, irrespective of the country involved and whether a local review was already being conducted. They required the local ethics approvals to be provided for their final decision. The conflicting requirements had repercussions on the timely execution of the primary studies in LMICs. The lead HIC REC advised that if the LMIC teams have undergone a local research ethics review and received a favourable ethical opinion, they could commence the research activities specified in their ethics applications and favourable opinions. In contrast, the other HIC REC insisted that the project should not commence until full ethical approval had been obtained from their university, alongside local ethical approval.

Another challenge at the meso level arose from the differing policies and practices for research data management. The HIC funder and two HIC university RECs requested detailed data management plans compliant with the European Union General Data Protection Regulation. However, partners in LMIC were unable to fully comply with the same standards due to different legal requirements in their respective countries and varying policies and processes for research data management within their universities. As a temporary solution, the lead HIC university granted all LMIC principal investigators/co-investigators and their researchers an honorary status enabling them to access secure departmental file storage. The partners signed Data Sharing Agreement and Data Repository Agreement for using Research Data Repository at the lead HIC university for storing and sharing research data which underpinned outputs from the primary studies.

Study design

The international team of researchers with backgrounds in psychology (NVL, YNA), policy (BK), medicine (AFDO, NVL), and social science (LJB) conducted a qualitative study comprising of semi-structured interviews and a document review of ethics policies and guidelines. Our positionality in the critical realism ontology [ 17 ] and feminist epistemologies and methodologies [ 18 ] influenced the choice of a qualitative research design to explore the contextual factors and processes shaping researchers’ and REC members’ experiences during the ethics approval phase of global research projects on health system responses to VAW. Our approach was also informed by discourses on decolonising global health research [ 19 ] and epistemic injustice in academic global health [ 20 ]. We recognised the existence of international and institutional hierarchies, that (post)colonial legacies shape the field of global research on VAW and that systemic changes are needed to shift the hierarchies of power [ 19 ]. We believed that the experiences and views of the HIC and LMIC researchers and participants were equally credible. We acknowledged that researchers and participants would impact on each other, and that the researchers’ backgrounds would influence data production and analysis. We challenged epistemic injustice through fostering co-creation of knowledge by researchers and study participants with similar experiences. The authors who conducted interviews (NVL, BK) were members of the same global health group; three authors (NVL, LJB, AFDO) were also interviewed as research participants. These authors were not involved in the analysis of their transcripts.

We followed the READ (ready your material, extract data, analyse data, distil your findings) approach for document review [ 21 ] and the framework method [ 22 ] for data analysis. While interviews explored individual experiences of ethics approval for global research in health system responses to VAW, review of policies and guidelines allowed to contextualise these experiences. Concepts within the NER model were used as sensitising devices which informed the analysis [ 14 ].

Data collection

We conducted online semi-structured qualitative interviews in March-July 2021. Data set size for interviews was informed by the model and concept of information power [ 23 ]. We assumed that our study would need less participants because of the narrow aim, high specificity of participants for the study aim, established NER model, strong interview dialog, and cross-case analysis.

We used purposive sampling strategy to recruit researchers and REC members with rich and diverse experience of ethics approvals for global research, representing five partner countries in our global health research group. The study was advertised via an email sent to the group mailing list and snowballed via professional networks. Interested individuals emailed study researchers who confirmed eligibility, provided further information, and arranged interviews on Zoom/Teams. Interviews were conducted in the language of choice of the interviewees. Participants provided verbal informed consent. The topic guides explored experiences of applying ethics policies and guidelines in practice, following REC processes, obtaining ethics approvals, challenges faced, and proposed causes and solutions (Additional file 1 ). Interviews were audio recorded, professionally transcribed, checked, and anonymised.

We identified policies and guidelines on research ethics through interview participants, electronic searches, and reference checking. Two researchers (NVL, BK) searched websites of the HIC and LMIC universities and RECs involved in the research, as well as of HIC funders and think tanks using terms “violence” “women”, “ethic*”, “guideline”. We retrieved, screened, and selected documents meeting our inclusion criteria: international, national, and institutional policies and guidelines from the five partner countries that discussed global research and/or research on VAW.

We started data analysis while conducting interviews to refine topic guides for further interviews and to identify additional documents. For the document review, we customised an Excel proforma [ 21 ] to extract data on title, author, year, source, target audience, key messages, data relevant to global research and studies on VAW. During data extraction, we made notes about how each document addressed ethical issues in global research and/or research on VAW. Interview transcripts and documents were imported into NVIVO 12 for data management and coding. The analysis was conducted using a combination of inductive and deductive approaches. Researchers (NVL, BK, AFDO) worked on their subsets of transcripts and documents in English and local language. These researchers read and re-read two interview transcripts and independently manually coded text relevant to the research questions. The researchers compared initial codes and developed a ‘working analytical framework’ which they then applied to their subsets of transcripts and documents in NVIVO [ 22 ]. The framework was refined through four cycles of revisions during coding process. We then grouped our codes into candidate themes, mapped them on the constructs of the NER model in an Excel framework matrix in English, and developed final analytical themes. Researchers (NVL, BK) wrote descriptive accounts of the analytical themes with illustrative quotes. The study team met regularly to discuss the codes, themes, matrix, and descriptive accounts paying attention to the similarities and differences within and between interviews and documents, countries, and institutions.

We conducted 18 interviews with researchers ( n  = 11) and REC members ( n  = 7) representing all five partner countries and a wide range of professional experience (2–25 years) (Table  1 ).

Interviews in English ( n  = 15) and local language ( n  = 3) lasted between 27 and 80 min (mean 46 min). Despite support from local researchers, we could not recruit REC members from one South-Asian LMIC. When approached, REC members declined participation explaining that the study was not supported by their institutional and national REC and that it is difficult to speak about issues and challenges which may be against their government. In addition, they felt that this study should have been conducted in collaboration with their RECs and some of the members as co-authors (email correspondence). In contrast, REC at the lead HIC university, transferred our ethics application to a different faculty to prevent conflict of interest.

We analysed 27 documents (4 international, 17 national, 6 institutional), that were categorised into educational material, guidelines, legal documents, policy documents, reports, standard operational procedures, and statements (Table  2 ).

The reports by HIC funders produced in partnership with researchers from different countries included analysis of global inequalities and consent. The HIC was the only country that wrote documents detailing how to operate as an international research funder, although LMIC had documents detailing international partnerships. Two LMICs had national ethics regulations while all other partners had institutional documents.

Interview participants signposted the same high-level policies and guidelines on ethics in global research: the Helsinki declaration [ 24 ], the Council for International Organizations of Medical Sciences’ (CIOMS) International Ethical Guidelines for Biomedical Research Involving Human Subjects [ 25 ] and Nuffield Council’s guidelines [ 26 ]. National and institutional research ethics policies and guidelines were built on the international principles and standards which were tailored to the local context. All guidelines for global research stipulated compliance with international and national laws and regulations and required ethics approvals in countries where research activities took place and in the country funding the study. Documents from HICs and LMICs highlighted the importance of respecting local societal norms, conducting research that benefits local communities and strengthens local capacities.

Our framework analysis generated 20 thematic codes, 12 candidate themes, and 3 final analytical themes summarising ethical challenges at the approval stage resulting from the global nature of the group, location of primary studies within health systems, and VAW topic. Within each theme, we reported perspectives on causes, impact, and solutions across the interviews and documents (Table  3 , Fig.  2 ).

figure 2

NER model for global research on health system responses to violence against women, research ethics committees’ approval stage

Challenges resulting from the global nature of the group

Location of research collaborators in HIC and several LMICs contributed to the challenges caused by factors in the following areas.

Differing power and resources

Documentary and interview data suggested that hierarchical power imbalance between HIC and LMIC countries could be a contextual factor at the macro and meso levels. The power imbalance contributed towards ambiguity and frustration among researchers and REC members, and tensions between HIC and LMIC partners and between researchers, RECs and funders. The Nuffield report emphasised issues of power imbalance and the possible differences and conflicts between ethics committees in different countries [ 26 ]. In contrast, major European HIC funders of global research imposed ethics standards and processes that were based on their national laws as a global benchmark:

“[HIC funder name] is governed by [HIC name] law. The legislation supporting this policy relates to work carried out in the [HIC name]. We expect researchers to use similar standards and principles for any research outside the [HIC name]." ([HIC funder] Policy 2021).

Most interview participants perceived power imbalance between HICs (i.e., funder) and LMICs as a macro-level barrier. Some researchers and REC members felt that HIC policy makers and funders imposed their research agenda on LMICs, used funding as a mechanism for compliance, and set ethics regulations that did not suit the local context or were difficult to comply with because of limited research capacity and differing structures and resources in LMICs:

“It is much more likely when the project is managed by a general PI [principal investigator] of an institution from the global north, with funding from the global north, [that] countries from the global south that often participate with research participants and less with the thinking people, have much more difficulty to establish the limits and characteristics, the local peculiarities. So, I think it's more of a question of politics and power within research than the questions of Ethics Committees.” Interviewee 7, University REC and Hospital REC member, South American LMIC.

Some interview participants perceived the setting of funding priorities for global research by HICs without engaging with LMIC researchers and policy makers as a way of recolonising their research agenda. It was proposed that the solution was to engage local communities in research priority setting with funders:

“The only research funding is from international aid. Agencies become bureaucracies. In countries like our government doesn’t dictate policy, doesn’t set policy, it’s international aid that does. They dictate. Environment, this, that, that, what environment for god’s sake? We have so many wars here, every time we rebuild it gets destroyed. They set it in relation to their own priorities. It didn’t used to be like that. It used to be that agencies came, discussed, etc. and we together did things. Now, everything is on website, take it or leave it. The ethics is part and parcel of this approach of trying to say, “Wait a minute, do not colonise us that way too.” Interviewee 11, University REC member, Middle-Eastern LMIC.

While HIC funders mandated "that research performed in partner countries is conducted in accordance with regulations and to a standard no less stringent than those applicable in the [European HIC]" ([HIC funder] Policy 2021), some interviewees thought it was problematic due to the lack of consideration for the LMIC context in which research is being conducted. For example, LMIC and HIC researchers agreed that some LMIC universities did not have policies, processes, and resources for implementing stringent HIC requirements for data management. While their local RECs scrutinised the application sections about study design, methods, funding, they did not request the detailed data management plan which was an essential part of the HIC ethics applications.

Differing RECs rules

The international and local policies and guidelines required ethics approvals from RECs in the funder country and in the countries where research activities took place. At meso level, the power was in the hands of multiple RECs in HIC and LMICs. Each REC interpreted and applied the universal ethics principles and international policies and guidelines differently. This resulted in the challenge of ‘differing RECs rules’ with varied requirements, processes, and timeframes which lacked coordination and challenged timely delivery of primary studies across LMICs. Researchers had to ‘problem-solve’ ethical approval conundrums themselves because RECs did not talk to each other and exercised power through standardisation of their approval process which many researchers described as highly bureaucratised, predominantly biomedical, and severely outdated. The power was in the RECs hands and researchers had to abide by the RECs rules with which they often did not agree. To mitigate these tensions, researchers used informal support from more experienced colleagues in their teams and partner countries, adapted study documentation previously approved by their RECs, and submitted ethics applications to the RECs “where you know people to make it easy” (Interviewee 13, University researcher, South-Asian LMIC). REC members followed their in-house standard operating procedures and best practice examples of previously approved research projects and felt that they adequately supported researchers throughout the application process. REC members thought that they treated local and global projects equally. They also thought that the international studies were more trustworthy because they had a higher level of funder scrutiny and the ability to recruit the best local experts.

Interviewees suggested measures at the macro level for mitigating power imbalances between HICs and LMICs. They proposed proactively lobbying HIC funders about LMIC priorities "to rethink research ethics in a way that is compatible with our local and regional [context]" (Interviewee 11, University REC member, Middle-Eastern LMIC). At the meso level, interviewees highlighted the importance of mutual learning and respecting country-specific contexts, transparent communication, and agreed partnership-wide practices for country-specific informed consent, safeguarding, data management, and helping each other with developing ethics applications and responding to RECs queries:

“Particularly in a global context, learning from others, because there is often a perception that it might be what people may consider the gold standard. It may not be, there may be more innovative ways to manage ethics and other regulatory approval processes from our global partners”. Interviewee 3, University REC member, European HIC.

To comply with the HIC funder’s requirements and improve their data safety, LMIC researchers wanted local policies for data management, additional funding to buy encrypted equipment and secure data storage, and education for local RECs and researchers on data management plans:

“…lobby and advocate for a strict data governance section within the [ethics application] pro forma that the ethical committee has." Interviewee 18, University researcher, South-Asian LMIC.

Challenge resulted from the location of primary studies within health system

Location of primary studies in LMIC health care services was a contextual factor at the meso level which contributed to the challenge of reconciling the differing requirements and rules of both the university REC and health authority REC.

Differing rules between university RECs and health authorities

Some LMIC researchers identified differing rules between university RECs and health authorities as a barrier which caused ambiguity and delays in local ethics approvals. To conduct research with health care professionals and patients, it was necessary to obtain ethics approvals from academic REC and ethics and/or regulatory approvals from the relevant health authority (e.g., Ministry of Health, Municipal health authority, healthcare setting). In two LMICs, this parallel process created an extra challenge for researchers. They experienced confusion, frustration, and delays because the two bodies had differing perspectives on the same issues and their approval processes were not coordinated:

“A problem that I always have, the university has a standard informed consent, and they understand that informed consent starts with a lot of data on the interviewee. When they send it to the municipality, the municipality says to me, "Oh, this is no good, this informed consent, we don't like it. You can't ask for all of this information," and I agree. Then I have to do something in between, because I have to negotiate with the two agencies, they ask for different things.” Interviewee 5, University researcher, South-American LMIC.

The conflicting rules could be explained by the varying perceived roles and responsibilities among research and health care approval bodies. Although REC members from universities and health authorities felt that they were responsible for the safety of research participants, the latter thought that they had better knowledge of their services and therefore an additional responsibility for the research participants as service users:

“Our concern is with protection of the users of our healthcare system within our jurisdiction, this is our chief concern. Even for very simple research the most important thing is how the municipality treat its healthcare service users. What guides us is mainly what the healthcare system means here in our city, because that is what we work with, the healthcare system users as research participants. So what guides us is the healthcare service, its logic, it’s dynamic, a research project can’t muddle with the services' dynamic or the work of the healthcare professionals.” Interviewee 6, Municipal REC member, South American LMIC.

Interviewees wanted more coordination between university and health system RECs which would harmonise and expedite the two approval processes.

" a coordination between the ethics board and the ministries itself. Like some kind of internal platform between the [cabinet work] and the policy level. I wish there was something like that so that the process would be a bit easier. Or some person from the ethics itself would be more cooperative and would help us to coordinate with them somehow so that the bureaucratic process is shortened." Interviewee 17, University researcher, South-Asian LMIC.

Challenges resulted from the VAW topic

VAW as research topic was another factor at the macro and meso levels which required additional labour, time and resources for obtaining ethics approvals in LMICs. Documents and interview data identified additional VAW-specific challenges in the following areas.

Differing conceptualisation of VAW

Sometimes the funder’s conceptualisation of VAW as a research topic differed from the local REC and researchers view due to the country-specific societal norms and political situation, resulting in ambiguity and frustration for researchers and REC members. One REC member from the LMIC experiencing protracted armed conflict thought that the conceptualisation of VAW in their country had been shaped primarily through the views of international aid agencies/research funders, with gender often being a substitute word for women and VAW being researched as an interpersonal problem in isolation from the chronic violence at the community and society levels. Such narrow conceptualisation could influence the choice of the research tools and produce biased findings. A researcher from the same LMIC explained that their country specific political and social context required researchers to defend their choice of international partners to get ethics approval for VAW research:

“VAW is perceived as a problem that should be treated on a local level. It is a sensitive topic rooted in the culture and religion. In our culture, religion, values, and traditions are strongly expressed and strongly engaged even within administration and research. You need to defend your research not only from an ethical point of view, but also from intention of why you’re doing this research with international partners.” Interviewee 10, University researcher, Middle-Eastern LMIC.

Differing conceptualisation of vulnerability

While most documents and interviewees acknowledged that certain groups of research participants were more vulnerable than others and needed extra protection, only VAW-specific ethics guideline [ 29 ] and some experienced VAW researchers acknowledged vulnerabilities and protection for the researchers. According to REC members, they treated VAW like any sensitive topic and required proof of safeguarding and support resources for research participants. We found divergent views on the concept of vulnerability when applied in the VAW research context. In generic research ethics documents, vulnerability of research participants was defined as impairing their capacity to consent. Vulnerability meant that certain groups had limited ability to understand the nature of research and make informed decision about taking part, the possibility of being exploited and harmed by research. Several documents referred to vulnerable groups generically without providing specific definitions of clarifications regarding the types of individuals or conditions they encompassed. Other sources listed vulnerable groups, all of which were at risk of experiencing VAW – i.e., victims of traumatic events and sexual abuse, pregnant/lactating women, all women, women from orthodox communities, individuals disadvantaged by gender.

In contrast, VAW-specific ethics document [ 29 ], and some researchers from HIC and South Asian LMIC-2 recognised women who have experienced violence as capable of participating in research. The primary concern regarding women’s vulnerability in relation to participating in VAW research stemmed from the potential risk of experiencing further violence. Therefore, the protection measures ensured safety, confidentiality, and signposting to specialist VAW services. Researchers from two partner countries emphasised that all women who have experienced violence have agency and some of them are empowered by their lived experience. Therefore, that they should not be regarded as incapable of providing informed consent to participate in research. One researcher highlighted differing HIC and LMIC societal norms regarding vulnerability of research participants with the former fostering power among participants:

“I really like that the European context is stricter because it really ensures safety, security, privacy of the women. Especially when we are working with vulnerable groups. I think it comes out of respect for participants. Because I think in the [South Asian LMIC-2] context, vulnerable groups are sympathised and not empathised, maybe. Because out of sympathy you only feel pity for these women, and you don’t respect them as humans. When you respect a person then you would definitely think about how the researchers protect these women from being harmed or revictimized.” Interviewee 18, University researcher, South-Asian LMIC.

High-level guidelines and all interviewees acknowledged the need for additional time and effort for addressing issues of vulnerability in ethics applications through ensuring confidentiality, safety, and provision of referral/signposting to specialist VAW services. They highlighted the importance of the adequate time allocation for lengthy ethics approval processes which were sometimes delayed the commencement of the primary studies. Interviewees suggested allocating at least six months for obtaining ethics approvals.

"The special nature of this research topic [VAW] demands that safety concerns be considered from the very beginning of a study through its implementation and dissemination. This means that violence research will likely require a longer timeframe and a greater investment of resources to ensure these issues are fully addressed." (WHO Recommendations 2001 [ 29 ]).

From the perspective of healthcare REC, the established policies and care pathways should ensure the safety and appropriate care of patients affected by violence identified through studies on VAW and health:

“We have to read it [ethics application] carefully and make sure that if the researcher discovers that the woman is in fact experiencing violence it is notified in the national database. We have to be mindful of those things since they are health policies, the research project cannot go against our health policy, our care policy. We have a violence department here in the municipality, people that work solely with this, so a project like this has to know this exists and have a dialog with this area. We ask “what are you going to do when you see the person is experiencing violence? What care will you offer? How will you do it? What are you going to offer this person?”. We have to see if everything was thought of, otherwise this person will come here, do the research, get the data and just leave.” Interviewee 6, Municipal REC member, South American LMIC.

Limited REC methodological and topic expertise

Researchers from the two South Asian LMICs felt frustrated with the limited VAW methodological and subject expertise of their RECs who dismissed qualitative and mixed methods, verbal informed consent, and remote data collection. They highlighted the importance of educating RECs and researchers on the specifics of the ethics applications for VAW research. For instance, one LMIC interviewee produced a resource sheet on VAW research for her institutional REC to support their ethics application and strengthen REC capacity. They noticed that their institutional REC application process improved over time with more VAW projects being undertaken. REC members also talked about continuing training and dialogue between RECs, RECs and researchers to strengthen capacity for ethical conduct of research. The interviewees agreed that the changes should occur at the institutional level:

“Simultaneously making sure ethics boards are having proper policies, guidelines, regulations, and qualified people who are able to review the ethics and provide substantial feedback to applicants. Not depending on who you know within the community and the ethics committee to push your application through.” Interviewee 14, University researcher, South-Asian LMIC.

This qualitative study of professional and policy perspectives generated three themes summarising and explaining challenges in global research on VAW and health at the ethics approval stage. The global nature of the research contributed towards differing power dynamics and resource distribution between HIC and LMICs and discrepant RECs rules across countries and institutions. HIC and LMIC researchers tried to mitigate the conflicting RECs rules by collaborating and supporting each other during the ethics application process. However, they lacked autonomy and capacity to shift the power from HIC or harmonise rules across RECs. Location of the primary studies in LMIC healthcare services contributed towards divergent institutional rules across academic RECs and health authorities that researchers tried to conciliate by negotiating the differences. The VAW topic contributed towards differing conceptualisations of VAW and participants vulnerability and limited methodological and topic expertise in some LMIC RECs which researchers addressed through helping REC to develop capacity.

These contextual factors had a negative impact on researchers and teams' morale, and the relationships between researchers, RECs, and HIC funders. Furthermore, they posed a substantial risk to the timely completion of studies. Most researchers felt frustrated and demotivated by the hierarchical, bureaucratised, uncoordinated, and lengthy approval processes. Participants suggested several strategies to address the power imbalances and challenges identified in the study. This included advocating for the involvement of LMIC representatives in shaping HIC funding agendas for global health research, prompting a redistribution of power between the HIC and LMICs at the macro- and meso- levels, fostering coordination between academic RECs and health authorities and between HIC and LMICs RECs, and prioritising capacity strengthening on ethics in VAW research. While these issues were present in all countries, their manifestations varied in terms of forms and degree due to the disparities in research infrastructure and healthcare systems.

Our analysis was informed by the NER model for global population health research [ 30 ] which we applied to the topic of global research on health system responses to VAW. Our study confirmed findings on ethical challenges in global health research reported in prior literature [ 14 , 31 ] and discovered new challenges specific to the REC approval stage of studies on VAW as part of the global health agenda. These challenges were multifactorial and resulted from the global nature of the research group (disparities in power and resources, divergent RECs’ rules), location of primary study within LMIC health system (differing rules between university RECs and health authorities), and the topic of VAW (differing conceptualisation of VAW and vulnerability, limited methodological and topic expertise).

Our finding on the power asymmetry between HICs and LMICs as the major systemic driver of ethical challenges supports current discourse on decolonising research agendas and building equitable global health research partnerships [ 32 ]. While all interviewees and most high-level policies acknowledged power imbalance and advocated for equitable partnerships, researchers and REC members felt that HIC funders continued to dictate global health research agendas and impose their own institutional rules and societal norms on LMIC partners. Indeed, our interviewees perceived the agendas and rules prescribed by HIC funders and policy makers as a form of recolonisation which reinforced inequalities between HICs and LMICs at the macro level and jeopardised research integrity. Our global health research group tried to redress power imbalances through reflexivity about positionality during the research process which helped to establish and maintain equitable relationships within and between HIC and LMIC teams. However, our efforts at the meso (group) level could not change power asymmetry between HIC funder/RECs and LMIC researchers/RECs. As suggested by our findings and prior literature, rebalancing power requires interventions at the level of HIC policy makers and funders and HIC and LMIC RECs [ 10 , 33 ].

Our finding on the challenge of disjointed academic RECs and health system authorities which imposed differing rules and lacked communication with each other is consistent with prior literature that found highly bureaucratised, disjointed, and lengthy ethics approval processes across HICs and LMICs [ 14 ]. Our interviewees’ suggestions for improving consistency and joined-up working amongst HIC/LMIC and academic RECs/health authorities support recommendations for more collaborative capacity strengthening and harmonisation across RECs in global research projects [ 34 ].

Our finding on differing conceptualisations of VAW reflects previous research that reported a lack of consensus regarding the definition of VAW and terminology used by researchers, practitioners, and research participants [ 35 ]. In the context of global research, the differences in definitions of VAW used by HIC funders, LMIC researchers and REC members were rooted in country-specific socio-political contexts. Some LMIC interviewees felt that HIC governments and funders imposed research agendas which defined VAW as a relationship problem and did not recognise the intersecting systemic violence and lived experience of people in a war torn LMIC. Nor do they acknowledge the ways in which political conflict can exacerbate different forms of gender-based violence. In contrast, LMIC participants living in countries affected by armed conflict acknowledged the complex interplay between individual, relationship, community, and societal factors that put people at risk of experiencing and using violence. This finding supports recommendations for inclusive agenda setting for global research, emphasising the importance of involving HIC funders, LMIC governments and researchers in setting priorities and co-designing research programmes that address the unique needs of LMICs and align with their socio-political contexts [ 33 ].

Our finding on the differing conceptualisations of vulnerability of research participants in global research on VAW could be explained by cultural variation regarding the concept of gender roles in different societies and the feminist ethos of VAW research. As highlighted by our interview participants, such conceptual differences have implications for the choice of research methodology and advocacy for participants. Feminist theories and approaches widely used and accepted in HICs might not offer a useful framework for transforming the realities of women experiencing violence in LMICs. Generalising their validity without contextual tailoring to LMIC-specific political and cultural contexts might hamper the very efforts to end violence [ 36 ]. Similarly, when applying methods and advocacy tools that have been developed in HICs to different LMICs, global research groups should consider the distinct context factors and actively seek the input of those who have local expertise and knowledge, to ensure the best recruitment, experiences of research participants and data generated [ 37 ].

The ethical debates surrounding the inclusion of women who experience violence as research participants revolve around ensuring their protection while also avoiding their undue exclusion from participating [ 38 ]. It is acknowledged that women who experience violence are not a homogeneous group, and therefore, considerations must be made to ensure their diverse experiences and perspectives are represented in research [ 39 , 40 ].

Prior research has produced convincing evidence regarding the challenges in global research during ethics approval stage [ 34 ]. Future research should identify and evaluate policies and interventions that aim to address the causes of these challenges.

Strengths and limitations

This study combined findings from qualitative interviews and complementary documentary analysis on ethics in global research on VAW and health. The use of qualitative methodology matched the objective of illuminating and contextualising the subjective experiences of researchers and REC members regarding obtaining ethics approval for global research projects. We added credibility to our findings by integrating results of interview and document analyses, involvement of three researchers from HIC and LMIC in data coding, whole team discussions of candidate and final analytical themes, and providing supporting quotes. We contributed towards transferability of our findings to similar contexts and participant groups through drawing a geographically diverse sample from one HIC and four LMICs across Europe, South America, Middle-Eastern region, and South Asia and through reporting socio-demographic characteristics of the participants. Table 1 shows that our purposive sampling strategy produced a maximum variation participant group in terms of countries, roles, years of relevant experience. However, the transferability has been limited by recruiting from the five partner countries within one global health research group. The sub-group from one South Asian LMIC did not include REC members.

Throughout the study, we critically examined and reflected on our own roles and influences of our values, assumptions, and experiences on the data we generated and analysis we produced. Our dual role as qualitative researchers and members of the global health group with direct experience of obtaining research ethics approvals allowed us to provided valuable insights, interpretations, and perspectives that contributed to the depth and richness of the findings. However, we acknowledge that the dual roles of author and research participant among three of the authors could also be seen as a limitation. It has the potential to introduce bias, as our perspectives may have influenced the interpretation of the findings. To mitigate this, we employed rigorous reflexivity practices, continuously interrogating our biases, and the impact of our involvement on the outcomes. Simultaneously, this insider perspective constituted a strength, enabling us to access and contribute to more nuanced understandings and ensure that researchers’ voices are accurately represented. In turn, this strengthened the credibility and relevance of the findings. It also helped to address potential prejudices and power imbalances because of the shared decision making in the interpretation and writing of the paper.

Global research on health system responses to VAW generated additional challenges during application for ethics approvals across HIC and LMIC partners. These challenges were driven by power and resource asymmetries between HICs and LMICs, differing rules between RECs and between academic RECs and health authorities, varying conceptualisations of VAW and participant vulnerability, limited methodological and topic expertise in some LMIC RECs. The challenges had a negative impact on researchers’ relationships with RECs and funders. They imposed additional emotional labour on researchers and threatened timely delivery of the programme of the research. The process of ethics approval for global research on health system responses to VAW requires greater flexibility to accommodate country-specific contexts, with equal power distribution between HICs and LMICs, researchers and RECs. While some of these objectives can be achieved through educating individual RECs members, researchers, and funders, the power asymmetry and differing rules and contextualisation should be addressed at the meso (institutional) and macro (country) levels.

It is very important to conduct global research on health system responses to VAW to develop evidence-based interventions. Although a higher level of scrutiny during ethics approval stage might be justified, this should not hinder research on this topic, since findings are important to identify gaps in service provision and inform development of evidence-based interventions. By upholding high ethical standards in global research on health system responses to VAW, we ensure the opportunity for a comprehensive and evidence-based approach to addressing the issues. This, in turn, enhances the outcomes and results for women who have experienced violence.

Availability of data and materials

Due to the sensitivity of the data involved, these data are published as a controlled dataset at the University of Bristol Research Data Repository data.bris , at https://doi.org/ https://doi.org/10.5523/bris.3qs252vyomger219n9f5fcv5u3 [ 41 ]. The metadata record published openly by the repository at this location clearly states how data can be accessed by bona fide researchers. Requests for access will be considered by the University of Bristol Data Access Committee, who will assess the motives of potential data re-users before deciding to grant access to the data. No authentic request for access will be refused and re-users will not be charged for any part of this process.


High-income country

Low- and middle-income country

Research Ethics Committee

  • Violence against women

World Health Organisation

Butchart A, Mikton C, Dahlberg LL, Krug EG. Global status report on violence prevention 2014. Inj Prev. 2015;21(3):213. https://doi.org/10.1136/injuryprev-2015-041640 .

Sardinha L, Maheu-Giroux M, Stockl H, Meyer SR, Garcia-Moreno C. Global, regional, and national prevalence estimates of physical or sexual, or both, intimate partner violence against women in 2018. Lancet. 2022;399(10327):803–13. https://doi.org/10.1016/S0140-6736(21)02664-7 .

Article   PubMed   PubMed Central   Google Scholar  

Garcia-Moreno C, Hegarty K, d’Oliveira AF, Koziol-McLain J, Colombini M, Feder G. The health-systems response to violence against women. Lancet. 2015;385(9977):1567–79. https://doi.org/10.1016/S0140-6736(14)61837-7 .

Article   PubMed   Google Scholar  

Colombini M, Dockerty C, Mayhew SH. Barriers and facilitators to integrating health service responses to intimate partner violence in low- and middle-income countries: a comparative health systems and service analysis. Stud Fam Plann. 2017;48(2):179–200. https://doi.org/10.1111/sifp.12021 .

Colombini M, Mayhew SH, Garcia-Moreno C, d’Oliveira AF, Feder G, Bacchus LJ. Improving health system readiness to address violence against women and girls: a conceptual framework. BMC Health Serv Res. 2022;22(1):1429. https://doi.org/10.1186/s12913-022-08826-1 .

OECD. Global outlook on financing for sustainable development 2023: no sustainability without equity. Paris: OECD Publishing; 2022.

Book   Google Scholar  

Schraiber LB, D’Oliveira AF, Portella AP, Menicucci E. Gender-based violence in public health: challenges and achievements. Cien Saude Colet. 2009;14(4):1019–27. https://doi.org/10.1590/s1413-81232009000400009 .

Fontes LA. Ethics in violence against women research: the sensitive, the dangerous, and the overlooked. Ethics Behav. 2004;14(2):141–74. https://doi.org/10.1207/s15327019eb1402_4 .

Steinert JI, Nyarige DA, Jacobi M, Kuhnt J, Kaplan L. A systematic review on ethical challenges of ‘field’ research in low-income and middle-income countries: respect, justice and beneficence for research staff? BMJ glob. 2021;6(7):e005380. https://doi.org/10.1136/bmjgh-2021-005380 .

Article   Google Scholar  

Weber S, Hardiman M, Kanja W, Thomas S, Robinson-Edwards N, Bradbury-Jones C. Towards ethical international research partnerships in gender-based violence research: insights from research partners in Kenya. Viol Against Women. 2022;28(11):2909–31. https://doi.org/10.1177/10778012211035798 .

Ellsberg M, Heise L, Peña R, Agurto S, Winkvist A. Researching domestic violence against women: methodological and ethical considerations. Stud Fam Plann. 2001;32(1):1–16. https://doi.org/10.1111/j.1728-4465.2001.00001.x .

Article   CAS   PubMed   Google Scholar  

Kelmendi K. Violence against women: methodological and ethical issues. Psychology. 2013;4:559–65. https://doi.org/10.4236/psych.2013.47080 .

Reid C, Calia C, Guerra C, Grant L, Anderson M, Chibwana K, et al. Ethics in global research: Creating a toolkit to support integrity and ethical action throughout the research journey. Res Ethics. 2021;17(3):359–74. https://doi.org/10.1177/1747016121997522 .

Morrison K, Tomsons S, Gomez A, Forde M. Network of ethical relationships model for global North-South population health research. Glob Public Health. 2018;13(7):819–42. https://doi.org/10.1080/17441692.2016.1276948 .

Hawcroft C, Rossi E, Tilouche N, d’Oliveira AF, Bacchus LJ. Engaging early career researchers in a global health research capacity-strengthening programme: a qualitative study. Health Res Policy Syst. 2023;21(1):19. https://doi.org/10.1186/s12961-022-00949-5 .

García-Moreno C, Jansen HA, Ellsberg M, Heise L, Watts C. WHO multi-country study on women’s health and domestic violence against women. Initial results on prevalence, health outcomes and women’s responses. Geneva, Switzerland: World Health Organization [WHO]; 2005.

Google Scholar  

Alderson P. Critical realism for health and illness research. A practical introduction. Bristol, UK: Policy Press; 2021.

Wigginton B, Lafrance MN. Learning critical feminist research: A brief introduction to feminist epistemologies and methodologies. Feminism & Psychology. 0(0):0959353519866058. https://doi.org/10.1177/0959353519866058 .

Khan M, Abimbola S, Aloudat T, Capobianco E, Hawkes S, Rahman-Shepherd A. Decolonising global health in 2021: a roadmap to move from rhetoric to reform. BMJ Glob Health. 2021;6(3):e005604. https://doi.org/10.1136/bmjgh-2021-005604 .

Bhakuni H, Abimbola S. Epistemic injustice in academic global health. Lancet Glob Health. 2021;9(10):e1465–70. https://doi.org/10.1016/S2214-109X(21)00301-6 .

Dalglish SL, Khalid H, McMahon SA. Document analysis in health policy research: the READ approach. Health Pol Plan. 2020;35(10):1424–31. https://doi.org/10.1093/heapol/czaa064 .

Gale NK, Heath G, Cameron E, Rashid S, Redwood S. Using the framework method for the analysis of qualitative data in multi-disciplinary health research. BMC Med Res Methodol. 2013;13:117. https://doi.org/10.1186/1471-2288-13-117 .

Malterud K, Siersma VD, Guassora AD. Sample size in qualitative interview studies: guided by information power. Qual Health Res. 2016;26(13):1753–60. https://doi.org/10.1177/1049732315617444 .

World Medical A. World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. JAMA. 2013;310(20):2191–4. https://doi.org/10.1001/jama.2013.281053 .

Article   CAS   Google Scholar  

CIOMS. International ethical guidelines for health-related research involving humans. fourth edition. Geneva: Council for International Organizations of Medical Sciences; 2016.

Nuffield Council on Bioethics. The ethics of research related to healthcare in developing countries. London: Nuffield Council on Bioethics; 2002.

Nuffield Council on Bioethics. The ethics of research related to healthcare in developing countries: a follow-up discussion paper. London: Nuffield Council on Bioethics; 2005.

Nuffield Council on Bioethics. Research in global health emergencies: ethical issues. London: Nuffield Council on Bioethics; 2020.

World Health Organisation. Putting women first: Ethical and safety recommendations for research on domestic violence against women. Geneva: WHO; 2001.

Tomsons S, Morrison K, Gomez A, Forde M. Ethical issues facing North- South research teams. Glob Popul Health Res. Final report. 2013. https://doi.org/10.13140/RG.2.1.4337.9920 .

Shanks K, Paulson J. Ethical research landscapes in fragile and conflict-affected contexts: understanding the challenges. Res Ethics. 2022;18(3):169–92. https://doi.org/10.1177/17470161221094134 .

Kumar M, Atwoli L, Burgess RA, Gaddour N, Huang KY, Kola L, et al. What should equity in global health research look like? Lancet. 2022;400(10347):145–7. https://doi.org/10.1016/S0140-6736(22)00888-1 .

Dodson J, UKCDS. Building partnerships of equals. The role of funders in equitable and effective international development collaborations: UK Collaborative on Development Sciences 2017.

Ng LC, Hanlon C, Yimer G, Henderson DC, Fekadu A. Ethics in global health research: the need for balance. Lancet Glob Health. 2015;3(9):e516–7. https://doi.org/10.1016/S2214-109X(15)00095-9 .

McGarry J, Ali P. Researching domestic violence and abuse in healthcare settings: Challenges and issues. J Res Nurs. 2016;21(5–6):465–76. https://doi.org/10.1177/1744987116650923 .

Reverter S. Epistemologies of violence against women. A proposal from the South. Cogent Soc Sci. 2022;8(1):2038356. https://doi.org/10.1080/23311886.2022.2038356 .

Adams V, Biehl J. The work of evidence in critical global health. Med Anthropol Theory. 2016;3(2):100.

Cook E, Markham S, Parker J, John A, Barnicot K, McManus S. Risk, responsibility, and choice in research ethics. Lancet Psychiatry. 2022;9(1):5–6. https://doi.org/10.1016/S2215-0366(21)00434-X .

Garcia-Moreno C, Zimmerman C, Morris-Gehring A, Heise L, Amin A, Abrahams N, et al. Addressing violence against women: a call to action. Lancet. 2015;385(9978):1685–95. https://doi.org/10.1016/S0140-6736(14)61830-4 .

World Health O. Researching violence against women : practical guidelines for researchers and activists / Mery Ellsberg, Lori Heise. Geneva: World Health Organization; 2005.

N Lewis G Feder 2023 Ethical challenges in global health research on violence against women: a qualitative study of policy and professional perspectives https://doi.org/10.5523/bris.3qs252vyomger219n9f5fcv5u3

Download references


Research Fellow Sandi Dheensa, Bristol Medical School, UK for conducting qualitative interview. Professor Jonathan Ives, Bristol Medical School, UK for advice on study design, policy documents, target journal.

This research was funded by the NIHR (17/63/125) using UK aid from the UK Government to support global health research. The views expressed in this publication are those of the authors and not necessarily those of the NIHR or the UK government.

Author information

Natalia V. Lewis and Beatriz Kalichman joint first author.

Authors and Affiliations

Centre for Academic Primary Care, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK

Natalia V. Lewis

Department of Preventive Medicine, Medical School, University of São Paulo, São Paulo, Brazil

Beatriz Kalichman, Yuri Nishijima Azeredo & Ana Flavia d’Oliveira

Department of Global Health and Development, London School of Hygiene and Tropical Medicine, London, UK

Loraine J. Bacchus

You can also search for this author in PubMed   Google Scholar


NVL and LJB designed the study. NVL and AFDO supervised the study. NVL and BK collected data. NVL, BK, AFDO analysed data. NVL, BK, YNA, LJB, AFDO contributed to interpretation of study findings. NVL and BK wrote first draft. All authors contributed to three revisions and read and approved the final manuscript.

Authors’ information

NVL, PhD in Health Psychology, Senior Research Fellow in Primary Care, is a mixed methods health services researcher with medical clinical background and specialism in health system responses to VAW. BK, MSc in Urban and Reginal Planning exploring the importation of theory of gentrification from HICs, PhD candidate in Collective Health investigating how power relations between partners shape global health research partnerships, with special focus on the knowledge produced. YNA, MSc on violence perpetrated by professionals in health services, PhD in Collective Health studying the relations between the development of technology and clinical expertise. LJB, BSc MA PhD, Professor of Global Health, is a social scientist whose research focuses on the development and evaluation of complex interventions within health systems and services that address violence against women and against men in same sex relationships. AFDO, PhD in Preventive Medicine, has been studying VAW for the last 25 years. She participated in the development and revisions of the WHO ethical guidelines in VAW.

Corresponding author

Correspondence to Natalia V. Lewis .

Ethics declarations

Ethics approval and consent to participate.

This study has been conducted according to the principles in the Declaration of Helsinki. To prevent conflict of interest in interviews with our own Faculty REC members, this ethics application was reviewed by the Faculty of Social Sciences and Law Research Ethics Committee at University of Bristol, UK (22 February 2021 (ref 116510)). Participants provided verbal informed consent before interview started. Documents for policy review were publicly available online. We anonymised interviewees and did not reference national and institutional documents to protect the anonymity of the study participants.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Lewis, N.V., Kalichman, B., Azeredo, Y.N. et al. Ethical challenges in global research on health system responses to violence against women: a qualitative study of policy and professional perspectives. BMC Med Ethics 25 , 32 (2024). https://doi.org/10.1186/s12910-024-01034-y

Download citation

Received : 14 July 2023

Accepted : 08 March 2024

Published : 19 March 2024

DOI : https://doi.org/10.1186/s12910-024-01034-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Health research
  • International collaboration
  • Global health
  • Research ethics
  • Ethical issues
  • Qualitative study

BMC Medical Ethics

ISSN: 1472-6939

research articles using qualitative methods

  • Open access
  • Published: 15 March 2024

Young people's experiences of physical activity insecurity: a qualitative study highlighting intersectional disadvantage in the UK

  • Caroline Dodd-Reynolds 1 ,
  • Naomi Griffin 2 ,
  • Phillippa Kyle 3 ,
  • Steph Scott 2 ,
  • Hannah Fairbrother 4 ,
  • Eleanor Holding 5 ,
  • Mary Crowder 5 ,
  • Nicholas Woodrow 5 &
  • Carolyn Summerbell 1  

BMC Public Health volume  24 , Article number:  813 ( 2024 ) Cite this article

231 Accesses

12 Altmetric

Metrics details

Intersecting socioeconomic and demographic reasons for physical activity (PA) inequalities are not well understood for young people at risk of experiencing marginalisation and living with disadvantage. This study explored young people’s experiences of PA in their local area, and the associated impacts on opportunities for good physical and emotional health and wellbeing.

Seven local youth groups were purposefully sampled from disadvantaged areas across urban, rural and coastal areas of England, including two that were specifically for LGBTQ + young people. Each group engaged in three interlinked focus groups which explored young people’s perceptions and lived experience of PA inequalities. Data were analysed using an inductive, reflexive thematic approach to allow for flexibility in coding.

Fifty five young people aged 12–21 years of different sexualities, gender and ethnicity took part. Analysis yielded four themes: PA experiences across spaces; resigned to a lack of inclusivity and ‘belonging’; safety first; complexities in access and accessibility. Young people felt more comfortable to be active in spaces that were simpler to navigate, particularly outdoor locations largely based in nature. In contrast, local gyms and sports clubs, and the school environment in general, were spoken about often in negative terms and as spaces where they experienced insecurity, unsafety or discomfort. It was common for these young people to feel excluded from PA, often linked to their gender and sexuality. Lived experiences or fears of being bullied and harassed in many activity spaces was a powerful message, but in contrast, young people perceived their local youth club as a safe space. Intersecting barriers related to deprivation, gender and sexuality, accessibility, disability, Covid-19, affordability, ethnicity, and proximity of social networks. A need emerged for safe spaces in which young people can come together, within the local community and choose to be active.


The overarching concept of ‘physical activity insecurity’ emerged as a significant concern for the young people in this study. We posit that PA insecurity in this context can be described as a limited or restricted ability to be active, reinforced by worries and lived experiences of feeling uncomfortable, insecure, or unsafe.

Peer Review reports

Three in four adolescents do not meet global physical activity (PA) guidelines [ 1 ] and the annual global cost of inactivity is estimated to be in excess of $67·5 billion [ 2 ]. Adolescent inactivity is unequally distributed between nations, as well as within societies [ 3 ] and in England, only 47% of 13–16 year-olds met national PA guidelines in 2022/23 [ 4 ]. Physical activity is linked to 13 of the 2030 UN sustainable development goals (SDGs) including SDG3 good health and well-being, SDG4 quality education, and SDG10 reduced inequalities [ 1 ]. Through their global action plan, the World Health Organisation (WHO) [ 1 ] presents a mission to ensure access to safe and enabling environments along with diverse opportunities for PA, targeting a 15% relative reduction in inactivity for adults and adolescents by 2030. Despite this global focus, clear gaps in knowledge around policy development and implementation have been highlighted [ 3 ] with a need for supportive policies, environments, and opportunities [ 5 ] for children and young people to be active.

In this paper, we define physical activity as “people moving, acting and performing within culturally specific spaces and contexts, and influenced by a unique array of interests, emotions, ideas, instructions and relationships” ([ 6 ], p. 5). Like health, PA is heavily influenced by intersecting socioeconomic and demographic factors [ 7 , 8 ], yet PA has the potential to improve health equity [ 9 ]. In England, epidemiological data show that children and young people are less likely to meet PA guidelines according to low affluence, gender (girls and 'other'), and ethnicity (Black, Asian, Mixed and Other non-white/non-white British) [ 4 ]. Evidence suggests, however, that individual determinants of young people’s PA are variable and diverse and include previous PA, PE/school sports, independent mobility and active transport, education level and other health behaviours such as alcohol consumption [ 10 , 11 ]. A comprehensive systematic review of over 18-year-olds [ 12 ] reported 117 correlates of PA across a range of demographic, biological, psychological, behavioural, social and environmental factors.

The direct relationship between socioeconomic status (SES) and children’s PA is particularly unclear, with umbrella systematic review evidence [ 13 ] suggesting mixed findings in terms of whether SES is a determinant of PA, though the same study demonstrated a positive association between SES and PA for adults. Individual factors such as parental income and parental occupation, along with payment of fees/equipment did, however, show some evidence of an association with children and adolescent PA [ 13 ]. Whilst the authors note the small number of studies available for children and adolescents, a lack of causal evidence and differing measurement tools which might contribute to the uncertainty around SES and PA, we suggest also that quantitative evidence may well fail to capture the complexity of children and young people’s PA in different spaces. Indeed, a qualitative review of limited extant literature concerning socioeconomic position and experiences of barriers to PA [ 14 ] highlighted issues such as social support, accessibility and environment, and experiences (particularly gendered) of health and other behaviours, but importantly noted that those in low socioeconomic position areas had a good understanding of PA benefits. Better understanding is required regarding the complexity of PA experiences for children and young people living with disadvantage.

Within the PA literature, systems approaches are evolving to map and understand networks and mechanisms within complex systems, ultimately aiming to reduce health inequalities [ 15 , 16 ], and a systems-based framework for action forms a key component of the WHO’s global strategy [ 1 ]. To support this work, better understanding is needed regarding the dynamic, contextual mechanisms which underpin various agents in local systems [ 17 ], for example through understanding better young people’s personal, or direct ‘lived experiences’ of PA. Engaging in dialogue with young people at the heart of local communities, offers a deeper and more nuanced understanding of place-based PA challenges and opportunities.

In general, individuals transitioning from childhood to adulthood are underserved in PA research, yet experiences earlier in life have a lasting effect on adult health and health behaviours [ 7 ]. The 2016 Lancet Commission on adolescent Health and Wellbeing [ 18 ] recommended setting clear objectives for change, based on local needs, and highlighted a gap for young people at risk of being socially and economically marginalised, including LGBT + (lesbian, gay, bisexual, trans and others) groups. Adolescents and those on the fringes of adulthood (hereafter referred to as young people) therefore present a critical but wide-ranging group with whom we must seek to better understand PA inequalities, particularly in the context of widening place-based inequality and deprivation and the syndemic 'shock' of the COVID-19 pandemic [ 19 , 20 ]. Accordingly, we have applied the concept of intersectionality [ 21 , 22 ] to explore the complex and intersecting factors which influence access to, and experiences of, PA.

We have recently reported young people’s nuanced understandings of the malleable and dynamic relationships between socioeconomic circumstance and health [ 23 ] and in this paper, we focused on PA specifically. We explored young people’s experiences of PA in their local area, and the associated impacts on opportunities for good physical and emotional health and wellbeing. In doing so we worked with young people who were already at risk of experiencing social and health inequalities across England, UK.

This paper drew on data from a larger project [ 23 ] where a series of three interlinked qualitative focus groups were undertaken with six groups of young people who attended local community youth groups between February and June 2021. For the present study, we recruited a further group (December 2021) to ensure diversity in terms of gender and sexual orientation. In total, 55 participants aged 12–21 years, from seven youth groups across three regions of England took part. Each youth group took part in three interlinked focus groups exploring health and health inequalities (21 focus groups in total). Two regions were in the north of England (South Yorkshire (SY) n  = 2; North East (NE) n  = 3; one region was in the south of England (London (L) n  = 2). All regions fell within the most deprived quintile based on 2019 English indices of multiple deprivation (IMD) in England, with closer to 1 being more deprived. At participant-level, IMD quintile ranged from 1–3. The project commenced during the Covid-19 pandemic, where the UK experienced several lock-down periods. Due to social distancing restrictions, all focus groups were conducted online except for two youth groups which were in-person (one due to digital exclusion and another recruited once restrictions lifted sufficiently). Focus groups lasted approximately 1.5 h. Further details on methodological and ethical challenges and full procedures are described elsewhere [ 23 , 24 ]. Ethical approval was granted by the School of Health and Related Research (ScHARR) Ethics Committee at the University of Sheffield and the Department of Sport and Exercise Sciences Ethics Committee at Durham University.

We adopted a purposive sampling strategy, designed to encapsulate maximum variation in perspectives and diversity [ 25 ]. Our sample was guided by the breadth and focus of the research question(s); demands placed on participants; depth of data likely to be generated; pragmatic constraints; and the analytic goals and purpose of the overall project [ 25 , 26 ]. Our final sample included young people of different sexualities, gender and ethnicity across urban and rural and coastal areas (see Table  1 ).

Youth workers invited group members to participate and shared an information video and project overview before researchers attended youth group sessions to discuss the study, build rapport and provide more detailed information sheets.These sessions were all held online during lockdown, except for two in-person groups, which were visited by the researchers. Written consent was gathered for all participants and, where under 16 years, opt-in consent from parents/guardians was also gained. Participants were asked to provide basic demographic information including postcode to calculate IMD.

Data generation

Topic guides were developed [ 23 ], giving careful consideration to activities and language used around health inequalities. These were piloted and revised with two other partner youth organisations through early public involvement and engagement work. Youth workers helped facilitate sessions and at least four and two researchers were present for online and in-person sessions, respectively (NG, NW, MC, EH, HF, CDR, VE). The same groups of researchers worked across the 21 focus groups in different sites, to ensure consistency in process. All focus groups began with introductions and a warmup activity, followed by the main activity (in smaller breakout groups) and finally close and a cool-down activity. The three interlinked focus groups held with each youth group explored: (1) children and young people's understandings of health and wellbeing as a human right (via participatory concept mapping, see Jessiman [ 27 ] for an example), (2) children and young people's perceptions of the social determinants of health (sharing ideas about contemporary news articles relevant to health inequalities) and (3) children and young people's understandings of the ways young people can take action in their local area. Focus groups were recorded via encrypted Dictaphones and transcribed verbatim, with data anonymised at the point of transcription. Contextual field notes were taken by researchers.

Thematic analysis is a well-established approach to qualitative inquiry in health-related research that allows for the depth and richness of qualitative data to guide analysis [ 28 ]. We used an inductive, reflexive thematic approach to allow for flexibility in coding [ 26 ] and the desire to make sure our analysis was adequately capturing views of the young people themselves [ 29 ]. The approach was rigorously tested through the piloting of methods, regular analysis meetings, and sense-checking sessions (with participants) to validate themes [ 30 ]. For a full description of the original reflexive thematic analysis process [ 26 , 31 ] please see Fairbrother et al. [ 23 ]. In brief, an initial coding frame was developed, with key codes and overarching themes discussed (linked to young people’s perspectives on the relationship between socioeconomic circumstances and health) and agreed upon by the wider research team. Once these core themes were established, an additional in-depth phase of reflexive analysis was undertaken (NH, PK, CDR, CS) to specifically explore PA, which had arisen continually, but not been developed as a theme, across the initial analysis. As before [ 23 ], we emphasised a creative and active approach to the analysis which followed an inherently ‘interpretative reflexive process’ ([ 26 ], p. 334). CDR, PK and NG were immersed in the data, continually reflecting upon, questioning and revisiting during the analysis process Regular analysis meetings took place to reflect and discuss and a new coding framework was developed and agreed by CDR, PK and NG, from which with themes were developed. The qualitative data management software system NVivo-12 was used to support data management.

Our analysis yielded four central themes: (1) PA experiences across spaces; (2) Resigned to a lack of inclusivity and ‘belonging’; (3) Safety first; (4) Complexities in access and accessibility. Nevertheless, themes naturally interrelate and the overarching concept of ‘PA insecurity’ emerged as a significant concern for the young people who generously shared their personal experiences with us. Here each interlinked focus group session is denoted S1, S2, S3.

Physical activity experiences across spaces

The types of spaces in which young people felt able, or not able, to be active were crucial and formed the backdrop to their PA-related experiences and interactions with others. These are contextually linked here to later themes which provide further depth on how PA might or might not be enacted by young people within those spaces.

Across sites, there were differential responses in terms of ‘things to do’ in the local area. Inner city areas had fewer green and blue spaces but presented more organised opportunities in the locality. In rural areas young people had to travel to engage in social activities. Whilst in general there were positive attitudes towards PA, in the NE and SY, there was a perceived lack of things to do where they lived that did not cost money, or require private or unreliable public transport. A salient sub-theme developed around local opportunities for activity, with one group highlighting the resulting ease with which sedentary activities displaced other activities:

Facilitator : ‘Do you prefer to play on consoles or do you prefer to go outside and run around and have exercise’? NE2, S2 : ‘If there’s nothing to do, then I will stay in the house, but if there is something to do, then I might as well just go outside’.

At first glance, this apathy perhaps represents a lack of self-efficacy, often described as an individual-level determinant of PA. However, being physically active was far from simplistic and the young people described many associated challenges including closure of local amenities such as bowling and trampoline parks, with investment instead made in a nearby seaside town. For example, they described complexities around access to the nearest swimming pool. This was free in summer but not in the immediate locality, and thus required adult facilitation to enable the young people to travel to and access the pool, resulting in a structural barrier preventing them from taking part in something which was important to them within their existing social networks:

NE2,S2: ‘ Well just going out with friends and my dad saw that – I don’t know where – but he said, “Do you want to go?” “Yeah.” So he’ll get on the bus and he’ll go around and he got us in the baths ’. NE2,S2: ‘ He goes around…and picks up children ’.

Spaces that were simpler to navigate included outdoor locations, largely based in nature, which for a number of the young people evoked a sense of freedom and well-being: ‘ there’s a big, massive field and a couple of times a week I take my dog there so he can meet other dogs. Take him for a big walk… is good for your health. It’s good for my dog .’ (NE2, S1.)

Blue spaces were perceived similarly by those living near the coast: ‘ I like going to the beach… I just like the sea. It’s calm and obviously there’s a long way to walk as well ’ (NE2, S1).

Some indoor PA spaces, particularly swimming pools, were also described as places which evoked calmness and wellbeing. The following young person reflected on this in relation to how they felt in water:

‘ And it’s funny because when I first thought about the swimming pool, I didn’t think about it in terms of the physical exercise being good for me but obviously that is good. It’s much more that when I’m just completely submerged in water I feel very calm and I think it’s a bit of a shock to the system which can be nice, to be cold, suddenly very cold, and then get warmed up afterwards. So it’s kind of the pool and then also having a nice cup of tea when I get home after. For my mind and body I’d say… ’. (S1, S1).

Other indoor spaces such as gyms and sports clubs were spoken of in terms of being more for purposeful PA (i.e. exercise or sport) however the young people tended to speak less positively about their experiences, highlighting feelings of discomfort and of feeling self-conscious. In doing so, gender-based concerns often intersected: ‘ I did trampolining competitively…I was just getting to a point where I wasn’t comfortable. Because I was still having to wear the girl’s uniform …when you look at the differences between the uniforms, it's really stark ’. (NE1, S2). Similarly, in the gym setting, young people highlighted a perceived lack of security: ‘ Because gyms are enclosed spaces, there’s like dodgy blokes who are all like pumped and I’d rather not be around them. It’s just not my idea of fun ’ (SY2, S3). The ‘gym’ was repeatedly referred to in one focus group as negatively impacting upon self-esteem: ‘ I just hate it because like if you’re 16, the gym I used to go to had like a lot of older body building people so I’d feel like they were just watching me and I’d feel really uncomfortable about it’ , (SY1, S1). Conversations however also spoke of a need for inclusivity in the gym environment, noting feelings of pressure (as a female) ‘…if there’s guys looking at them they might not want them looking because obviously they’ll be looking in places they don’t want them looking. But also…we shouldn’t have to have girl-only gyms, everyone should be like integrated. ’ In the same conversation, concerns of racist behaviour were linked directly to the local area by another young person: ‘ And also racism as well, you could experience a lot of racism in gyms if you’re living in predominantly a white town.’ (SY2, S3).

The final space that permeated discussions, was an institutional one: the school or college environment. For the most part, narratives drew on personal experiences, often negative. For some, opportunities to be active in the institutional space had been removed altogether, something which was beyond their control: ‘ we didn’t do PE for about, a good three years because…we now needed to concentrate on our GCSEs… ’ (NE1,S3). Here, decisions made by adults in school, created barriers and the young people were aware of the gravity of lost opportunities to be active in the institutional space: ‘ …a lot of kids were missing out on that physical education and a way to exercise. That might have been students’ only way of exercise ’. This highlights how a lack of support from adults in positions of power can affect young people’s engagement in PA. Conversely, this could be positive intervention, illustrating how unequal distribution of support from teachers can impact significantly on young people’s PA experience in the education setting. In this example, one young person clearly highlighted a link between PA and psychological wellbeing:

‘ I managed to convince the teachers to let me do double rugby… two outside of school, one in school, like one club in school, two clubs outside of school and then two PE’s… because it’s an aggressive sport, I can get out all my aggression…I’m good with my team and I’m friends with all the people in it’ , (NE3, S3).

This theme illustrates that the young people were well aware of physical spaces in which they might be active but highlights the importance that young people attach to feelings of safety, security, and freedom, and how these can intersect with other characteristics. Physical activity spaces may be more conducive to positive mental health if they are larger and open, without interference from other people and where young people might feel less threatened by a perceived, or actual, need to conform.

Resigned to a lack of inclusivity and ‘belonging’

For many of the young people, there was a commentary around feeling excluded from PA, particularly sport, linked to gender and sexuality. This appeared to evoke a sense of resignation, even at such a young age, of having had to give up trying to access certain types of PA due to feeling a lack of inclusivity.

‘ If you feel that you can’t participate in a sport, then your physical health is going to decline, just from the sake of a trans person just trying to negotiate – if you go to a game. Which dressing room are you going to use? You avoid that completely to keep yourself safe or you have then out yourself to people. ’ (NE1, S2).

Such experiences extended to PE lessons, sometimes with a sense of finality and relief, with one trans young person seemingly ‘owning’ that exclusion:

‘ Participant 2: I’m not doing P.E…I also have it on my notes saying that I can’t do PE after going through physiotherapy. Facilitator 2: And is that good, do you think, because you don’t really want to do it? Participant 2: Yeah. ’ (NE3, S3).

For some trans and non-binary young people who were engaging with PA at school, there appeared to be some support and understanding from staff, but this was not enough on its own: ‘ teachers keep coming and talking to me about joining in with the boys and I’ve finally got my mum to agree to let me go. But the teachers keep saying they’ll talk to her, but then my mum keeps saying she’ll ring the school but she never does ’ (NE3, S3). Another young person who had not ‘ come out’ as non-binary fully yet in school further illustrated negative experiences with gendered PE lessons: ‘ So I go into the girls’ PE and I’m sick of it because I go in and it’s just like, “hi girls!!” and I’m just like just kill me now ’. (NE3, S3).

For others, non-gendered opportunities in PE were desirable, avoiding traditional school curriculum activities that implied boys and girls taking part separately. One participant admitted to having hidden in the toilets to avoid PE because: ‘ I despise football but it was the only thing we did for about six months ’ and suggested a need for more ‘ variety ’ and ‘ more inclusive sports .’ (NE1, S3).

Safety first

Young people’s access to and engagement with PA in certain spaces, was foregrounded by a need to feel safe in those spaces both in terms of physical and emotional safety. For many this was linked to fear of crime and substance abuse in the local park: ‘ You could literally go…and it’s probably got either a bag that’s had something in it, alcohol bottles or needles. It’s quite terrifying ’ (SY1, S2). Others highlighted particular situations which required avoidance: ‘… the dealing’s worse…that’s where all the fights happen…that basically…makes it more dangerous for people to be outside’ (L2, S2). When referring to the end of lockdown, and people buying drinking supermarket-bought alcohol ‘… outside in the open, like, and in huge groups…’ one participant noted a fear for safety outdoors which intersected with worries of racist behaviour…’ so, like, that’s one of the things that makes me a little bit, like, scared, like, I wonder, like, would they, like, say something, like, racist to me?’ (SY1, S2).

Active travel was also explained as problematic, but for some a necessity which required precautions for ‘girls’ who it was suggested (by a group of boys) should ‘ Put a key in their fingers ’. (NE2, S2) and careful planning for one participant: ‘ I had to find a whole new route home so I didn’t get harassed and beat up ’. (NE3, S3.5). For one group, avoidance of crimes in progress was critical: ‘ people constantly starting fires ’ (NE2, S1). Fear of harassment outdoors seemed entrenched for many, sometimes linked to gender: ‘ Catcalling, and being followed… harder to feel safe… even in…broad daylight ’ (L2, S3), and other times a generational influence on feelings of fear in the local area: ‘… after a certain time, 4 or 5 o’clock, my nan used to say “time to go back now” because she knew that that’s when all the dodgy people would come out really, even if it wasn’t necessarily dark earlier. ’ (SY1, S1).

For some, fears were more nuanced and centred around avoidance of bullying and transphobia: ‘ I think, particularly for trans and LGBTQ people, it’s difficult to feel secure in a sport … I don’t feel safe going to the club because if they find out I’m trans, they’ll just pick me out ’ (NE1, S2). This extended from access to sports clubs, to open spaces where young people in one LGBTQ + youth group were in agreement about fears of harassment based on their gender or sexuality:

Facilitator 2: When you’re walking around and you’re out and about, how do you feel? Participant 5: I feel like I’m in danger and scared….who feels unsafe out and about? Participant 2: I do. Participant 1: I do too. (NE3, S3)

Together, these points exemplify a need for safe spaces in which young people at risk of marginalisation and living with deprivation can come together, connect within the local community and choose to be active: ‘ Like, because you can be at a park and then you can get harassed easily… You could barely go [to a park] without that [harassment] happening…Makes me not want to leave my house .’ (NE1, S1). The youth groups themselves were seen as places of familiarity ‘ I’ve been here for ages ’, where young people can ‘ socialise, have fun, play a range of games, [make] new friends and bonding[sic] ’(NE2, S1) in a safe environment where social connections can be made. As such, youth groups might be pivotal in providing the kind of leadership and support required for PA access and engagement, perhaps even just in terms of open space provision: ‘ Even though we have the consoles, we don’t really use them that often. We’re mostly just outside… ’ (NE2, S1), as well as facilitating the social networks needed for young people to even consider PA.

Complexities in access & accessibility

Whilst intersecting elements of PA access feature throughout other themes, it is important to draw specific attention to intersecting barriers relating to accessibility, including disability, Covid-19, affordability, and proximity of social networks with whom to engage in PA. In illustrating this, we draw on some of the issues highlighted in earlier themes.

Provision of physically accessible green spaces with appropriate facilities and equipment was an intersecting issue linked to crime and affordability, in the experience of one young person living in an area of poverty:

‘ …where I live and work, like football is huge. We've got a few football pitches, you’ve got to pay to play certain places especially for young people, so it's very, very difficult and what they tend to do, they’ll climb the cage gates and that leads to trouble and…stuff. If we had access to open free football pitches, that would be quite beneficial…I don’t think there’s…enough active stuff. Over the last couple of years, I've noticed they put in like monkey bars and other gym stuff [in parks]. ’ (L1, S1).

In some discussions, disability was an intersecting issue, adding to the complexity of PA access and highlighting affordability and a desire to be active with peers:

Participant 4: It’s not easy [to get about town with transport] if you have a hearing impairment and I know that from experience Participant 1: But it can be easy though, depends on who your friends are, so me and my friends kind of live close [ind] and with covid, we have friends that kind of live further an it’s a little bit harder [ind] Facilitator so why it is harder, what stops that Participant 1: It’s probably either their parents not letting them go out further to come meet us [ind] and they may not be able to afford to…go out (NE2, S1)

This theme illustrates that complexity of intersecting barriers to PA appears particularly pronounced and nuanced for some of these young people. As a result, solutions are likely to be similarly complex..

Considering the naturally intersecting themes in this study, we posit that the overarching concept of ‘physical activity insecurity’ emerged as a significant concern for the young people who generously shared their personal experiences (see Fig.  1 ). Physical activity insecurity is not an established term within the literature. To date and to the best of our knowledge, just one paper has linked it to families’ low readiness to provide opportunities for PA, where food insecurity was already being experienced: an “inability to provide sufficient health-promoting MVPA for children” ([ 32 ], p. 41). We note the distinction with food insecurity, which is well-recognised and formally defined as “limited or uncertain availability of nutritionally adequate and safe foods or limited or uncertain ability to acquire acceptable foods in socially acceptable ways.” ([ 33 ], p. 193). Here, we propose a new conceptualisation for PA insecurity, beyond simply providing a space for PA to be ‘secure’ and in recognition of the complexity of PA as a behaviour which is navigated cerebrally, socially, and politically within a situated space [ 6 ]. Young people in our study were very much aware of the spaces and opportunities for PA and associated potential benefits but were challenged by how the wider social and physical environment responded to them and reinforced feelings of inaccessibility. Here we draw on Friere’s concept of critical consciousness [ 34 ], which refers to an individual’s awareness of oppressive systemic forces in society, a sense of efficacy to work against oppression, to illustrate that instead of internalising the inaccessibility of certain spaces, young people actively highlighted the ways in which spaces were not set up with their access needs in mind. We thus define PA insecurity as a limited or restricted ability to be active, reinforced by worries and experiences of feeling uncomfortable, emotionally or physically unsafe. We suggest this can be as a result of oppressive practices, lack of inclusion and disadvantage. Our findings suggest that PA insecurity can be experienced by any young person at risk of experiencing marginalisation and living with disadvantage, particularly where intersectional barriers overlap. However it seems is particularly nuanced for transgender and non-binary young people, for example in dealing with harassment and/or exclusion due to gender discrimination. We suggest that the young people in our study may not ever be able to contemplate PA until they feel safer, supported and included by society. We explore this further, later on the discussion, in terms of existing theory related to feelings of oppression and discrimination in disablism [ 35 , 36 ].

figure 1

Physical activity insecurity experienced by young people

Challenges linked to gender and sexuality within a sporting context have been widely documented in the sociology literature, for example Anderson’s [ 37 ] inclusive masculinity theory suggests a trend towards reduced sexism and “homohysteria” in recent years. However, Pope [ 38 ] argues that though men’s attitudes to women in sport may be slowly changing for the better, overtly misogynistic masculinities are still prominent. Whilst our work was not grounded in theories of gender and sexuality, our sample comprised one trans masculine, one gender-fluid, four non-binary, nine trans males, 19 female and 21 male participants (Table  1 ) and our data certainly highlight that non cis-gendered individuals felt unable to be their true authentic self around PA. Little else is understood about the lived experiences of LGBTQ + youths in the PA domain, and what exists tends to consider school PE/sport provision [ 39 ]. Research does however support the notion that sexual and gender minority youths, particularly transgender young people, avoid PA settings due to feeling unsafe and uncomfortable [ 40 , 41 ]. Herrick and Duncan [ 42 ] similarly highlight a need for safe, inclusive PA spaces for LGBTQ + adults, and also a need for an intersectional approach to explore PA complexity along with avoidance of elitist and inaccessible terms such ‘athlete’.

Intersectionality highlights the multiple intersecting identities of individuals and groups and how they interact and can compound each other in relation to oppression and inequality [ 21 , 22 ]. In our findings, intersecting socioeconomic and demographic challenges raised by participants included deprivation (as per our sampling strategy), ablesim, crime and safety, affordability, and racism, as well as inequalities related to gender and sexuality. The young people in our study were cognisant to the ways in which different vulnerabilities can interact and compound each other, for example, exclusions related to homophobia, transphobia and ableism being further compounded by income inequality. They also discussed the links between accessibility and place, where some young people have greater opportunities to be involved in PA due to where they live, with closer proximity meaning greater access and affordability. Participants also reflected on how experiences of racism, sexism and/or homophobia in PA spaces increased the likelihood of disengagement. The young people who accessed LGBTQ + specific youth groups reflected on opportunities afforded to them to play or be active with others like them in a safe space, but highlighted that those space are not accessible to all, due to limited capacity and often a need to travel (affordability). Further insight into wider health inequalities as experienced by the LGBTQ + groups can be found in our linked paper by Griffin et al. [ 43 ]. In the present paper, we highlight a need to understand the complex ways in which intersectional disadvantages can intersect and compound each other, and in doing so, exacerbate PA insecurity.

We suggest that our findings of largely internalised feelings of insecurity, discomfort and a lack of safety represent facets of oppression and undermined psycho-emotional wellbeing. As such, there appear to be parallels with the concept of ‘psycho-emotional disablism and internalised oppression’ [ 35 , 36 , 44 , 45 ] where “internalised oppression…can undermine someone’s psycho-emotional well-being and sense of self” ([ 44 ] p. 24). Reeve [ 44 ] further notes that the emphasis on removing psycho-emotional barriers should not lie with the individual, but rather with society. We posit that this has important implications for our findings. Reeve [ 45 , 46 ] describes how indirect psycho-social disablism can reflect experiences of structural barriers, for example the “experience of being faced with an inaccessible building can evoke an emotional response such as anger or hurt at being excluded” ([ 46 ], p. 106). In our study such barriers are described by young people as e.g. changing rooms and uniforms which (drawing on Reeve [ 47 ]) alone might be characterised as solely socio-structural barriers to PA, if we did not have insight into how these experiences made the young people feel marginalised or resigned to inactivity. We suggest that the young people in our study similarly evoked elements of internalised oppression and discrimination in relation to PA, particularly in terms of feeling resigned to a lack of inclusivity and belonging. Importantly, we did not ask the young people in our study about their disability status, and therefore do not apply this theory through a disability lens per se. Rather, we consider here how psycho-emotional disablism might be applied through an intersectional lens, given the sharp similarities in challenges experienced and internalised by our young people. Given this, we suggest that simply adapting or removing structural barriers is insufficient to enable safe PA access for these young people, particularly those identifying as LGBTQ + . Researchers, practice partners and policy-makers need to work with young people to better understand their experiences, and to facilitate trustworthy relationships with PA within society.

Our findings also suggest that compassion, understanding and allyship of a trusted adult, may be critical for young people to feel safe and secure and thus give their trust and permission to engage in PA. Support from adults in positions of power had a strong influence on young people’s (lack of) engagement in PA, for example teachers in the institutional space. This point is supported by work which explored the relationship between a trusted adult and adolescent health and education outcomes [ 48 ], where young people outlined the need for mutual respect, patience and willingness of an adult to go the ‘extra mile’ in enabling them to engage in positive health behaviours. One US-based group has gone as far as to develop bespoke physical education teacher support around inclusive athletics for LGBTQ youth [ 49 ], though how this might work in practice across multiple PA settings needs yet to be explored. We suggest that young people, particularly those experiencing intersectional barriers to PA, should be included in decisions relating to PA policy and design of PA spaces themselves. This may in the longer-term reduce reliance on trusted adults.

Understanding what a secure PA space might actually look like for young people with shared challenges and/or protected characteristics is a clearly needed next step. Yet there exists a dearth of contextual evidence around how young people at risk of experiencing marginalisation and living with disadvantage experience PA in their local environment. More broadly, we acknowledge a need for whole system action to improve young people’s PA experiences in the spaces that they have access to. This extends beyond provision of what might be perceived physically safe spaces (e.g. safe playing, walking or cycling infrastructure) to inclusive language and action linked to changing facilities, clothing, and creating opportunities for PA within existing trusted networks such as youth groups. Though, as we have noted, responsibility for inclusivity should not lie with the individual, hearing the voices of young people in terms of who is needed, where they are needed and how spaces could be made more inclusive, is critical in this respect [ 50 , 51 ].

In the UK, the PA landscape is driven by ‘top-down’ national policy agendas [ 52 ] and responsibility for young people’s PA provision is devolved across numerous sectors at local level. Given gaps in knowledge of PA policy development and implementation for young people [ 3 ] and the need for supportive policies, environments and opportunities to strengthen those national policy efforts [ 5 ] we suggest that further work might look to local groups and networks to co-produce, with young people at risk of marginalisation and living with disadvantage [ 53 ], guidance on what secure PA spaces might look like and who is required to facilitate them. Such work must carefully consider the views of young people, trusted adults (e.g. youth workers) and others involved in provision of PA such as service providers, teachers and local authorities.


Fieldwork took place during periods of Covid-related lockdown, and some comments from participants may reflect challenges which were exacerbated at this point in time. Though there was ethnic diversity across the overall sample, this was largely limited to the southern sites. We also acknowledge potential limitations of recruitment through existing youth organisations, which may exclude the voices of young people who are unable to engage with this provision. Nevertheless, as noted by Fairbrother et al. [ 23 ], working with youth groups enabled us to have the support of Youth Workers in refining topic guides and facilitating participant engagement, as well as providing an invaluable source of trusted support for participants [ 24 ].

Young people were recruited through youth groups and trusted youth leaders were very much part of the process. In each sampling site, the same group of young people engaged in three focus groups, and the building of rapport through this process provided open and honest reflections. We note the rigour of analysis as a strength in this work, particularly the sense-checking of themes with the young people.

Future research

Future research should build on these findings and work with young people at risk of experiencing marginalisation and living with disadvantage to explore what safe PA spaces and associated PA policies might consist of. Further diversity in sampling is also important. Finally, consideration should be paid to whether PA insecurity can be measured, for example via an assessment tool.

We argue that the voices of young people at risk of marginalisation and living with deprivation, including LGBTQ + youths, must be heard in the context of their own embodied PA experiences, in order to mediate PA inequalities. Young people articulated a clear and in-depth understanding of the spaces in which they experience (or do not) PA. They provided a powerful narrative which suggests PA insecurity as central to their lived experiences of PA, often highlighting intersecting barriers to PA which resulted in feelings of internalised oppression and undermined psycho-emotional well-being. We highlight a need for accessible and affordable safe spaces within the local community, where young people can come together and have the ability to be active. Such safe spaces will likely require facilitation and support of trusted adults in terms of helping to manage the complexity of challenges associated with PA for these young people.

Availability of data and materials

The datasets generated and analysed during the current study are not publicly available due to privacy reasons but are available from the corresponding author on reasonable request.

World Health Organisation. Global action plan on physical activity 2018–2030: more active people for a healthier world. Geneva: World Health Organization; 2018.

Google Scholar  

Ding D, Lawson KD, Kolbe-Alexander TL, Finkelstein EA, Katzmarzyk PT, Van Mechelen W, Pratt M. The economic burden of physical inactivity: a global analysis of major non-communicable diseases. Lancet. 2016;388:1311–24.

Article   PubMed   Google Scholar  

van Sluijs EMF, Ekelund U, Crochemore-Silva I, Guthold R, Ha A, Lubans D, Oyeyemi AL, Ding D, Katzmarzyk PT. Physical activity behaviours in adolescence: current evidence and opportunities for intervention. Lancet. 2021;398:4290–442.

Sport England. Active Lives Children and Young People’s Survey. Academic Year 2022–2023. 2023. https://sportengland-production-files.s3.eu-west-2.amazonaws.com/s3fs-public/202312/Active%20Lives%20Children%20and%20Young%20People%20Survey%20-%20academic%20year%202022-23%20report.pdf?VersionId=3N7GGWZMKy88UPsGfnJVUZkaTklLwB_L . Accessed 22 Feb 2024.

Chalkley A, Milton K. A critical review of national physical activity policies relating to children and young people in England. J Sport Health Sci. 2021;10(3):255–62.

Piggin J. What is physical activity? A holistic definition for teachers, researchers and policy makers. Front Sports Act Living. 2020;2:72. https://doi.org/10.3389/fspor.2020.00072 .

Article   PubMed   PubMed Central   Google Scholar  

Pickett K, Taylor-Robinson D, Erlam, J. The Child of the North: Building a fairer future after COVID-19. The Northern Health Science Alliance and N8 Research Partnership. 2021.  https://www.thenhsa.co.uk/app/uploads/2022/01/Child-of-the-North-Report-FINAL-1.pdf .

Sport England. Active Lives Adult Survey November 2020–21 Report. 2022. https://sportengland-production-files.s3.eu-west-2.amazonaws.com/s3fs-public/2021-10/Active%20Lives%20Adult%20Survey%20May%202020-21%20Report.pdf . Accessed 20 April 2023.

International Society for Physical Activity and Health (ISPAH). ISPAH’s Eight Investments That Work for Physical Activity. 2020. www.ISPAH.org/Resources . Accessed 26 May 2023.

Condello G, Puggina A, Aleksovska A, et al. Behavioral determinants of physical activity across the life course: a “DEterminants of DIet and Physical ACtivity” (DEDIPAC) umbrella systematic literature review. Int J Behav Nutr Phys Act. 2017;14:58. https://doi.org/10.1186/s12966-017-0510-2 .

García-Fernández J, González-López JR, Vilches-Arenas A, et al. Determinants of physical activity performed by young adults. Int J Environ Res Public Health. 2019;16(21):4061. https://doi.org/10.3390/ijerph16214061 .

Choi J, Lee M, Lee JK, et al. Correlates associated with participation in physical activity among adults: a systematic review of reviews and update. BMC Public Health. 2017;17:356. https://doi.org/10.1186/s12889-017-4255-2 .

O’Donoghue G, Kennedy A, Puggina A, et al. Socioeconomic determinants of physical activity across the life course: a “DEterminants of DIet and Physical ACtivity” (DEDIPAC) umbrella literature review. Plos One. 2018;13(1):e0190737. https://doi.org/10.1371/journal.pone.0190737 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Alliott O, Ryan M, Fairbrother H, et al. Do adolescents’ experiences of the barriers to and facilitators of physical activity differ by socioeconomic position? A systematic review of qualitative evidence. Obes Rev. 2022;23(3):e13374. https://doi.org/10.1111/obr.13374 .

Nobles J, Fox C, InmanWard A, et al. Navigating the river(s) of systems change: a multi-methods, qualitative evaluation exploring the implementation of a systems approach to physical activity in Gloucestershire. BMJ Open. 2022;12:e063638. https://doi.org/10.1136/bmjopen-2022-063638 .

Cavill N, Richardson D, Faghy M, Bussell C, Rutter H. Using system mapping to help plan and implement city-wide action to promote physical activity. J Public Health Res. 2020;9(3):jphr-2020. https://doi.org/10.4081/jphr.2020.1759 .

Article   Google Scholar  

Rigby BP, Dodd-Reynolds CJ, Oliver EJ. The understanding, application and influence of complexity in national physical activity policy-making. Health Res Pol Syst. 2022;20(1):59. https://doi.org/10.1186/s12961-022-00864-9 .

Patton GG, Sawyer SM, Santelli JS, et al. Our future: a Lancet commission on adolescent health and wellbeing. The Lancet. 2016;387:2423–78.

Bambra C, Riordan R, Ford J, et al. The COVID-19 pandemic and health inequalities. J Epidemiol Community Health. 2020;74:964–8.

PubMed   Google Scholar  

Scott S, McGowan VJ, Visram S. I’m gonna tell you about how Mrs Rona has affected me. Exploring young people’s experiences of the COVID-19 pandemic in North East England: a qualitative diary-based study. Int J Environ Res Public Health. 2021;18(7):3837. https://doi.org/10.3390/ijerph18073837 .

Crenshaw K. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine. In feminist theory and antiracist policies. University of Chicago Legal Forum. 1989;1:139–67.

Hill Collins P, Bilge S. Intersectionality. Malden: Polity Press; 2016. ISBN 978–0–7456- 8448–2.

Fairbrother H, Woodrow N, Crowder M, et al. ‘It all kind of links really’: young people’s perspectives on the relationship between socioeconomic circumstances and health. Int J Environ Res Public Health. 2022;19(6):3679. https://doi.org/10.3390/ijerph19063679 .

Woodrow N, Fairbrother H, Crowder M, et al. Exploring inequalities in health with young people through online focus groups: navigating the methodological and ethical challenges. Qual Res J. 2022;22(2):197–208. https://doi.org/10.1108/QRJ-06-2021-0064 .

Mason J. Qualitative Researching. 3rd ed. Los Angeles: SAGE; 2018.

Braun V, Clarke V. One size fits all? What counts as quality practice in (reflexive) thematic analysis? Qual Res Psychol. 2021;18(3):328–52. https://doi.org/10.1080/14780887.2020.1769238 .

Jessiman P, Powell K, Williams P, et al. A systems map of the determinants of child health inequalities in England at the local level. Plos One. 2021;16:e0245577.

Braun V, Clarke V. What can “thematic analysis” offer health and wellbeing researchers? Int J Qual Stud Health Wellbeing. 2014;9(1):26152. https://doi.org/10.3402/qhw.v9.2615227 .

Fielden A, Sillence E, Little L. Children’s understandings’ of obesity, a thematic analysis. Int J Qual Stud Health Well Being. 2011;6(3):7170.

Hadi MA, José CS. Ensuring rigour and trustworthiness of qualitative research in clinical pharmacy. Int J Clin Pharm. 2016;38(3):641–6.

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77–101. https://doi.org/10.1191/1478088706qp063oa .

Gunter KB, Jackson J, Tomayko EJ, et al. Food insecurity and physical activity insecurity among rural Oregon families preventive medicine reports. Prev Med Rep. 2017;8:38–41.

Cook JT, Frank DA. Food security, poverty, and human development in the United States. Ann NY Acad Sci. 2008;1136:193–209. https://doi.org/10.1196/annals.1425.001 .

Article   ADS   PubMed   Google Scholar  

Friere P. Education for Critical Consciousness. New York: Seabury Press; 1974.

Thomas C. Female Forms: Experiencing and Understanding Disability. Buckingham: Open University Press; 1999.

Thomas C. Sociologies of Disability and Illness: Contested Ideas in Disability Studies and Medical Sociology. Basingstoke: Palgrave Macmillan; 2007.

Book   Google Scholar  

Anderson E. Inclusive masculinity. The changing nature of masculinities. Routledge: New York; 2019. https://doi.org/10.4324/9780203871485

Pope S, Williams J, Cleland J. Men’s football fandom and the performance of progressive and misogynistic masculinities in a ‘new age’ of UK women’s sport. Sociology. 2022;56(4):730–48.

Greenspan SB, Griffith C, Murtagh EF. LGBTQ Youths’ school athletic experiences: A 40-year content analysis in nine flagship journals. J LGBT Issues Couns. 2017;11(3):190–200. https://doi.org/10.1080/15538605.2017.1346492 .

Greenspan SB, Griffith C, Watson RJ. LGBTQ+ Youth’s experiences and engagement in physical activity: a comprehensive content analysis. Adolescent Res Rev. 2019;4:169–85. https://doi.org/10.1007/s40894-019-00110-4 .

Greenspan SB, Griffith C, Hayes CR, et al. LGBTQ + and ally youths’ school athletics perspectives: a mixed-method analysis. J LGBT Youth. 2019;16(4):403–34. https://doi.org/10.1080/19361653.2019.1595988 .

Herrick SSC, Duncan LR. A qualitative exploration of LGBTQ+ and intersecting identities within physical activity contexts. J Sport Exerc Psychol. 2018;40(6):325–35.

Griffin N, Crowder M, Kyle P, et al. ‘Bigotry is all around us, and we have to deal with that’: Exploring LGBTQ+ young people’s experiences and understandings of health inequalities in North East England. SSM - Qual Res Health. 2023;3:100263. https://doi.org/10.1016/j.ssmqr.2023.100263 .

Reeve D. Psycho-emotional disablism in the lives of people experiencing mental distress. In: Anderson J, Sapey B, Spandler H, editors. Distress or disability? Proceedings of a symposium held at lancaster disability 15–16 November 2011. Lancaster: Centre for Disability Research, Lancaster University; 2012. p. 24–9.

Reeve D, et al. Psycho-emotional disablism and internalised oppression. In: Swain J, French S, Barnes C, et al., editors. Disabling Barriers – Enabling Environments. 3rd ed. London: Sage; 2014. p. 92–8.

Reeve D. Psycho-emotional disablism: The missing link? In: Watson N, Roulstone A, Thomas C, editors. Routledge Handbook of Disability Studies. London: Routledge; 2012. p. 78–92.

Reeve D. Negotiating psycho-emotional dimensions of disability and their influence on identity constructions. Disabil Soc. 2002;17(5):493–508.

Whitehead R, Pringle J, Scott E, et al. The relationship between a trusted adult and adolescent health and education outcomes. NHS Health Scotland: Edinburgh; 2019.

Greenspan SB, Whitcomb S, Griffith C. Promoting affirming school athletics for LGBTQ youth through professional development. J Educ Psychol Consult. 2018;29(1):68–88. https://doi.org/10.1080/10474412.2018.1482217 .

Lister NB, Baur LA, Felix JF, Hill AJ, Marcus C, Reinehr T, et al. Child and adolescent obesity. Nat Rev Dis Primers. 2023;9(1):24. https://doi.org/10.1038/s41572-023-00435-4 .

Pickett K, Taylor-Robinson D, et al. The Child of the North: Building a fairer future after COVID-19, the Northern Health Science Alliance and N8 Research Partnership. 2021. https://www.thenhsa.co.uk/app/uploads/2022/01/Child-of-the-North-Report-FINAL-1.pdf

Oliver EJ, Hanson CL, Lindsey IA. Exercise on referral: evidence and complexity at the nexus of public health and sport policy. Int J Sport Policy Pol. 2016;8(4):731–6. https://doi.org/10.1080/19406940.2016.1182048 .

Smith B, Williams O, Bone L, & the Moving Social Work Coproduction Collective. Co-production: a resource to guide co-producing research in the sport, exercise, and health sciences. Qual Res Sport, Exerc Health. 2022;15(2):159–87. https://doi.org/10.1080/2159676X.2022.2052946 .

Download references


The authors would like to thank the members of our stakeholder steering group for their support and input throughout the project. We thank members of the youth organisations who piloted and provided feedback on our data generation tools and methods. We also thank Emily Tupper and Vanessa Er who provided support for some of the focus groups and Matt Egan for his involvement in the wider project. Finally, we thank the young people and youth organisations that took part in the research for their contributions, insights and enthusiasm.

This project was funded by the National Institute for Health and Care Research (NIHR) School for Public Health Research (SPHR) (grant reference number PD-SPH-2015). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

Author information

Authors and affiliations.

Department of Sport and Exercise Sciences, Fuse, Durham University, Durham, DH1 3LA, UK

Caroline Dodd-Reynolds & Carolyn Summerbell

Population Health Sciences Institute, Fuse, Newcastle University, Newcastle, NE1 4LP, UK

Naomi Griffin & Steph Scott

Newcastle University Business School, Fuse, Newcastle University, Newcastle, NE1 4SE, UK

Phillippa Kyle

Health Sciences School, University of Sheffield, Sheffield, S10 2LA, UK

Hannah Fairbrother

ScHARR, University of Sheffield, Sheffield, S1 4DA, UK

Eleanor Holding, Mary Crowder & Nicholas Woodrow

You can also search for this author in PubMed   Google Scholar


CDR was involved in conception and design of the study, data collection and analysis, drafting of manuscript; NG was involved in data collection and analysis; PK was involved in data analysis; SS was involved in data analysis; HF was involved in conception and study design, data analysis; EH was involved in data collection; MC was involved in data collection and analysis; NW was involved in data collection and analysis. CS was involved in conception and design of the study, data analysis, drafting of manuscript. All authors edited drafts and agreed the final submitted manuscript.

Corresponding author

Correspondence to Caroline Dodd-Reynolds .

Ethics declarations

Ethics approval and consent to participate.

Ethical approval was granted by the School of Health and Related Research (ScHARR) Ethics Committee at the University of Sheffield, and the Department of Sport and Exercise Sciences Ethics Committee at Durham University. All methods were undertaken in accordance with the relevant guidelines and regulations of these institutions. All participants involved in the study provided informed consent and those under 16 also provided parental/ guardian consent. All participants were made aware that data collection would remain anonymous and that they would not be identified. No direct quotes used in the study are attributed or traceable to any named individual.

Consent of publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Dodd-Reynolds, C., Griffin, N., Kyle, P. et al. Young people's experiences of physical activity insecurity: a qualitative study highlighting intersectional disadvantage in the UK. BMC Public Health 24 , 813 (2024). https://doi.org/10.1186/s12889-024-18078-9

Download citation

Received : 31 May 2023

Accepted : 12 February 2024

Published : 15 March 2024

DOI : https://doi.org/10.1186/s12889-024-18078-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Physical activity insecurity
  • Adolescents
  • Young people
  • Disadvantage
  • Deprivation

BMC Public Health

ISSN: 1471-2458

research articles using qualitative methods

  • Open access
  • Published: 13 June 2023

Patients’ satisfaction with heroin-assisted treatment: a qualitative study

  • Rune Ellefsen 1 ,
  • Linda Elise Couëssurel Wüsthoff 1 , 2 &
  • Espen Ajo Arnevik 1  

Harm Reduction Journal volume  20 , Article number:  73 ( 2023 ) Cite this article

2337 Accesses

3 Citations

19 Altmetric

Metrics details

Heroin-assisted treatment (HAT) involves supervised dispensing of medical heroin (diacetylmorphine) for people with opioid use disorder. Clinical evidence has demonstrated the effectiveness of HAT, but little is known about the self-reported satisfaction among the patients who receive this treatment. This study presents the first empirical findings about the patients’ experiences of, and satisfaction with, HAT in the Norwegian context.

Qualitative in-depth interviews with 26 patients in HAT were carried out one to two months after their enrollment. Analysis sought to identify the main benefits and challenges that the research participants experienced with this treatment. An inductive thematic analysis was conducted to identify the main areas of benefits and challenges. The benefits were weighed against the challenges in order to assess the participants’ overall level of treatment satisfaction.

Analysis identified three different areas of experienced benefits and three areas of challenges of being in this treatment. It outlines how the participants’ everyday lives are impacted by being in the treatment and how this, respectively, results from the treatment’s medical, relational, or configurational dimensions. We found an overall high level of treatment satisfaction among the participants. The identification of experienced challenges reveals factors that reduce satisfaction and thus may hinder treatment retention and positive treatment outcomes.


The study demonstrates a novel approach to qualitatively investigate patients’ treatment satisfaction across different treatment dimensions. The findings have implications for clinical practice by pointing out key factors that inhibit and facilitate patients’ satisfaction with HAT. The identified importance of the socio-environmental factors and relational aspect of the treatment has further implications for the provision of opioid agonist treatment in general.

Heroin-assisted treatment (HAT) is an intensive form of treatment for opioid use disorder (OUD) that involves the dispensing of medical heroin (diacetylmorphine) from clinics where additional psychosocial interventions and support services are often available. Internationally, this medication is typically not available in a take-home form, unlike other medications used in opioid agonist treatment (OAT). Norwegian HAT patients are expected to be present in the clinic twice daily for supervised intake of medical heroin.

HAT is considered an evidence-based approach for a highly vulnerable patient group [ 1 , 2 ]. Results from randomized controlled trials suggest that HAT can be effective in reducing crime and illicit heroin use [ 3 ] and that the target group stays in this treatment longer than in traditional OAT [ 4 , 5 ].

A large body of research in the OUD field demonstrates that the risk of overdose is high before entering treatment and even greater when treatment is terminated [ 6 ]. Around 1 out of 10 OAT patients in Norway terminate their treatment annually [ 7 ], while a recent systematic review found that between 20 and 84% of OAT patients remain in treatment [ 8 ]. For people with OUD, it is particularly important to adjust and tailor treatment options to different needs in order to facilitate longer treatment retention and reduce serious harm [ 9 ]. HAT is therefore considered an important option for this particular group of patients.

Treatment satisfaction among patients who receive addiction treatment is considered an issue of significance in both clinical practice and research [ 10 , 11 ]. Treatment satisfaction includes patient evaluation of their own experiences of receiving treatment and health-care services [ 12 ]. Patients who are satisfied with OAT tend to stay in treatment longer [ 9 ]. A survey including 1939 patients in outpatient treatment for substance use disorders (SUD) found that satisfaction was positively associated with either the completion of treatment or longer treatment retention, which is further related to favorable treatment outcomes [ 13 ].

Patients’ treatment satisfaction is not limited to the pharmacological aspect of treatment but is also influenced by socio-environmental factors associated with the clinical staff [ 14 , 15 ]. These factors include the staff’s continuity, their personal beliefs about illicit drug use, their preferred methods of treatment, and their therapeutic skills [ 16 , 17 ]. Patients’ satisfaction with the medication offered in OAT also positively influences their satisfaction with other interventions offered in the treatment centers [ 18 ]. Patient satisfaction with prescribed medical heroin cannot, therefore, be detached from the setting in which HAT is provided and the way clinicians provide it [ 19 ].

How clinicians relate to and provide services to patients is fundamental to the patients’ experience of the treatment [ 20 ]. In treatment with medical heroin, one study found that unfavorable interactions with providers of medical heroin or hydromorphone treatment had the strongest independent effect on how patients’ satisfaction changed over time [ 21 ]. Patients see relational dynamics, such as those related to their trust in the clinicians and the clinical environment, as issues that are significant predictors of satisfaction over time [ 22 ]. Given the significance of relational factors for patients’ overall perceptions of treatment, scholars have called for future studies to help determine the inhibitors and facilitators of positive patient–clinician relationships [ 21 ]. Our study answers this call by employing an innovative multidimensional approach where we distinguish between the three key dimensions that are crucial for patients’ satisfaction with HAT: the medical dimension (the diacetylmorphine), the relational dimension (the patient–clinician interactions), and the configurational dimension (the configuration of the treatment).

Only a limited number of studies have examined satisfaction among HAT patients. Moreover, the studies that exist are primarily based on quantitative methodology. Qualitative data enable greater insight into the social and relational aspects of treatment [ 23 ]. Qualitative studies have also demonstrated their ability to identify treatment outcomes that are often overlooked in clinical trials, like the treatment’s positive impact on self-esteem [ 24 ]. The successful effect of HAT thus suggests that the pharmacology of the drug is not the only key to a favorable treatment outcome [ 25 ]. This study therefore employs a multidimensional approach that covers the medical, relational, and configurational dimensions of HAT. This enables us to identify which dimension of HAT is associated with each benefit and challenge that the patients experience in this treatment.

Study context

Norway’s first HAT clinics opened in 2022 as a five-year trial project for people with OUD who have not benefited sufficiently from existing OAT. HAT exists in Canada and seven European countries, but the treatment is configured slightly differently across these countries [ 26 ]. Provision of medical heroin is the core component of this treatment, while other aspects, such as the number of doses provided per day, may differ.

Norwegian HAT is organized in the specialized health services as a part of the established system for OAT. However, HAT differs from traditional OAT by its different operating methods (e.g., higher intensity) and the use of medical heroin. The configuration of HAT in Norway means that patients may spend up to two hours at the clinic daily. This involves time spent in the waiting room, prescreening conversation, the injection room, and the mandatory observation (min. 20 min) for those who inject the medication. This reflects the intensity of this treatment and illustrates why the clinical environment and patient–clinician interactions are crucial for how this treatment is experienced.

Norwegian HAT patients are offered two heroin intakes daily, with the option of less frequent attendance. For the periods not covered by medical heroin, the patients were offered take-home methadone during this study. New national OAT guidelines introduced after this study also made slow-release oral morphine available for HAT patients to take home. At the time of this study, the majority of enrolled HAT patients administered the diacetylmorphine by injection, while about 5% used the alternative oral route of administration. In this National context, HAT is provided in two designated clinics as part of a trial project. The clinics are primarily staffed by nurses, social educators, social workers, and medical doctors with specialization in addiction medicine. Here, the patients are offered basic health checks in addition to the medications, while staff engage in milieu therapy and assist users on issues like finances and housing. These services may not be provided in all HAT clinics internationally [ 27 ].

HAT is open-ended and with no time-limit for patients in Norway, although continuation after the 5-year trial period is unsure. The psychosocial support and care offered are voluntary-based, and no engagement with additional services is required. The poly-drug use among the patients is acknowledged by clinicians and does not in itself lead to sanctions. However, an observational screening by the staff is carried out before medication is provided, and the dosage is reduced if a patient is too affected by other drugs or alcohol.

At the time of writing (June 2023), about 70 patients were enrolled in HAT in Norway’s two largest cities (Oslo and Bergen). Bergen’s clinic was using a temporary location at the time of this study, with a limited capacity of only 20 patients (the capacity increased to 40 in 2023). The clinic in Oslo has a physical capacity of between 70 and 90 patients, according to staff. The stated aim is to increase the total number to between 150 and 300 patients nationally within the period of the trial project [ 7 ]. It is still uncertain whether HAT will become a part of the established OAT treatment services and expanded after the trial period ends. Empirical studies involving primary data and results from Norwegian HAT is yet non-existent. The present study thus fills a knowledge gap by providing insights into the patients’ initial experiences and satisfaction with HAT in Norway.

Participants, setting, and data collection

This study is based on interviews with 26 individuals (Oslo: N  = 19, Bergen: N  = 7) enrolled in HAT. These participants were between 31 and 68 years of age with 47 as the mean age and consisted of 20 males and 6 females. Two participants received heroin in tablet form while the rest injected it. Interviews were conducted four to eight weeks after the participants started HAT. The interviews took place inside or just outside the HAT clinics between March and July 2022. The users’ frequent visits to the HAT clinics made it possible to meet them repeatedly and build trust before conducting the interviews. Participants were not offered any compensation.

A six-person research team carried out the interviews, including two peer researchers with lived experience of OUD from the OAT user organization ProLAR Nett. The researchers had no previous relations to the recruited participants. Questions for the semi-structured interviews were designed to capture patients’ positive and negative experiences with HAT and its impact on their everyday lives. This included questions about what they were most satisfied with and what they found most challenging with HAT, what they thought could be different in HAT, as well as questions regarding the medications, relationship with clinicians and the treatment scheme. Interviews were conducted shortly after HAT was established in Norway, which means that the clinics were in a start-up phase with a number of patients that was lower than what was planned for during a normal situation in the clinics.

Interviews were recorded and lasted on average 41 min. They were conducted after medication intake to have a calmer setting for conversation. The researchers spent considerable time in the clinics to establish rapport before recruiting participants. We used the same opioid intoxication scoring tool as used in Norwegian HAT to make sure researchers never asked for consent or conducted interviews if patients were too affected by the medication. The tool is a translated version of the one used in Danish HAT clinics [ 28 ]. Gaining participants for the interviews was challenging in terms of being able to meet users at a time when they were both willing and not too affected by the medications to participate. Participants’ capacity and willingness to talk about the issues raised in interviews varied greatly. This is likely to be related to their varying capability of self-reflection and self-expression, their mental state on the day of the interview, and the related influence of medical heroin and other drugs [ 29 ]. To accommodate these challenges and the varying accessibility to the participants we employed a flexible approach. This involved considering the patients’ state and situation at the time of every scheduled interview, which led to numerous postponements and cancellations of interviews. For some patients, we facilitated their participation by dividing the interview into several shorter conversations.

All participant names below are pseudonyms. To protect anonymity, we also omitted information about the city in which participants had been enrolled and their age.

Transcribed interviews were coded and analyzed following the principles of a flexible inductive thematic analysis [ 30 ]. We started out identifying and distinguishing between the experienced benefits and challenges of being in HAT, including their positive and negative impacts on patients. The analytical process involved creating and revising codes (themes) to capture the most prevalent benefits and challenges, resulting in three elements of the treatment that were beneficial and three that were particularly challenging. The resulting structure of codes is used to report our findings, where we also link each benefit and challenge to the dimensions (the medical, relational or configurational) of the treatment that produced them (see Table 1 ).

Benefits of the treatment

Three aspects of HAT stand out as beneficial: the access to medical heroin, the positive patient–clinician relations, and the supportive environment of the clinic.

Access to medical heroin

This aspect stands out as the most crucial to the participants and is related to the medical dimension of the treatment. Access to medical heroin was beneficial in two ways: First, it helped to reduce the stress linked with constant pressure to acquire money for illicit drugs. Secondly, the daily clinic visits introduced new routines that—combined with medical heroin—provided energy and hope.

Ingrid compared her life before and after receiving medical heroin:

I would wrap drugs into portion packs, sell them on the streets, being all day around people that are stressed and who want drugs and do not have enough money […] And in the middle of it, maybe if you're lucky you'll sell some sex on the corner too, right, it's just… from that life to being able to sort of wake up, come here to get medicine and then go home, go to bed and sleep two hours on the couch, like. […] It brings a calm and a peace over my everyday life and my life which is completely… well, I've spent over €200 a day on drugs.

Entering treatment with medical heroin alleviated the constant financial pressure to raise money for heroin, as Anne explained: “Life has changed in the way that it has become more quiet at home.” She and her partner needed €600 a day to avoid withdrawal. Many participants sold drugs to finance their own use of illegal heroin before entering HAT. Karl used to sell amphetamines but explained how the pressure to sell changed once he “no longer needed as much money” to stay well. Martin and others similarly said “I don’t sell drugs like I used to, since I started here.”

Related positive impacts of medical heroin were outlined by Geir: “It simply enables me to use my brain capacity for something else than chasing heroin.” Reducing the stress of hustling money impacted the participant’s life, like Tor explained: “I love that I have stability now. That I know, every day, I don’t need to stress about it. That I have what I need. It means a lot, like, to my quality of life.”

The medical heroin created predictability regarding a need that had to be covered in the users’ daily life. These positive impacts are likely to have broader mental health benefits, as alleviating stress positively contributes to people’s recovery processes [ 31 ]. Many participants compared the stress of acquiring money for illegal heroin with the demand of meeting up frequently in the clinic. Fredrik said “Even if we have to come here two times a day, and it is kind of impractical and stuff, it’s peanuts,” because before entering treatment there were not “enough hours around the clock to stress about money.”

Reduced financial pressure further contributed to make the second positive impact of medical heroin possible: new routines and positive energy. Despite the intensity of the treatment, many users found the routines and structure of regular clinic visits positive. The behavioral change of new daily routines imposed by receiving medical heroin in HAT was described by Arild: “I just notice how much easier it is to get out now and getting things done at home, and I eat more.”

Turid pointed out a cognitive shift related to having more time: “Clearly, I have much more time to stake out the life and path I wish for. […] I feel that I can take a look around me and look for opportunities with my eyes wide open.”

HAT structured the lives of participants through the regular clinic visits. These visits and the regular medication intake impacted their everyday routines. Alex felt that entering treatment both offered routine and positive energy: “At last, I have something I can go to regularly. And now I notice that I am starting to get inspired again.”

Erik noticed similar life changes: “It has become a lot better.” He explained “Earlier I just woke up, right? Now I have something to go to.” Erik started to laugh and said that treatment was “almost like being at work again” as he got into a daily “rhythm.” Several participants referred to being in treatment as a job. Ingrid was one of them: “To come here, the rhythm of having something to fill my days with […] It’s like… I see it as my job [laughs] to come here every day. And it has become a very nice job.”

Turid initially “thought it would be a problem” with the intensity of the treatment, but the medical heroin made it “easier to go outside” because “my body feels lighter. It’s easier to just be, to exist.” Many patients referred to the benefits of medical heroin compared to methadone because medical heroin made them function better, both cognitively and physically.

Receiving medical heroin has been found to promote changes in patients’ outlook [ 1 ], while studies also find that regular supervised intake of medical heroin provides valuable stability and routines to patients’ lives [ 24 ].

Patient–clinician relations

The positive relations between patients and clinical staff were prevalent in the patients’ stories and refer to the relational dimension of the treatment. This dimension overlaps with what others have conceptualized as “the therapeutic relationship” [ 32 ] or “everyday interactions” between users and providers of treatment services [ 33 ]. Participants described two aspects of the positive patient–clinician relations: the respectful interactions with staff, and the experience of having an influence on their own treatment.

Referring to previous treatment experiences, Anne described her entry into HAT: “Wherever you go really, you are used to being met with raised eyebrows or a kind of skepticism. And I just have to praise the people working here. I like them all.” Many patients voiced that the staff treated them better than the way they were used to, and often contrasted this to earlier experiences in traditional OAT. Ann described HAT as “the exact opposite” and continued: “I struggled with anxiety, struggled to get out and meet people. And that’s not how it works here at all. Like, it’s like I am a totally different person here.” The positive relationship with clinicians mattered.

Thomas put into words a difference between traditional OAT and HAT: “I think it’s kind of… it’s a better culture here, I think. I think we are met with more respect, and that they have a different approach to us as drug users” Making similar comparisons, Erik said: “It’s like night and day!” and explained the difference: “You get treated for who you are, and it’s not the rules and regime that you have in OAT.”

Geir was positively surprised by HAT: “People are treated respectfully, and kind of get… they aren’t kicked around, and then they behave a lot better. There is a better unison, really.” Fredrik was most satisfied with “those who work here, and kind of the whole atmosphere, the whole way of being welcomed.”

These positive relations with clinicians represented something unusual to several participants. Ingrid said “one simply isn’t used to being met with openness and trust and humanity.” Similarly, Stian found HAT to be unlike his former experiences:

I feel they have knowledge. Perhaps not every person of the staff. Some are new to learning about it, but they behave professionally towards us. And those who are in charge, in particular, have great understanding for the issues, and they adjust the treatment to us.

The reciprocal trust and respect between staff and patients stand out as a key feature of what makes their interaction a positive experience for participants. Stian further felt this treatment was tailored to patients, and not the other way around. This brings us to the second positive aspect of patient–clinician interactions: patients having their voices heard and having influence on their own treatment.

Stian continued to explain: “You can tell them what you have taken. As long as it does not go against what you are about to take [medical heroin], you get the dose you are supposed to get, and you have influence on the dosage.” He described his experience of both increasing and decreasing the dosage: “My voice gets heard—it’s user participation, as it is so nicely called.”

Erika experienced having a great impact on her own treatment: “For example, we adjust my dosage upwards when I need a bigger dose.” Ingrid shared the same experience: “Yes, they listen to what I say, right?”.

Adjusting and finding the right dosage was a major issue among the participants. Influence on dosages was mentioned as a clear sign of being heard and taken seriously by staff. Ola described how the medical doctor had “been absolutely fantastic” in following up on him and in “finding a dose that is adapted to me, that makes me able to feel well and good.”

The positive patient–clinician relations are likely to contribute to a strengthened feeling of self-worth, as opposed to the stigma that patients experienced in other settings. Experience of stigmatization in treatment for OUD is generally a major barrier to treatment entry and retention [ 34 ]. The positive experiences of interacting regularly with the clinicians added important meaning to the clinic visits. Positive relations are generally important for the recovery processes of persons in OAT [ 35 ].

The supportive environment

This beneficial aspect of the treatment was pointed out by many participants and is related to the configurational dimension of the treatment. The configurational dimension covers the way in which the treatment is organized and configured; for example, the services that are provided and that may differ among clinics. Two attributes of the supportive environment were emphasized as beneficial: First, the variety of psychosocial and medical assistance offered and secondly, the way the clinic and treatment provided for a new structure around—and safer setting for—the patients’ heroin use.

When asked if he would recommend HAT to others, Erik answered “Absolutely!” He continued explaining why: “Help is provided here, you know, and they are here for you.” Most participants gave examples of support and assistance they had received from staff. Fredrik said “Firstly, they helped me sort out my finances. So I am working on that now actually.” Thomas offered another example:

When I started here, I got appliances in place in my flat, firstly. […] When I took my medication just now, the nurse said to me: “Have you been to Jysk [warehouse selling beds and bedding products] yet?” And I just said, “No. That’s true… I have a voucher for Jysk.” I will try going there later today to use it.

Fredrik expressed that he was “almost allergic to social workers” but said that what the social worker in HAT had been able to get in place for him was “brilliant in every way.” He received help applying for economic support from the social service, and described assistance with health appointments and a hospital visit. Entering HAT seemed to lower the threshold for using services for many participants, by making patients more able and willing to follow up on different social and health issues.

Anne told us: “I have already been called in to a check with the heart specialist because of a strange sound in the heart. That’s something the medical doctor started right away. […] And then there is that job project.”

Marius was satisfied with the help he received: “I think I have received good help from the social worker, ’cause I am actually in the middle of a housing crisis.”

Fredrik described benefits of being assisted in health issues:

They figured out that I had a very low level of vitamin D, so now I get a vitamin D supplement here, every day. And I have talked to the social worker…. With his help I have booked an appointment at the dentist's and things like that.

Elin also had received “lots of” help and emphasized the impact it had on her: “I am shocked. I have gotten a hope I didn’t have before.” The supportive environment triggered Elin’s hope.

Participants did not follow up or follow through on all the opportunities and assistance that the clinic provided or offered. Nevertheless, many described that they had initiated and followed up more on issues that were important to their everyday lives, health, and quality of life. Birger eagerly told us: “Now I am starting to work tomorrow” through help from the staff. He continued: “I have never worked in my whole life.”

Several participants described the second benefit of the supportive environment; feeling cared for and being safer in HAT than they were outside of treatment. Erika said “At least we have a social worker, a psychologist, and we have six nice nurses and good medical doctors… we now have a good team around us.” Ola had years of experiences from the health-care system before entering HAT: “I have had all kinds of diseases, and I have never been in a unit or anything where I have felt this much at home and welcomed and so well taken care of.”

If a patient in HAT has an overdose or is too heavily affected by medical heroin intake, clinicians are prepared for instant medical aid. Patients also inject in a clinical setting with clean syringes and medical-grade heroin, which does not cause the same problems as illicit street heroin, including less abscesses and other health risks. Vein scanners are available to help users find suitable veins for injection, while clinicians also offer guidance for injecting in the large muscles as an alternative to the veins.

A feeling of being safer and taken care of was important for the participants’ perception of the treatment (see also [ 36 ]). Reidar underlined his hope for care in HAT: “It’s the medical things. That they can follow up on my health, my kidneys, and that they are able to help me, like getting me off the methadone.” Martin said this about HAT: “It has helped, not mainly about the money, but firstly about my own life. I don’t get overdoses. I don’t get abscesses. I don’t get this and that. It’s safety.”

The patients’ experiences of HAT as a supportive environment are closely related to the positive patient–clinician relationships and the previously described benefits of medical heroin. The latter enables patients to use the supportive environment, while the former makes the clinic visits something positive.

The intake of medical heroin overseen by health personnel offered a secure setting with assistance and care that marked a radical shift in the participants’ heroin use patterns prior to HAT [see [ 19 ] for similar findings]. The barriers for participants to make use of services and assistance also seemed to be lowered by entering HAT. This increased use of psychosocial support is likely to positively impact the patients’ quality of life and treatment outcomes [ 11 ].

Challenges of the treatment

Three aspects of HAT stand out as challenging to the participants: the treatment scheme, the clinic rules, as well as the increased downtime and uncertainties about HAT’s future.

Treatment scheme

This aspect stands out as the most important challenge for the participants. It is related to the configurational dimension of the treatment, and particularly the way in which the treatment and medication provision is organized. Patients described two main challenges: First, the limited types, level, and frequency of medications available in HAT; and secondly, the inconveniences of having to show up at the clinic twice a day.

Related to the former, what Ingrid found most challenging was: “Perhaps that it’s only open two times a day [laughs]. It should have been a third time just before the night. It’s hard to get the medication to cover myself around the clock.” The opening hours of the Norwegian HAT clinics are limited to the periods of about 8 am–12 pm and 2–5 pm. Most patients received take-home methadone to cover the evenings and nights, but many disliked its negative side effects or lack of desirable effect. As a result, many chose to buy illegal heroin and other drugs to cover this period. Ingrid hardly used methadone at all and wanted to replace it with slow-release morphine, if possible. Erika said “I think there should be a broader choice of medications. There ought to be a lot more types of morphine. I would have liked to get a morphine tablet.”

Complaints about methadone were widespread (see also [ 36 ]). Fredrik did not like it: “Methadone, it’s like… it makes you well, but oh my god you get so parked in the head that it’s mad. In that way, that’s one of the few things that I see as somewhat negative here.” Marius thought some things in HAT should be changed:

The most important change, I think, would be to open up for other drugs as well. Amphetamines and… yeah. Like, I am here because I have a problematic relationship with heroin, and if the goal is to abstain fully from drug crimes or that type of lifestyle, then those [amphetamines] ought to be offered here as well. At least acknowledge that people are using them, and that it should be allowed to use them here.

Similarly, Thomas explained “I am also dependent on amphetamines” and he wished that HAT could offer amphetamines to those dependent on them.

The experience of receiving too little heroin or take-home medication for the evening and night led many participants to buy illegal drugs. Thomas explained “Here they say: ‘It’s supposed to be enough.’ No, it’s not, because I get fucking sick.” Thomas and many others addressed this challenge of getting by during the nights.

Reidar described such an incident: “What the hell, I get very sick and have a lot of pain and such when I am being stepped down [receiving reduced dosages]. It makes me have to buy heroin.” Many patients used illegal drugs in addition to the legal medication they received, but this was clearly influenced by what the clinic offered.

Some participants used illicit heroin preemptively to avoid waking up sick, because the medical heroin they received was usually not enough to avoid symptoms of withdrawal. Others described being able to skip illegal heroin during the night, but this was dependent on getting an appropriate afternoon heroin dosage in the clinic.

The question of dosage and how to get enough and suitable medication to avoid withdrawal were key issues that featured across interviews. There is no set maximum dosage of medical heroin in Norwegian HAT. Several participants wished to obtain a higher heroin dosage. Anne was not satisfied: “I think the adjustments of my dose are somewhat slow.” She still expressed understanding for the reason behind it, which was her irregular attendance in the clinic.

Anne and a few others had problems with getting to the clinic during opening hours, which relates to the second experienced challenge of the treatment scheme: its intensity. The intensity of the treatment scheme was described as challenging because frequent visits within limited opening hours reduced their opportunity for other activities. They also spent much time inside the clinics in addition to traveling back and forth for each visit. Kjetil found it challenging that HAT “takes quite some time.”

When asked if the treatment made it easier to follow up on other things he wanted to do, Ola replied: “Yeeah [hesitant]. It’s both ways, really, but it’s the thing that you have to show up here two times – you must keep that in the back of your mind all the time. You need to adjust the rest of your life around that.” Like many other participants, Ola underlined that the frequent visits “might make it harder to plan other things.” Martin described the mixed feeling about the treatment’s intensity:

I come every day, twice a day, twice a day. I must like it, or otherwise, what the fuck? I don’t like to wake up, I don’t like to be here at 8 am every day. I come every day, but it’s hard.

Erika wanted a part of the treatment scheme to be different: “The thing about picking up twice daily; it could be allowed to get heroin to take on a vacation and out of here, but that’s not possible.” Stian also described challenges of going on vacation. He was prepared to go on vacation with the methadone he would be given to take with him, but: “I have sorted out some Oxycodone illegally, like, to use as a supplement.” These and other individual challenges, particularly for those who disliked methadone, featured in several interviews. Their solution was usually to buy illegal heroin and other drugs.

The patients’ use of illegal drugs while in HAT should be seen in relation to the configuration of the treatment. This includes the types, frequency, and levels of medication offered, and whether it makes patients feel covered from symptoms of withdrawal. Several participants discussed going with relatives on holiday trips but found it challenging or impossible because of the limited alternatives of take-home medications that could replace medical heroin. The intensity of the treatment could also create extraordinary challenges for patients who had physical disabilities or traveling distances of more than one hour to the clinic. These burdens of the treatment scheme have also been emphasized as negative by HAT patients in other studies [ 21 ].

Clinic rules

Clinic rules were described as challenging by many participants. This is related to the configurational dimension of the treatment, and specifically the way in which patients’ medication intake and behavior are regulated during their clinic visits. Clinic rules were challenging in two ways: They were experienced as too strict or unfounded, and the clinicians’ enforcement of rules and sanctions negatively influenced their relations with patients.

The clinic rules most often referred to by participants were those regulating behavior in different sections of the clinic; in the waiting, screening, injection, and observation rooms. In the injection room, these rules set a limit of 20 min with three attempts to inject the medication. Rules also cover the visual-based screening of patients before heroin intake and a minimum of 20 min of mandatory observation after injection. The visual screening before intake involves the potential for a reduced dose, or denial of medical heroin, if patients are too affected by other drugs or alcohol. Such dosage reduction was at times perceived as an unreasonable sanction or even punishment.

Turid gave an example: “There’s a few others who have gotten their doses reduced because they have been quite loaded when they arrived here, and they experienced it as a sanction.” She had received a reduced dose herself after she told clinicians that she had taken some pills:

Not that I am going around lying to them, but I will not tell them the next time if I have a slip on pills or something else. They will have to figure that out themselves. I may get punished for it if I say something. […] They reduce your dose and in addition they demand that you are supposed to get by and stay well.

The experienced sanctions influenced the relationship with clinicians. Håvard also experienced having his dosage reduced: “Yes, if I have been too drugged, yes, I have. When you are too loaded you are not served.” He found these experiences hard to talk about, but Håvard and other participants usually expressed understanding for having their dose reduced.

Fredrik, however, described a rule that many experienced as unfounded:

What I find somewhat strange is the rule that you are not allowed to shoot in the groin, and those things. I cannot understand…. I get the feeling that this is a rule that is created for you [providers of treatment] more than us. […] I think people should be able to shoot where they used to. It’s not a problem in the injection room [a supervised drug consumption site], so why should it be a problem here?

Marius had negative experiences of the rules related to late arrival as well as the limited time and injection attempts in the injection room:

If you arrive one minute too late, you are not let in. And you have 20 minutes to inject. And they are quite meticulous about it… to begin with, it was three attempts. But now it has dropped to two, but they describe it as three. Like, there’s quite a few things I experience as somewhat strict. […] To me, it has become a big issue to actually make it on time, simply.

Some participants raised the negative implications of having the ban and the 20-min limit in the injection room, which caused stress for patients in a way that could lead to bad decisions and unsafe injection practices.

There had been incidents in the clinic where patients covertly injected in the groin, despite the ban. The ban was a big issue for participants who had been injecting in the groin for years because they had few or no alternative veins to use. After being caught injecting in the groin, Thomas was called to a “serious talk” with the staff about the rules and what was expected of him. Repeatedly failing to abide by the rules could lead to suspension from HAT for a limited time period. Serious violations of rules, like violent behavior or severe threats of violence, may lead to immediate discharge. No patients had been discharged on these grounds between HAT’s start in January 2022 and the time of writing (June 2023).

Martin found the mandatory 20-min observation period in the clinic after each injection to be troublesome:

Sometimes when we are done shooting, why do they tell us to sit down 20 minutes? You don’t need 20 minutes, I don’t understand. Maybe to check; that’s alright. Sometimes some people who come here get mad at them [staff], they scream and this and that. And sometimes when the time is over, they [staff] tell you, go! It’s finished – go!

What Martin describes here, and parts of what Marius said above, are not about the rules themselves, but the way in which they are enforced by clinicians. This brings us to the second challenge of clinic rules: Their enforcement sometimes negatively impacted patient–clinician relations.

Some patients reacted to the arguments used by staff to legitimate the clinic rules, as they were seemingly not primarily based on medical reasoning or considerations of the patients’ health. Patients expressed that some clinic rules could contribute to discontent among users, a worsened atmosphere in the clinic, and a tenser relationship with the clinicians.

Birger mentioned what he perceived as the most demanding part of the treatment rules: “It has to be the thing with waiting for the dose.” All patients have to remain in the waiting room together with other patients before they are allowed into the screening room prior to heroin intake. Stian mentioned wanting the opportunity “to report criticism” about clinic rules and how the clinic is run to the staff and leadership.

While many were clear that several clinic rules were frustrating and negative, a widespread theme was still that HAT was experienced as much better than traditional OAT, because the HAT rules were perceived as more laxed and involving less sanctions (see also [ 37 , 38 ]). However, HAT still involves a set of rules and potential sanctions that regulates the patients’ behavior in the clinic. Other studies also find that patients have negative experiences with the enforcement of clinic rules in HAT [ 21 ].

Downtime and uncertainties

This challenge includes having too much downtime and uncertainties about the future of HAT. Concerning the latter, uncertainties about whether HAT will be terminated after the 5-year trial project ends is related to the configurational dimension of the treatment, while concerns for the clinic milieu with increasing numbers of patients is linked to the treatment’s relational dimension. The challenges of downtime are also related to the configurational dimension of the treatment, especially regarding which activities are offered and possible for patients in HAT.

While the benefits of having more time as a result of access to medical heroin were described above, this also involved challenges. Many participants voiced how downtime caused unrest or boredom. Fredrik was one of them: “With all the spare time I suddenly have gotten, there’s a bit of time to sit and ponder about life, and about everything that did not turn out the way it should.” Ingrid thought the hours between the two daily clinic visits was the worst period: “It’s kinda like I wonder a bit about what more we are to do eventually, yeah. And I know there are more people that are kind of calling for something more.”

When Thomas was asked if he felt less or more socially isolated after entering HAT, he explained:

Suddenly I am sitting there you know with my flat, empty apartment, and myself and I don’t know what the fuck to do, you know. And it becomes… it’s been kind of empty […] I get kind of scared by it. It’s kind of gloomy, right? So I hope they make some initiative down there [in HAT], someone should have done that here, you know, taken some initiative for people to being able to… ’cause there are loads of us here who want to do things now.

Thomas explained another implication of having too much downtime: “I know several people here are still just going downtown, you know, pushing, or going there because they don’t have anything else to do maybe. I don’t fucking know, like. I am doing the same thing myself.”

Karl echoes several participants when saying: “I had expected it to be something more, like, than just coming here for a shot, kind of.” He continued: “It would have been good to have something to do during the days. It’s of course possible to continue as before, going downtown pushing drugs… that’s an option too.” Obviously, getting more free time does not automatically influence positively on recovery processes [ 38 ].

Both Karl and other participants had opportunities for assistance in HAT for job training, courses they might be interested in, or going to a psychologist, while the social workers and other staff also assisted the patients in using opportunities for activities outside of HAT if they wished to. Be that as it may, many did not use these opportunities. Several participants missed social and cultural activities to fill their time between the first and second clinic visit, but what HAT offered did not seem to cover what they sought. Stian had therefore taken an initiative on his own: “We are trying to, a bit on our own, to set up some space for music rehearsal somewhere close to the clinic, so we can have something to do after the shot.” He said that this was missing “also for patients who don’t like music, but who just want to draw, or have some place to socialize.”

The second set of challenges include uncertainty about what the future for HAT will be with more patients and what will happen if HAT is terminated after the 5-year trial period. Anne felt well taken care of in HAT, but realized that the situation would change:

That’s why I am so chuffed that I am in on this in the start-up. Oh my god, all those people [clinicians], and they have so much time as well. Person number 50 is not going to have the same experience that I had as one of the first 15 persons.

Håvard voiced similar concerns: “The only thing I am thinking about is the number of people that will be coming here. That’s what I am thinking. It can be problematic.”

The concern over increasing numbers of patients is related to the atmosphere and milieu in the clinic. The reduced time of clinicians per patient was one issue raised, but participants also voiced concerns about patients queuing outside the clinic before the opening hours and felt that this might lead to quarrels about who is to enter first.

Stian was among the first people who started receiving medical heroin. He emphasized the solidarity and absence of thieving in the beginning, but that “it seems like there has been some lately.” The concern about more conflicts among patients as their number rises was voiced by several participants. Thomas was one of them: “There’s been a few troublemakers coming lately who have triggered the atmosphere in a bit of a negative direction.”

Ola mentioned a challenge that few others raised: “I don’t like the drug scene in this town. I have always tried to avoid it. That’s the only negative thing with HAT, that it’s a gathering of users in one place.” And thinking about the future, he said “When there’s more participants, users, in this system, it means we are going to be gathered and meet every day and it’s gonna involve, most likely, people you don’t want to meet.” Similar concerns also featured among HAT patients in previous studies [ 21 ].

The challenging uncertainties in HAT involved concerns that the atmosphere among users, the positive experience of being in the clinic, and the good relations with clinicians could deteriorate. Concerns were also raised about whether HAT will continue after the 5-year trial period, and how it will affect them if medical heroin becomes unavailable. Martin raised this issue: “It’s a good place, but how long will it last? I don’t know.”

Thomas related the uncertain future of HAT to his experience from another treatment trial project for SUD:

I was part of RusFact [Flexible Assertive Community Treatment for people with SUD] that was just down the road here, but it was a trial project that was terminated, like suddenly. And it’s an example of… I have been through so many processes with so many people which are like… I open up, I enter it with full energy, show up, and I have really fucking done it, showing up. Even with my disability I went there every fucking day, showing up at what I was supposed to show up for, and those kinds of things. Then, suddenly it’s just: ‘Well, now we are dissolving this, because it was only a trial project.’ So all the relations you made there, now it’s just – fuck you, kind of.

The issue of concerns over HAT’s future was not among the most prominent challenges in our data. However, research from other HAT trials has shown that patients have experienced the exit from such trial projects as “tumultuous,” and that they were “anxious about their future” as the trial neared its end and they were to be transitioned to treatments that had failed them in the past [ 27 ].

This qualitative study from the Norwegian context has outlined the three most prevalent benefits and challenges of being in HAT, as seen from the patients’ perspective. Access to medical heroin, the positive patient–clinician relations, and the supportive environment of the clinic and wider treatment were experienced as main benefits. The most challenging aspects were the intense treatment scheme and its limitation in the medications provided, the strict clinic rules, as well as the increased downtime and concerns over the lastingness of HAT. Assessing patients’ treatment satisfaction in this study thus involves weighing their experienced benefits against the challenges.

It is evident that participants were more satisfied than dissatisfied with entering and being in treatment, one to two months after their enrollment. From what participants described as changes in their everyday life after entering treatment, it is also clear that their quality of life has improved in certain areas. Being in HAT helped make their everyday lives safer, more predictable, stable, and with less constant pressures to commit crimes or obtain money in undesirable ways (for similar findings, see [ 3 , 39 , 40 , 41 , 42 , 43 ]) These benefits are likely to contribute positively to treatment retention. The fact that many patients experienced a welcoming environment in HAT, new positive routines, and valuable relations with clinicians as well as gratitude regarding the supportive treatment environment seems—in sum—to give participants a greater sense of self-determination. This is also related to their experience of being heard and having an influence on their own treatment, which enhances patients’ “relational autonomy” [ 42 ]. These positive treatment outcomes are likely to assist patients’ recovery processes by contributing to the user’s “positive sense of identity apart from one’s condition while rebuilding a life despite or within the limitations imposed by that condition” [ 43 ]. People with OUD, either in or out of treatment, view the social life and the daily activities as most important for their recovery processes [ 44 ]. The sum of benefits seemingly makes many patients experience stronger personal relationships, which involves social inclusion and increased self-determination in their everyday life: benefits that have been emphasized as fundamental to the quality of life for people with OUD [ 45 ].

Each benefit and each challenge were related to different dimensions of the treatment (see Table 1 ). The medical dimension of the treatment—primarily the stable access to medical heroin—covered patients’ opioid dependency, but had key social, financial, and mental health implications, which often contributed positively to participants’ quality of life. The relational dimension of the treatment involved regular positive interactions with staff that constituted a meaningful activity in itself, a counterweight to the stigma that patients experienced in other settings (see also [ 46 , 47 , 48 ]). The relational dimension of the treatment was most often described as challenging in relation to staff’s enforcement of clinic rules and when staff members were perceived as lacking in knowledge about heroin use and injection practices. The benefits identified with the relational dimension of the treatment and the positive patient–clinician relationships were closely related to the experienced absence of hostility and intrusive controls. Most of these patients had strong, negative experiences from earlier treatment in traditional OAT.

While HAT patients are likely to have a longer series of negative experiences with OAT than the average OAT patient, and Norwegian OAT has changed substantially over the years, the difference in atmosphere and relationships still stand out as a core contrast between HAT and traditional OAT. HAT’s configuration seemingly enables closer relationships in a milieu where patients and clinicians get to known each other by both formal and informal modes of daily interactions. This is usually not the case in traditional OAT.

Despite restrictive regulations of HAT which in principle is similar to, and in some ways more restrictive than traditional OAT, the patients were still much more satisfied with HAT. The medication was one crucial difference, but the experience of how the staff were flexible and respective toward patients and their expressed needs stand out as equally important (see also [ 49 , 50 , 51 ]). The clinicians seemed to exercise flexibility in a way that alleviated some of the challenges of the restrictive treatment setting.

Most of the experienced challenges were in relation to the configurational dimension of the treatment, which involves the way it is organized and provided to patients. The treatment scheme’s intensity and the limited range, level, and frequency of medications offered were the most prevalent challenges featuring in the data. This duality in the patients’ experience reflects a general tension between the patients’ need for and satisfaction with structured care in HAT, and the restrictive setting in which it is provided, that patients did not like.

This study covers the short-term impacts of the treatment, where the patients’ satisfaction concerns the transition into treatment and the experience at the onset of HAT. Satisfaction may of course change over time, but at this point, the benefits of being in HAT outweighed the challenges involved. Some areas of the treatment are still pointed out as being particularly challenging. The participants’ lives are bound up in a highly intense treatment scheme which may become more challenging if the benefits dissolve over time: for example, if the clinic environment and interaction with staff become negative experiences. The results of this study should thus be interpreted in relation to the phase of the treatment that data covered as well as the fact that HAT was newly established and not yet in normal operation with the full number of patients.

Explaining the different dimensions of the treatment that produce the specific benefits and challenges that patients experience, contributes by indicating trajectories and mechanisms through which specific elements of the treatment produce certain outcomes. This helps fill a gap regarding why and how HAT produces these outcomes. While the access to medical heroin seems to produce similar positive outcomes for patients in HAT across countries, such as reducing illicit heroin use and contributing to safer injection practices [ 50 ], the relational dimension of the treatment is likely to differ more across clinics and countries as it depends more on contextual factors like the organizational culture of the clinic as well as the type of staff and their views of OUD patients [ 51 ].

These findings may be used to create more user-oriented services. Assessing the impact of clinic rules and whether their intentions are achieved seem particularly important in HAT across contexts. Whether opening hours or the number of medication doses offered daily should be expanded, or if groin injections should be allowed in combination with guidance and follow-ups are issues that could be considered in the Norwegian context. In Germany and Switzerland, HAT clinics offer up to three and five medication doses daily, wider opening hours and supervised groin injections [ 52 ]. Switzerland also has positive experiences with take-home medical heroin to stable patients [ 53 ]. Such user-oriented configurations of HAT may make it more flexible toward patients’ needs. However, take-home medical heroin may weaken the positive impact of the therapeutic relationship between patients and clinicians, which is based on the frequent clinic visits.

Knowledge generated from this study may inform current and future HAT programs in Norway and beyond. As this study covers a phase of HAT with a high level of satisfaction, it may be used as a point of comparison in later studies of patients’ satisfaction with HAT domestically and as a template for similar qualitative studies abroad. The insights about what is experienced as positive and negative across each treatment dimension could be useful for OAT in general, where the insights about what enables the positive patient–clinician relations seem particularly important.

This study employed a novel multidimensional approach to investigate patients’ self-reported satisfaction across the medical, relational, and configurational dimensions of HAT. It is clear from data that the benefits clearly outweigh the challenges when the patients’ experienced benefits with HAT are weighed against their experienced challenges. We thus found a high level of treatment satisfaction among patients. The findings have implications for clinical practice by pointing out key areas of the treatment that should be maintained—or potentially changed—to ensure a high level of satisfaction over time. The identified importance of socio-environmental factors and the relational dimension of the treatment also has broader implications for the provision of OUD treatment services more generally. It provides insight into key factors that make the patient–clinician relationship something positive, meaningful and therapeutic in itself.

Availability of data and materials

The interview transcripts generated during the interviews in this study are not publicly available to preserve the confidentiality of the participants.


  • Heroin-assisted treatment

Opioid agonist treatment

  • Opioid use disorder

Substance use disorder

Bell J, Rvd W, Strang J. Supervised injectable heroin: a clinical perspective. Can J Psychiatry. 2017;62(7):451–6.

Article   PubMed   Google Scholar  

Ferri M, Davoli M, Perucci CA. Heroin maintenance for chronic heroin-dependent individuals. Cochrane Database Syst Rev. 2011;2011(12):1–48.

Google Scholar  

Smart R, Reuter P. Does heroin-assisted treatment reduce crime? Rev Randomized-controll Trials Addict. 2021;117(3):518–31.

Oviedo-Joekes E, Guh D, Brissette S, Marchand K, MacDonald S, Lock K, et al. Hydromorphone compared with diacetylmorphine for long-term opioid dependence: a randomized clinical trial. JAMA Psychiat. 2016;73(5):447–55.

Article   Google Scholar  

Strang J, Groshkova T, Uchtenhagen A, van den Brink W, Haasen C, Schechter MT, et al. Heroin on trial: systematic review and meta-analysis of randomised trials of diamorphine-prescribing as treatment for refractory heroin addiction. Br J Psychiatry. 2015;207(1):5–14.

Skeie I, Clausen T, Hjemsæter AJ, Landheim AS, Monsbakken B, Thoresen M, et al. Mortality, causes of death, and predictors of death among patients on and off opioid agonist treatment: results from a 19-year cohort study. Eur Addict Res. 2022;28(5):358–67.

Bech AB, Bukten A, Lobmaier P, Skeie I, Lillevold PH, Clausen T. Statusrapport 2021: Siste år med gamle LAR-retningslinjer [Status report 2021: Last year of the old OAT guidelines]. Oslo: Norwegian Centre for Addiction Research; 2022. p. 2022.

Klimas J, Hamilton MA, Gorfinkel L, Adam A, Cullen W, Wood E. Retention in opioid agonist treatment: a rapid review and meta-analysis comparing observational studies and randomized controlled trials. Syst Rev. 2021;10:216. https://doi.org/10.1186/s13643-021-01764-9 .

Article   PubMed   PubMed Central   Google Scholar  

Kelly SM, O’Grady KE, Brown BS, Mitchell SG, Schwartz RP. The role of patient satisfaction in methadone treatment. Am J Drug Alcohol Abuse. 2010;36(3):150–4.

de los Pérez CJ, Trujols J, Valderrama JC, Valero S, Puig T. Patient perspectives on methadone maintenance treatment in the Valencia Region: dose adjustment, participation in dosage regulation, and satisfaction with treatment. Drug Alcohol Depend. 2005;79(3):405–12.

Vanderplasschen W, Naert J, Vander Laenen F, De Maeyer J. Treatment satisfaction and quality of support in outpatient substitution treatment: opiate users’ experiences and perspectives. Drugs Educ Prev Policy. 2014;22(3):272–800.

Crow R, Gage H, Hampson S, Hart J, Kimber A, Storey L, et al. The measurement of satisfaction with healthcare: implications for practice from a systematic review of the literature. Health Technol Assess. 2002;6(32):1–244.

Article   CAS   PubMed   Google Scholar  

Hser Y-I, Evans E, Huang D, Anglin DM. Relationship between drug treatment services, retention, and outcomes. Psychiatr Serv. 2004;55(7):767–74.

Gjersing LR, Butler T, Caplehorn JRM, Belcher JM, Matthews R. Attitudes and beliefs towards methadone maintenance treatment among Australian prison health staff. Drug Alcohol Rev. 2007;26(5):501–8.

Gjersing L, Waal H, Caplehorn JR, Gossop M, Clausen T. Staff attitudes and the associations with treatment organisation, clinical practices and outcomes in opioid maintenance treatment. BMC Health Serv Res. 2010;10(1):194.

Deering D, Horn J, Frampton CMA. Clients’ perceptions of opioid substitution treatment: an input to improving the quality of treatment. Int J Ment Health Nurs. 2012;21(4):330–9.

Lilly R, Quirk A, Rhodes T, Stimson GV. Sociality in methadone treatment: understanding methadone treatment and service delivery as a social process. Drugs Educ Prev Policy. 2000;7(2):163–78.

Alcaraz S, Viladrich C, Trujols J, Siñol N, et al. Heroin-dependent patient satisfaction with methadone as a medication influences satisfaction with basic interventions delivered by staff to implement methadone maintenance treatment. Patient Prefer Adher. 2018;12:1203–2118.

Blanken P, van den Brink W, Hendriks VM, Huijsman IA, Klous MG, Rook EJ, et al. Heroin-assisted treatment in the Netherlands: history, findings, and international context. Eur Neuropsychopharmacol. 2010;20(Suppl. 2):S105–58.

Lilly R, Quirk A, Rhodes T, Stimson GV. Juggling multiple roles: staff and client perceptions of keyworker roles and the constraints on delivering counselling and support services in methadone treatment. Addict Res. 1999;7(4):267–89.

Marchand K, Palis H, Guh D, Lock K, MacDonald S, Brissette S, et al. A multi-methods and longitudinal study of patients’ perceptions in injectable opioid agonist treatment: implications for advancing patient-centered methodologies in substance use research. J Subst Abuse Treat. 2022;132:108512.

Anstice S, Strike CJ, Brands B. Supervised methadone consumption: client issues and stigma. Subst Use Misuse. 2009;44(6):794–808.

Romo N, Poo M, Ballesta R. From illegal poison to legal medicine: a qualitative research in a heroin-prescription trial in Spain. Drug Alcohol Rev. 2009;28(2):186–95.

Oviedo-Joekes E, Marchand K, Lock K, Chettiar J, Marsh DC, Brissette S, et al. A chance to stop and breathe: participants’ experiences in the North American Opiate medication initiative clinical trial. Addict Sci Clin Pract. 2014;9(21):1–10.

Bell J, Dru A, Fischer B, Levit S, Sarfraz MA. Substitution therapy for heroin addiction. Subst Use Misuse. 2002;37(8–10):1149–78.

Fischer B, Oviedo-Joekes E, Blanken P, Haasen C, Rehm J, Schechter MT, et al. Heroin-assisted treatment (HAT) a decade later: a brief update on science and politics. J Urban Health. 2007;84(4):552–62.

Boyd S, Murray D, et al. Telling our stories: heroin-assisted treatment and SNAP activism in the Downtown Eastside of Vancouver. Harm Reduct J. 2017;14(27):1–14.

Johansen KS. Heroinbehandling i Danmark: en undersøgelse av brugere av behandling [Heroin-treatment in Denmark: an investigation of the users and treatment]. Glostrup: KABS Viden; 2013:221-2. Accessed 25 May 2023 from https://viden.kabs.dk/wp-content/uploads/2018/05/kabs_heroinbehandling-dk_elektronisk_5.pdf

Wright S, Klee H, Reid P. Interviewing Illicit drug users: observations from the field. Addict Res. 1998;6(6):517–35.

Braun V, Clarke V. One size fits all? What counts as quality practice in (reflexive) thematic analysis? Qual Res Psychol. 2021;18(3):328–52.

Cleck JN, Blendy JA. Making a bad thing worse: adverse effects of stress on drug addiction. J Clin Investig. 2008;118(2):454–61.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Meier PS, Barrowclough C, Donmall MC. The role of the therapeutic alliance in the treatment of substance misuse: a critical review of the literature. Addiction. 2005;100(3):304–16.

Skatvedt A. The importance of “empty gestures” in recovery: being human together. Symb Interact. 2017;40(3):396–413.

Harris J, Elrath K. Methadone as social control: institutionalized stigma and the prospect of recovery. Qual Health Res. 2012;22(6):810–24.

Gilchrist G, Moskalewicz J, Nutt R, Love J, Germeni E, Valkova I, et al. Understanding access to drug and alcohol treatment services in Europe: a multi-country service users’ perspective. Drugs Educ Prev Policy. 2014;21(2):120–30.

Mayer S, Fowler A, Brohman I, Fairbairn N, Boyd J, Kerr T, et al. Motivations to initiate injectable hydromorphone and diacetylmorphine treatment: a qualitative study of patient experiences in Vancouver. Canada Int J Drug Policy. 2020;85(102930):1–9.

Demaret I, LemaÎtre A, Ansseau M. Staff concerns in heroin-assisted treatment centres. J Psychiatr Ment Health Nurs. 2012;19(6):563–7.

Carlisle VR, Maynard OM, Bagnall D, Hickman M, Shorrock J, Thomas K, et al. Should I stay or should I Go? A qualitative exploration of stigma and other factors influencing opioid agonist treatment journeys. Int J Environ Res Public Health. 2023;20(2):1526.

Jozaghi E. ‘“SALOME gave my dignity back”’: the role of randomized heroin trials in transforming lives in the downtown eastside of Vancouver, Canada. Int J Qual Stud Health Well Being. 2014;9:1–9.

McCall J, Phillips JC, Estafan A, Caine V. Exploring the experiences of staff working at an opiate assisted treatment clinic: an interpretive descriptive study. Appl Nurs Res. 2019;45:45–51.

Bell J, van der Waal R, Strang J. Supervised injectable heroin: a clinical perspective. Can J Psychiatry. 2017;62(7):451–6.

Lago RR, Bógus CM, Peter E. An exploration of the relational autonomy of people with substance use disorders: constraints and limitations. Int J Ment Heal Addict. 2020;18(2):277–92.

Davidson L, Tondora J, O’Connell MJ, Kirk T, Rockholz P, Evans AC. Creating a recovery-oriented system of behavioral health care: moving from concept to reality. Psychiatr Rehabil J. 2007;31(1):23–31.

Scherbaum N, Specka M. Factors influencing the course of opiate addiction. Int J Methods Psychiatr Res. 2008;17(S1):S39–44.

De Maeyer J, Vanderplasschen W, Broekaert E. Exploratory study on drug users’ perspectives on quality of life: more than health-related quality of life? Soc Indic Res. 2008;90(1):107–26.

Riley F, Harris M, Ahmed D, Moore H, Poulter L, Towl G, et al. ‘I feel like I found myself again’ – rethinking ‘recovery’ in a qualitative exploration of Heroin Assisted Treatment (HAT) service users’ experiences. BMC Harm Reduct J. 2022.

Poulter HL, Walker T, Ahmed D, Moore HJ, Riley F, Towl G, Harris M. More than just ‘free heroin’: caring whilst navigating constraint in the delivery of diamorphine assisted treatment. Int J Drug Polic. 2023;116:104025. https://doi.org/10.1016/j.drugpo.2023.104025 .

Riley F, Harris M, Poulter HL, Moore HJ, Ahmed D, Towl G, Walker T. ‘This is hardcore’: a qualitative study exploring service users’ experiences of Heroin-Assisted Treatment (HAT) in Middlesbrough, England. Harm Reduct J. 2023. https://doi.org/10.1186/s12954-023-00785-y .

Marchand K, Foreman J, MacDonald S, Harrison S, Schechter MT, Oviedo-Joekes E. Building healthcare provider relationships for patient-centered care: a qualitative study of the experiences of people receiving injectable opioid agonist treatment. Subst Abuse Treat Prev Polic. 2020;15(1):7.

Martins MLF, Wilthagen EA, Oviedo-Joekes E, Beijnen JH, de Grave N, Uchtenhagen A, et al. The suitability of oral diacetylmorphine in treatment-refractory patients with heroin dependence: a scoping review. Drug Alcohol Depend. 2021;227(108984):1–15.

McNair R, Monaghan M, Montgomery P. Heroin assisted treatment for key health outcomes in people with chronic heroin addictions: a context-focused systematic review. Drug Alcohol Depend. 2023. https://doi.org/10.1016/j.drugalcdep.2023.109869 .

Zador D, Lintzeris N, van der Waal R, Miller P, Metrebian N, Strang J. The fine line between harm reduction and harm production – development of a clinical policy on femoral (Groin) injecting. Eur Addict Res. 2008;14(4):213–8.

Meyer M, Strasser J, Köck P, Walter M, Vogel M, Dürsteler KM. Experiences with take-home dosing in heroin-assisted treatment in Switzerland during the COVID-19 pandemic - is an update of legal restrictions warranted? Int J Drug Polic. 2022. https://doi.org/10.1016/j.drugpo.2021.103548 .

Download references


This study was part of a mixed-methods research evaluation study of the 5-year HAT trial project in Norway: www.bit.ly/heroin-assisted-treatment . The overarching evaluation is led by the Norwegian Center for Addiction Research at the University of Oslo. The authors gratefully acknowledge the assistance of staff of the HAT clinics in Oslo and Bergen for helping with participant recruitment. We particularly appreciate patient participants for their time and thoughtful contributions during the interviews. The research team who contributed to the data collection for this study, in addition to the first author, includes Heloise Sverdrup Lund, Tiril Kyrkjebø, Christina Hansen, Lene Midtsundstad and Turid Hveding Wik. We would like to thank Ronny Bjørnestad which contributed in planning the study, developing interview guides and facilitating data collection. Thanks also to Silvana De Pirro for input on previous drafts.

Open access funding provided by University of Oslo (including Oslo University Hospital). This research was funded by the Norwegian Directorate of Health and Oslo University Hospital. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders.

Author information

Authors and affiliations.

Section for Clinical Addiction Research, Oslo University Hospital, PO Box 4959 Nydalen, 0424, Oslo, Norway

Rune Ellefsen, Linda Elise Couëssurel Wüsthoff & Espen Ajo Arnevik

The Norwegian Centre for Addiction Research, University of Oslo, PO Box 1039 Blindern, 0315, Oslo, Norway

Linda Elise Couëssurel Wüsthoff

You can also search for this author in PubMed   Google Scholar


RE contributed to investigation, methodology, formal analysis, and writing—original draft. LCW was involved in project administration, resources, and writing—review and editing. EAA contributed to funding acquisition, conceptualization, methodology, resources, and writing—review and editing.

Corresponding author

Correspondence to Rune Ellefsen .

Ethics declarations

Ethics approval and consent to participate.

The project was approved by the Regional Ethical Committee (195733) as well as the local Data Protection Officers in Bergen Health Trust (3061–3061) and Oslo University Hospital (20/27594). All data were handled and stored in line with regulations for health research and sensitive personal data. Written informed consent was obtained from the participants prior to the study procedures.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ellefsen, R., Wüsthoff, L.E.C. & Arnevik, E.A. Patients’ satisfaction with heroin-assisted treatment: a qualitative study. Harm Reduct J 20 , 73 (2023). https://doi.org/10.1186/s12954-023-00808-8

Download citation

Received : 18 April 2023

Accepted : 06 June 2023

Published : 13 June 2023

DOI : https://doi.org/10.1186/s12954-023-00808-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Diacetylmorphine
  • Patient satisfaction
  • Qualitative research
  • Socio-environmental factors

Harm Reduction Journal

ISSN: 1477-7517

research articles using qualitative methods

  • Open access
  • Published: 18 March 2024

A mixed methods analysis of the medication review intervention centered around the use of the ‘Systematic Tool to Reduce Inappropriate Prescribing’ Assistant (STRIPA) in Swiss primary care practices

  • Katharina Tabea Jungo 1 , 13 ,
  • Michael J. Deml 2 ,
  • Fabian Schalbetter 1 ,
  • Jeanne Moor 1 , 3 ,
  • Martin Feller 1 ,
  • Renata Vidonscky Lüthold 1 , 12 ,
  • Johanna Alida Corlina Huibers 4 ,
  • Bastiaan Theodoor Gerard Marie Sallevelt 5 ,
  • Michiel C Meulendijk 6 ,
  • Marco Spruit 6 , 7 , 8 ,
  • Matthias Schwenkglenks 9 , 10 , 11 ,
  • Nicolas Rodondi 1 , 3 &
  • Sven Streit 1  

BMC Health Services Research volume  24 , Article number:  350 ( 2024 ) Cite this article

1 Altmetric

Metrics details

Electronic clinical decision support systems (eCDSS), such as the ‘Systematic Tool to Reduce Inappropriate Prescribing’ Assistant (STRIPA), have become promising tools for assisting general practitioners (GPs) with conducting medication reviews in older adults. Little is known about how GPs perceive eCDSS-assisted recommendations for pharmacotherapy optimization. The aim of this study was to explore the implementation of a medication review intervention centered around STRIPA in the ‘Optimising PharmacoTherapy In the multimorbid elderly in primary CAre’ (OPTICA) trial.

We used an explanatory mixed methods design combining quantitative and qualitative data. First, quantitative data about the acceptance and implementation of eCDSS-generated recommendations from GPs ( n  = 21) and their patients ( n  = 160) in the OPTICA intervention group were collected. Then, semi-structured qualitative interviews were conducted with GPs from the OPTICA intervention group ( n  = 8), and interview data were analyzed through thematic analysis.

In quantitative findings, GPs reported averages of 13 min spent per patient preparing the eCDSS, 10 min performing medication reviews, and 5 min discussing prescribing recommendations with patients. On average, out of the mean generated 3.7 recommendations (SD=1.8). One recommendation to stop or start a medication was reported to be implemented per patient in the intervention group (SD=1.2). Overall, GPs found the STRIPA useful and acceptable. They particularly appreciated its ability to generate recommendations based on large amounts of patient information. During qualitative interviews, GPs reported the main reasons for limited implementation of STRIPA were related to problems with data sourcing (e.g., incomplete data imports), preparation of the eCDSS (e.g., time expenditure for updating and adapting information), its functionality (e.g., technical problems downloading PDF recommendation reports), and appropriateness of recommendations.


Qualitative findings help explain the relatively low implementation of recommendations demonstrated by quantitative findings, but also show GPs’ overall acceptance of STRIPA. Our results provide crucial insights for adapting STRIPA to make it more suitable for regular use in future primary care settings (e.g., necessity to improve data imports).

Trial registration

Clinicaltrials.gov NCT03724539, date of first registration: 29/10/2018.

Peer Review reports

Globally the proportion of adults with multimorbidity has increased in past decades [ 1 , 2 ]. More than 50% of older adults aged ≥ 65 years have several chronic conditions [ 3 ]. The coexistence of ≥ 2 chronic conditions is commonly referred to as multimorbidity [ 4 ]. Multimorbidity is usually accompanied by polypharmacy, which can be defined as the concurrent, regular intake of ≥ 5 medications [ 5 ]. The higher the number of medications used, the more likely older adults are to have potentially inappropriate polypharmacy, which not only consists of the use of inappropriate medications, but also prescribing omissions [ 6 , 7 , 8 , 9 , 10 ]. The use of potentially inappropriate medications, highly prevalent in older adults with multimorbidity and polypharmacy [ 11 ], is associated with an increased risk of adverse drug events, falls, and cognitive decline in older adults [ 12 , 13 , 14 , 15 , 16 ]. This in turn is associated with increased health services use, such as hospitalizations or emergency department visits, and higher healthcare costs. Hence, optimizing medication use of older adults with multimorbidity and polypharmacy is a crucial task.

However, performing medication reviews is time-consuming and can be challenging, especially in a context in which time allocated to treating individual patients is short, as is commonly the case in primary care settings, and large amounts of patient information need to be processed (e.g., medications, diagnoses, lab values, patient preferences). Considering new possibilities available through the digital revolution, electronic clinical decision support systems (eCDSS) can be a useful tool for supporting healthcare professionals, when performing medication reviews. eCDSS are software-based tools, able of managing large amounts of data and designed to be a direct aid to clinical decision making [ 17 ]. They are capable of matching information, such as evidence-based clinical recommendations (e.g., guidelines), with patient information and can thereby generate patient-specific recommendations.

One such eCDSS is the ‘Systematic Tool to Reduce Inappropriate Prescribing’ Assistant (STRIPA). It is based on the algorithms of the ‘Screening Tool to Alert doctors to Right Treatment’ (START) and ‘Screening Tool of Older Person’s Prescriptions’ (STOPP) version 2 [ 18 ]. The STOPP/START criteria are the most widely used and extensively studied explicit screening tool to detect potentially inappropriate prescribing in older patients in Europe [ 19 , 20 ]. While the STOPP criteria highlight situations of potentially inappropriate medication use (e.g., overprescribing, drug-drug interactions, drug-disease interactions, incorrect dosages), the START criteria indicate potential prescribing omissions. The STRIPA generates patient-specific recommendations, based on the STOPP and START criteria, by considering medication lists, diagnoses, and selected lab values [ 21 ]. It is thus a promising tool for optimizing pharmacotherapy in older adults and has been tested in two clinical trials to determine if its use can improve clinical outcomes (e.g., European multicenter hospital-based OPERAM trial in Switzerland, the Netherlands, Belgium and Ireland [ 22 , 23 ], OPTICA trial in Swiss primary care settings [ 24 , 25 , 26 ].

The use of eCDSS has been shown to be beneficial for certain medication-related outcomes, such as reductions of medication errors, improvements in prescribing quality and decreases in the use of potentially inappropriate medications, which in turn leads to increased medication safety [ 27 , 28 , 29 ]. However, the evidence supporting the use of eCDSS largely focuses on hospital settings and results are mixed for primary care settings [ 30 ]. More specifically, current evidence shows high variability in the effectiveness and implementation of such tools in primary care settings and reports implementation challenges (e.g., time-consuming data entry, alert fatigue) [ 31 , 32 , 33 , 34 ]. Such documented problems related to implementing these tools can be hypothesized to have negatively influenced the impact of their use. Consequently, studying eCDSS implementation in primary care settings is crucial, as this will influence the future development of effective implementation strategies. In this context, the present study aimed to explore the implementation of the medication review intervention centered on the use of the STRIPA during the ‘Optimising PharmacoTherapy In the multimorbid elderly in primary CAre’ (OPTICA) trial conducted in Swiss primary care settings by using an explanatory mixed-methods approach. Our goal was to analyze the number of prescribing recommendations generated and implemented, the time expenditure for performing the intervention, and the key themes emerging from interviewing general practitioners (GPs) about their use of the intervention.

This research was embedded in the OPTICA trial [ 26 ], a cluster randomized controlled trial in Swiss primary care practices conducted by an interdisciplinary and interprofessional team (e.g., GPs, epidemiologists, etc.). The main goal of this trial was to investigate whether the use of a structured medication review intervention centered around the use of an eCDSS, namely the ‘Systematic Tool to Reduce Inappropriate Prescribing’ Assistant (STRIPA), helps to improve the medication appropriateness and reduce prescribing omissions in older multimorbid adults with polypharmacy compared to a medication discussion between GPs and patients [ 24 , 25 , 26 ]. The details of the trial protocol and the baseline characteristics of study participants have previously been reported [ 24 , 25 ]. Fig  1 provides an overview of the different steps of the intervention. In addition to detecting potential overuse, underuse, and misuse of drugs, STRIPA generated prescribing recommendations to prevent drug-drug interactions and inappropriate dosages, by combining both implicit and explicitly tools to improve appropriate prescribing [ 21 ]. The version of the STRIPA used for the OPTICA trial had been adapted for use in primary care settings from the STRIPA version used in the OPERAM trial conducted in four European countries, in which the medication review intervention was done during hospitalization [ 22 , 23 , 35 ]. The data on medications, coded diagnoses, laboratory values, and vital signs originating from the electronic health records (EHR) of participating GPs and their patients were imported into the STRIPA by the study team after they were obtained from the ‘Family Medicine ICPC-Research using Electronic Medical Records’ (FIRE) EHR database [ 36 ]. Trial participants were ≥ 65 years old, had ≥ 3 chronic conditions, regularly used ≥ 5 medications and were followed-up for 12 months. In the intervention arm GPs used the STRIPA to perform a medication review and engaged in shared-decision making with patients. Trial results were inconclusive on whether the medication review intervention centered around the use of an eCDSS led to an improvement in medication appropriateness or a reduction in prescribing omissions at 12 months compared to a medication discussion in line with usual care (without medication review). Nevertheless, the intervention was safely delivered without causing any harm to patients and led to the implementation of several prescribing recommendations [ 26 ].

figure 1

Schema of the six steps of the OPTICA study intervention using the ‘Systematic Tool to Reduce Inappropriate Prescribing’ (STRIP) assistant. Adapted from: Jungo et al. [ 24 ]

Study design

In this sub-study, we used a mixed methods design in which we combined information collected from participating GPs on the prescribing recommendations generated and implemented and semi-structured interviews with GPs from the OPTICA intervention group. In an explanatory approach, we first collected quantitative data, which we sought to subsequently further explain and understand through qualitative methods [ 37 ]. We reported the findings of this study according to the CRISP statement [ 38 ].


In both the quantitative and qualitative part of the research project, the study participants were the GPs who were randomly assigned to the intervention arm of the OPTICA trial ( n  = 21).

Data collection

Quantitative component.

Since during the trial all GPs from the OPTICA intervention group had access to the medication review intervention centered around STRIPA and were asked to perform it with their recruited patients, we invited all of them to report information on the use of the intervention in the REDCap study database. This covered the number of generated and the implemented prescribing recommendations, which are relevant outcomes to study the implementation of a medication review intervention. In addition, GPs had the option of providing free text responses on why they did not implement any prescribing recommendations. KTJ verified the entries in REDCap and completed them with information available in STRIPA. The following variables were collected for each recommendation generated: name of the recommendation, type of the recommendation, whether the recommendation was presented to the patient, and (if applicable) whether the recommendation was implemented. Furthermore, GPs directly reported the time used to prepare and conduct the medication review as well as the time spent on the shared decision-making with the patient. Quantitative data were collected between May 2019 and February 2020.

Qualitative component

We performed semi-structured interviews with a purposive sample of intervention group GPs who had been included in the OPTICA study. Interviews were conducted by FS in Swiss German and transcribed verbatim to High German. The interview guide included questions related to GPs’ attitudes towards treating older adults with multimorbidity and polypharmacy, the conduct of the medication review intervention tested during the OPTICA trial, and GPs’ general attitudes towards the use of eCDSS for optimizing prescribing practices (Appendix 1 in the Supporting Material). Preliminary quantitative data were used to inform the interview guide (e.g., quantitative findings about the implementation of prescribing recommendations and the use of the eCDSS, such as “We saw that it took around 40 minutes to prepare and perform the intervention. How does that compare to your experience during the trial when conducting the intervention?”), so that GPs could provide information on their perspective. Interviews were audio-recorded and transcribed into text for analysis. Interviews were conducted between October 2019 and February 2020.

Data analysis

We described participant baseline characteristics and performed descriptive analyses. We calculated the total number of recommendations generated per study participant in the OPTICA intervention arm. We then calculated the number of recommendations physicians reported to have discussed with patients and the number implemented after shared decision-making. In addition, we calculated the average time spent on preparing and conducting medication reviews and the average time of shared decision-making consultations. Since some variables were non-normally distributed (visual test), we present mean (standard deviation) and median (interquartile range). We performed all analyses with Stata 15.1 (StataCorp, College Station, TX, USA) [ 39 ].

We analyzed the qualitative data with thematic analysis , which is a commonly used tool to identify and analyze patterns in qualitative data [ 40 ]. We used a mix of deductive and inductive coding, with deductive coding allowing us to expand on specific findings from the quantitative results and inductive coding allowing us to interpret any surprising findings we had not expected. Three of the investigators (KTJ, MJD, FS) contributed to the identification of themes. Consensus was reached by discussing the themes that were independently identified. In addition, we used the Framework method by Gale et al. to structure our analyses [ 41 ]. We used the software TamsAnalyzer to code and organize qualitative data into meaningful themes [ 42 ].

Baseline characteristics

There were a total of 21 GPs and 160 of their patients in the intervention group. Table  1 provides baseline characteristics of the GPs and patients in the OPTICA intervention group.

Table  2 shows the expenditure of time, per patient, for the preparation of the STRIPA, the conduct of the medication review intervention, as well as the duration of discussion with the patient. We observed that the drag/drop function to assign drugs to medical conditions in the STRIPA had been used for 133 out of the 160 patients in the intervention group, by 20 of the 21 GPs. GPs in the intervention group conducted a mean of 6 medication reviews (median = 7). For the 133 patients, a minimum of one prescribing recommendation had been generated for 130 patients (97.7%). A total of 704 prescribing recommendations had been generated for patients in the intervention group [ 26 ]. For the 133 patients, an average of 3.7 STOPP/START recommendations (SD 1.8, range: 0–11, median = 3, IQR = 2–5) was generated by STRIPA per patient. The mean number of STOPP recommendations generated by STRIPA was 2.3 (SD 1.3, range: 0–7, median = 2, IQR = 1–3) per patient and the mean number of generated START recommendations was 1.3 (SD 1.2, range: 0–6, median = 1, IQR = 1–2). For 53 patients in the intervention group, 10 of the GPs provided information on the implementation of prescribing recommendations. For 31 out of the 53 patients (58.5%) at least one prescribing recommendation was reported to have been implemented. On average, 1 recommendation to stop or start a medication was reported as implemented per patient (SD = 1.2, median = 1, IQR = 0–2). The most common reasons why GPs reported not implementing the prescribing recommendations were: beliefs that current prescriptions were beneficial for patients, recommendations were not suitable for patients, and bad experiences with previous medication changes.

Quantitative findings

Qualitative findings.

Overall, semi-structured interviews were conducted with 8 of the 21 GPs randomized to the intervention group. The qualitative results allowed us to focus more specifically on GP perspectives on, and experiences with, STRIPA and to support our understanding of the limited implementation documented in the quantitative findings (e.g., significant time expenditure and limited implementation of prescribing recommendations). GPs generally appreciated the fact that the STRIPA was able to manage a large amount of data and to generate different types of prescribing recommendations, such as discontinuing or initiating medications. Despite this general appreciation, we identified the following themes as being barriers for GPs for STRIPA use: length of time for STRIPA preparation, problems with data sources, and poor data quality, sub-optimal functionality, limited recommendation practicability, and problems related to the implementation of recommendations.


Most GPs mentioned that the coding of diagnoses (to ICPC-2) in their EHR systems was a time-consuming and cumbersome task because most did not routinely use it prior to the beginning of the trial. GPs found the expenditure of time to prepare the STRIPA, including the coding of diagnoses, too high. For instance, one GP (male, 57 years) stated, “ I was a little overwhelmed by the administrative burden ”. It also became clear that the lengthy time expenditure involved in preparing the STRIPA would be a limiting factor for the tool’s future use: “if time expenditure remains that high, the STRIPA has no chance of being used in clinical practice ” (GP, male, 44 years). It was also stated that this long preparation time would not have made it possible for GPs to use the tool during the consultations with patients present.

Data import

Another major theme involved sub-optimal completeness of data imported from EHR systems to web-based STRIPA, which created additional work for GPs. Problems with data imports were multifaceted. First, not all information needed for STRIPA use was systematically captured in EHR systems and fully exported to the FIRE project database. For instance, this concerned unstructured information in text fields and lab values for which the FIRE team did not yet standardize imports into their database. Second, there was a time lag of up to a couple of weeks, because as explained above, data were transferred via data exports from the physicians' EHR systems to the FIRE project database and then back to the STRIPA. This required data to be updated and verified once they were in the STRIPA. Overall, GPs expressed that this time-consuming data updating and correcting was a limiting factor for future use of the STRIPA: “I had to capture quite a lot of information by hand, and that is of course terribly tedious and time-consuming and thus not suitable for daily practice ” (GP, male, 44 years Footnote 1 ). Some GPs mentioned how they would have appreciated an automated data transfer from the EHR system used in their GP office to the STRIPA, as this would have facilitated their use of the tool.

Functions and features

Overall, GPs reported to be satisfied with the functions and features of the STRIPA. For instance, GPs appreciated STRIPA’s ability to incorporate a wide variety of values into analysis (i.e., different lab values, medication lists, diagnoses, vital signs), which they would not have been able to do manually. Further, GPs described how they appreciated the varied types of prescribing recommendations, since this highlighted different types of prescribing-related problems. However, not all GPs thought the tool was intuitive to use. Further, some GPs reported technical problems when using the tool (e.g., long buffering when loading a new page or the next step of the analyses, problems with downloading PDF reports). GPs also noted a learning effect (e.g., after getting to know the tool, GPs were able to perform the subsequent reviews faster).

GPs’ perceptions of the suitability and practicability of recommendations

GPs reported being satisfied with the overall quality of recommendations. However, GPs emphasized that recommendations were not always suitable, practicable or clinically relevant. First, due to the above-mentioned problems with data imports, recommendations were sometimes not applicable for patients. For example, there may have been valid reasons why certain medications were prescribed at certain doses, and these reasons were not captured in the STRIPA. Second, recommendations were sometimes not suitable because of the seasonality of recommendation (i.e., influenza vaccine: most GPs used the STRIPA in spring 2019, which did not correspond to the influenza vaccination season). Furthermore, in the EHR systems GPs usually did not list the influenza vaccine to the regular medications used by their patients, which is why the recommendation to vaccinate appeared, irrespective of whether the patient had been vaccinated in the past fall. Third, in some cases, the STRIPA could not use all information provided (e.g., it did not capture that some medications had several active ingredients). In some instances, GPs reported not implementing certain recommendations as they did not believe that these recommendations would change patient health-status or well-being.

Further, some recommendations were perceived as too basic and therefore not useful for experienced GPs. One GP put it like this: “ Some of the information provided is not necessary for an experienced general practitioner ” (GP, male, 44 years). In some instances, the STRIPA generated prescribing recommendations that were already known to the GPs but had deliberately not been implemented for specific reasons, such as patient preferences. Another GP explicitly stated that he had wished for more “courageous” recommendations, which would have gone beyond the “evident” recommendations and would have challenged his previous prescribing decisions. GPs, however, also emphasized how the generation of only few recommendation for some of patients confirmed their prescribing decisions and work as physicians: “ I was happy, that the medication was not questioned in general. Otherwise, I would have had to doubt the quality of my work ” (GP, male, 44 years). The recommendations, or rather the lack thereof, was perceived as a confirmation of quality work by some GPs.

Implementation of prescribing recommendations

The implementation of prescribing recommendations generated by the STRIPA was one of the themes that was discussed during the interviews. In general, GPs confirmed the relatively low implementation rate with only a fraction of recommendations being implemented, which is in line with our first step’s quantitative findings. However, interviews showed differences between GPs in terms of how many recommendations they reported having implemented. Because the STRIPA sometimes did not capture all nuances of patient health status, GPs often had valid reasons to reject generated recommendations. Consequently, only a small percentage of recommendations was presented to and discussed with patients. One GP, however, also told us that while he was not able to implement many recommendations directly, seeing them with the tool helped him to become aware of potential prescribing problems. With regards to the implementation of recommendations that they deemed feasible, some GPs reported challenges when respect to presentation to patients. One GP expressed it like this: “You have to be careful not to make yourself ‘lower’ than you are as a doctor. You should radiate a certain competence and not give the impression ‘I need a computer to help me treat you.’ Otherwise, it’ll be too complicated ” (male, 44 years).

Finally, the overall impressions of GPs were that the STRIPA was a potentially useful tool, but that its functionality was not ideal for regular use in clinical practice. For instance, a GP (male, 57 years) said, “ The STRIPA is actually very useful, even in the way in which it works right now, but it is too complex for everyday use. ” Another GP (male, 44 years) echoed this sentiment, “ If the STRIPA wants to get a chance, it has to run a lot smarter ,” meaning that data entry should be fully automated. Overall, while some GPs stated that their expectations were met, others stated that they were disappointed by the tool.

This mixed-methods study set out to explore the conduct of a medication review intervention centered around the use of the STRIPA in a real-life clinical setting during the OPTICA trial, a cluster-randomized controlled trial conducted in Swiss primary care settings. Our quantitative findings show that the expenditure of time for the preparation and use of the STRIPA as well as for the discussion of the recommendations generated was substantial, which may have limited the overall implementation of the intervention. Further, a small percentage of recommendations generated by the tool were presented to patients and implemented. The qualitative part of the study helped to explain the quantitative findings and showed that the main reasons for limited implementation of the STRIPA were related to problems with the data source, preparation of the eCDSS and its functionality, as well as the practicability of generated prescribing recommendations.

Time factor

Both our quantitative and qualitative findings showed substantial time expenditures were required to prepare STRIPA, to run analyses and to discuss recommendations with patients. This finding is in line with the results from a process evaluation of a deprescribing intervention based on an eCDSS, in which GPs mainly reported retrieving additional information for the use of the tool to be time-consuming and inconvenient [ 32 ]. A previous study on the efficiency of medication reviews performed with the STRIPA showed that the time expenditure declined as professionals gained more experience (e.g., from around 20 to around 10 min per review) [ 43 ]. We unfortunately do not have any data to make comparisons about the time needed for medication review based on the STRIPA to other medication reviews performed by the same GPs in our sample.

Data handling

Another major implementation challenge that we observed involved problems with data imports and the cumbersome nature of manual data entry, which was partially needed to add or update missing or incorrect information. In the OPTICA trial, the purpose of using data from electronic health records was to facilitate data entry for GPs. Despite this, most GPs reported that they had to spend a relatively large amount of time to manually update and add information as shown by the quantitative data (e.g., code diagnoses, update medication lists due to frequent changes in older multimorbid patients). In most cases, this was due to time lags following latest exports to the FIRE project database, which may have rendered an update necessary. There were also issues because not all data from the physicians’ electronic health record systems could be exported to FIRE (e.g., unstructured text information or certain lab values collected with different measurement units in different reference laboratories) and because different EHR systems exported data differently (e.g., reporting of medications and diagnoses at every encounter vs. reporting only when changes are made in the record). Some GPs criticized “missing information” in the data that had been imported into the STRIPA from their electronic health records programs via the FIRE project database. This may have resulted from GPs not knowing how data exported to the FIRE project were structured (i.e., that they were limited to selected values, or that data had to figure in the EHR system for a certain amount of time before inclusion in an export, which is why last-minute updates before an export may not have been captured).

Another main barrier to the use of the STRIPA, which was shown by the quantitative findings and explained by the qualitative findings, was the relatively low implementation rate of recommendations generated by the tool. These findings are similar to previous ones from trials testing an eCDSS based on the STOPP/START criteria in hospital settings [ 23 , 44 , 45 ], one of which showed that 15% of all prescribing recommendations were implemented and the other one showed that 62% of patients had had ≥ 1 recommendation successfully implemented 2 months post-recommendation. Additionally, previous research on the usability of eCDSS-assisted de-prescribing found that 32% of GPs reported not having implemented any recommendation [ 33 ]. Interestingly, there seemed to be a wide variability between different GPs in previous studies. For instance, researchers found that while some GPs implemented nearly all generated recommendations, others implemented few or none [ 32 ]. While there is limited data about this in our study due to the small sample size, our findings suggest variability between GPs with regards to the implementation of prescribing recommendations (with the mean number of recommendations implementing ranging from 0.3 to 2.3). Furthermore, previous research has shown that more experienced healthcare professionals were more likely to disregard and reject recommendations [ 46 ]. Of note, a low implementation rate based upon generated recommendations is not necessarily bad; GPs may have had valid reasons for not implementing recommendations (e.g., recommendation not being appropriate for the patient, etc.), and it is not expected that every single prescribing recommendation should be implemented. A critical review of prescribing recommendations generated by eCDSS by clinicians is always required, as these tools can support clinicians but not replace their clinical judgment.

The reasons for implementation problems reported in the literature were similar to what we found in our qualitative analysis [ 32 , 33 ]. First, the eCDSS did not capture all relevant patient-specific information, which is why some recommendations were not appropriate. This aligns with findings from the OPERAM trial, which had tested the STRIPA in hospital settings across four European countries and during which the medication review intervention was done during hospitalization [ 45 ]. Second, there were difficulties in implementing recommendations when prescribing decisions had been made by other medical specialists. Third, GPs’ or patients’ hesitancy toward medication changes can be major barriers to implementing recommendations. This is also reflected in the findings from the OPERAM trial, which found that the main reason for not implementing a recommendation was patients’ reluctance to change their medication use [ 45 , 47 ]. These challenges need to be considered when further developing eCDSS. Despite the potentially low immediate implementation of recommendations, research shows that the use of eCDSS can be a useful tool to start reflections and discussions about patient medication use [ 48 ]. Hence, eCDSS-based interventions can positively influence GPs’ prescribing behaviors, as GPs have reported an increased awareness of prescribing problems after using a CDSS [ 33 ].

Even though some GPs reported a learning effect when performing the medication review using the STRIPA, we retrospectively assume that an average of 6 medication reviews may not have been enough to benefit from this learning effect. Performing such a small number of medication reviews using the STRIPA may not have allowed GPs to incorporate the use of the tool in their workflow in an efficient manner. Fragmented workflows are a commonly reported problem linked to the use of eCDSS, as these tools are often designed without considering the human information processing and behaviors [ 46 ]. While providing assistance to participating GPs during the study intervention, our study team noticed that the computer literacy differed between participating GPs. We assume that this influenced the STRIPA use during the trial. Consequently, working on better integrating the use of the STRIPA into the routine clinical practice of GPs and adapting it to computer literacy levels of individual GPs may be crucial for a successful implementation of eCDSS in primary care settings.

Willingness to use eCDSS

Our findings showed that overall GPs would be willing to use eCDSS, such as the STRIPA, for medication review if the above-mentioned issues were addressed. This openness to using eCDSS is reflected in previous research [ 32 ]. In one study, 65% of respondents mentioned that they would be willing to use eCDSS in routine practice if the CDSS was integrated into their EHR system [ 33 ]. In addition to this, there would have to be minimal data entry so that the additional expenditure of time for using a tool would be as short as possible. Further, it is necessary that algorithms behind eCDSS must regularly be updated (e.g., with latest guidelines) [ 48 ]. Finally, our research clearly shows that simply providing new eCDSS to GPs is not sufficient and does not automatically translate into implementation of prescribing recommendations. GPs need to be supported with communication strategies on how to conduct shared decision-making with patients and strategies on how to overcome their own barriers to inappropriate prescribing.

Overall, qualitative findings suggest that GPs were dissatisfied with reoccurring problems when using the STRIPA (e.g., problems with data entry, generation of recommendations that GPs did not deem useful). Consequently, apart from solving technical issues and improving data imports, it will be crucial to work on presenting recommendations in a way that is perceived as useful by GPs. This is crucial, because instead of GPs focusing their energy on discarding non-useful recommendations, they should be able to focus on other potentially useful recommendations for prescribing decisions with older adults with multimorbidity and polypharmacy.

Need for interoperable electronic health record systems in Swiss primary care settings

Direct, fully automated imports from the physicians’ EHR systems into the STRIPA would not have been technically feasible due to the multiple different EHR software providers used in the Swiss German language region of Switzerland. It thus made sense to collaborate with the FIRE project, as this was the best available option operationalizing EHR data for a clinical trial with an eCDSS in Switzerland. This mixed-methods study, however, shows this approach’s limitations. This should be a wake-up call for Swiss software developers to implement industrial standards allowing different EHR systems to be compatible with one another (e.g., feed data from one software into another, combine data from different software). In the future, this would allow easier use of eCDSS, such as the STRIPA. In addition, efforts should be made to make the coding of ICPC-2 diagnoses more common in Swiss primary care settings. At the moment, diagnostic coding is not commonly done in routine care, which affects the feasibility of implementing tools like the STRIPA.

Increasingly digitalized healthcare systems and readily available health data will allow the widespread use of eCDSS in the future. However, digitalization alone will not provide a sufficient basis for eCDSS to be used efficiently. Clinical practice and research must address the shortcomings identified in our research and in previous studies. In particular, approaches need to be developed to better integrate eCDSS into clinical workflows in primary care settings. Furthermore, EHR systems must become more interoperable for eCDSS to be effectively integrated into clinical workflows, so that data from different sources can be used reliably. If these challenges are successfully addressed, eCDSS can become a useful tool supporting physicians in primary care settings for optimizing prescribing practices.

Strengths & limitations

The combined analyses of both quantitative and qualitative data allowed for better data triangulation and strengthened our findings. However, this mixed methods study has several limitations. First, since there were problems when generating PDF reports at the end of the STRIPA use, we had to retrospectively collect information on the prescribing recommendations by manually exporting them from the STRIPA. This came with the downside that we could only see which recommendations were generated, but not which ones had been accepted by GPs. This is why we had to rely on self-reported information from GPs regarding their acceptance of prescribing recommendations. Second, despite sending multiple reminders to GPs, we were faced with a small sample size and significant amount of missing quantitative data, as only 7 out of 21 GPs reported information about implementing prescribing recommendations, and only 8 out of 21 GPs agreed to be interviewed. Further, the sample of GPs mostly consisted of male GPs, which, in addition to the small sample size, could have limited the generalizability of findings. Next, we would like to acknowledge that the GPs who agreed to participate in the OPTICA trial and the qualitative interview were likely not representative of all GPs practicing in Swiss primary care settings. Finally, we did not consider patient perspectives on the conduct of the medication review intervention, which represents an important opportunity for future studies.

Overall, GPs found the STRIPA useful, particularly due to its ability to generate recommendations based on large amounts of data. During the OPTICA trial, however, general practitioners only discussed and implemented a fraction of the recommendations generated by the STRIPA. Issues related to the STRIPA’s usability, general practitioners’ high expectations about the tool’s functionalities, data management, and time expenditure involved with preparing the STRIPA for analysis were important barriers described during semi-structured interviews. The qualitative findings help explain the low acceptance and implementation rate of the recommendations. Due to a learning effect, a decline in the expenditure of time needed to perform medication reviews with the STRIPA would be expected if GPs continued to use this tool more regularly and with more patients. In its current form, it is unlikely that the STRIPA will be implemented more broadly. Our results, however, are crucial for designing and adapting eCDSS like STRIPA in a meaningful way to make them more feasible and acceptable to providers and more suitable for regular use in primary care settings on a larger scale, as this will become increasingly possible in the context of digitalized healthcare systems.

Data availability

We will make the data for this study available to other researchers upon request. The data will be made available for scientific research purposes, after the proposed analysis plan has been approved. Data and documentation will be made available through a secure file exchange platform after approval of the proposal. In addition, a data transfer agreement must be signed (which defines obligations that the data requester must adhere to regarding privacy and data handling). Deidentified participant data limited to the data used for the proposed project will be made available, along with a data dictionary and annotated case report forms. For data access, please contact the corresponding author.

Several GPs were male and 44 years old at the time of the interview.

Roig JJ, Souza D, Oliveras-Fabregas A, Minobes-Molina E, Cancela MdC, Galbany-Estragués P. Trends of multimorbidity in 15 European countries: a population-based study in community-dwelling adults aged 50 and over. Research Square; 2020.

Chowdhury SR, Chandra Das D, Sunna TC, Beyene J, Hossain A. Global and regional prevalence of multimorbidity in the adult population in community settings: a systematic review and meta-analysis. EClinicalMedicine. 2023;57:101860.

Article   PubMed   PubMed Central   Google Scholar  

Marengoni A, Angleman S, Melis R, Mangialasche F, Karp A, Garmen A, et al. Aging with multimorbidity: a systematic review of the literature. Ageing Res Rev. 2011;10(4):430–9.

Article   PubMed   Google Scholar  

Johnston MC, Crilly M, Black C, Prescott GJ, Mercer SW. Defining and measuring multimorbidity: a systematic review of systematic reviews. Eur J Public Health. 2019;29(1):182–9.

Masnoon N, Shakib S, Kalisch-Ellett L, Caughey GE. What is polypharmacy? A systematic review of definitions. BMC Geriatr. 2017;17(1):230.

Bazargan M, Smith JL, King EO. Potentially inappropriate medication use among hypertensive older African-American adults. BMC Geriatr. 2018;18(1):238.

Simões PA, Santiago LM, Maurício K, Simões JA. Prevalence of potentially inappropriate medication in the older Adult Population within Primary Care in Portugal: a nationwide cross-sectional study. Patient Prefer Adherence. 2019;13:1569–76.

Roux B, Sirois C, Simard M, Gagnon ME, Laroche ML. Potentially inappropriate medications in older adults: a population-based cohort study. Fam Pract. 2020;37(2):173–9.

PubMed   Google Scholar  

Nothelle SK, Sharma R, Oakes A, Jackson M, Segal JB. Factors associated with potentially inappropriate medication use in community-dwelling older adults in the United States: a systematic review. Int J Pharm Pract. 2019;27(5):408–23.

Kuijpers MA, van Marum RJ, Egberts AC, Jansen PA. Relationship between polypharmacy and underprescribing. Br J Clin Pharmacol. 2008;65(1):130–3.

Jungo KT, Streit S, Lauffenburger JC. Utilization and Spending on Potentially Inappropriate Medications by US Older Adults with Multiple Chronic Conditions using Multiple Medications. Arch Gerontol Geriatr. 2021;93:104326. https://doi.org/10.1016/j.archger.2020.104326 .

Xing XX, Zhu C, Liang HY, Wang K, Chu YQ, Zhao LB, et al. Associations between potentially inappropriate medications and adverse Health outcomes in the Elderly: a systematic review and Meta-analysis. Ann Pharmacother. 2019;53(10):1005–19.

Masumoto S, Sato M, Maeno T, Ichinohe Y, Maeno T. Potentially inappropriate medications with polypharmacy increase the risk of falls in older Japanese patients: 1-year prospective cohort study. Geriatr Gerontol Int. 2018;18(7):1064–70.

Koyama A, Steinman M, Ensrud K, Hillier TA, Yaffe K. Long-term cognitive and functional effects of potentially inappropriate medications in older women. The journals of gerontology Series A, Biological sciences and medical sciences. 2014;69(4):423–9.

Liew TM, Lee CS, Goh Shawn KL, Chang ZY. Potentially inappropriate prescribing among older persons: a Meta-analysis of Observational studies. Annals Family Med. 2019;17(3):257–66.

Article   Google Scholar  

Fabbietti P, Ruggiero C, Sganga F, Fusco S, Mammarella F, Barbini N, et al. Effects of hyperpolypharmacy and potentially inappropriate medications (PIMs) on functional decline in older patients discharged from acute care hospitals. Arch Gerontol Geriatr. 2018;77:158–62.

Hernandez G, Garin O, Dima AL, Pont A, Martí Pastor M, Alonso J, et al. EuroQol (EQ-5D-5L) validity in assessing the quality of life in adults with Asthma: cross-sectional study. J Med Internet Res. 2019;21(1):e10178.

Huibers CJA, Sallevelt BTGM, de Groot DA, Boer MJ, van Campen JPCM, Davids CJ, et al. Conversion of STOPP/START version 2 into coded algorithms for software implementation: a multidisciplinary consensus procedure. Int J Med Informatics. 2019;125:110–7.

O’Mahony D, O’Sullivan D, Byrne S, O’Connor MN, Ryan C, Gallagher P. STOPP/START criteria for potentially inappropriate prescribing in older people: version 2. Age Ageing. 2015;44(2):213–8.

Alshammari H, Al-Saeed E, Ahmed Z, Aslanpour Z. Reviewing potentially inappropriate medication in hospitalized patients over 65 using Explicit Criteria: a systematic literature review. Drug Healthc Patient Saf. 2021;13:183–210.

Drenth-van Maanen AC, Leendertse AJ, Jansen PAF, Knol W, Keijsers C, Meulendijk MC, et al. The systematic Tool to reduce Inappropriate Prescribing (STRIP): combining implicit and explicit prescribing tools to improve appropriate prescribing. J Eval Clin Pract. 2018;24(2):317–22.

Adam L, Moutzouri E, Baumgartner C, Loewe AL, Feller M, M’Rabet-Bensalah K, et al. Rationale and design of OPtimising thERapy to prevent avoidable hospital admissions in Multimorbid older people (OPERAM): a cluster randomised controlled trial. BMJ Open. 2019;9(6):e026769.

PubMed   PubMed Central   Google Scholar  

Blum MR, Sallevelt BTGM, Spinewine A, O’Mahony D, Moutzouri E, Feller M, et al. Optimizing therapy to prevent Avoidable Hospital admissions in Multimorbid older adults (OPERAM): cluster randomised controlled trial. BMJ. 2021;374:n1585.

Jungo KT, Rozsnyai Z, Mantelli S, Floriani C, Löwe AL, Lindemann F, et al. Optimising PharmacoTherapy in the multimorbid elderly in primary CAre’ (OPTICA) to improve medication appropriateness: study protocol of a cluster randomised controlled trial. BMJ open. 2019;9(9):e031080.

Jungo KT, Meier R, Valeri F, Schwab N, Schneider C, Reeve E, et al. Baseline characteristics and comparability of older multimorbid patients with polypharmacy and general practitioners participating in a randomized controlled primary care trial. BMC Fam Pract. 2021;22(1):123.

Jungo KT, Ansorg AK, Floriani C, Rozsnyai Z, Schwab N, Meier R, et al. Optimising prescribing in older adults with multimorbidity and polypharmacy in primary care (OPTICA): cluster randomised clinical trial. BMJ. 2023;381:e074054.

Jia P, Zhang L, Chen J, Zhao P, Zhang M. The effects of clinical decision support systems on Medication Safety: an overview. PLoS ONE. 2016;11(12):e0167683–e.

Reis WC, Bonetti AF, Bottacin WE, Reis AS Jr., Souza TT, Pontarolo R, et al. Impact on process results of clinical decision support systems (CDSSs) applied to medication use: overview of systematic reviews. Pharm Pract. 2017;15(4):1036.

Google Scholar  

Monteiro L, Maricoto T, Solha I, Ribeiro-Vaz I, Martins C, Monteiro-Soares M. Reducing potentially inappropriate prescriptions for older patients using computerized decision support tools: systematic review. J Med Internet Res. 2019;21(11):e15385.

Scott IA, Pillans PI, Barras M, Morris C. Using EMR-enabled computerized decision support systems to reduce prescribing of potentially inappropriate medications: a narrative review. Therapeutic Adv drug Saf. 2018;9(9):559–73.

Bryan C, Boren SA. The use and effectiveness of electronic clinical decision support tools in the ambulatory/primary care setting: a systematic review of the literature. Inform Prim Care. 2008;16(2):79–91.

Rieckert A, Sommerauer C, Krumeich A, Sönnichsen A. Reduction of inappropriate medication in older populations by electronic decision support (the PRIMA-eDS study): a qualitative study of practical implementation in primary care. BMC Fam Pract. 2018;19(1):110.

Rieckert A, Teichmann AL, Drewelow E, Kriechmayr C, Piccoliori G, Woodham A, et al. Reduction of inappropriate medication in older populations by electronic decision support (the PRIMA-eDS project): a survey of general practitioners’ experiences. J Am Med Inf Association: JAMIA. 2019;26(11):1323–32.

Bell H, Garfield S, Khosla S, Patel C, Franklin BD. Mixed methods study of medication-related decision support alerts experienced during electronic prescribing for inpatients at an English hospital. Eur J Hosp Pharmacy: Sci Pract. 2019;26(6):318–22.

Crowley EK, Sallevelt B, Huibers CJA, Murphy KD, Spruit M, Shen Z, et al. Intervention protocol: OPtimising thERapy to prevent avoidable hospital admission in the multi-morbid elderly (OPERAM): a structured medication review with support of a computerised decision support system. BMC Health Serv Res. 2020;20(1):220.

Chmiel C, Bhend H, Senn O, Zoller M, Rosemann T. The FIRE project: a milestone for research in primary care in Switzerland. Swiss Med Wkly. 2011;140:w13142.

Creswell J, Plano Clark V. Designing and conducting mixed methods research. Los Angeles: SAGE; 2011.

Phillips WR, Sturgiss E, Glasziou P, Hartman TCo, Orkin AM, Prathivadi P et al. Improving the Reporting of Primary Care Research: Consensus Reporting Items for Studies in Primary Care—the CRISP Statement. The Annals of Family Medicine. 2023:3029.

StataCorp. Stata Statistical Software: Release 17 College Station. TX: StataCorp LLC; 2021.

Braun V, Clarke V. Using thematic analysis in psychology. Qualitative Res Psychol. 2006;3(2):77–101.

Gale NK, Heath G, Cameron E, Rashid S, Redwood S. Using the framework method for the analysis of qualitative data in multi-disciplinary health research. BMC Med Res Methodol. 2013;13(1):117.

Weinstein M. TAMS Analyzer 4.0 [Computer software] 2010 [Available from: https://tamsys.sourceforge.io/osxtams/docs/basic/TA%20User%20Guide.pdf].

Meulendijk MC, Spruit MR, Willeboordse F, Numans ME, Brinkkemper S, Knol W, et al. Efficiency of clinical decision support systems improves with experience. J Med Syst. 2016;40(4):76.

O’Mahony D, Gudmundsson A, Soiza RL, Petrovic M, Jose Cruz-Jentoft A, Cherubini A, et al. Prevention of adverse drug reactions in hospitalized older patients with multi-morbidity and polypharmacy: the SENATOR* randomized controlled clinical trial. Age Ageing. 2020;49(4):605–14.

Sallevelt BTGM, Huibers CJA, Heij JMJO, Egberts TCG, van Puijenbroek EP, Shen Z, et al. Frequency and Acceptance of clinical decision support system-generated STOPP/START signals for hospitalised older patients with polypharmacy and Multimorbidity. Drugs Aging. 2022;39(1):59–73.

Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. Npj Digit Med. 2020;3(1):17.

Huibers CJA, Sallevelt B, Heij J, O’Mahony D, Rodondi N, Dalleur O, et al. Hospital physicians’ and older patients’ agreement with individualised STOPP/START-based medication optimisation recommendations in a clinical trial setting. Eur Geriatr Med. 2022;13(3):541–52.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Peiris DP, Joshi R, Webster RJ, Groenestein P, Usherwood TP, Heeley E, et al. An electronic clinical decision support tool to assist primary care providers in cardiovascular disease risk management: development and mixed methods evaluation. J Med Internet Res. 2009;11(4):e51.

Download references


The authors would like to thank the general practitioners participating in the OPTICA trial for participating in this research, in particular those who were in the intervention group and provided the information for this implementation evaluation. Thanks go to the CTU Bern for their support in conducting the OPTICA trial. KTJ is funded by a Postdoc.Mobility Fellowship from the Swiss National Science Foundation (P500PM_206728). KTJ was a member of the Junior Investigator Intensive Program of the US Deprescribing Research Network, which is funded by the National Institute on Aging (R24AG064025).

This work was funded by the Swiss National Science Foundation, within the framework of the National Research Programme 74 (NRP74) under contract number 407440_167465 (to SS and NR).

Author information

Authors and affiliations.

Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland

Katharina Tabea Jungo, Fabian Schalbetter, Jeanne Moor, Martin Feller, Renata Vidonscky Lüthold, Nicolas Rodondi & Sven Streit

Institute of Sociological Research, University of Geneva, Geneva, Switzerland

Michael J. Deml

Department of General Internal Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland

Jeanne Moor & Nicolas Rodondi

Geriatrics, Department of Geriatric Medicine, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands

Johanna Alida Corlina Huibers

Department of Clinical Pharmacy, University Medical Center Utrecht, Utrecht, Utrecht, The Netherlands

Bastiaan Theodoor Gerard Marie Sallevelt

Public Health and Primary Care (PHEG), Leiden University Medical Center, Leiden University, Leiden, Netherlands

Michiel C Meulendijk & Marco Spruit

Leiden Institute of Advanced Computer Science (LIACS), Faculty of Science, Leiden University, Leiden, Netherlands

Marco Spruit

Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands

Health Economics Facility, Department of Public Health, University of Basel, Basel, Switzerland

Matthias Schwenkglenks

Institute of Pharmaceutical Medicine (ECPM), University of Basel, Basel, Switzerland

Epidemiology, Biostatistics and Prevention Institute (EBPI), University of Zurich, Zurich, Switzerland

Graduate School for Health Sciences, University of Bern, Bern, Switzerland

Renata Vidonscky Lüthold

Center for Healthcare Delivery Sciences (C4HDS), Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, United States

Katharina Tabea Jungo

You can also search for this author in PubMed   Google Scholar


KTJ, MJD, and SS designed the mixed-methods implementation study. KTJ and FS acquired the qualitative data. KTJ, FS, and MJD analyzed the qualitative data. KTJ, MSp, MSch, NR, SS acquired the quantitative data. KTJ analyzed the quantitative data. KTJ drafted the first draft of the manuscript with help from MJD and SS. All authors (KTJ, MJD, FS, JM, MF, RL, CJAH, BTGMS, MCM, MSp, MSch, NR, SS) reviewed and edited the manuscript and approved the final version.

Corresponding author

Correspondence to Katharina Tabea Jungo .

Ethics declarations

Ethical approval and consent to participate.

The ethics committee of the canton of Bern (Switzerland) and the Swiss regulatory authority (Swissmedic) approved the study protocol of the OPTICA trial (BASEC ID: 2018–00914) including the conduct of this mixed-methods evaluation. All study participants provided informed consent to participate in the trial. All methods were performed in accordance with the relevant guidelines and regulations (e.g., declaration of Helsinki).

Consent for publication

Not required.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Jungo, K.T., Deml, M.J., Schalbetter, F. et al. A mixed methods analysis of the medication review intervention centered around the use of the ‘Systematic Tool to Reduce Inappropriate Prescribing’ Assistant (STRIPA) in Swiss primary care practices. BMC Health Serv Res 24 , 350 (2024). https://doi.org/10.1186/s12913-024-10773-y

Download citation

Received : 15 August 2023

Accepted : 23 February 2024

Published : 18 March 2024

DOI : https://doi.org/10.1186/s12913-024-10773-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Multimorbidity
  • Polypharmacy
  • Primary care
  • Medication optimization
  • Electronic clinical decision support system
  • Mixed methods research

BMC Health Services Research

ISSN: 1472-6963

research articles using qualitative methods

This paper is in the following e-collection/theme issue:

Published on 18.3.2024 in Vol 26 (2024)

Outcomes and Costs of the Transition From a Paper-Based Immunization System to a Digital Immunization System in Vietnam: Mixed Methods Study

Authors of this article:

Author Orcid Image

Original Paper

  • Thi Thanh Huyen Dang 1 , MD, PhD   ; 
  • Emily Carnahan 2 , MPH   ; 
  • Linh Nguyen 3 , MPH   ; 
  • Mercy Mvundura 2 , PhD   ; 
  • Sang Dao 3 , MPH   ; 
  • Thi Hong Duong 1 , MD, PhD   ; 
  • Trung Nguyen 1 , MPH   ; 
  • Doan Nguyen 1 , MD   ; 
  • Tu Nguyen 3 , MSc   ; 
  • Laurie Werner 2 , MPA   ; 
  • Tove K Ryman 4 , MPH, PhD   ; 
  • Nga Nguyen 3 , MD, PhD  

1 National Expanded Program on Immunization, National Institute of Hygiene and Epidemiology, Hanoi, Vietnam

2 PATH, Seattle, WA, United States

3 PATH, Hanoi, Vietnam

4 Bill & Melinda Gates Foundation, Seattle, WA, United States

Corresponding Author:

Nga Nguyen, MD, PhD

1101, 11th floor, Hanoi Towers

49 Hai Ba Trung Street

Hanoi, 100000

Phone: 84 243936221 ext 130

Email: [email protected]

Background: The electronic National Immunization Information System (NIIS) was introduced nationwide in Vietnam in 2017. Health workers were expected to use the NIIS alongside the legacy paper-based system. Starting in 2018, Hanoi and Son La provinces transitioned to paperless reporting. Interventions to support this transition included data guidelines and training, internet-based data review meetings, and additional supportive supervision visits.

Objective: This study aims to assess (1) changes in NIIS data quality and use, (2) changes in immunization program outcomes, and (3) the economic costs of using the NIIS versus the traditional paper system.

Methods: This mixed methods study took place in Hanoi and Son La provinces. It aimed to analyses pre- and postintervention data from various sources including the NIIS; household and health facility surveys; and interviews to measure NIIS data quality, data use, and immunization program outcomes. Financial data were collected at the national, provincial, district, and health facility levels through record review and interviews. An activity-based costing approach was conducted from a health system perspective.

Results: NIIS data timeliness significantly improved from pre- to postintervention in both provinces. For example, the mean number of days from birth date to NIIS registration before and after intervention dropped from 18.6 (SD 65.5) to 5.7 (SD 31.4) days in Hanoi ( P <.001) and from 36.1 (SD 94.2) to 11.7 (40.1) days in Son La ( P <.001). Data from Son La showed that the completeness and accuracy improved, while Hanoi exhibited mixed results, possibly influenced by the COVID-19 pandemic. Data use improved; at postintervention, 100% (667/667) of facilities in both provinces used NIIS data for activities beyond monthly reporting compared with 34.8% (202/580) in Hanoi and 29.4% (55/187) in Son La at preintervention. Across nearly all antigens, the percentage of children who received the vaccine on time was higher in the postintervention cohort compared with the preintervention cohort. Up-front costs associated with developing and deploying the NIIS were estimated at US $0.48 per child in the study provinces. The commune health center level showed cost savings from changing from the paper system to the NIIS, mainly driven by human resource time savings. At the administrative level, incremental costs resulted from changing from the paper system to the NIIS, as some costs increased, such as labor costs for supportive supervision and additional capital costs for equipment associated with the NIIS.

Conclusions: The Hanoi and Son La provinces successfully transitioned to paperless reporting while maintaining or improving NIIS data quality and data use. However, improvements in data quality were not associated with improvements in the immunization program outcomes in both provinces. The COVID-19 pandemic likely had a negative influence on immunization program outcomes, particularly in Hanoi. These improvements entail up-front financial costs.


Since 2017, the National Immunization Information System (NIIS) in Vietnam has been used nationwide by immunization facilities from the national, provincial, district, and commune levels to capture, store, and access immunization data [ 1 ]. The NIIS is a digital system that includes an immunization registry comprising individual-level, longitudinal information on vaccine doses administered and a logistics management system for vaccines and related supplies. As of June 2022, the NIIS has recorded data from approximately 31 million clients and has been used across 15,000 immunization facilities.

Although there is a growing body of evidence on the outcomes associated with digital systems to support vaccine service delivery in low- and middle-income countries [ 2 - 9 ], there are still many questions regarding how they affect data quality, use, and vaccination outcomes in practice. A challenge that has emerged in multiple country contexts is the parallel or dual reporting required when digital systems are introduced and health workers are expected to continue to use the legacy paper-based forms in addition to the new digital system. For example, Tanzanian facilities that had transitioned entirely to using an electronic immunization registry had higher odds of system use compared with those maintaining parallel electronic and paper-based systems [ 10 ].

Moreover, there is limited evidence on the costs of development and implementation of the system, recurrent costs of the system, and cost implications of eliminating a parallel paper system at the service delivery level [ 11 , 12 ]. The Vietnamese experience of introducing and scaling digital tools for immunization presents an opportunity to help fill these evidence gaps.

History of the NIIS

Before the introduction of the NIIS, a paper-based immunization registry and vaccine stock management system were used. Health workers captured data on paper forms, which were compiled in a monthly report. The paper-based system created a significant workload for health workers, and the immunization data were often delayed or incomplete, limiting the availability of reliable data at the higher levels of the health system [ 13 ].

From 2009 to 2012, the National Expanded Program on Immunization (NEPI), in collaboration with PATH and the World Health Organization, developed and piloted a logistics management tracking system for vaccines and related supplies (VaxTrak) and an immunization registry (ImmReg) at the commune and district levels. After the pilot phase, ImmReg expanded to all districts in the pilot province, and VaxTrak was scaled nationwide. An evaluation of ImmReg in 2015 showed that the system was highly accepted by health workers and improved vaccine coverage and timeliness [ 6 ]. In 2014 and 2015, NEPI and PATH integrated ImmReg and VaxTrak into a single comprehensive software system. From 2016 to 2018, the Vietnam General Department of Preventive Medicine and PATH developed the NIIS based on the pilot software and scaled it nationwide [ 1 ].

Transition to a Paperless System

Beginning in 2018, the government of Vietnam collaborated with PATH to provide technical support to strengthen NIIS implementation and transition to a paperless system in 2 provinces: Hanoi and Son La. This work was funded by the Bill & Melinda Gates Foundation and was implemented from 2018 to 2022.

We hypothesized that transitioning to a paperless system would improve data quality and data use in the intervention areas. We hypothesized that if data quality and data use improved, these changes could lead to improvements in immunization program outcomes. The objective of this study was to examine the short-term outcomes and costs associated with the NIIS and the transition to paperless reporting, focusing on three main categories:

  • What are the changes in data quality and data use because to the paperless transition interventions?
  • What are the changes in immunization program outcomes (timeliness, dropout rates, and coverage) because of the paperless transition interventions?
  • What are the incremental financial costs associated with developing, deploying, and maintaining the NIIS, including the economic cost implications of transitioning to paperless reporting?

This mixed methods study aimed to evaluate the short-term outcomes and costs associated with the transition to a paperless system. Mixed methods were used to quantify the observed changes in outcomes and costs and to qualitatively describe why and how changes occurred. Pre-post analyses were conducted to understand the short-term changes in data quality, data use, and immunization program outcomes. Financial data were extracted from the project and partners’ records, and interviews were conducted at various levels of the health system to inform the cost analysis. This study was conducted from July 2019 to November 2021.

This study was conducted in 2 intervention provinces, Hanoi and Son La. These provinces were selected for the transition to a paperless system because they have a variety of geographic, demographic, and health system characteristics that may influence digital readiness. Hanoi, the capital city, is primarily urban with a high population density, high immigration rate, good infrastructure, and many private-sector and fee-based facilities. In contrast, Son La is a mountainous border province primarily composed of rural districts and has low population density, limited resources, and fee-based facilities.

Each province provides immunizations in public district health centers, hospitals, and commune health centers (CHCs) as well as private fee-based immunization facilities (FIFs). Primary data collection for this study also occurred at the national level, for example, to capture financial costs from the project and partners related to the development of the NIIS.

The transition to a paperless system and overall immunization activities in Vietnam were impacted by the COVID-19 pandemic, starting in 2020. The government mandated social distancing lockdowns multiple times in each province, which meant that individuals were not allowed to leave their home without special authorization, and nonessential businesses were closed. Immunization services were disrupted and, in some cases, unavailable, as health care workers were occupied with the COVID-19 response.

Multimedia Appendix 1 includes an overview of the characteristics of the study provinces (Table S1 in Multimedia Appendix 1 ) and dates when COVID-19 social distancing was applied in each province (Table S2 in Multimedia Appendix 1 ).


A technical working group composed of the Ministry of Health, NEPI, PATH, and Viettel (the NIIS developer) oversees the implementation of the NIIS. A readiness assessment was conducted from June to July 2019 to provide the technical working group with information about the progress, needs, and challenges of transitioning to a fully paperless immunization system [ 14 ]. NEPI and PATH designed interventions (summarized in Textbox 1 ) to support the transition to paperless reporting based on the readiness assessment results. Interventions included detailed implementation guidelines and standard operating procedures for the transition to paperless reporting, internet-based data review meetings, additional supportive supervision visits, and Zalo (Vietnam’s popular social media and chat app) groups for end users to exchange knowledge and experiences in NIIS use.

Key interventions

  • Guidelines and training on the shift to paperless reporting : implementation guidelines and standard operating procedures for the transition to paperless were implemented through a training of trainers and cascaded training approach for health workers.
  • Data quality and data use guidelines and training for health workers at the province, district, and commune health center levels.
  • Internet-based data review meetings at the district level where all communes share progress on paperless transition. Challenges identified through these meetings were used to prioritize areas for support. Initially, these meetings were held monthly but later shifted to quarterly.
  • Additional supportive supervision visits from the government and PATH at district and commune facilities to support data quality, data use, and the overall transition to paperless. During the COVID-19 pandemic, these shifted to internet-based supportive supervision visits. Internet-based supportive supervision guidelines and trainings were developed for districts and provinces.
  • Zalo groups for end users (at least 1 National Immunization Information System [NIIS] user per facility) in each district to exchange knowledge and experiences in NIIS use. Zalo is a popular social media and chat app in Vietnam.

Facilities began transitioning to paperless reporting in November 2019 in Son La and in January 2020 in Hanoi. All facilities in both provinces have retired from the paper-based immunization management logbook and have completely transitioned to paperless reporting using the NIIS as of January 2020. Although all sites are officially reporting using the NIIS, some paper-based systems are still used to comply with various inspections, payment procedures, or requirements from other ministries.

In addition to the key interventions ( Textbox 1 ), an e-learning system and e-immunization cards were piloted at a smaller scale. The e-learning system was developed and piloted in 6 districts (in the 2 provinces) to train managers at the national, regional, provincial, and district levels and facility health workers (CHCs, FIFs, and hospitals) on using the NIIS. The e-immunization card, a mobile phone app that allows parents or clients to access their individual demographic information and vaccination data, was also developed and launched in the 2 provinces.

Study Design and Data Collection

Various methods were used to collect data related to each of the study aims (data quality and use, immunization performance, and costing) in the 2 provinces. This section describes the study design, data sources, and data collection approach for each study objective. Table S3 in Multimedia Appendix 1 summarizes the data collection methods across the 3 study aims.

Data Quality and Use

A pre- or postintervention study design was used to assess changes in data quality and use. Data collection included self-administered facility assessments, household surveys, and facility surveys conducted at a sample of CHCs, FIFs, and hospitals; details of the methodology have been published elsewhere [ 14 ]. The same methods were used for pre- and postintervention data collection, with the addition of in-depth interviews with Expanded Program on Immunization (EPI) officers at the district and commune levels at postintervention.

The self-administered facility assessment was sent via email to all immunization facilities to collect basic information regarding infrastructure, capacity, and NIIS data use.

Facility surveys were conducted on a purposively selected sample of districts, communes, hospitals, and FIFs in each province. Purposive sampling was performed in consultation with NEPI and provincial Centers for Disease Control and Prevention (CDC) to select a mix of facility types (fee based, private, and public), geographies (urban, semiurban, rural, and mountainous), and experiences with NIIS. The study was designed to survey the same facilities at pre- and postintervention. At preintervention, 8 FIFs and 7 hospitals were included; at postintervention, 5 FIFs and 7 hospitals were included (the change was owing to 3 FIFs that had closed by the time of the postintervention survey). In each facility, 20 clients in the paper logbook were randomly selected. The facility surveys captured structured information about the facility, the use of the NIIS, and demographic and immunization information for the 20 sampled clients.

A household survey was conducted in a sample of households with children aged <2 years to capture demographic and immunization information about the children from their home-based immunization cards. The household survey was conducted in the same purposively selected communes as the facility surveys. Within each commune, villages or living quarters were selected for convenience, and all households with children in the defined age range were included.

The NEPI and PATH staff trained data collectors who used KoboToolbox platform for data entry for all data collection forms. More details on the sampling approach and a full list of facilities included in the different evaluations conducted as part of this study are included in Table S4 in Multimedia Appendix 1 .

Structured interviews were conducted to understand the factors related to data quality and data use. CDC staff led the interviews with immunization managers at the district level and health workers at the commune level. Interviews involving NEPI and CDC personnel were performed by the PATH staff. Interviews were conducted over the phone and in person at the interviewees’ workplaces, lasting around 35 minutes. The interviews were recorded with consent and transcribed.

Immunization Program Outcomes

A pre- or postintervention study design was used to assess changes in immunization program outcomes in the 2 provinces using NIIS data. NIIS data were exported for a pre- and postintervention cohort of children in each province. Due to the differing commencement dates of paperless reporting in 2 project provinces was different, Son La began in November 2019, while Hanoi followed in Jan 2020. As a result, the preintervention cohort group comprises children born between July 1 and September 30, 2018, while the postintervention group comprises those born between July 1 and September 30, 2020. Each child’s immunization information was analyzed for their first 12 months of life.

The costing study used a mixed methods approach including primary data collection using a microcosting approach and secondary data collection from financial record reviews. The costs of all activities were estimated from the perspectives of the implementing organizations (Hanoi and Son La provinces for NIIS systems use and Viettel, PATH, and NEPI) and hence take the health system perspective. No client costs were included.

We captured the costs for the different activities of implementing the NIIS from the software design and development activities to the deployment, accounting for the costs of the different partners engaged in this process. For the partner leading the software design, development, and deployment, we include the costs of the infrastructure, server, bandwidth, technical support, help desk, training, and maintenance and operations of the system. For the NEPI and the subnational levels, the costs included those for development of training materials, conducting staff training, data entry, meetings, and internet setup, at facilities where it was needed.

Data on the incremental financial costs for designing, developing, and deploying the NIIS software were obtained through the NEPI and partner organization expenditure records review. We also obtained information from each EPI administration and health facility in the sample on the expenditures for NIIS-related training and meetings and other deployment costs, including costs for data back entry and internet or phone setup at facilities.

Data on the recurrent financial costs of the NIIS were obtained via interviews with the head of each facility or person in charge of the facility finances in each study facility in the 2 provinces. The NEPI provided records on hardware inventory and repair requests, which were used to estimate replacement rates and maintenance costs for equipment.

Data Analysis

This section describes the variables and data analysis approach for each study objective.

For qualitative data analysis, we first transcribed all the interviews. Then, a member of the research team, who was trained on qualitative data methods, was assigned to code the transcripts using Microsoft Excel. This coding process followed a content analysis approach and involved a 3-level coding process: initially, open coding was applied to 5 transcripts to identify major themes; subsequently, the research team held discussions to reach a consensus on the major themes and any emergent themes; and finally, a final codebook was created before coding the remaining transcripts. This approach helped to gain a comprehensive understanding of health workers’ perspectives on improving data quality and data use from the NIIS as well as to identify the barriers and facilitators of immunization coverage.

In the analysis of quantitative data, we used Stata (version 14; StataCorp) as our statistical tool. For categorical variables, we used the chi-square test, and in cases where the expected cell counts were <5, Fisher exact test was used. For continuous variables, we first checked the variable distribution, and given the absence of a normal distribution, the Wilcoxon-Mann-Whitney U test was used to compare group differences. In addition, to investigate the relationship between immunization outcomes and its determinants, we conducted a multivariable logistic regression analysis. The threshold for statistical significance was set at P <.05.

The data use outcome of interest was the percentage of facilities using NIIS data to inform specified routine activities (eg, making monthly vaccination plans).

The data quality outcomes of interest were the quantitative measures of timeliness, completeness, and accuracy. Information from the household and health facility surveys was compared with the NIIS data to assess data quality. Refer to Table S5 in Multimedia Appendix 1 for the definitions of the data quality indicators.

Data exported from the NIIS were cleaned. Duplicate records, unreliable data (eg, vaccination date before birth date), and records with “lost to follow-up” status were excluded. The primary outcome of interest was on-time vaccination, determined by the recommended age for vaccine delivery according to the NEPI vaccination schedule [ 15 ]. The secondary immunization program outcomes of interest were dropout rates and full vaccination coverage. Multimedia Appendix 1 provides details on NIIS data cleaning and definitions for the primary and secondary outcomes of interest (Table S6 in Multimedia Appendix 1 ).

Quantitative data were analyzed using Stata (version 14). We computed the total incremental costs per health system level associated with the NIIS implementation. We also estimated the cost per child for the NIIS implementation activities. For this analysis, we allocated a proportion of the costs for NIIS implementation to the 2 provinces according to their annual birth cohort size relative to the national birth cohort. The annual birth cohort for Vietnam in 2019 was approximately 1.5 million [ 16 ], whereas the 2 study provinces (Hanoi and Son La) had an annual birth cohort of 162,000 and 25,000, respectively, based on data from the NIIS, representing approximately 12% (187,000/1,500,000) of the annual birth cohort of Vietnam. The NIIS implementation costs were spread over 5 birth cohorts in these cost-per-child calculations, as the system implementation was done over the 5-year period.

To estimate the economic costs associated with service delivery and reporting using either the paper-based system, electronic system, or both, we used an activity-based costing approach. Ingredients or components of the activities were quantified for each resource type, including human resource time use for different immunization activities where there would be a change when using the NIIS versus the paper system, capital costs for equipment and supplies, and recurrent costs for internet connectivity and equipment maintenance attributable to using the NIIS or the paper system. The quantification was done by conducting interviews at the study facilities using structured costing questionnaires. The unit cost of each resource was obtained from secondary data sources, and the total costs for each activity by resource type were estimated. To obtain the total costs, we aggregated the costs for each activity by resource type. We estimated the recurrent costs per facility and per child, with the latter based on only 1 birth cohort as these are annual recurrent costs.

Ethical Considerations

This study served as the end-line evaluation activity within the project’s work plan. The project was a collaborative effort between the NEPI and PATH. This evaluation constitutes 1 of the project activities outlined in the project documents submitted to the Vietnam Ministry of Health. As per the regulations set forth by the Vietnamese government, the project documents were reviewed and certified by units within the Ministry of Health and other relevant ministries before receiving approval from the Ministry of Health. The study was reviewed, considered as project evaluation, and approved by the Vietnam Ministry of Health (decision 1996/QĐ-BYT), and this study does not require ethics review in accordance with the circular 04/TT-BYT issued by the Vietnam Ministry of Health [ 17 ]. This circular [ 17 ] regulates the establishment, functions, tasks, and rights of research ethics committees, and it is specified that only research involving human subjects necessitates research ethics committee approval before implementation and supervision during the research process. In addition, the study was reviewed by PATH’s US-based Research Determination Committee, which concluded that the activity did not involve “human subjects” as defined in the US Government 45 Code of Federal Regulations 46.102(e) [ 18 ] and did not require US ethics review.

In addition, before conducting the interviews, comprehensive information was provided to all participants, encompassing the study’s objectives, participant rights, and strict confidentiality measures applied to protect their personal information. Informed consent was diligently obtained from each participant before starting the interviews. Their consent to participate in the study was obtained before proceeding with the interviews. Each qualitative interviewee received VN ₫150.000 (US $6.50) as payment for their time. In facilities selected for the costing study, health facility staff received VN ₫400.000 (US $17) for their time participating in structured costing interviews. Interviews were conducted in private settings to ensure confidentiality. All information was coded and only accessible to the study team, and data privacy was emphasized during training of the data collection team. The NEPI provided official permission for the use of the data extracted from the NIIS. All identifying data were coded, and names were eliminated before data analysis.

Data quality and use were measured through the NIIS data export, household surveys, and facility surveys and further explained through qualitative interviews.

Data Quality

The NIIS data quality evaluation considered the attributes of timeliness, completeness, and accuracy.

On the basis of the data exported from the NIIS, timeliness significantly improved from pre- to postintervention for all indicators across all health system levels in both provinces ( Table 1 ). Between pre- and postintervention, there was a significant decrease in the mean number of days from birth date to NIIS registration (Hanoi: 18.6, SD 65.5 to 5.7, SD: 31.4 days; P <.001 and Son La: 36.1, SD 94.2 to 11.7, SD 40.1; P <.001). Across all health system levels (CHCs, FIFs, and hospitals), there were significant decreases in the mean number of days from the injection date to when the injection was updated in the NIIS. Stock transactions (only assessed at the CHC level) also showed a significant decrease in the mean number of days from stock arrival date to NIIS voucher date (Hanoi: 10.5, SD 36.1 to 5.2, SD: 19.8 days; P <.001 and Son La 13.4, SD 38.1 to 6.5, SD 23.5; P <.001).

a N/A: not applicable.

b NIIS: National Immunization Information System.

c Dependent variables were not normally distributed; therefore, the Wilcoxon-Mann-Whitney U test was used.


Completeness was assessed by comparing information from the household and facility surveys with the information from the NIIS ( Table 2 ). At the CHC level, completeness of registration, client information, and injection information captured in the NIIS significantly improved between pre- and postintervention in Hanoi and Son La. At the FIF level in Hanoi, there was a decline in the percentage of clients registered in the NIIS; however, among those registered, there was an increase in the completeness of client information. At the FIF level in Son La, there was increased completeness of registration, client information, and immunization information. At the hospital level, there was an improvement in the percentage of clients registered in the NIIS and a decline in the completeness of client information in both provinces, and the completeness of immunization information remained unchanged at 100% (Hanoi: preintervention n=63, postintervention n=153; Son La: preintervention n=41, postintervention n=74) in both provinces at pre- and postintervention.

c HH: household.

d FIF: fee-based immunization facility.

The accuracy of demographic and immunization information was assessed by comparing information on clients’ personal immunization cards (captured through the household survey) with their information entered in the NIIS. The percentage of clients with demographic information matched between the 2 sources significantly increased in Hanoi (199/217, 91.7% to 340/353, 96.3%; P =.02) and Son La (101/119, 84.9% to 107/107, 100%; P <.001) from pre- to postintervention ( Table 3 ). The percentage of injections with immunization information matched between the 2 sources also significantly increased in Son La (1037/1216, 85.3% to 1097/1097, 100%; P <.001) but significantly decreased in Hanoi (2147/2188, 98.1% to 3786/3981, 95.1%; P =.01).

The interviews indicated that respondents at all levels (national, provincial, district, and facility levels) and across both provinces had a strong understanding of data quality, defined as timeliness, completeness, and accuracy. Most respondents (13/16, 81%) participated in the intervention training on data quality and data use and indicated that it was useful for their work and that they had applied what they learned:

I have participated in the training course on data quality and data use last year. After the training, my knowledge and skill in data quality and data use are better, so I applied to my daily work, I usually check the input data to make them complete and accurate before entering into the system. [CHC staff]

All respondents rated their facility’s data quality in the NIIS as “good” or “very good” in terms of timeliness, completeness, and accuracy. All respondents mentioned human resources as the most important factor associated with data quality, including health workers’ knowledge and skills (in data entry, analysis, quality assessment, and use), understanding the importance of data quality, and bandwidth to support the immunization program when working across health areas.

Data use was measured through facility assessments asking health workers about the activities that the NIIS data were used to inform. In the preintervention survey, 34.8% (202/580) of the facilities in Hanoi and 29.4% (55/187) of the facilities in Son La indicated that they had used the NIIS data for additional activities beyond monthly reporting. In the postintervention survey, 100% (Hanoi: 468/468 and Son La: 199/199) of the facilities in both provinces indicated using the NIIS data for activities beyond monthly reporting. Table S7 in Multimedia Appendix 1 shows the frequency by activity. At postintervention, the most common uses of data among facilities were to inform monthly vaccination plans, campaign plans, or annual immunization plans. At the management level, the most common use of the NIIS data was to evaluate the performance of health facilities. From the qualitative interviews, the most frequently mentioned obstacle to using the data was the lack of health workers’ capacity for data analysis and use.

On-time vaccination was the primary immunization performance outcome of interest. Secondary outcomes were dropout rates and vaccination coverage (refer to the Tables S8-12 in Multimedia Appendix 1 ).

Study Population Characteristics

Immunization outcomes were assessed by comparing the NIIS data for a cohort of children pre- and postintervention in the 2 provinces. After data cleaning, 81,485 children were included in the sample. Their population characteristics are summarized in Table S8 in Multimedia Appendix 1 . In Hanoi, there were small but significant differences in the ethnicity and rural and urban location of children in the pre- and postintervention cohorts. In Son La, there was also a significant difference in the ethnicity of children in the pre- and postintervention cohorts. In both provinces, there were significant differences in the percentage of children vaccinated primarily from FIFs versus CHCs at pre- and postintervention.

On-Time Vaccination by Antigen

Across all antigens, apart from the measles-containing vaccine first dose in Hanoi, the percentage of children who received the vaccine on time increased from pre- to postintervention, and nearly all increases were statistically significant ( Table 4 ).

a BCG: bacillus Calmette-Guérin.

b Penta: pentavalent.

c MCV1: measles-containing vaccine first dose.

In the multiple logistic regression analysis, children at postintervention were approximately 1.18 times as likely in Hanoi and 1.69 times as likely in Son La to receive timely administration of pentavalent (Penta) 3 compared with those in the preintervention cohort ( P <.001; Table 5 ). In Hanoi, there was no significant difference in timely Penta 3 vaccination by gender or ethnicity, but children in urban areas were 1.6 times as likely to receive Penta 3 on time compared with those in rural areas. In Son La, there was also no difference by gender, but Thai children were 1.3 times as likely to receive Penta 3 on time compared with other ethnicities. In both provinces, children who were mostly vaccinated from FIFs were more likely to receive Penta 3 on time compared with children mostly vaccinated from CHCs.

A separate logistic regression analysis for on-time full vaccination is included in Table S12 in Multimedia Appendix 1

On-Time Full Immunization Coverage

Figure 1 shows the full immunization coverage over time for each birth cohort. In Hanoi, 87.8% (29,649/33,752) of the children in the preintervention cohort and 81.9% (29,167/35,611) of the children in the postintervention cohort ( P <.001) had reached full immunization before their first birthday. In contrast, there was an increase in the percentage of children reaching full immunization before their first birthday in Son La, from 63.3% (3947/6233) preintervention to 88.3% (5040/5705) postintervention ( P <.001). The results for immunization coverage by antigen for the pre- and postintervention birth cohorts in each province are included Table S9 in Multimedia Appendix 1 .

research articles using qualitative methods

Up-Front Costs of NIIS Implementation

The up-front NIIS software design, development, and deployment took approximately 2300 person-months of labor from 2015 to 2020, and the partner costs for this labor were estimated at approximately US $1.75 million ( Table 6 ). Most of the software developer costs (US $1,233,152/US $1,745,712, 71%) pertained to the deployment of the system. In addition, there were up-front costs per facility for implementing the NIIS, including costs for back entry of data, internet setup (where needed), training of users, and meeting costs. At the national and provincial levels, the bulk of the up-front costs were spent on training and meetings. As trainings were paid for by the higher administrative levels, training costs are low or 0 at district levels and health facilities. At these levels, the larger cost share was for deploying the NIIS.

a NIIS: National Immunization Information System.

The up-front costs for the NIIS implementation allocated to the 2 study provinces were approximately US $419,000 ( Table 7 ). When these costs are allocated over 5 birth cohorts, the estimated cost per child for the NIIS implementation was estimated at US $0.48.

b NEPI: National Expanded Program on Immunization.

Recurrent Costs for the NIIS

The software developer estimated the annual recurrent costs for the system operation to be US $85,000. At health facilities, the average monthly economic cost for health worker labor for immunization-related activities done using the paper system were estimated to be US $146 ( Table 8 ). In comparison, the monthly cost for labor with the NIIS was US $67, which is less than half of the labor cost when using the paper system. The most significant savings in labor time costs, resulting from the transition from the paper system to the NIIS, occurred through reduced time spent on organizing immunization sessions, data management, and reporting. However, the NIIS also resulted in additional activities for staff, including checking of duplicates and introduction of e-immunization cards. The NIIS also added new facility costs, including recurrent costs for internet, printing, and SMS text messaging reminders and the capital costs of equipment, which amounts to an average cost of US $28 per facility. However, with the NIIS, there were savings in printing costs as registers and ledgers would not be printed, and the average costs of these were US $4.51 per month or US $58 per year. Overall, the total monthly costs per facility with NIIS (US $95) are lower than with the paper system (US $151).

Table 9 presents similar costs for the administrative levels. At most levels, except at provincial CDC, labor costs are lower with the NIIS than with the paper system. At the CDC, the labor costs for supportive supervision are the largest share of costs, and these costs increase when using the NIIS compared with when using the paper system. At the district health centers, there is a decline in labor costs for activities such as management and reporting when using the NIIS, and hence, labor costs are lower with the NIIS. As mentioned above, implementing the NIIS incurs additional costs, including recurrent costs for internet, printing, and equipment maintenance and capital costs for equipment, which makes the total monthly costs for NIIS more than the total monthly costs for the paper system. There are incremental costs resulting from the change from the paper system to the NIIS at the administrative levels.

b N/A: not applicable.

Although labor costs decrease at most health system levels with the transition to the NIIS, in practice, this may not translate into budget line savings as staff are retained and their time is reallocated. The financial costs associated with the paper system primarily entail printing registers, but these expenses are relatively minor in comparison with the those incurred with the electronic system. The estimated annual financial recurrent costs per child for the NIIS are US $3.17, and these account for the annual costs of the server (with the relevant portion allocated to the study provinces) and the costs for capital equipment and internet. There are incremental financial costs to the health system when moving from the legacy paper system to the electronic system.

Principal Findings

Health workers at the province, district, and CHC levels successfully transitioned from the legacy paper-based system to the NIIS for paperless reporting, proving that the transition was possible in 2 very different provincial contexts. Although the transition to paperless reporting results in lower labor costs at the facility, district, and national levels, it requires incremental financial recurrent costs (estimated at US $3.17 per child per year) to maintain the NIIS.

This study found improvements in data quality in Hanoi and Son La between pre- and postintervention. There were significant improvements in data timeliness at all levels of the health system and in both provinces. It is likely that the interventions contributed to this improvement because the guidelines, trainings, data review meetings, and supportive supervision visits emphasized the importance of timely data. Timeliness indicators performed better in Hanoi than in Son La, which may be because of the different service delivery practices; in Hanoi, vaccines are delivered on a weekly basic, whereas in Son La, they are delivered monthly. In addition, in Hanoi, more clients visit FIFs that deliver immunization daily.

In Son La, there were also improvements across nearly all the completeness and accuracy indicators. However, there were mixed results in terms of data completeness and accuracy in Hanoi, with some statistically significant declines in quality. This may be partly because Hanoi was more affected by the COVID-19 pandemic than Son La; there were 6 rounds of mandated social distancing across all 30 districts in Hanoi. When immunization services were available, facilities may have been understaffed if human resources were reassigned to the COVID-19 response and they may have faced higher demand from a backlog of children who could not be vaccinated during social distancing. Overburdened health workers may prioritize service delivery over data quality.

The observed differences in provincial results may also have been influenced by the way interventions were delivered in each province. The facilities in Son La received timely support through internet-based supervisory assistance. In Hanoi, they maintained in-person supportive supervision, which was less timely, and limited the number of facilities that could be visited. Facilities in Son La also received refresher trainings, which Hanoi did not, as Hanoi had an ongoing focus on COVID-19 vaccination.

There were large increases from pre- to postintervention in the percentage of facilities using the NIIS data to inform their activities. Results from the costing analysis showed that at the CHCs, health workers’ time spent on immunization program data entry and reporting declined as the transition to paperless resulted in efficiencies and reduced workload. It is possible that this saved time was dedicated to data use.

For most activities, a higher percentage of facilities in Son La reported using NIIS data compared with Hanoi. These results may have also been influenced by the COVID-19 pandemic or the differences in project interventions. In addition, Hanoi has a more mobile population, with clients frequently moving between facilities. This implies that vaccination plans informed by NIIS data may not be as accurate or comprehensive compared with a location such as Son La, where there is less population movement.

Although only 44.7% (84/188) of facilities in Son La and 27.7% (125/451) of facilities in Hanoi reported using the NIIS to evaluate data quality, previous studies have shown that data use can drive data quality improvements [ 19 ]. Through the qualitative interviews, health workers recommended building data quality indicators directly into the NIIS to help users understand their data quality easily.

The study also found that the majority of on-time vaccination rates for various antigens in the 2 project provinces improved compared to those in the preintervention group. With more timely data captured in the NIIS, health workers know which vaccines a child is due for and can send reminders or follow-up to deliver the vaccines on time. For the clients using the e-immunization card, it may have also contributed to timely vaccination by reminding clients of their vaccination schedule and due dates. However, the on-time vaccination rate for the first dose of measles in Hanoi decreased after the intervention compared to before, which may have been owing to the COVID-19 lockdowns from July to September 2021, when the postintervention cohort was aged 9 to 12 months. In Hanoi, many parents prefer waiting to receive the measles-rubella vaccine at 12 months, which is delivered at FIFs, versus the monovalent measles vaccine, which is part of the standard EPI schedule at 9 months.

In Son La, improvements in data quality and on-time vaccination were also associated with improvements in vaccination coverage. Full immunization at 12 months was 88.3% (5040/5705) in the postintervention cohort compared with 63.3% (3947/6233) in the preintervention cohort. A previous evaluation of ImmReg (the earlier version of the NIIS) in another province in Vietnam, Ben Tre, also found improvements in on-time vaccination and vaccination coverage when comparing pre- and postintroduction of the digital immunization registry, and these improvements were sustained or increased 1 year later [ 6 ]. However, in Hanoi, despite improvements in on-time vaccination, there were declines in vaccination coverage, again likely because of the COVID-19 pandemic. According to the 2021 annual EPI report for Vietnam, full immunization coverage decreased nationwide from 96.8% in 2020 to 87.1% in 2021 [ 20 ].

Introducing these new electronic systems, which have improved data quality and timeliness and could potentially improve immunization outcomes, comes at a cost, as up-front financial investments have to be made for software design, development, and deployment. The cost per child for NIIS development and deployment was estimated at US $0.48 in the 2 study provinces in Vietnam. This is much lower than that estimated in Tanzania and Zambia, the only other low- and middle-income countries with comparable cost estimates for the development of electronic immunization registries [ 11 , 12 ]. In our study, costs were annualized over 5 birth cohorts, whereas in Tanzania and Zambia, costs were annualized over 3 birth cohorts. Even if we were to use 3 birth cohorts for this analysis, our estimated costs for Vietnam would still be lower. The estimated lower costs per child for Vietnam could be in part because most CHCs had existing computers, and hence, there was no mass procurement of equipment required, which is not the case in many other countries. In Vietnam, the equipment and connectivity costs were shared across multiple health programs, further reducing the costs borne by the immunization program, unlike in Tanzania and Zambia, where immunization was the first health area to be digitalized at the health facility level. The availability of electricity at the health facility level in Vietnam also reduced costs, as there was no need for the procurement of alternative power sources such as solar chargers. In addition, software development in Vietnam was conducted by an in-country telecommunications partner that provided in-kind and pro bono services and support, some of which could not be quantified, resulting in lower costs than if done through an external organization.

Our study found that at the CHC level, there are cost savings for health workers’ time use and other resources such as printing with the NIIS compared with the paper system, and at the administrative level, there are incremental costs. The NIIS results in some savings in labor time, although some new activities are added for the staff. There are also recurring costs, such as ongoing technical support and maintenance provided by Viettel, the need for refresher training at the local level for new staff, and expenses associated with equipment maintenance and replacement as well as connectivity provided by local government. These recurrent costs are important to consider for sustainability.


Our study has several limitations. First, we conducted the study in 2 provinces and had small sample sizes for the costing results, facility survey results for data quality, and other results based on provider interviews. The small sample size and purposive sampling limit the generalizability of our findings; however, 2 distinct provinces with a mix of facility types were intentionally sampled so that the results can be used to understand the costs and outcomes of the transition to paperless reporting in a range of settings. Second, the time frame for the evaluation was short, which limited the magnitude of the change that could be observed. Third, there were minor differences in the data collection approaches and responses between pre- and postintervention. For example, the urban sampling approach was adjusted for the feasibility of data collection, and the response rate for facility assessment was lower at postintervention. Fourth, for data collected through provider interviews, such as for the costing study, potential recall bias arises due to retrospective nature of the inquires, as interviewees were asked to recall cost from past periods. Future studies should consider including costing data collection at different phases of the project to reduce recall bias. This would also facilitate the availability of costing data available to inform decision-making at various implementation phases. Our study may also have underestimated the costs associated with the paper system, as we did not incorporate expenses related to printing paper home-based records and invitation letters. These materials are printed to assist the caregivers in monitoring their children’s vaccination history or reminding them when their child is due for vaccination. We also have not accounted for the saved labor costs for health care workers who would deliver the invitation letters to households. Another limitation is that the COVID-19 pandemic affected the originally planned timelines for data collection and the immunization service delivery. We did not try to account for the impact of the pandemic on findings regarding the data quality and data use presented in this study.


Health workers in the 2 provinces successfully transitioned to paperless reporting while maintaining or improving data quality. We recommend that other provinces in Vietnam transition to paperless reporting by introducing the guidelines and standard operating procedures used in Hanoi and Son La and providing ongoing support through trainings, data review meetings, supportive supervision, and peer networks. Future studies should monitor data quality and immunization outcomes in other provinces as well as the sustainability of the observed changes in Hanoi and Son La.

Introducing these new electronic systems comes with costs—both up-front and recurrent—but there are advantages, as seen in the improvements in data quality and on-time vaccination. In Vietnam, stakeholders should plan and budget for the sustainability of the system at each level of the health system, given the recurrent costs including repairing and replacement of equipment, connectivity, refresher training, software system, and supportive supervision. Other countries planning to implement similar interventions should plan to collect costing data throughout to inform decision-making and budgeting.


The authors thank the Bill & Melinda Gates Foundation for providing support for this study and the National Immunization Information System (NIIS) Technical Working Group members for their leadership in designing, developing, and deploying the NIIS. The authors thank the data collectors who participated in this study. Finally, the authors acknowledge the invaluable collaboration of the health workers, managers, and leaders in Hanoi and Son La provinces.

Data Availability

The data sets generated during and analyzed during this study are available from the corresponding author on reasonable request.

Conflicts of Interest

None declared.

Supplementary tables.

  • Dang H, Dao S, Carnahan E, Kawakyu N, Duong H, Nguyen T, et al. Determinants of scale-up from a small pilot to a national electronic immunization registry in Vietnam: qualitative evaluation. J Med Internet Res. Sep 22, 2020;22(9):e19923. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yohana E, Mitiku S, Kayumba PC, Swalehe O. Electronic immunization registry in improving vaccine supply chain availability in Tanga City Council, Tanzania. Rwanda J Med Health Sci. Sep 10, 2021;4(2):223-236. [ CrossRef ]
  • Siddiqi DA, Abdullah S, Dharma VK, Shah MT, Akhter MA, Habib A, et al. Using a low-cost, real-time electronic immunization registry in Pakistan to demonstrate utility of data for immunization programs and evidence-based decision making to achieve SDG-3: insights from analysis of big data on vaccines. Int J Med Inform. May 2021;149:104413. [ CrossRef ] [ Medline ]
  • Uddin MJ, Shamsuzzaman M, Horng L, Labrique A, Vasudevan L, Zeller K, et al. Use of mobile phones for improving vaccination coverage among children living in rural hard-to-reach areas and urban streets of Bangladesh. Vaccine. Jan 04, 2016;34(2):276-283. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chen L, Du X, Zhang L, van Velthoven MH, Wu Q, Yang R, et al. Effectiveness of a smartphone app on improving immunization of children in rural Sichuan Province, China: a cluster randomized controlled trial. BMC Public Health. Aug 31, 2016;16(1):909. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nguyen NT, Vu HM, Dao SD, Tran HT, Nguyen TX. Digital immunization registry: evidence for the impact of mHealth on enhancing the immunization system and improving immunization coverage for children under one year old in Vietnam. Mhealth. 2017;3:26. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Gilbert SS, Thakare N, Ramanujapuram A, Akkihal A. Assessing stability and performance of a digitally enabled supply chain: retrospective of a pilot in Uttar Pradesh, India. Vaccine. Apr 19, 2017;35(17):2203-2208. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Gilbert SS, Bulula N, Yohana E, Thompson J, Beylerian E, Werner L, et al. The impact of an integrated electronic immunization registry and logistics management information system (EIR-eLMIS) on vaccine availability in three regions in Tanzania: a pre-post and time-series analysis. Vaccine. Jan 16, 2020;38(3):562-569. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Fritz J, Herrick T, Gilbert SS. Estimation of health impact from digitalizing last-mile logistics management information systems (LMIS) in Ethiopia, Tanzania, and Mozambique: a lives saved tool (LiST) model analysis. PLoS One. 2021;16(10):e0258354. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Carnahan E, Ferriss E, Beylerian E, Mwansa FD, Bulula N, Lyimo D, et al. Determinants of facility-level use of electronic immunization registries in Tanzania and Zambia: an observational analysis. Glob Health Sci Pract. Sep 30, 2020;8(3):488-504. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mvundura M, Di Giorgio L, Lymo D, Mwansa FD, Ngwegwe B, Werner L. The costs of developing, deploying and maintaining electronic immunisation registries in Tanzania and Zambia. BMJ Glob Health. Nov 25, 2019;4(6):e001904. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mvundura M, Di Giorgio L, Vodicka E, Kindoli R, Zulu C. Assessing the incremental costs and savings of introducing electronic immunization registries and stock management systems: evidence from the better immunization data initiative in Tanzania and Zambia. Pan Afr Med J. Feb 12, 2020;35(Suppl 1):11. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • World Health Organization, PATH. Optimize: Vietnam report. PATH. 2013. URL: https://media.path.org/documents/TS_opt_vietnam_rpt.pdf [accessed 2024-01-29]
  • Duong H, Dao S, Dang H, Nguyen L, Ngo T, Nguyen T, et al. The transition to an entirely digital immunization registry in Ha Noi province and Son La Province, Vietnam: readiness assessment study. JMIR Form Res. Oct 25, 2021;5(10):e28096. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Minh An DT, Lee JK, Van Minh HV, Trang NT, Thu Huong NT, Nam YS, et al. Timely immunization completion among children in Vietnam from 2000 to 2011: a multilevel analysis of individual and contextual factors. Glob Health Action. Mar 01, 2016;9(1):29189. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Vietnam: country information. Gavi, The Vaccine Alliance. URL: https://www.gavi.org/programmes-impact/country-hub/west-pacific/vietnam [accessed 2022-09-28]
  • 4/TT-BYT on establishment, functions, tasks and rights of research ethics committees, 05 March 2020. Vietnam Ministry of Health. URL: https:/​/thuvienphapluat.​vn/​van-ban/​Bo-may-hanh-chinh/​Thong-tu-4-TT-BYT-2020-thanh-lap-chuc-nang-nhiem-vu-Hoi-dong-dao-duc-trong-nghien-cuu-y-sinh-hoc-440446.​aspx [accessed 2024-01-25]
  • Home page. Code of Federal Regulations. URL: https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-A/part-46/subpart-A/section-46.102 [accessed 2024-01-25]
  • Werner L, Seymour D, Puta C, Gilbert S. Three waves of data use among health workers: the experience of the better immunization data initiative in Tanzania and Zambia. Glob Health Sci Pract. Sep 2019;7(3):447-456. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • National immunization program and vaccine delivery the national expanded program on immunization. The World Bank Document. URL: https:/​/openknowledge.​worldbank.org/​server/​api/​core/​bitstreams/​022f4737-59de-4f41-960f-e82f05093402/​content [accessed 2024-02-27]


Edited by T Leung, T de Azevedo Cardoso; submitted 14.12.22; peer-reviewed by V Horner, N Mejia, M Muhonde; comments to author 07.06.23; revised version received 28.07.23; accepted 26.01.24; published 18.03.24.

©Thi Thanh Huyen Dang, Emily Carnahan, Linh Nguyen, Mercy Mvundura, Sang Dao, Thi Hong Duong, Trung Nguyen, Doan Nguyen, Tu Nguyen, Laurie Werner, Tove K Ryman, Nga Nguyen. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.03.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Perspective
  • Published: 06 March 2024

Artificial intelligence and illusions of understanding in scientific research

  • Lisa Messeri   ORCID: orcid.org/0000-0002-0964-123X 1   na1 &
  • M. J. Crockett   ORCID: orcid.org/0000-0001-8800-410X 2 , 3   na1  

Nature volume  627 ,  pages 49–58 ( 2024 ) Cite this article

18k Accesses

3 Citations

697 Altmetric

Metrics details

  • Human behaviour
  • Interdisciplinary studies
  • Research management
  • Social anthropology

Scientists are enthusiastically imagining ways in which artificial intelligence (AI) tools might improve research. Why are AI tools so attractive and what are the risks of implementing them across the research pipeline? Here we develop a taxonomy of scientists’ visions for AI, observing that their appeal comes from promises to improve productivity and objectivity by overcoming human shortcomings. But proposed AI solutions can also exploit our cognitive limitations, making us vulnerable to illusions of understanding in which we believe we understand more about the world than we actually do. Such illusions obscure the scientific community’s ability to see the formation of scientific monocultures, in which some types of methods, questions and viewpoints come to dominate alternative approaches, making science less innovative and more vulnerable to errors. The proliferation of AI tools in science risks introducing a phase of scientific enquiry in which we produce more but understand less. By analysing the appeal of these tools, we provide a framework for advancing discussions of responsible knowledge production in the age of AI.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

research articles using qualitative methods

Similar content being viewed by others

research articles using qualitative methods

Nobel Turing Challenge: creating the engine for scientific discovery

Hiroaki Kitano

research articles using qualitative methods

Accelerating science with human-aware artificial intelligence

Jamshid Sourati & James A. Evans

research articles using qualitative methods

On scientific understanding with artificial intelligence

Mario Krenn, Robert Pollice, … Alán Aspuru-Guzik

Crabtree, G. Self-driving laboratories coming of age. Joule 4 , 2538–2541 (2020).

Article   CAS   Google Scholar  

Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620 , 47–60 (2023). This review explores how AI can be incorporated across the research pipeline, drawing from a wide range of scientific disciplines .

Article   CAS   PubMed   ADS   Google Scholar  

Dillion, D., Tandon, N., Gu, Y. & Gray, K. Can AI language models replace human participants? Trends Cogn. Sci. 27 , 597–600 (2023).

Article   PubMed   Google Scholar  

Grossmann, I. et al. AI and the transformation of social science research. Science 380 , 1108–1109 (2023). This forward-looking article proposes a variety of ways to incorporate generative AI into social-sciences research .

Gil, Y. Will AI write scientific papers in the future? AI Mag. 42 , 3–15 (2022).

Google Scholar  

Kitano, H. Nobel Turing Challenge: creating the engine for scientific discovery. npj Syst. Biol. Appl. 7 , 29 (2021).

Article   PubMed   PubMed Central   Google Scholar  

Benjamin, R. Race After Technology: Abolitionist Tools for the New Jim Code (Oxford Univ. Press, 2020). This book examines how social norms about race become embedded in technologies, even those that are focused on providing good societal outcomes .

Broussard, M. More Than a Glitch: Confronting Race, Gender, and Ability Bias in Tech (MIT Press, 2023).

Noble, S. U. Algorithms of Oppression: How Search Engines Reinforce Racism (New York Univ. Press, 2018).

Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? in Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (Association for Computing Machinery, 2021). One of the first comprehensive critiques of large language models, this article draws attention to a host of issues that ought to be considered before taking up such tools .

Crawford, K. Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence (Yale Univ. Press, 2021).

Johnson, D. G. & Verdicchio, M. Reframing AI discourse. Minds Mach. 27 , 575–590 (2017).

Article   Google Scholar  

Atanasoski, N. & Vora, K. Surrogate Humanity: Race, Robots, and the Politics of Technological Futures (Duke Univ. Press, 2019).

Mitchell, M. & Krakauer, D. C. The debate over understanding in AI’s large language models. Proc. Natl Acad. Sci. USA 120 , e2215907120 (2023).

Kidd, C. & Birhane, A. How AI can distort human beliefs. Science 380 , 1222–1223 (2023).

Birhane, A., Kasirzadeh, A., Leslie, D. & Wachter, S. Science in the age of large language models. Nat. Rev. Phys. 5 , 277–280 (2023).

Kapoor, S. & Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 4 , 100804 (2023).

Hullman, J., Kapoor, S., Nanayakkara, P., Gelman, A. & Narayanan, A. The worst of both worlds: a comparative analysis of errors in learning from data in psychology and machine learning. In Proc. 2022 AAAI/ACM Conference on AI, Ethics, and Society (eds Conitzer, V. et al.) 335–348 (Association for Computing Machinery, 2022).

Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1 , 206–215 (2019). This paper articulates the problems with attempting to explain AI systems that lack interpretability, and advocates for building interpretable models instead .

Crockett, M. J., Bai, X., Kapoor, S., Messeri, L. & Narayanan, A. The limitations of machine learning models for predicting scientific replicability. Proc. Natl Acad. Sci. USA 120 , e2307596120 (2023).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Lazar, S. & Nelson, A. AI safety on whose terms? Science 381 , 138 (2023).

Article   PubMed   ADS   Google Scholar  

Collingridge, D. The Social Control of Technology (St Martin’s Press, 1980).

Wagner, G., Lukyanenko, R. & Paré, G. Artificial intelligence and the conduct of literature reviews. J. Inf. Technol. 37 , 209–226 (2022).

Hutson, M. Artificial-intelligence tools aim to tame the coronavirus literature. Nature https://doi.org/10.1038/d41586-020-01733-7 (2020).

Haas, Q. et al. Utilizing artificial intelligence to manage COVID-19 scientific evidence torrent with Risklick AI: a critical tool for pharmacology and therapy development. Pharmacology 106 , 244–253 (2021).

Article   CAS   PubMed   Google Scholar  

Müller, H., Pachnanda, S., Pahl, F. & Rosenqvist, C. The application of artificial intelligence on different types of literature reviews – a comparative study. In 2022 International Conference on Applied Artificial Intelligence (ICAPAI) https://doi.org/10.1109/ICAPAI55158.2022.9801564 (Institute of Electrical and Electronics Engineers, 2022).

van Dinter, R., Tekinerdogan, B. & Catal, C. Automation of systematic literature reviews: a systematic literature review. Inf. Softw. Technol. 136 , 106589 (2021).

Aydın, Ö. & Karaarslan, E. OpenAI ChatGPT generated literature review: digital twin in healthcare. In Emerging Computer Technologies 2 (ed. Aydın, Ö.) 22–31 (İzmir Akademi Dernegi, 2022).

AlQuraishi, M. AlphaFold at CASP13. Bioinformatics 35 , 4862–4865 (2019).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Article   CAS   PubMed   PubMed Central   ADS   Google Scholar  

Lee, J. S., Kim, J. & Kim, P. M. Score-based generative modeling for de novo protein design. Nat. Computat. Sci. 3 , 382–392 (2023).

Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15 , 1120–1127 (2016).

Krenn, M. et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 4 , 761–769 (2022).

Extance, A. How AI technology can tame the scientific literature. Nature 561 , 273–274 (2018).

Hastings, J. AI for Scientific Discovery (CRC Press, 2023). This book reviews current and future incorporation of AI into the scientific research pipeline .

Ahmed, A. et al. The future of academic publishing. Nat. Hum. Behav. 7 , 1021–1026 (2023).

Gray, K., Yam, K. C., Zhen’An, A. E., Wilbanks, D. & Waytz, A. The psychology of robots and artificial intelligence. In The Handbook of Social Psychology (eds Gilbert, D. et al.) (in the press).

Argyle, L. P. et al. Out of one, many: using language models to simulate human samples. Polit. Anal. 31 , 337–351 (2023).

Aher, G., Arriaga, R. I. & Kalai, A. T. Using large language models to simulate multiple humans and replicate human subject studies. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 337–371 (JMLR.org, 2023).

Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3. Proc. Natl Acad. Sci. USA 120 , e2218523120 (2023).

Ornstein, J. T., Blasingame, E. N. & Truscott, J. S. How to train your stochastic parrot: large language models for political texts. Github , https://joeornstein.github.io/publications/ornstein-blasingame-truscott.pdf (2023).

He, S. et al. Learning to predict the cosmological structure formation. Proc. Natl Acad. Sci. USA 116 , 13825–13832 (2019).

Article   MathSciNet   CAS   PubMed   PubMed Central   ADS   Google Scholar  

Mahmood, F. et al. Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. Med. Imaging 39 , 3257–3267 (2020).

Teixeira, B. et al. Generating synthetic X-ray images of a person from the surface geometry. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 9059–9067 (Institute of Electrical and Electronics Engineers, 2018).

Marouf, M. et al. Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks. Nat. Commun. 11 , 166 (2020).

Watts, D. J. A twenty-first century science. Nature 445 , 489 (2007).

boyd, d. & Crawford, K. Critical questions for big data. Inf. Commun. Soc. 15 , 662–679 (2012). This article assesses the ethical and epistemic implications of scientific and societal moves towards big data and provides a parallel case study for thinking about the risks of artificial intelligence .

Jolly, E. & Chang, L. J. The Flatland fallacy: moving beyond low–dimensional thinking. Top. Cogn. Sci. 11 , 433–454 (2019).

Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12 , 1100–1122 (2017).

Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 10 , 221–227 (2013).

Bileschi, M. L. et al. Using deep learning to annotate the protein universe. Nat. Biotechnol. 40 , 932–937 (2022).

Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16 , 695–698 (2019).

Demszky, D. et al. Using large language models in psychology. Nat. Rev. Psychol. 2 , 688–701 (2023).

Karjus, A. Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence. Preprint at https://arxiv.org/abs/2309.14379 (2023).

Davies, A. et al. Advancing mathematics by guiding human intuition with AI. Nature 600 , 70–74 (2021).

Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover theories of human decision-making. Science 372 , 1209–1214 (2021).

Ilyas, A. et al. Adversarial examples are not bugs, they are features. Preprint at https://doi.org/10.48550/arXiv.1905.02175 (2019)

Semel, B. M. Listening like a computer: attentional tensions and mechanized care in psychiatric digital phenotyping. Sci. Technol. Hum. Values 47 , 266–290 (2022).

Gil, Y. Thoughtful artificial intelligence: forging a new partnership for data science and scientific discovery. Data Sci. 1 , 119–129 (2017).

Checco, A., Bracciale, L., Loreti, P., Pinfield, S. & Bianchi, G. AI-assisted peer review. Humanit. Soc. Sci. Commun. 8 , 25 (2021).

Thelwall, M. Can the quality of published academic journal articles be assessed with machine learning? Quant. Sci. Stud. 3 , 208–226 (2022).

Dhar, P. Peer review of scholarly research gets an AI boost. IEEE Spectrum spectrum.ieee.org/peer-review-of-scholarly-research-gets-an-ai-boost (2020).

Heaven, D. AI peer reviewers unleashed to ease publishing grind. Nature 563 , 609–610 (2018).

Conroy, G. How ChatGPT and other AI tools could disrupt scientific publishing. Nature 622 , 234–236 (2023).

Nosek, B. A. et al. Replicability, robustness, and reproducibility in psychological science. Annu. Rev. Psychol. 73 , 719–748 (2022).

Altmejd, A. et al. Predicting the replicability of social science lab experiments. PLoS ONE 14 , e0225826 (2019).

Yang, Y., Youyou, W. & Uzzi, B. Estimating the deep replicability of scientific findings using human and artificial intelligence. Proc. Natl Acad. Sci. USA 117 , 10762–10768 (2020).

Youyou, W., Yang, Y. & Uzzi, B. A discipline-wide investigation of the replicability of psychology papers over the past two decades. Proc. Natl Acad. Sci. USA 120 , e2208863120 (2023).

Rabb, N., Fernbach, P. M. & Sloman, S. A. Individual representation in a community of knowledge. Trends Cogn. Sci. 23 , 891–902 (2019). This comprehensive review paper documents the empirical evidence for distributed cognition in communities of knowledge and the resultant vulnerabilities to illusions of understanding .

Rozenblit, L. & Keil, F. The misunderstood limits of folk science: an illusion of explanatory depth. Cogn. Sci. 26 , 521–562 (2002). This paper provided an empirical demonstration of the illusion of explanatory depth, and inspired a programme of research in cognitive science on communities of knowledge .

Hutchins, E. Cognition in the Wild (MIT Press, 1995).

Lave, J. & Wenger, E. Situated Learning: Legitimate Peripheral Participation (Cambridge Univ. Press, 1991).

Kitcher, P. The division of cognitive labor. J. Philos. 87 , 5–22 (1990).

Hardwig, J. Epistemic dependence. J. Philos. 82 , 335–349 (1985).

Keil, F. in Oxford Studies In Epistemology (eds Gendler, T. S. & Hawthorne, J.) 143–166 (Oxford Academic, 2005).

Weisberg, M. & Muldoon, R. Epistemic landscapes and the division of cognitive labor. Philos. Sci. 76 , 225–252 (2009).

Sloman, S. A. & Rabb, N. Your understanding is my understanding: evidence for a community of knowledge. Psychol. Sci. 27 , 1451–1460 (2016).

Wilson, R. A. & Keil, F. The shadows and shallows of explanation. Minds Mach. 8 , 137–159 (1998).

Keil, F. C., Stein, C., Webb, L., Billings, V. D. & Rozenblit, L. Discerning the division of cognitive labor: an emerging understanding of how knowledge is clustered in other minds. Cogn. Sci. 32 , 259–300 (2008).

Sperber, D. et al. Epistemic vigilance. Mind Lang. 25 , 359–393 (2010).

Wilkenfeld, D. A., Plunkett, D. & Lombrozo, T. Depth and deference: when and why we attribute understanding. Philos. Stud. 173 , 373–393 (2016).

Sparrow, B., Liu, J. & Wegner, D. M. Google effects on memory: cognitive consequences of having information at our fingertips. Science 333 , 776–778 (2011).

Fisher, M., Goddu, M. K. & Keil, F. C. Searching for explanations: how the internet inflates estimates of internal knowledge. J. Exp. Psychol. Gen. 144 , 674–687 (2015).

De Freitas, J., Agarwal, S., Schmitt, B. & Haslam, N. Psychological factors underlying attitudes toward AI tools. Nat. Hum. Behav. 7 , 1845–1854 (2023).

Castelo, N., Bos, M. W. & Lehmann, D. R. Task-dependent algorithm aversion. J. Mark. Res. 56 , 809–825 (2019).

Cadario, R., Longoni, C. & Morewedge, C. K. Understanding, explaining, and utilizing medical artificial intelligence. Nat. Hum. Behav. 5 , 1636–1642 (2021).

Oktar, K. & Lombrozo, T. Deciding to be authentic: intuition is favored over deliberation when authenticity matters. Cognition 223 , 105021 (2022).

Bigman, Y. E., Yam, K. C., Marciano, D., Reynolds, S. J. & Gray, K. Threat of racial and economic inequality increases preference for algorithm decision-making. Comput. Hum. Behav. 122 , 106859 (2021).

Claudy, M. C., Aquino, K. & Graso, M. Artificial intelligence can’t be charmed: the effects of impartiality on laypeople’s algorithmic preferences. Front. Psychol. 13 , 898027 (2022).

Snyder, C., Keppler, S. & Leider, S. Algorithm reliance under pressure: the effect of customer load on service workers. Preprint at SSRN https://doi.org/10.2139/ssrn.4066823 (2022).

Bogert, E., Schecter, A. & Watson, R. T. Humans rely more on algorithms than social influence as a task becomes more difficult. Sci Rep. 11 , 8028 (2021).

Raviv, A., Bar‐Tal, D., Raviv, A. & Abin, R. Measuring epistemic authority: studies of politicians and professors. Eur. J. Personal. 7 , 119–138 (1993).

Cummings, L. The “trust” heuristic: arguments from authority in public health. Health Commun. 29 , 1043–1056 (2014).

Lee, M. K. Understanding perception of algorithmic decisions: fairness, trust, and emotion in response to algorithmic management. Big Data Soc. 5 , https://doi.org/10.1177/2053951718756684 (2018).

Kissinger, H. A., Schmidt, E. & Huttenlocher, D. The Age of A.I. And Our Human Future (Little, Brown, 2021).

Lombrozo, T. Explanatory preferences shape learning and inference. Trends Cogn. Sci. 20 , 748–759 (2016). This paper provides an overview of philosophical theories of explanatory virtues and reviews empirical evidence on the sorts of explanations people find satisfying .

Vrantsidis, T. H. & Lombrozo, T. Simplicity as a cue to probability: multiple roles for simplicity in evaluating explanations. Cogn. Sci. 46 , e13169 (2022).

Johnson, S. G. B., Johnston, A. M., Toig, A. E. & Keil, F. C. Explanatory scope informs causal strength inferences. In Proc. 36th Annual Meeting of the Cognitive Science Society 2453–2458 (Cognitive Science Society, 2014).

Khemlani, S. S., Sussman, A. B. & Oppenheimer, D. M. Harry Potter and the sorcerer’s scope: latent scope biases in explanatory reasoning. Mem. Cognit. 39 , 527–535 (2011).

Liquin, E. G. & Lombrozo, T. Motivated to learn: an account of explanatory satisfaction. Cogn. Psychol. 132 , 101453 (2022).

Hopkins, E. J., Weisberg, D. S. & Taylor, J. C. V. The seductive allure is a reductive allure: people prefer scientific explanations that contain logically irrelevant reductive information. Cognition 155 , 67–76 (2016).

Weisberg, D. S., Hopkins, E. J. & Taylor, J. C. V. People’s explanatory preferences for scientific phenomena. Cogn. Res. Princ. Implic. 3 , 44 (2018).

Jerez-Fernandez, A., Angulo, A. N. & Oppenheimer, D. M. Show me the numbers: precision as a cue to others’ confidence. Psychol. Sci. 25 , 633–635 (2014).

Kim, J., Giroux, M. & Lee, J. C. When do you trust AI? The effect of number presentation detail on consumer trust and acceptance of AI recommendations. Psychol. Mark. 38 , 1140–1155 (2021).

Nguyen, C. T. The seductions of clarity. R. Inst. Philos. Suppl. 89 , 227–255 (2021). This article describes how reductive and quantitative explanations can generate a sense of understanding that is not necessarily correlated with actual understanding .

Fisher, M., Smiley, A. H. & Grillo, T. L. H. Information without knowledge: the effects of internet search on learning. Memory 30 , 375–387 (2022).

Eliseev, E. D. & Marsh, E. J. Understanding why searching the internet inflates confidence in explanatory ability. Appl. Cogn. Psychol. 37 , 711–720 (2023).

Fisher, M. & Oppenheimer, D. M. Who knows what? Knowledge misattribution in the division of cognitive labor. J. Exp. Psychol. Appl. 27 , 292–306 (2021).

Chromik, M., Eiband, M., Buchner, F., Krüger, A. & Butz, A. I think I get your point, AI! The illusion of explanatory depth in explainable AI. In 26th International Conference on Intelligent User Interfaces (eds Hammond, T. et al.) 307–317 (Association for Computing Machinery, 2021).

Strevens, M. No understanding without explanation. Stud. Hist. Philos. Sci. A 44 , 510–515 (2013).

Ylikoski, P. in Scientific Understanding: Philosophical Perspectives (eds De Regt, H. et al.) 100–119 (Univ. Pittsburgh Press, 2009).

Giudice, M. D. The prediction–explanation fallacy: a pervasive problem in scientific applications of machine learning. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/4vq8f (2021).

Hofman, J. M. et al. Integrating explanation and prediction in computational social science. Nature 595 , 181–188 (2021). This paper highlights the advantages and disadvantages of explanatory versus predictive approaches to modelling, with a focus on applications to computational social science .

Shmueli, G. To explain or to predict? Stat. Sci. 25 , 289–310 (2010).

Article   MathSciNet   Google Scholar  

Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355 , 486–488 (2017).

Logg, J. M., Minson, J. A. & Moore, D. A. Algorithm appreciation: people prefer algorithmic to human judgment. Organ. Behav. Hum. Decis. Process. 151 , 90–103 (2019).

Nguyen, C. T. Cognitive islands and runaway echo chambers: problems for epistemic dependence on experts. Synthese 197 , 2803–2821 (2020).

Breiman, L. Statistical modeling: the two cultures. Stat. Sci. 16 , 199–215 (2001).

Gao, J. & Wang, D. Quantifying the benefit of artificial intelligence for scientific research. Preprint at arxiv.org/abs/2304.10578 (2023).

Hanson, B. et al. Garbage in, garbage out: mitigating risks and maximizing benefits of AI in research. Nature 623 , 28–31 (2023).

Kleinberg, J. & Raghavan, M. Algorithmic monoculture and social welfare. Proc. Natl Acad. Sci. USA 118 , e2018340118 (2021). This paper uses formal modelling methods to demonstrate that when companies all rely on the same algorithm to make decisions (an algorithmic monoculture), the overall quality of those decisions is reduced because valuable options can slip through the cracks, even when the algorithm performs accurately for individual companies .

Article   MathSciNet   CAS   PubMed   PubMed Central   Google Scholar  

Hofstra, B. et al. The diversity–innovation paradox in science. Proc. Natl Acad. Sci. USA 117 , 9284–9291 (2020).

Hong, L. & Page, S. E. Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proc. Natl Acad. Sci. USA 101 , 16385–16389 (2004).

Page, S. E. Where diversity comes from and why it matters? Eur. J. Soc. Psychol. 44 , 267–279 (2014). This article reviews research demonstrating the benefits of cognitive diversity and diversity in methodological approaches for problem solving and innovation .

Clarke, A. E. & Fujimura, J. H. (eds) The Right Tools for the Job: At Work in Twentieth-Century Life Sciences (Princeton Univ. Press, 2014).

Silva, V. J., Bonacelli, M. B. M. & Pacheco, C. A. Framing the effects of machine learning on science. AI Soc. https://doi.org/10.1007/s00146-022-01515-x (2022).

Sassenberg, K. & Ditrich, L. Research in social psychology changed between 2011 and 2016: larger sample sizes, more self-report measures, and more online studies. Adv. Methods Pract. Psychol. Sci. 2 , 107–114 (2019).

Simon, A. F. & Wilder, D. Methods and measures in social and personality psychology: a comparison of JPSP publications in 1982 and 2016. J. Soc. Psychol. https://doi.org/10.1080/00224545.2022.2135088 (2022).

Anderson, C. A. et al. The MTurkification of social and personality psychology. Pers. Soc. Psychol. Bull. 45 , 842–850 (2019).

Latour, B. in The Social After Gabriel Tarde: Debates and Assessments (ed. Candea, M.) 145–162 (Routledge, 2010).

Porter, T. M. Trust in Numbers: The Pursuit of Objectivity in Science and Public Life (Princeton Univ. Press, 1996).

Lazer, D. et al. Meaningful measures of human society in the twenty-first century. Nature 595 , 189–196 (2021).

Knox, D., Lucas, C. & Cho, W. K. T. Testing causal theories with learned proxies. Annu. Rev. Polit. Sci. 25 , 419–441 (2022).

Barberá, P. Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Polit. Anal. 23 , 76–91 (2015).

Brady, W. J., McLoughlin, K., Doan, T. N. & Crockett, M. J. How social learning amplifies moral outrage expression in online social networks. Sci. Adv. 7 , eabe5641 (2021).

Article   PubMed   PubMed Central   ADS   Google Scholar  

Barnes, J., Klinger, R. & im Walde, S. S. Assessing state-of-the-art sentiment models on state-of-the-art sentiment datasets. In Proc. 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (eds Balahur, A. et al.) 2–12 (Association for Computational Linguistics, 2017).

Gitelman, L. (ed.) “Raw Data” is an Oxymoron (MIT Press, 2013).

Breznau, N. et al. Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proc. Natl Acad. Sci. USA 119 , e2203150119 (2022). This study demonstrates how 73 research teams analysing the same dataset reached different conclusions about the relationship between immigration and public support for social policies, highlighting the subjectivity and uncertainty involved in analysing complex datasets .

Gillespie, T. in Media Technologies: Essays on Communication, Materiality, and Society (eds Gillespie, T. et al.) 167–194 (MIT Press, 2014).

Leonelli, S. Data-Centric Biology: A Philosophical Study (Univ. Chicago Press, 2016).

Wang, A., Kapoor, S., Barocas, S. & Narayanan, A. Against predictive optimization: on the legitimacy of decision-making algorithms that optimize predictive accuracy. ACM J. Responsib. Comput. , https://doi.org/10.1145/3636509 (2023).

Athey, S. Beyond prediction: using big data for policy problems. Science 355 , 483–485 (2017).

del Rosario Martínez-Ordaz, R. Scientific understanding through big data: from ignorance to insights to understanding. Possibility Stud. Soc. 1 , 279–299 (2023).

Nussberger, A.-M., Luo, L., Celis, L. E. & Crockett, M. J. Public attitudes value interpretability but prioritize accuracy in artificial intelligence. Nat. Commun. 13 , 5821 (2022).

Zittrain, J. in The Cambridge Handbook of Responsible Artificial Intelligence: Interdisciplinary Perspectives (eds. Voeneky, S. et al.) 176–184 (Cambridge Univ. Press, 2022). This article articulates the epistemic risks of prioritizing predictive accuracy over explanatory understanding when AI tools are interacting in complex systems.

Shumailov, I. et al. The curse of recursion: training on generated data makes models forget. Preprint at arxiv.org/abs/2305.17493 (2023).

Latour, B. Science In Action: How to Follow Scientists and Engineers Through Society (Harvard Univ. Press, 1987). This book provides strategies and approaches for thinking about science as a social endeavour .

Franklin, S. Science as culture, cultures of science. Annu. Rev. Anthropol. 24 , 163–184 (1995).

Haraway, D. Situated knowledges: the science question in feminism and the privilege of partial perspective. Fem. Stud. 14 , 575–599 (1988). This article acknowledges that the objective ‘view from nowhere’ is unobtainable: knowledge, it argues, is always situated .

Harding, S. Objectivity and Diversity: Another Logic of Scientific Research (Univ. Chicago Press, 2015).

Longino, H. E. Science as Social Knowledge: Values and Objectivity in Scientific Inquiry (Princeton Univ. Press, 1990).

Daston, L. & Galison, P. Objectivity (Princeton Univ. Press, 2007). This book is a historical analysis of the shifting modes of ‘objectivity’ that scientists have pursued, arguing that objectivity is not a universal concept but that it shifts alongside scientific techniques and ambitions .

Prescod-Weinstein, C. Making Black women scientists under white empiricism: the racialization of epistemology in physics. Signs J. Women Cult. Soc. 45 , 421–447 (2020).

Mavhunga, C. What Do Science, Technology, and Innovation Mean From Africa? (MIT Press, 2017).

Schiebinger, L. The Mind Has No Sex? Women in the Origins of Modern Science (Harvard Univ. Press, 1991).

Martin, E. The egg and the sperm: how science has constructed a romance based on stereotypical male–female roles. Signs J. Women Cult. Soc. 16 , 485–501 (1991). This case study shows how assumptions about gender affect scientific theories, sometimes delaying the articulation of what might be considered to be more accurate descriptions of scientific phenomena .

Harding, S. Rethinking standpoint epistemology: What is “strong objectivity”? Centen. Rev. 36 , 437–470 (1992). In this article, Harding outlines her position on ‘strong objectivity’, by which clearly articulating one’s standpoint can lead to more robust knowledge claims .

Oreskes, N. Why Trust Science? (Princeton Univ. Press, 2019). This book introduces the reader to 20 years of scholarship in science and technology studies, arguing that the tools the discipline has for understanding science can help to reinstate public trust in the institution .

Rolin, K., Koskinen, I., Kuorikoski, J. & Reijula, S. Social and cognitive diversity in science: introduction. Synthese 202 , 36 (2023).

Hong, L. & Page, S. E. Problem solving by heterogeneous agents. J. Econ. Theory 97 , 123–163 (2001).

Sulik, J., Bahrami, B. & Deroy, O. The diversity gap: when diversity matters for knowledge. Perspect. Psychol. Sci. 17 , 752–767 (2022).

Lungeanu, A., Whalen, R., Wu, Y. J., DeChurch, L. A. & Contractor, N. S. Diversity, networks, and innovation: a text analytic approach to measuring expertise diversity. Netw. Sci. 11 , 36–64 (2023).

AlShebli, B. K., Rahwan, T. & Woon, W. L. The preeminence of ethnic diversity in scientific collaboration. Nat. Commun. 9 , 5163 (2018).

Campbell, L. G., Mehtani, S., Dozier, M. E. & Rinehart, J. Gender-heterogeneous working groups produce higher quality science. PLoS ONE 8 , e79147 (2013).

Nielsen, M. W., Bloch, C. W. & Schiebinger, L. Making gender diversity work for scientific discovery and innovation. Nat. Hum. Behav. 2 , 726–734 (2018).

Yang, Y., Tian, T. Y., Woodruff, T. K., Jones, B. F. & Uzzi, B. Gender-diverse teams produce more novel and higher-impact scientific ideas. Proc. Natl Acad. Sci. USA 119 , e2200841119 (2022).

Kozlowski, D., Larivière, V., Sugimoto, C. R. & Monroe-White, T. Intersectional inequalities in science. Proc. Natl Acad. Sci. USA 119 , e2113067119 (2022).

Fehr, C. & Jones, J. M. Culture, exploitation, and epistemic approaches to diversity. Synthese 200 , 465 (2022).

Nakadai, R., Nakawake, Y. & Shibasaki, S. AI language tools risk scientific diversity and innovation. Nat. Hum. Behav. 7 , 1804–1805 (2023).

National Academies of Sciences, Engineering, and Medicine et al. Advancing Antiracism, Diversity, Equity, and Inclusion in STEMM Organizations: Beyond Broadening Participation (National Academies Press, 2023).

Winner, L. Do artifacts have politics? Daedalus 109 , 121–136 (1980).

Eubanks, V. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St. Martin’s Press, 2018).

Littmann, M. et al. Validity of machine learning in biology and medicine increased through collaborations across fields of expertise. Nat. Mach. Intell. 2 , 18–24 (2020).

Carusi, A. et al. Medical artificial intelligence is as much social as it is technological. Nat. Mach. Intell. 5 , 98–100 (2023).

Raghu, M. & Schmidt, E. A survey of deep learning for scientific discovery. Preprint at arxiv.org/abs/2003.11755 (2020).

Bishop, C. AI4Science to empower the fifth paradigm of scientific discovery. Microsoft Research Blog www.microsoft.com/en-us/research/blog/ai4science-to-empower-the-fifth-paradigm-of-scientific-discovery/ (2022).

Whittaker, M. The steep cost of capture. Interactions 28 , 50–55 (2021).

Liesenfeld, A., Lopez, A. & Dingemanse, M. Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators. In Proc. 5th International Conference on Conversational User Interfaces 1–6 (Association for Computing Machinery, 2023).

Chu, J. S. G. & Evans, J. A. Slowed canonical progress in large fields of science. Proc. Natl Acad. Sci. USA 118 , e2021636118 (2021).

Park, M., Leahey, E. & Funk, R. J. Papers and patents are becoming less disruptive over time. Nature 613 , 138–144 (2023).

Frith, U. Fast lane to slow science. Trends Cogn. Sci. 24 , 1–2 (2020). This article explains the epistemic risks of a hyperfocus on scientific productivity and explores possible avenues for incentivizing the production of higher-quality science on a slower timescale .

Stengers, I. Another Science is Possible: A Manifesto for Slow Science (Wiley, 2018).

Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 623 , 115–121 (2023).

Feinman, R. & Lake, B. M. Learning task-general representations with generative neuro-symbolic modeling. Preprint at arxiv.org/abs/2006.14448 (2021).

Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109 , 612–634 (2021).

Mitchell, M. AI’s challenge of understanding the world. Science 382 , eadm8175 (2023).

Sartori, L. & Bocca, G. Minding the gap(s): public perceptions of AI and socio-technical imaginaries. AI Soc. 38 , 443–458 (2023).

Download references


We thank D. S. Bassett, W. J. Brady, S. Helmreich, S. Kapoor, T. Lombrozo, A. Narayanan, M. Salganik and A. J. te Velthuis for comments. We also thank C. Buckner and P. Winter for their feedback and suggestions.

Author information

These authors contributed equally: Lisa Messeri, M. J. Crockett

Authors and Affiliations

Department of Anthropology, Yale University, New Haven, CT, USA

Lisa Messeri

Department of Psychology, Princeton University, Princeton, NJ, USA

M. J. Crockett

University Center for Human Values, Princeton University, Princeton, NJ, USA

You can also search for this author in PubMed   Google Scholar


The authors contributed equally to the research and writing of the paper.

Corresponding authors

Correspondence to Lisa Messeri or M. J. Crockett .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Cameron Buckner, Peter Winter and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Messeri, L., Crockett, M.J. Artificial intelligence and illusions of understanding in scientific research. Nature 627 , 49–58 (2024). https://doi.org/10.1038/s41586-024-07146-0

Download citation

Received : 31 July 2023

Accepted : 31 January 2024

Published : 06 March 2024

Issue Date : 07 March 2024

DOI : https://doi.org/10.1038/s41586-024-07146-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Ai is no substitute for having something to say.

Nature Reviews Physics (2024)

Perché gli scienziati si fidano troppo dell'intelligenza artificiale - e come rimediare

Nature Italy (2024)

Why scientists trust AI too much — and what to do about it

Nature (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research articles using qualitative methods


This article is part of the research topic.

Research on Teaching Strategies and Skills in Different Educational Stages

Physical education (PE) students' reflections about the learning outcomes of different teaching methods: a mixed methods study Provisionally Accepted

  • 1 Nord University, Norway

The final, formatted version of the article will be published soon.

Teaching in higher education is still mainly executed as lectures, even though research about student-active instruction methods points to more motivated students, higher enjoyment, and more optimal learning outcomes. The purpose of this study was to obtain better insight into how physical education (PE) students assessed their learning outcomes in relation to the use of different pedagogical approaches. A master's course in PE was planned and implemented using the following eight different learning approaches: lectures; practical exercises about themes in lectures; discussions during lectures; discussions outside of lectures; planning and exercises for peer students; individual work preparing to write an academic text; individual work writing the academic text; and reading for an exam. The study constituted a mixed methods study, which used quantitative data from students' evaluation of eight different learning approaches on a Likert-type scale, and in-depth qualitative data from follow-up interviews with some of the same students, with the aim of explaining the main findings.Quantitative data about the students' reflections on the learning outcomes of the different learning approaches were collected among 59 different students at three different times (2021, 2022, and 2023), after finishing a course in the fifth semester in a master's program in PE.The findings showed that the students reported achieving the highest learning outcomes from practical exercises and attaining the lowest learning outcomes from lectures. In depth interviews among seven randomly selected students were also used to obtain reflections from the students about the different learning approaches. Quantitative analyses again revealed that practical exercises produced the highest learning outcomes, while lectures resulted in the lowest learning outcomes. Qualitative analyses of the in-depth interviews indicated that practical activities enabled students to relate theory to practice, make them active, and are associated with future work, while the quality of lectures depended on characteristics of the teacher and were often experienced as long and unstimulating. According to the results, we recommend that student teachers in higher education acquire the ability to plan and execute practical lessons in relation to themes focused upon in lectures and involve students more in discussions during lectures.

Keywords: student teachers, learning outcomes, Lectures, practical activities, Pedagogical methods

Received: 05 Jan 2024; Accepted: 19 Mar 2024.

Copyright: © 2024 Sørensen and Lagestad. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: PhD. Arne Sørensen, Nord University, Bodø, Norway

People also looked at

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Quantitative and Qualitative Approaches to Generalization and Replication–A Representationalist View

In this paper, we provide a re-interpretation of qualitative and quantitative modeling from a representationalist perspective. In this view, both approaches attempt to construct abstract representations of empirical relational structures. Whereas quantitative research uses variable-based models that abstract from individual cases, qualitative research favors case-based models that abstract from individual characteristics. Variable-based models are usually stated in the form of quantified sentences (scientific laws). This syntactic structure implies that sentences about individual cases are derived using deductive reasoning. In contrast, case-based models are usually stated using context-dependent existential sentences (qualitative statements). This syntactic structure implies that sentences about other cases are justifiable by inductive reasoning. We apply this representationalist perspective to the problems of generalization and replication. Using the analytical framework of modal logic, we argue that the modes of reasoning are often not only applied to the context that has been studied empirically, but also on a between-contexts level. Consequently, quantitative researchers mostly adhere to a top-down strategy of generalization, whereas qualitative researchers usually follow a bottom-up strategy of generalization. Depending on which strategy is employed, the role of replication attempts is very different. In deductive reasoning, replication attempts serve as empirical tests of the underlying theory. Therefore, failed replications imply a faulty theory. From an inductive perspective, however, replication attempts serve to explore the scope of the theory. Consequently, failed replications do not question the theory per se , but help to shape its boundary conditions. We conclude that quantitative research may benefit from a bottom-up generalization strategy as it is employed in most qualitative research programs. Inductive reasoning forces us to think about the boundary conditions of our theories and provides a framework for generalization beyond statistical testing. In this perspective, failed replications are just as informative as successful replications, because they help to explore the scope of our theories.


Qualitative and quantitative research strategies have long been treated as opposing paradigms. In recent years, there have been attempts to integrate both strategies. These “mixed methods” approaches treat qualitative and quantitative methodologies as complementary, rather than opposing, strategies (Creswell, 2015 ). However, whilst acknowledging that both strategies have their benefits, this “integration” remains purely pragmatic. Hence, mixed methods methodology does not provide a conceptual unification of the two approaches.

Lacking a common methodological background, qualitative and quantitative research methodologies have developed rather distinct standards with regard to the aims and scope of empirical science (Freeman et al., 2007 ). These different standards affect the way researchers handle contradictory empirical findings. For example, many empirical findings in psychology have failed to replicate in recent years (Klein et al., 2014 ; Open Science, Collaboration, 2015 ). This “replication crisis” has been discussed on statistical, theoretical and social grounds and continues to have a wide impact on quantitative research practices like, for example, open science initiatives, pre-registered studies and a re-evaluation of statistical significance testing (Everett and Earp, 2015 ; Maxwell et al., 2015 ; Shrout and Rodgers, 2018 ; Trafimow, 2018 ; Wiggins and Chrisopherson, 2019 ).

However, qualitative research seems to be hardly affected by this discussion. In this paper, we argue that the latter is a direct consequence of how the concept of generalizability is conceived in the two approaches. Whereas most of quantitative psychology is committed to a top-down strategy of generalization based on the idea of random sampling from an abstract population, qualitative studies usually rely on a bottom-up strategy of generalization that is grounded in the successive exploration of the field by means of theoretically sampled cases.

Here, we show that a common methodological framework for qualitative and quantitative research methodologies is possible. We accomplish this by introducing a formal description of quantitative and qualitative models from a representationalist perspective: both approaches can be reconstructed as special kinds of representations for empirical relational structures. We then use this framework to analyze the generalization strategies used in the two approaches. These turn out to be logically independent of the type of model. This has wide implications for psychological research. First, a top-down generalization strategy is compatible with a qualitative modeling approach. This implies that mainstream psychology may benefit from qualitative methods when a numerical representation turns out to be difficult or impossible, without the need to commit to a “qualitative” philosophy of science. Second, quantitative research may exploit the bottom-up generalization strategy that is inherent to many qualitative approaches. This offers a new perspective on unsuccessful replications by treating them not as scientific failures, but as a valuable source of information about the scope of a theory.

The Quantitative Strategy–Numbers and Functions

Quantitative science is about finding valid mathematical representations for empirical phenomena. In most cases, these mathematical representations have the form of functional relations between a set of variables. One major challenge of quantitative modeling consists in constructing valid measures for these variables. Formally, to measure a variable means to construct a numerical representation of the underlying empirical relational structure (Krantz et al., 1971 ). For example, take the behaviors of a group of students in a classroom: “to listen,” “to take notes,” and “to ask critical questions.” One may now ask whether is possible to assign numbers to the students, such that the relations between the assigned numbers are of the same kind as the relations between the values of an underlying variable, like e.g., “engagement.” The observed behaviors in the classroom constitute an empirical relational structure, in the sense that for every student-behavior tuple, one can observe whether it is true or not. These observations can be represented in a person × behavior matrix 1 (compare Figure 1 ). Given this relational structure satisfies certain conditions (i.e., the axioms of a measurement model), one can assign numbers to the students and the behaviors, such that the relations between the numbers resemble the corresponding numerical relations. For example, if there is a unique ordering in the empirical observations with regard to which person shows which behavior, the assigned numbers have to constitute a corresponding unique ordering, as well. Such an ordering coincides with the person × behavior matrix forming a triangle shaped relation and is formally represented by a Guttman scale (Guttman, 1944 ). There are various measurement models available for different empirical structures (Suppes et al., 1971 ). In the case of probabilistic relations, Item-Response models may be considered as a special kind of measurement model (Borsboom, 2005 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-12-605191-g0001.jpg

Constructing a numerical representation from an empirical relational structure; Due to the unique ordering of persons with regard to behaviors (indicated by the triangular shape of the relation), it is possible to construct a Guttman scale by assigning a number to each of the individuals, representing the number of relevant behaviors shown by the individual. The resulting variable (“engagement”) can then be described by means of statistical analyses, like, e.g., plotting the frequency distribution.

Although essential, measurement is only the first step of quantitative modeling. Consider a slightly richer empirical structure, where we observe three additional behaviors: “to doodle,” “to chat,” and “to play.” Like above, one may ask, whether there is a unique ordering of the students with regard to these behaviors that can be represented by an underlying variable (i.e., whether the matrix forms a Guttman scale). If this is the case, we may assign corresponding numbers to the students and call this variable “distraction.” In our example, such a representation is possible. We can thus assign two numbers to each student, one representing his or her “engagement” and one representing his or her “distraction” (compare Figure 2 ). These measurements can now be used to construct a quantitative model by relating the two variables by a mathematical function. In the simplest case, this may be a linear function. This functional relation constitutes a quantitative model of the empirical relational structure under study (like, e.g., linear regression). Given the model equation and the rules for assigning the numbers (i.e., the instrumentations of the two variables), the set of admissible empirical structures is limited from all possible structures to a rather small subset. This constitutes the empirical content of the model 2 (Popper, 1935 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-12-605191-g0002.jpg

Constructing a numerical model from an empirical relational structure; Since there are two distinct classes of behaviors that each form a Guttman scale, it is possible to assign two numbers to each individual, correspondingly. The resulting variables (“engagement” and “distraction”) can then be related by a mathematical function, which is indicated by the scatterplot and red line on the right hand side.

The Qualitative Strategy–Categories and Typologies

The predominant type of analysis in qualitative research consists in category formation. By constructing descriptive systems for empirical phenomena, it is possible to analyze the underlying empirical structure at a higher level of abstraction. The resulting categories (or types) constitute a conceptual frame for the interpretation of the observations. Qualitative researchers differ considerably in the way they collect and analyze data (Miles et al., 2014 ). However, despite the diverse research strategies followed by different qualitative methodologies, from a formal perspective, most approaches build on some kind of categorization of cases that share some common features. The process of category formation is essential in many qualitative methodologies, like, for example, qualitative content analysis, thematic analysis, grounded theory (see Flick, 2014 for an overview). Sometimes these features are directly observable (like in our classroom example), sometimes they are themselves the result of an interpretative process (e.g., Scheunpflug et al., 2016 ).

In contrast to quantitative methodologies, there have been little attempts to formalize qualitative research strategies (compare, however, Rihoux and Ragin, 2009 ). However, there are several statistical approaches to non-numerical data that deal with constructing abstract categories and establishing relations between these categories (Agresti, 2013 ). Some of these methods are very similar to qualitative category formation on a conceptual level. For example, cluster analysis groups cases into homogenous categories (clusters) based on their similarity on a distance metric.

Although category formation can be formalized in a mathematically rigorous way (Ganter and Wille, 1999 ), qualitative research hardly acknowledges these approaches. 3 However, in order to find a common ground with quantitative science, it is certainly helpful to provide a formal interpretation of category systems.

Let us reconsider the above example of students in a classroom. The quantitative strategy was to assign numbers to the students with regard to variables and to relate these variables via a mathematical function. We can analyze the same empirical structure by grouping the behaviors to form abstract categories. If the aim is to construct an empirically valid category system, this grouping is subject to constraints, analogous to those used to specify a measurement model. The first and most important constraint is that the behaviors must form equivalence classes, i.e., within categories, behaviors need to be equivalent, and across categories, they need to be distinct (formally, the relational structure must obey the axioms of an equivalence relation). When objects are grouped into equivalence classes, it is essential to specify the criterion for empirical equivalence. In qualitative methodology, this is sometimes referred to as the tertium comparationis (Flick, 2014 ). One possible criterion is to group behaviors such that they constitute a set of specific common attributes of a group of people. In our example, we might group the behaviors “to listen,” “to take notes,” and “to doodle,” because these behaviors are common to the cases B, C, and D, and they are also specific for these cases, because no other person shows this particular combination of behaviors. The set of common behaviors then forms an abstract concept (e.g., “moderate distraction”), while the set of persons that show this configuration form a type (e.g., “the silent dreamer”). Formally, this means to identify the maximal rectangles in the underlying empirical relational structure (see Figure 3 ). This procedure is very similar to the way we constructed a Guttman scale, the only difference being that we now use different aspects of the empirical relational structure. 4 In fact, the set of maximal rectangles can be determined by an automated algorithm (Ganter, 2010 ), just like the dimensionality of an empirical structure can be explored by psychometric scaling methods. Consequently, we can identify the empirical content of a category system or a typology as the set of empirical structures that conforms to it. 5 Whereas the quantitative strategy was to search for scalable sub-matrices and then relate the constructed variables by a mathematical function, the qualitative strategy is to construct an empirical typology by grouping cases based on their specific similarities. These types can then be related to one another by a conceptual model that describes their semantic and empirical overlap (see Figure 3 , right hand side).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-12-605191-g0003.jpg

Constructing a conceptual model from an empirical relational structure; Individual behaviors are grouped to form abstract types based on them being shared among a specific subset of the cases. Each type constitutes a set of specific commonalities of a class of individuals (this is indicated by the rectangles on the left hand side). The resulting types (“active learner,” “silent dreamer,” “distracted listener,” and “troublemaker”) can then be related to one another to explicate their semantic and empirical overlap, as indicated by the Venn-diagram on the right hand side.

Variable-Based Models and Case-Based Models

In the previous section, we have argued that qualitative category formation and quantitative measurement can both be characterized as methods to construct abstract representations of empirical relational structures. Instead of focusing on different philosophical approaches to empirical science, we tried to stress the formal similarities between both approaches. However, it is worth also exploring the dissimilarities from a formal perspective.

Following the above analysis, the quantitative approach can be characterized by the use of variable-based models, whereas the qualitative approach is characterized by case-based models (Ragin, 1987 ). Formally, we can identify the rows of an empirical person × behavior matrix with a person-space, and the columns with a corresponding behavior-space. A variable-based model abstracts from the single individuals in a person-space to describe the structure of behaviors on a population level. A case-based model, on the contrary, abstracts from the single behaviors in a behavior-space to describe individual case configurations on the level of abstract categories (see Table 1 ).

Variable-based models and case-based models.

From a representational perspective, there is no a priori reason to favor one type of model over the other. Both approaches provide different analytical tools to construct an abstract representation of an empirical relational structure. However, since the two modeling approaches make use of different information (person-space vs. behavior-space), this comes with some important implications for the researcher employing one of the two strategies. These are concerned with the role of deductive and inductive reasoning.

In variable-based models, empirical structures are represented by functional relations between variables. These are usually stated as scientific laws (Carnap, 1928 ). Formally, these laws correspond to logical expressions of the form

In plain text, this means that y is a function of x for all objects i in the relational structure under consideration. For example, in the above example, one may formulate the following law: for all students in the classroom it holds that “distraction” is a monotone decreasing function of “engagement.” Such a law can be used to derive predictions for single individuals by means of logical deduction: if the above law applies to all students in the classroom, it is possible to calculate the expected distraction from a student's engagement. An empirical observation can now be evaluated against this prediction. If the prediction turns out to be false, the law can be refuted based on the principle of falsification (Popper, 1935 ). If a scientific law repeatedly withstands such empirical tests, it may be considered to be valid with regard to the relational structure under consideration.

In case-based models, there are no laws about a population, because the model does not abstract from the cases but from the observed behaviors. A case-based model describes the underlying structure in terms of existential sentences. Formally, this corresponds to a logical expression of the form

In plain text, this means that there is at least one case i for which the condition XYZ holds. For example, the above category system implies that there is at least one active learner. This is a statement about a singular observation. It is impossible to deduce a statement about another person from an existential sentence like this. Therefore, the strategy of falsification cannot be applied to test the model's validity in a specific context. If one wishes to generalize to other cases, this is accomplished by inductive reasoning, instead. If we observed one person that fulfills the criteria of calling him or her an active learner, we can hypothesize that there may be other persons that are identical to the observed case in this respect. However, we do not arrive at this conclusion by logical deduction, but by induction.

Despite this important distinction, it would be wrong to conclude that variable-based models are intrinsically deductive and case-based models are intrinsically inductive. 6 Both types of reasoning apply to both types of models, but on different levels. Based on a person-space, in a variable-based model one can use deduction to derive statements about individual persons from abstract population laws. There is an analogous way of reasoning for case-based models: because they are based on a behavior space, it is possible to deduce statements about singular behaviors. For example, if we know that Peter is an active learner, we can deduce that he takes notes in the classroom. This kind of deductive reasoning can also be applied on a higher level of abstraction to deduce thematic categories from theoretical assumptions (Braun and Clarke, 2006 ). Similarly, there is an analog for inductive generalization from the perspective of variable-based modeling: since the laws are only quantified over the person-space, generalizations to other behaviors rely on inductive reasoning. For example, it is plausible to assume that highly engaged students tend to do their homework properly–however, in our example this behavior has never been observed. Hence, in variable-based models we usually generalize to other behaviors by means of induction. This kind of inductive reasoning is very common when empirical results are generalized from the laboratory to other behavioral domains.

Although inductive and deductive reasoning are used in qualitative and quantitative research, it is important to stress the different roles of induction and deduction when models are applied to cases. A variable-based approach implies to draw conclusions about cases by means of logical deduction; a case-based approach implies to draw conclusions about cases by means of inductive reasoning. In the following, we build on this distinction to differentiate between qualitative (bottom-up) and quantitative (top-down) strategies of generalization.

Generalization and the Problem of Replication

We will now extend the formal analysis of quantitative and qualitative approaches to the question of generalization and replicability of empirical findings. For this sake, we have to introduce some concepts of formal logic. Formal logic is concerned with the validity of arguments. It provides conditions to evaluate whether certain sentences (conclusions) can be derived from other sentences (premises). In this context, a theory is nothing but a set of sentences (also called axioms). Formal logic provides tools to derive new sentences that must be true, given the axioms are true (Smith, 2020 ). These derived sentences are called theorems or, in the context of empirical science, predictions or hypotheses . On the syntactic level, the rules of logic only state how to evaluate the truth of a sentence relative to its premises. Whether or not sentences are actually true, is formally specified by logical semantics.

On the semantic level, formal logic is intrinsically linked to set-theory. For example, a logical statement like “all dogs are mammals,” is true if and only if the set of dogs is a subset of the set of mammals. Similarly, the sentence “all chatting students doodle” is true if and only if the set of chatting students is a subset of the set of doodling students (compare Figure 3 ). Whereas, the first sentence is analytically true due to the way we define the words “dog” and “mammal,” the latter can be either true or false, depending on the relational structure we actually observe. We can thus interpret an empirical relational structure as the truth criterion of a scientific theory. From a logical point of view, this corresponds to the semantics of a theory. As shown above, variable-based and case-based models both give a formal representation of the same kinds of empirical structures. Accordingly, both types of models can be stated as formal theories. In the variable-based approach, this corresponds to a set of scientific laws that are quantified over the members of an abstract population (these are the axioms of the theory). In the case-based approach, this corresponds to a set of abstract existential statements about a specific class of individuals.

In contrast to mathematical axiom systems, empirical theories are usually not considered to be necessarily true. This means that even if we find no evidence against a theory, it is still possible that it is actually wrong. We may know that a theory is valid in some contexts, yet it may fail when applied to a new set of behaviors (e.g., if we use a different instrumentation to measure a variable) or a new population (e.g., if we draw a new sample).

From a logical perspective, the possibility that a theory may turn out to be false stems from the problem of contingency . A statement is contingent, if it is both, possibly true and possibly false. Formally, we introduce two modal operators: □ to designate logical necessity, and ◇ to designate logical possibility. Semantically, these operators are very similar to the existential quantifier, ∃, and the universal quantifier, ∀. Whereas ∃ and ∀ refer to the individual objects within one relational structure, the modal operators □ and ◇ range over so-called possible worlds : a statement is possibly true, if and only if it is true in at least one accessible possible world, and a statement is necessarily true if and only if it is true in every accessible possible world (Hughes and Cresswell, 1996 ). Logically, possible worlds are mathematical abstractions, each consisting of a relational structure. Taken together, the relational structures of all accessible possible worlds constitute the formal semantics of necessity, possibility and contingency. 7

In the context of an empirical theory, each possible world may be identified with an empirical relational structure like the above classroom example. Given the set of intended applications of a theory (the scope of the theory, one may say), we can now construct possible world semantics for an empirical theory: each intended application of the theory corresponds to a possible world. For example, a quantified sentence like “all chatting students doodle” may be true in one classroom and false in another one. In terms of possible worlds, this would correspond to a statement of contingency: “it is possible that all chatting students doodle in one classroom, and it is possible that they don't in another classroom.” Note that in the above expression, “all students” refers to the students in only one possible world, whereas “it is possible” refers to the fact that there is at least one possible world for each of the specified cases.

To apply these possible world semantics to quantitative research, let us reconsider how generalization to other cases works in variable-based models. Due to the syntactic structure of quantitative laws, we can deduce predictions for singular observations from an expression of the form ∀ i : y i = f ( x i ). Formally, the logical quantifier ∀ ranges only over the objects of the corresponding empirical relational structure (in our example this would refer to the students in the observed classroom). But what if we want to generalize beyond the empirical structure we actually observed? The standard procedure is to assume an infinitely large, abstract population from which a random sample is drawn. Given the truth of the theory, we can deduce predictions about what we may observe in the sample. Since usually we deal with probabilistic models, we can evaluate our theory by means of the conditional probability of the observations, given the theory holds. This concept of conditional probability is the foundation of statistical significance tests (Hogg et al., 2013 ), as well as Bayesian estimation (Watanabe, 2018 ). In terms of possible world semantics, the random sampling model implies that all possible worlds (i.e., all intended applications) can be conceived as empirical sub-structures from a greater population structure. For example, the empirical relational structure constituted by the observed behaviors in a classroom would be conceived as a sub-matrix of the population person × behavior matrix. It follows that, if a scientific law is true in the population, it will be true in all possible worlds, i.e., it will be necessarily true. Formally, this corresponds to an expression of the form

The statistical generalization model thus constitutes a top-down strategy for dealing with individual contexts that is analogous to the way variable-based models are applied to individual cases (compare Table 1 ). Consequently, if we apply a variable-based model to a new context and find out that it does not fit the data (i.e., there is a statistically significant deviation from the model predictions), we have reason to doubt the validity of the theory. This is what makes the problem of low replicability so important: we observe that the predictions are wrong in a new study; and because we apply a top-down strategy of generalization to contexts beyond the ones we observed, we see our whole theory at stake.

Qualitative research, on the contrary, follows a different strategy of generalization. Since case-based models are formulated by a set of context-specific existential sentences, there is no need for universal truth or necessity. In contrast to statistical generalization to other cases by means of random sampling from an abstract population, the usual strategy in case-based modeling is to employ a bottom-up strategy of generalization that is analogous to the way case-based models are applied to individual cases. Formally, this may be expressed by stating that the observed qualia exist in at least one possible world, i.e., the theory is possibly true:

This statement is analogous to the way we apply case-based models to individual cases (compare Table 1 ). Consequently, the set of intended applications of the theory does not follow from a sampling model, but from theoretical assumptions about which cases may be similar to the observed cases with respect to certain relevant characteristics. For example, if we observe that certain behaviors occur together in one classroom, following a bottom-up strategy of generalization, we will hypothesize why this might be the case. If we do not replicate this finding in another context, this does not question the model itself, since it was a context-specific theory all along. Instead, we will revise our hypothetical assumptions about why the new context is apparently less similar to the first one than we originally thought. Therefore, if an empirical finding does not replicate, we are more concerned about our understanding of the cases than about the validity of our theory.

Whereas statistical generalization provides us with a formal (and thus somehow more objective) apparatus to evaluate the universal validity of our theories, the bottom-up strategy forces us to think about the class of intended applications on theoretical grounds. This means that we have to ask: what are the boundary conditions of our theory? In the above classroom example, following a bottom-up strategy, we would build on our preliminary understanding of the cases in one context (e.g., a public school) to search for similar and contrasting cases in other contexts (e.g., a private school). We would then re-evaluate our theoretical description of the data and explore what makes cases similar or dissimilar with regard to our theory. This enables us to expand the class of intended applications alongside with the theory.

Of course, none of these strategies is superior per se . Nevertheless, they rely on different assumptions and may thus be more or less adequate in different contexts. The statistical strategy relies on the assumption of a universal population and invariant measurements. This means, we assume that (a) all samples are drawn from the same population and (b) all variables refer to the same behavioral classes. If these assumptions are true, statistical generalization is valid and therefore provides a valuable tool for the testing of empirical theories. The bottom-up strategy of generalization relies on the idea that contexts may be classified as being more or less similar based on characteristics that are not part of the model being evaluated. If such a similarity relation across contexts is feasible, the bottom-up strategy is valid, as well. Depending on the strategy of generalization, replication of empirical research serves two very different purposes. Following the (top-down) principle of generalization by deduction from scientific laws, replications are empirical tests of the theory itself, and failed replications question the theory on a fundamental level. Following the (bottom-up) principle of generalization by induction to similar contexts, replications are a means to explore the boundary conditions of a theory. Consequently, failed replications question the scope of the theory and help to shape the set of intended applications.

We have argued that quantitative and qualitative research are best understood by means of the structure of the employed models. Quantitative science mainly relies on variable-based models and usually employs a top-down strategy of generalization from an abstract population to individual cases. Qualitative science prefers case-based models and usually employs a bottom-up strategy of generalization. We further showed that failed replications have very different implications depending on the underlying strategy of generalization. Whereas in the top-down strategy, replications are used to test the universal validity of a model, in the bottom-up strategy, replications are used to explore the scope of a model. We will now address the implications of this analysis for psychological research with regard to the problem of replicability.

Modern day psychology almost exclusively follows a top-down strategy of generalization. Given the quantitative background of most psychological theories, this is hardly surprising. Following the general structure of variable-based models, the individual case is not the focus of the analysis. Instead, scientific laws are stated on the level of an abstract population. Therefore, when applying the theory to a new context, a statistical sampling model seems to be the natural consequence. However, this is not the only possible strategy. From a logical point of view, there is no reason to assume that a quantitative law like ∀ i : y i = f ( x i ) implies that the law is necessarily true, i.e.,: □(∀ i : y i = f ( x i )). Instead, one might just as well define the scope of the theory following an inductive strategy. 8 Formally, this would correspond to the assumption that the observed law is possibly true, i.e.,: ◇(∀ i : y i = f ( x i )). For example, we may discover a functional relation between “engagement” and “distraction” without referring to an abstract universal population of students. Instead, we may hypothesize under which conditions this functional relation may be valid and use these assumptions to inductively generalize to other cases.

If we take this seriously, this would require us to specify the intended applications of the theory: in which contexts do we expect the theory to hold? Or, equivalently, what are the boundary conditions of the theory? These boundary conditions may be specified either intensionally, i.e., by giving external criteria for contexts being similar enough to the ones already studied to expect a successful application of the theory. Or they may be specified extensionally, by enumerating the contexts where the theory has already been shown to be valid. These boundary conditions need not be restricted to the population we refer to, but include all kinds of contextual factors. Therefore, adopting a bottom-up strategy, we are forced to think about these factors and make them an integral part of our theories.

In fact, there is good reason to believe that bottom-up generalization may be more adequate in many psychological studies. Apart from the pitfalls associated with statistical generalization that have been extensively discussed in recent years (e.g., p-hacking, underpowered studies, publication bias), it is worth reflecting on whether the underlying assumptions are met in a particular context. For example, many samples used in experimental psychology are not randomly drawn from a large population, but are convenience samples. If we use statistical models with non-random samples, we have to assume that the observations vary as if drawn from a random sample. This may indeed be the case for randomized experiments, because all variation between the experimental conditions apart from the independent variable will be random due to the randomization procedure. In this case, a classical significance test may be regarded as an approximation to a randomization test (Edgington and Onghena, 2007 ). However, if we interpret a significance test as an approximate randomization test, we test not for generalization but for internal validity. Hence, even if we use statistical significance tests when assumptions about random sampling are violated, we still have to use a different strategy of generalization. This issue has been discussed in the context of small-N studies, where variable-based models are applied to very small samples, sometimes consisting of only one individual (Dugard et al., 2012 ). The bottom-up strategy of generalization that is employed by qualitative researchers, provides such an alternative.

Another important issue in this context is the question of measurement invariance. If we construct a variable-based model in one context, the variables refer to those behaviors that constitute the underlying empirical relational structure. For example, we may construct an abstract measure of “distraction” using the observed behaviors in a certain context. We will then use the term “distraction” as a theoretical term referring to the variable we have just constructed to represent the underlying empirical relational structure. Let us now imagine we apply this theory to a new context. Even if the individuals in our new context are part of the same population, we may still get into trouble if the observed behaviors differ from those used in the original study. How do we know whether these behaviors constitute the same variable? We have to ensure that in any new context, our measures are valid for the variables in our theory. Without a proper measurement model, this will be hard to achieve (Buntins et al., 2017 ). Again, we are faced with the necessity to think of the boundary conditions of our theories. In which contexts (i.e., for which sets of individuals and behaviors) do we expect our theory to work?

If we follow the rationale of inductive generalization, we can explore the boundary conditions of a theory with every new empirical study. We thus widen the scope of our theory by comparing successful applications in different contexts and unsuccessful applications in similar contexts. This may ultimately lead to a more general theory, maybe even one of universal scope. However, unless we have such a general theory, we might be better off, if we treat unsuccessful replications not as a sign of failure, but as a chance to learn.

Author Contributions

MB conceived the original idea and wrote the first draft of the paper. MS helped to further elaborate and scrutinize the arguments. All authors contributed to the final version of the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


We would like to thank Annette Scheunpflug for helpful comments on an earlier version of the manuscript.

1 A person × behavior matrix constitutes a very simple relational structure that is common in psychological research. This is why it is chosen here as a minimal example. However, more complex structures are possible, e.g., by relating individuals to behaviors over time, with individuals nested within groups etc. For a systematic overview, compare Coombs ( 1964 ).

2 This notion of empirical content applies only to deterministic models. The empirical content of a probabilistic model consists in the probability distribution over all possible empirical structures.

3 For example, neither the SAGE Handbook of qualitative data analysis edited by Flick ( 2014 ) nor the Oxford Handbook of Qualitative Research edited by Leavy ( 2014 ) mention formal approaches to category formation.

4 Note also that the described structure is empirically richer than a nominal scale. Therefore, a reduction of qualitative category formation to be a special (and somehow trivial) kind of measurement is not adequate.

5 It is possible to extend this notion of empirical content to the probabilistic case (this would correspond to applying a latent class analysis). But, since qualitative research usually does not rely on formal algorithms (neither deterministic nor probabilistic), there is currently little practical use of such a concept.

6 We do not elaborate on abductive reasoning here, since, given an empirical relational structure, the concept can be applied to both types of models in the same way (Schurz, 2008 ). One could argue that the underlying relational structure is not given a priori but has to be constructed by the researcher and will itself be influenced by theoretical expectations. Therefore, abductive reasoning may be necessary to establish an empirical relational structure in the first place.

7 We shall not elaborate on the metaphysical meaning of possible worlds here, since we are only concerned with empirical theories [but see Tooley ( 1999 ), for an overview].

8 Of course, this also means that it would be equally reasonable to employ a top-down strategy of generalization using a case-based model by postulating that □(∃ i : XYZ i ). The implications for case-based models are certainly worth exploring, but lie beyond the scope of this article.

  • Agresti A. (2013). Categorical Data Analysis, 3rd Edn. Wiley Series In Probability And Statistics . Hoboken, NJ: Wiley. [ Google Scholar ]
  • Borsboom D. (2005). Measuring the Mind: Conceptual Issues in Contemporary Psychometrics . Cambridge: Cambridge University Press; 10.1017/CBO9780511490026 [ CrossRef ] [ Google Scholar ]
  • Braun V., Clarke V. (2006). Using thematic analysis in psychology . Qual. Res. Psychol . 3 , 77–101. 10.1191/1478088706qp063oa [ CrossRef ] [ Google Scholar ]
  • Buntins M., Buntins K., Eggert F. (2017). Clarifying the concept of validity: from measurement to everyday language . Theory Psychol. 27 , 703–710. 10.1177/0959354317702256 [ CrossRef ] [ Google Scholar ]
  • Carnap R. (1928). The Logical Structure of the World . Berkeley, CA: University of California Press. [ Google Scholar ]
  • Coombs C. H. (1964). A Theory of Data . New York, NY: Wiley. [ Google Scholar ]
  • Creswell J. W. (2015). A Concise Introduction to Mixed Methods Research . Los Angeles, CA: Sage. [ Google Scholar ]
  • Dugard P., File P., Todman J. B. (2012). Single-Case and Small-N Experimental Designs: A Practical Guide to Randomization Tests 2nd Edn . New York, NY: Routledge; 10.4324/9780203180938 [ CrossRef ] [ Google Scholar ]
  • Edgington E., Onghena P. (2007). Randomization Tests, 4th Edn. Statistics. Hoboken, NJ: CRC Press; 10.1201/9781420011814 [ CrossRef ] [ Google Scholar ]
  • Everett J. A. C., Earp B. D. (2015). A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers . Front. Psychol . 6 :1152. 10.3389/fpsyg.2015.01152 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Flick U. (Ed.). (2014). The Sage Handbook of Qualitative Data Analysis . London: Sage; 10.4135/9781446282243 [ CrossRef ] [ Google Scholar ]
  • Freeman M., Demarrais K., Preissle J., Roulston K., St. Pierre E. A. (2007). Standards of evidence in qualitative research: an incitement to discourse . Educ. Res. 36 , 25–32. 10.3102/0013189X06298009 [ CrossRef ] [ Google Scholar ]
  • Ganter B. (2010). Two basic algorithms in concept analysis , in Lecture Notes In Computer Science. Formal Concept Analysis, Vol. 5986 , eds Hutchison D., Kanade T., Kittler J., Kleinberg J. M., Mattern F., Mitchell J. C., et al. (Berlin, Heidelberg: Springer Berlin Heidelberg; ), 312–340. 10.1007/978-3-642-11928-6_22 [ CrossRef ] [ Google Scholar ]
  • Ganter B., Wille R. (1999). Formal Concept Analysis . Berlin, Heidelberg: Springer Berlin Heidelberg; 10.1007/978-3-642-59830-2 [ CrossRef ] [ Google Scholar ]
  • Guttman L. (1944). A basis for scaling qualitative data . Am. Sociol. Rev . 9 :139 10.2307/2086306 [ CrossRef ] [ Google Scholar ]
  • Hogg R. V., Mckean J. W., Craig A. T. (2013). Introduction to Mathematical Statistics, 7th Edn . Boston, MA: Pearson. [ Google Scholar ]
  • Hughes G. E., Cresswell M. J. (1996). A New Introduction To Modal Logic . London; New York, NY: Routledge; 10.4324/9780203290644 [ CrossRef ] [ Google Scholar ]
  • Klein R. A., Ratliff K. A., Vianello M., Adams R. B., Bahník Š., Bernstein M. J., et al. (2014). Investigating variation in replicability . Soc. Psychol. 45 , 142–152. 10.1027/1864-9335/a000178 [ CrossRef ] [ Google Scholar ]
  • Krantz D. H., Luce D., Suppes P., Tversky A. (1971). Foundations of Measurement Volume I: Additive And Polynomial Representations . New York, NY; London: Academic Press; 10.1016/B978-0-12-425401-5.50011-8 [ CrossRef ] [ Google Scholar ]
  • Leavy P. (2014). The Oxford Handbook of Qualitative Research . New York, NY: Oxford University Press; 10.1093/oxfordhb/9780199811755.001.0001 [ CrossRef ] [ Google Scholar ]
  • Maxwell S. E., Lau M. Y., Howard G. S. (2015). Is psychology suffering from a replication crisis? what does “failure to replicate” really mean? Am. Psychol. 70 , 487–498. 10.1037/a0039400 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Miles M. B., Huberman A. M., Saldaña J. (2014). Qualitative Data Analysis: A Methods Sourcebook, 3rd Edn . Los Angeles, CA; London; New Delhi; Singapore; Washington, DC: Sage. [ Google Scholar ]
  • Open Science, Collaboration (2015). Estimating the reproducibility of psychological science . Science 349 :Aac4716. 10.1126/science.aac4716 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Popper K. (1935). Logik Der Forschung . Wien: Springer; 10.1007/978-3-7091-4177-9 [ CrossRef ] [ Google Scholar ]
  • Ragin C. (1987). The Comparative Method : Moving Beyond Qualitative and Quantitative Strategies . Berkeley, CA: University Of California Press. [ Google Scholar ]
  • Rihoux B., Ragin C. (2009). Configurational Comparative Methods: Qualitative Comparative Analysis (Qca) And Related Techniques . Thousand Oaks, CA: Sage Publications, Inc; 10.4135/9781452226569 [ CrossRef ] [ Google Scholar ]
  • Scheunpflug A., Krogull S., Franz J. (2016). Understanding learning in world society: qualitative reconstructive research in global learning and learning for sustainability . Int. Journal Dev. Educ. Glob. Learn. 7 , 6–23. 10.18546/IJDEGL.07.3.02 [ CrossRef ] [ Google Scholar ]
  • Schurz G. (2008). Patterns of abduction . Synthese 164 , 201–234. 10.1007/s11229-007-9223-4 [ CrossRef ] [ Google Scholar ]
  • Shrout P. E., Rodgers J. L. (2018). Psychology, science, and knowledge construction: broadening perspectives from the replication crisis . Annu. Rev. Psychol . 69 , 487–510. 10.1146/annurev-psych-122216-011845 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith P. (2020). An Introduction To Formal Logic . Cambridge: Cambridge University Press. 10.1017/9781108328999 [ CrossRef ] [ Google Scholar ]
  • Suppes P., Krantz D. H., Luce D., Tversky A. (1971). Foundations of Measurement Volume II: Geometrical, Threshold, and Probabilistic Representations . New York, NY; London: Academic Press. [ Google Scholar ]
  • Tooley M. (Ed.). (1999). Necessity and Possibility. The Metaphysics of Modality . New York, NY; London: Garland Publishing. [ Google Scholar ]
  • Trafimow D. (2018). An a priori solution to the replication crisis . Philos. Psychol . 31 , 1188–1214. 10.1080/09515089.2018.1490707 [ CrossRef ] [ Google Scholar ]
  • Watanabe S. (2018). Mathematical Foundations of Bayesian Statistics. CRC Monographs On Statistics And Applied Probability . Boca Raton, FL: Chapman And Hall. [ Google Scholar ]
  • Wiggins B. J., Chrisopherson C. D. (2019). The replication crisis in psychology: an overview for theoretical and philosophical psychology . J. Theor. Philos. Psychol. 39 , 202–217. 10.1037/teo0000137 [ CrossRef ] [ Google Scholar ]


  1. Qualitative Research

    research articles using qualitative methods

  2. Qualitative Research Method: Meaning and Types

    research articles using qualitative methods

  3. PPT

    research articles using qualitative methods

  4. 6 Types of Qualitative Research Methods

    research articles using qualitative methods

  5. Qualitative Research: Definition, Types, Methods and Examples

    research articles using qualitative methods

  6. Qualitative Research: Definition, Types, Methods and Examples (2023)

    research articles using qualitative methods



  2. Qualitative Research Reporting Standards: How are qualitative articles different from quantitative?

  3. Understanding organisational culture using qualitative methods


  5. Types of Research

  6. Choosing the Best Research Topic


  1. Planning Qualitative Research: Design and Decision Making for New

    While many books and articles guide various qualitative research methods and analyses, there is currently no concise resource that explains and differentiates among the most common qualitative approaches. We believe novice qualitative researchers, students planning the design of a qualitative study or taking an introductory qualitative research course, and faculty teaching such courses can ...

  2. Qualitative Methods in Implementation Research: An Introduction

    Qualitative methods are a valuable tool in implementation research because they help to answer complex questions such as how and why efforts to implement best practices may succeed or fail, and how patients and providers experience and make decisions in care. This article orients the novice implementation scientist to fundamentals of ...

  3. How to use and assess qualitative research methods

    Abstract. This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions ...

  4. Qualitative Methods in Health Care Research

    Healthcare research is a systematic inquiry intended to generate trustworthy evidence about issues in the field of medicine and healthcare. The three principal approaches to health research are the quantitative, the qualitative, and the mixed methods approach. The quantitative research method uses data, which are measures of values and counts ...

  5. How to use and assess qualitative research methods

    Abstract. This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions ...

  6. Criteria for Good Qualitative Research: A Comprehensive Review

    This review aims to synthesize a published set of evaluative criteria for good qualitative research. The aim is to shed light on existing standards for assessing the rigor of qualitative research encompassing a range of epistemological and ontological standpoints. Using a systematic search strategy, published journal articles that deliberate criteria for rigorous research were identified. Then ...

  7. International Journal of Qualitative Methods: Sage Journals

    The International Journal of Qualitative Methods is the peer-reviewed interdisciplinary open access journal of the International Institute for Qualitative Methodology (IIQM) at the University of Alberta, Canada. The journal, established in 2002, is an eclectic international forum for insights, innovations and advances in methods and study designs using qualitative or mixed methods research.

  8. Qualitative Research: Sage Journals

    Qualitative Research is a peer-reviewed international journal that has been leading debates about qualitative methods for over 20 years. The journal provides a forum for the discussion and development of qualitative methods across disciplines, publishing high quality articles that contribute to the ways in which we think about and practice the craft of qualitative research.

  9. What Is Qualitative Research?

    Qualitative research methods. Each of the research approaches involve using one or more data collection methods.These are some of the most common qualitative methods: Observations: recording what you have seen, heard, or encountered in detailed field notes. Interviews: personally asking people questions in one-on-one conversations. Focus groups: asking questions and generating discussion among ...

  10. Full article: New Qualitative Methods and Critical Research Directions

    The Past: A Return to Qualitative Inquiry. This Special Issue of the Journal of Criminal Justice Education (JCJE) provides a platform for those interested in understanding, implementing, and developing new qualitative research designs from critical perspectives.We bring together eight articles from preeminent scholars who study crime and justice issues using innovative and insightful ...

  11. (PDF) Qualitative Research Methods: A Practice-Oriented Introduction

    The book examines questions such as why people do such research, how they go about doing it, what results it leads to, and how results can be presented in a plausible and useful way. Its ...

  12. Qualitative Study

    Qualitative research gathers participants' experiences, perceptions, and behavior. It answers the hows and whys instead of how many or how much. It could be structured as a stand-alone study, purely relying on qualitative data or it could be part of mixed-methods research that combines qualitative and quantitative data.

  13. Full article: Qualitative research in psychology: Attitudes of

    Qualitative methods have been present in psychology since its founding in 1879 (Wertz, Citation 2011), with seminal researchers such as William James and Sigmund Freud utilising qualitative approaches to form the basis of psychological knowledge (Willig & Stainton‐Rogers, Citation 2008).Despite its early use, and the acceptance of dual qualitative and quantitative research cultures within ...

  14. Qualitative research methods: when to use them and how to judge them

    As with research using quantitative methods, research using qualitative methods is home to the good, the bad and the ugly. It is essential that reviewers know the difference. Rejection letters are hard to take but more often than not they are based on legitimate critique. However, from time to time it is obvious that the reviewer has little ...

  15. What is Qualitative in Qualitative Research

    A fourth issue is that the "implicit use of methods in qualitative research makes the field far less standardized than the quantitative paradigm" (Goertz and Mahoney 2012:9). Relatedly, the National Science Foundation in the US organized two workshops in 2004 and 2005 to address the scientific foundations of qualitative research involving ...

  16. Qualitative research: its value and applicability

    Research conducted using qualitative methods is normally done with an intent to preserve the inherent complexities of human behaviour as opposed to assuming a reductive view of the subject in order to count and measure the occurrence of phenomena. Qualitative research normally takes an inductive approach, moving from observation to hypothesis ...

  17. Qualitative Research

    Qualitative Research. Qualitative research is a type of research methodology that focuses on exploring and understanding people's beliefs, attitudes, behaviors, and experiences through the collection and analysis of non-numerical data. It seeks to answer research questions through the examination of subjective data, such as interviews, focus ...

  18. Qualitative Methods Used to Generate Questionnaire Items: A Systematic

    The development and use of questionnaires are common in health research, and the use of qualitative methods to generate items enriches the quality of questionnaire items (McKenna et al., 2011).Content validity is a fundamental element of the questionnaire validity.

  19. Full article: Mental Health Risk Assessments of Patients, by Nurses

    Further qualitative, quantitative, or mixed-methods research would also be beneficial in illuminating this complex area of nursing practice. Limitations Internationally, mental health care has many similarities, such as the authority to involuntarily detain patients assessed as posing a credible risk (Georgieva et al., Citation 2019 ).

  20. Ethical challenges in global research on health system responses to

    We used the Network of Ethical Relationships model, framework method, and READ approach to analyse qualitative semi-structured interviews (n = 18) and policy documents (n = 27). In March-July 2021, we recruited a purposive sample of researchers and members of Research Ethics Committees (RECs) from the five partner countries.

  21. Young people's experiences of physical activity insecurity: a

    Overview. This paper drew on data from a larger project [] where a series of three interlinked qualitative focus groups were undertaken with six groups of young people who attended local community youth groups between February and June 2021.For the present study, we recruited a further group (December 2021) to ensure diversity in terms of gender and sexual orientation.

  22. Patients' satisfaction with heroin-assisted treatment: a qualitative

    Background Heroin-assisted treatment (HAT) involves supervised dispensing of medical heroin (diacetylmorphine) for people with opioid use disorder. Clinical evidence has demonstrated the effectiveness of HAT, but little is known about the self-reported satisfaction among the patients who receive this treatment. This study presents the first empirical findings about the patients' experiences ...

  23. A mixed methods analysis of the medication review intervention centered

    This research was embedded in the OPTICA trial [], a cluster randomized controlled trial in Swiss primary care practices conducted by an interdisciplinary and interprofessional team (e.g., GPs, epidemiologists, etc.).The main goal of this trial was to investigate whether the use of a structured medication review intervention centered around the use of an eCDSS, namely the 'Systematic Tool to ...

  24. Journal of Medical Internet Research

    Methods: This mixed methods study took place in Hanoi and Son La provinces. It aimed to analyses pre- and postintervention data from various sources including the NIIS; household and health facility surveys; and interviews to measure NIIS data quality, data use, and immunization program outcomes. ... Journal of Medical Internet Research 8230 ...

  25. Frontiers

    Within education --an institutional space dominated by White Supremacy, cis-heteropatriarchy, ableism and capitalism --emancipatory methods ask us to center the voices, wisdom, experiences and epistemologies of historically marginalized communities and to critique, resist and deconstruct systems of power that continue to cause undo harm, violence and marginalization to these groups. This ...

  26. Qualitative Study

    Qualitative research is a type of research that explores and provides deeper insights into real-world problems.[1] Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data. Qualitative research gathers participants ...

  27. Case Study Methodology of Qualitative Research: Key Attributes and

    The researchers first made use of qualitative analysis, using exploratory interviews with key informants to develop a few hypotheses. They, then, used the quantitative survey method to further test and confirm the relation between the variables of the hypotheses generated during the qualitative interviews (Yin, 2004, pp. 113-124).

  28. Artificial intelligence and illusions of understanding in scientific

    a, Scientists using AI tools for their research may experience an illusion of explanatory depth.In this example, a scientist uses an AI Quant to model a phenomenon (X) and believes they understand ...

  29. Frontiers

    Teaching in higher education is still mainly executed as lectures, even though research about student-active instruction methods points to more motivated students, higher enjoyment, and more optimal learning outcomes. The purpose of this study was to obtain better insight into how physical education (PE) students assessed their learning outcomes in relation to the use of different pedagogical ...

  30. Quantitative and Qualitative Approaches to Generalization and

    Qualitative and quantitative research strategies have long been treated as opposing paradigms. In recent years, there have been attempts to integrate both strategies. These "mixed methods" approaches treat qualitative and quantitative methodologies as complementary, rather than opposing, strategies (Creswell, 2015). However, whilst ...