Nested case-control studies

Affiliation.

  • 1 Department of Epidemiology and Biostatistics, School of Medicine, University of California, San Francisco 94143-0560.
  • PMID: 7845919
  • DOI: 10.1006/pmed.1994.1093

The nested case-control study design (or the case-control in a cohort study) is described here and compared with other designs, including the classic case-control and cohort studies and the case-cohort study. In the nested case-control study, cases of a disease that occur in a defined cohort are identified and, for each, a specified number of matched controls is selected from among those in the cohort who have not developed the disease by the time of disease occurrence in the case. For many research questions, the nested case-control design potentially offers impressive reductions in costs and efforts of data collection and analysis compared with the full cohort approach, with relatively minor loss in statistical efficiency. The nested case-control design is particularly advantageous for studies of biologic precursors of disease. To advance its prevention research agenda, NIH might be encouraged to maintain a registry of new and existing cohorts, with an inventory of data collected for each; to foster the development of specimen banks; and to serve as a clearinghouse for information about optimal storage conditions for various types of specimens.

  • Case-Control Studies*
  • Cohort Studies
  • Preventive Medicine
  • Subscriber Services
  • For Authors
  • Publications
  • Archaeology
  • Art & Architecture
  • Bilingual dictionaries
  • Classical studies
  • Encyclopedias
  • English Dictionaries and Thesauri
  • Language reference
  • Linguistics
  • Media studies
  • Medicine and health
  • Names studies
  • Performing arts
  • Science and technology
  • Social sciences
  • Society and culture
  • Overview Pages
  • Subject Reference
  • English Dictionaries
  • Bilingual Dictionaries

Recently viewed (0)

  • Save Search
  • Share This Facebook LinkedIn Twitter

Related Content

Related overviews.

case control study

cohort study

More Like This

Show all results sharing these subjects:

  • Public Health and Epidemiology

nested case control study

Quick reference.

A case control study that utilizes cases and control subjects already being studied for another purpose; often part of the larger population of a cohort study. The cases are those that arise in the larger population; the controls are other members of the same study population age- and sex-matched, but without the condition of interest. This is a nested portion of a larger group for all of whom some relevant information already exists.

From:   nested case control study   in  A Dictionary of Public Health »

Subjects: Medicine and health — Public Health and Epidemiology

Related content in Oxford Reference

Reference entries.

View all related items in Oxford Reference »

Search for: 'nested case control study' in Oxford Reference »

  • Oxford University Press

PRINTED FROM OXFORD REFERENCE (www.oxfordreference.com). (c) Copyright Oxford University Press, 2023. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single entry from a reference work in OR for personal use (for details see Privacy Policy and Legal Notice ).

date: 23 February 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • Accessibility
  • [66.249.64.20|185.80.150.64]
  • 185.80.150.64

Character limit 500 /500

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Nested case-control...

Nested case-control studies: advantages and disadvantages

  • Related content
  • Peer review
  • Philip Sedgwick , reader in medical statistics and medical education 1
  • 1 Centre for Medical and Healthcare Education, St George’s, University of London, London, UK
  • p.sedgwick{at}sgul.ac.uk

Researchers investigated whether antipsychotic drugs were associated with venous thromboembolism. A population based nested case-control study design was used. Data were taken from the UK QResearch primary care database consisting of 7 267 673 patients. Cases were adult patients with a first ever record of venous thromboembolism between 1 January 1996 and 1 July 2007. For each case, up to four controls were identified, matched by age, calendar time, sex, and practice. Exposure to antipsychotic drugs was assessed on the basis of prescriptions on, or during the 24 months before, the index date. 1

There were 25 532 eligible cases (15 975 with deep vein thrombosis and 9557 with pulmonary embolism) and 89 491 matched controls. The primary outcome was the odds ratios for venous thromboembolism associated with antipsychotic drugs adjusted for comorbidity and concomitant drug exposure. When adjusted using logistic regression to control for potential confounding, prescription of antipsychotic drugs in the previous 24 months was significantly associated with an increased occurrence of venous thromboembolism compared with non-use (odds ratio 1.32, 95% confidence interval 1.23 to 1.42). The researchers concluded that prescription of antipsychotic drugs was associated with venous thromboembolism in a large primary care population.

Which of the following statements, if any, are true?

a) The nested case-control study is a retrospective design

b) The study design minimised selection bias compared with a case-control study

c) Recall bias was minimised compared with a case-control study

d) Causality could be inferred from the association between prescription of antipsychotic drugs and venous thromboembolism

Statements a , b , and c are true, whereas d is false.

The aim of the study was to investigate whether prescription of antipsychotic drugs was associated with venous thromboembolism. A nested case-control study design was used. The study design was an observational one that incorporated the concept of the traditional case-control study within an established cohort. This design overcomes some of the disadvantages associated with case-control studies, 2 while incorporating some of the advantages of cohort studies. 3 4

Data for the study above were extracted from the UK QResearch primary care database, a computerised register of anonymised longitudinal medical records for patients registered at more than 500 UK general practices. Patient data were recorded prospectively, the database having been updated regularly as patients visited their GP. Cases were all adult patients in the register with a first ever record of venous thromboembolism between 1 January 1996 and 1 July 2007. There were 25 532 cases in total. For each case, up to four controls were identified from the register, matched by age, calendar time, sex, and practice. In total, 89 491 matched controls were obtained. Data relating to prescriptions for antipsychotic drugs on, or during the 24 months before, the index date were extracted for the cases and controls. The index date was the date in the register when venous thromboembolism was recorded for the case. The cases and controls were compared to ascertain whether exposure to prescription of antipsychotic drugs was more common in one group than in the other. Despite the data for the cases and controls being collected prospectively, the nested case-control study is described as retrospective ( a is true) because it involved looking back at events that had already taken place and been recorded in the register.

Selection bias is of particular concern in the traditional case-control study. Described in a previous question, 5 selection bias is the systematic difference between the study participants and the population they are meant to represent with respect to their characteristics, including demographics and morbidity. Cases and controls are often selected through convenience sampling. Cases are typically recruited from hospitals or general practices because they are convenient and easily accessible to researchers. Controls are often recruited from the same hospital clinics or general practices as the cases. Therefore, the selected cases may not be representative of the population of all cases. Equally, the controls might not be representative of otherwise healthy members of the population. The above nested case-control study was population based, with the QResearch primary care database incorporating a large proportion of the UK population. The cases and controls were selected from the database and therefore should be more representative of the population than those in a traditional case-control study. Hence, selection bias was minimised by using the nested case-control study design ( b is true).

The traditional case-control study involves participants recalling information about past exposure to risk factors after identification as a case or control. The study design is prone to recall bias, as described in a previous question. 6 Recall bias is the systematic difference between cases and controls in the accuracy of information recalled. Recall bias will exist if participants have selective preconceptions about the association between the disease and past exposure to the risk factor(s). Cases may, for example, recall information more accurately than controls, possibly because of an association with the disease or outcome. Although in the study above the cases and controls were identified retrospectively, the data for the QResearch primary care database were collected prospectively. Therefore, there was no reason for any systematic differences between groups of study participants in the accuracy of the information collected. Therefore, recall bias was minimised compared with a traditional case-control study ( c is true).

Not all of the patient records in the UK QResearch primary care database were used to explore the association between prescription of antipsychotic drugs and development of venous thromboembolism. A nested case-control study was used instead, with cases and controls matched on age, calendar time, sex, and practice. This was because it was statistically more efficient to control for the effects of age, calendar time, sex, and practice by matching cases and controls on these variables at the design stage, rather than controlling for their potential confounding effects when the data were analysed. The matching variables were considered to be important factors that could potentially confound the association between prescription of antipsychotic drugs and venous thromboembolism, but they were not of interest as potential risk factors in themselves. Matching in case-control studies has been described in a previous question. 7

Unlike a traditional case-control study, the data in the example above were recorded prospectively. Therefore, it was possible to determine whether prescription of antipsychotic drugs preceded the occurrence of venous thromboembolism. Nonetheless, only association, and not causation, can be inferred from the results of the above nested case-control study ( d is false)—that is, those people who were exposed to prescribed antipsychotic drugs were more likely to have developed venous thromboembolism. This is because the observed association between prescribed antipsychotic drugs and occurrence of venous thromboembolism may have been due to confounding. In particular, it was not possible to measure and then control for, through statistical analysis, all factors that may have affected the occurrence of venous thromboembolism.

The example above is typical of a nested case-control study; the health records for a group of patients that have already been collected and stored in an electronic database are used to explore the association between one or more risk factors and a disease or condition. The management of such databases means it is possible for a variety of studies to be undertaken, each investigating the risk factors associated with different diseases or outcomes. Nested case-control studies are therefore relatively inexpensive to perform. However, the major disadvantage of nested case-control studies is that not all pertinent risk factors are likely to have been recorded. Furthermore, because many different healthcare professionals will be involved in patient care, risk factors and outcome(s) will probably not have been measured with the same accuracy and consistency throughout. It may also be problematic if the diagnosis of the disease or outcome changes with time.

Cite this as: BMJ 2014;348:g1532

Competing interests: None declared.

  • ↵ Parker C, Coupland C, Hippisley-Cox J. Antipsychotic drugs and risk of venous thromboembolism: nested case-control study. BMJ 2010 ; 341 : c4245 . OpenUrl Abstract / FREE Full Text
  • ↵ Sedgwick P. Case-control studies: advantages and disadvantages. BMJ 2014 ; 348 : f7707 . OpenUrl CrossRef
  • ↵ Sedgwick P. Prospective cohort studies: advantages and disadvantages. BMJ 2013 ; 347 : f6726 . OpenUrl FREE Full Text
  • ↵ Sedgwick P. Retrospective cohort studies: advantages and disadvantages. BMJ 2014 ; 348 : g1072 . OpenUrl FREE Full Text
  • ↵ Sedgwick P. Selection bias versus allocation bias. BMJ 2013 ; 346 : f3345 . OpenUrl FREE Full Text
  • ↵ Sedgwick P. What is recall bias? BMJ 2012 ; 344 : e3519 . OpenUrl FREE Full Text
  • ↵ Sedgwick P. Why match in case-control studies? BMJ 2012 ; 344 : e691 . OpenUrl FREE Full Text

definition of nested case control study

definition of nested case control study

EP717 Module 5 - Epidemiologic Study Designs – Part 2:

Case-control studies.

  •   Page:
  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  

On This Page sidebar

A Nested Case-Control Study

Interpretation of the odds ratio, test yourself, recap of case-control design.

Learn More sidebar

Now consider a hypothetical prospective cohort study among 89,949 women in whom the investigators took blood samples and froze them at baseline for possible future use. After following the cohort for 12 years the investigators wanted to investigate a possible association between the pesticide DDT and breast cancer. Since they had frozen blood samples collected at baseline, they had the option of having the samples tested for DDT levels. If they had done this, the table below shows what they would have found.

If they had had this data, they could have calculated the risk ratio:

RR = (360/13,636) / (1,079/76,313) = 1.87

However, the cost of analyzing each sample for DDT was $20, and to analyze all of them would have cost close to $1.8 million. So, like the previous study, the exposure data was very costly.

Although this was a prospective cohort study, we could regard the cohort as a source population and conduct a case-control study drawing samples from the cohort . We could, for example, analyze the blood samples on all of the women who had developed breast cancer during the 12 year follow up and on 2,878 randomly selected samples from the women without breast cancer (i.e., twice as many controls as cases). This would be described as a nested case-control study , i.e., nested within a cohort study.

The results might have looked like this:

Odds Ratio = (a/c) / (b/d) = (360/1,079) / (432/2,446)

= 1.89 during the 12 year follow up study

So, they could achieve an odds ratio that is very close to what the risk ratio would have been at a much lower cost: (1,439+2,878) x $20 = $86,340.

The odds ratio is a legitimate measure of association, and, when the outcome of interest is uncommon, it provides a good estimate of what the risk ratio would have been if a cohort study had been possible. When looking at increasingly common outcomes, the odds ratio gives estimates that are more extreme than the risk ratio, i.e., further away from the null value. 

Not surprisingly, the interpretation of an odds is therefore similar to the interpretation of a risk ratio.

  • The null value (no difference) is 1.0.
  • Odds ratios > 1 suggest an increase in risk
  • Odds ratios < 1 suggest a decrease in risk

The odds ratio above would be interpreted as follows:

"Women with high DDT blood levels at baseline had 1.89 times the odds of developing breast cancer compared to women with low blood levels of DDT during the 12 year observation period."

Calculate the odds ratio for the association between playing video games and development of hypertension. Interpret the odds ratio you calculate in a sentence. See if you can do both of these correctly before looking at the answer.

return to top | previous page | next page

Content ©2021. All Rights Reserved. Date last modified: April 21, 2021. Wayne W. LaMorte, MD, PhD, MPH

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

Keyboard Shortcuts

7.2 - advanced case-control designs, nested case-control study:.

This is a case-control study within a cohort study. At the beginning of the cohort study \((t_0)\), members of the cohort are assessed for risk factors. Cases and controls are identified subsequently at time \(t_1\). The control group is selected from the risk set (cohort members who do not meet the case definition at \(t_1\).) Typically, the nested case-control study is less than 20% of the parent cohort.

Advantages of nested case-control

Disadvantages

Nested case-control studies can be matched , not matched , or counter-matched.

Matching cases to controls according to baseline measurements of one or several confounding variables is done to control for the effect from confounding variables. A counter-matched study, in contrast, is when we matched cases to controls who have a different baseline risk factor exposure level. The counter-matched study design is used to specifically assess the impact of this risk factor; it is especially good for assessing the potential interaction (effect modification!) of the secondary risk factor and the primary risk factor. Counter-matched controls are randomly selected from different strata of risk factor exposure levels in order to maximize variation in risk exposures among the controls. For example, in a study of the risk for bladder cancer from alcohol consumption, you might match cases to controls who smoke different amounts to see if the effect of smoking is only evident at a minimum level of exposure.

Example of a Nested Case-Control Study: Familial, psychiatric, and socioeconomic risk factors for suicide in young people: a nested case-control study . In a cohort study of risk factors for suicide, Agerbo et al. (2002), enrolled 496 young people who had committed suicide during 1981-97 in Denmark matched for sex, age, and time to 24,800 controls. Read how they matched each case to a representative random subsample of 50 people born the same year!

Advantages of the nested case-control design in diagnostic research

BMC Medical Research Methodology volume  8 , Article number:  48 ( 2008 ) Cite this article

49k Accesses

87 Citations

2 Altmetric

Metrics details

Despite its benefits, it is uncommon to apply the nested case-control design in diagnostic research. We aim to show advantages of this design for diagnostic accuracy studies.

We used data from a full cross-sectional diagnostic study comprising a cohort of 1295 consecutive patients who were selected on their suspicion of having deep vein thrombosis (DVT). We draw nested case-control samples from the full study population with case:control ratios of 1:1, 1:2, 1:3 and 1:4 (per ratio 100 samples were taken). We calculated diagnostic accuracy estimates for two tests that are used to detect DVT in clinical practice.

Estimates of diagnostic accuracy in the nested case-control samples were very similar to those in the full study population. For example, for each case:control ratio, the positive predictive value of the D-dimer test was 0.30 in the full study population and 0.30 in the nested case-control samples (median of the 100 samples). As expected, variability of the estimates decreased with increasing sample size.

Our findings support the view that the nested case-control study is a valid and efficient design for diagnostic studies and should also be (re)appraised in current guidelines on diagnostic accuracy research.

Peer Review reports

In diagnostic research it is essential to determine the accuracy of a test to evaluate its value for medical practice [ 1 ]. Diagnostic test accuracy is assessed by comparing the results of the index test with the results of the reference standard in the same patients. Given the cross-sectional nature of a diagnostic accuracy question, the design may be referred to as a cross-sectional cohort design. The (cohort) characteristic by which the study subjects (cohort members) are selected is 'the suspicion of the target disease', defined by the presence of particular symptoms or signs [ 2 ]. The collected study data allow for calculation of all diagnostic accuracy parameters of the index test, such as sensitivity, specificity, odds ratio, receiver operating characteristic (ROC) curve and predictive values, i.e. the probabilities of presence and absence of the disease given the index test result(s).

Subjects are not always selected on their initial suspicion of having the disease but often on the true presence or absence of the disease among those who underwent the reference test in routine care practice, which merely reflects a cross-sectional case-control design [ 3 , 4 ]. Appraisal of such conventional case-control design in diagnostic accuracy research has been limited due to its problems related to the incorrect sampling of cases and controls [ 3 – 7 ]. These problems may be overcome by applying a nested (cross-sectional) case-control study design, which may be advantageous over a full (cross-sectional) cohort design. The rationale, strengths and limitations of a nested case-control approach in epidemiology studies have widely been discussed in the literature [ 8 – 11 ], but not so much in the context of diagnostic accuracy research [ 6 ].

We therefore aim to show advantages of the nested case-control design for addressing diagnostic accuracy questions and discuss its pros and cons in relation to a conventional case-control design and to the full (cross sectional) cohort design in this domain. We will illustrate this with data from a recently conducted diagnostic accuracy study.

Case-control versus nested case-control design

The essence of a case-control study is that cases with the condition under study arise in a source population and controls are a representative sample of this same source population. Not the entire population is studied, what would be a full cohort study or census approach, but rather a random sample from the source population [ 12 ]. A major flaw inherent to case-control studies, described as early as 1959 [ 13 ], is the difficulty to ensure that cases and controls are a representative sample of the same source population. In a nested case-control study the cases emerge from a well-defined source population and the controls are sampled from that same population. The main difference between a case-control and a nested case-control study is that in the former the cases and controls are sampled from a source population with unknown size, whereas the latter is 'nested' in an existing predefined source population with known sample size. This source population can be a group or cohort of subjects that is followed over time or not.

The term 'cohort' is commonly referred to a group of subjects followed over time in etiologic or prognostic research. But in essence, time is no prerequisite for the definition of a cohort. A cohort is a group of subjects that is defined by the same characteristic. This characteristic can be a particular birth year, a particular living area, and also the presence of a particular sign or symptom that makes them suspected of having a particular disease as in diagnostic research. Accordingly, a cross-sectional study can either be a cross-sectional case-control study or a cross-sectional cohort study.

Case-control and nested case-control design in diagnostic accuracy research

In diagnostic accuracy research the case-control design is incorrectly applied when subjects are selected from routine care databases. First, this design commonly leads to biased estimates of diagnostic accuracy of the index test due to referral or (partial) verification bias [ 4 , 14 – 18 ]. In routine care, physicians selectively refer patients for additional tests, including the reference test, based on previous test results. This is good clinical practice but a bad starting point for diagnostic research. As said, for diagnostic research purposes all subjects suspected of the target disease preferably undergo the index test(s) plus reference test irrespective of previous test results. Second, selection of patients with a negative reference test result as 'controls' may lead to inclusion of controls that correspond to a different clinical domain, i.e. patients who underwent the reference test but not necessarily because they were similarly suspected of the target condition [ 16 , 17 ]. A third disadvantage of such case-control design is that absolute probabilities of disease presence given the index test results, i.e. the predictive values or post-test probabilities, that are the desired parameters for patient care, cannot be obtained. Cases and controls are sampled from a source population of unknown size. The total number of patients that were initially suspected of the target disease based on the presence of symptoms or signs, i.e. the true source population, is commonly unknown as in routine care patients are hardly classified by their symptoms and signs at presentation [ 18 ]. Hence, the sampling fraction of cases and controls is unknown and valid estimates of the absolute probabilities of disease presence cannot be calculated [ 12 ].

A nested case-control study in diagnostic research includes the full population or cohort of patients suspected of the target disease. The 'true' disease status is obtained for all these patients with the reference standard. Hence, there is no referral or partial verification bias. The results of the index tests can then be obtained for all subjects with the target condition but only for a sample of the subjects without the target condition. Usually all patients with the target disease are included, but this could as well be a sample of the cases. Besides the absence of bias, all measures of diagnostic accuracy, including the positive and negative predictive values, can simply be obtained by weighing the controls with the case-control sampling fraction, as explained in Figure 1 .

figure 1

Theoretical example of a full study population and a nested case-control sample . The index test result and the outcome are obtained for all patients of the study population. The case-control ratio was 1:4 (sampling fraction (SF) = 160/400 = 0.40). Valid diagnostic accuracy measures can be obtained from the nested case-control sample, by multiplying the controls with 1/sampling fraction. For example, the positive predictive value (PPV) of a full study population can be calculated with a/(a + b), in this example 30/(30 + 100) = 0.23. In a nested case-control sample the PPV is calculated with a/(a + (1/SF)*b), in this example: 30/(30 + 2.5*40) = 0.23. In a case-control sample however, the controls are sampled from a source population with unknown size. Therefore, the sample fraction is unknown and valid estimate of the PPV cannot be calculated.

Potential advantages of a nested case-control design in diagnostic research

The nested case-control study design can be advantageous over a full cross-sectional cohort design when actual disease prevalence in subjects suspected of a target condition is low, the index test is costly to perform, or if the index test is invasive and may lead to side effects. Under these conditions, one limits patient burden and saves time and money as the index test is performed in only a sample of the control subjects.

Furthermore, the nested case-control design is of particular value when stored data (serum, images etc.) of an existing study population are re-analysed for diagnostic research purposes. Using a nested case-control design, only data of a sample of the full study population need to be retrieved and analysed without having to perform a new diagnostic study from the start. This may for example apply to evaluation of tumour markers to detect cancer, but also for imaging or electrophysiology tests.

Diagnostic accuracy estimates derived from a nested case-control study, should be virtually identical to a full cohort analysis. However, the variability of the accuracy estimates will increase with decreasing sample size. We illustrate this with data of a diagnostic study on a cohort of patients who were suspected of DVT.

A cross-sectional study was performed among a cohort of adult patients suspected of deep vein thrombosis (DVT) in primary care. This suspicion was primarily defined by the presence of a painful and swollen or red leg that existed no longer than 30 days. Details on the setting, data collection and main results have been described previously. [ 19 , 20 ] In brief, the full study population included 1295 consecutive patients who visited one of the participating primary care physicians with above symptoms and signs of DVT. Patients were excluded if pulmonary embolism was suspected. The general practitioner systematically documented information on patient history and physical examination. Patient history included information such as age, gender, history of malignancy, and recent surgery. Physical examination included swelling of the affected limb and difference in circumference of the calves calculated as the circumference (in centimetres) of affected limb minus circumference of unaffected limb, further referred to as calf difference test. Subsequently, all patients were referred to undergo D-dimer testing. In line with available guidelines and previous studies, the D-dimer test result was considered abnormal if the test yielded a D-dimer level ≥ 500 ng/ml. [ 21 , 22 ] Finally, they all underwent the reference test, i.e. repeated compression ultrasonography (CUS) of the lower extremities. In patients with a normal first CUS measurement, the CUS was repeated after seven days. DVT was considered present if one CUS measurement was abnormal. The echographist was blinded to the results of patient history, physical examination, and the D-dimer assay.

Nested case-control samples

Nested case-control samples were drawn from the full study population (n = 1295). In all samples, we included always all 289 cases with DVT. Controls were randomly sampled from the 1006 subjects without DVT. We applied four different and frequently used case-control ratios, i.e. one control for each case (1:1), two controls for each case (1:2), three controls for each case (1:3) and four controls for each case (1:4). For example, a sample with case-control ratio of 1:1 contained 289 cases and 289 random subjects out of 1006 controls (sampling fraction 289/1006 = 0.287). In the 1:4 approach, we sampled with replacement. For each case-control ratio, 100 nested case-control samples were drawn.

Statistical analysis

We focussed on two important diagnostic tests for DVT, i.e. the dichotomous D-dimer test and the continuous calf difference test. The latter was specifically chosen as it allowed for the estimation and thus comparison of the area under the ROC curve (ROC area). Diagnostic accuracy measures of both tests were estimated for the four case-control ratios and compared with those obtained from the full study population. Measures of diagnostic accuracy included sensitivity and specificity, positive and negative predictive values and the odds ratio (OR) for the D-dimer test, and the OR and the ROC area for the calf difference test.

In the analysis of the nested case-control samples, we multiplied control samples by [1/sample fraction] corresponding to the case-control ratio (1:1 = 3.48; 1:2 = 1.74; 1:3 = 1.16; 1:4 = 0.87). For each case-control ratio, the point estimates and variability were determined. The median estimate of the 100 samples was considered as the point estimate. Analyses were performed using SPSS version 12.0 and S-plus version 6.0.

In the full study population, the prevalence of DVT was 22% (n = 289), the D-dimer test was abnormal in 69% of the patients (n = 892) and the mean difference in calf circumference was 2.3 cm (Table 1 ). The prevalence of DVT was 50%, 33%, 25% and 20% in the nested case-control samples as a result of the sampling ratios (1:1, 1:2, 1:3 and 1:4, respectively). The distributions of the test characteristics in the control samples were similar as for the patients from the full study population without DVT (Table 1 ).

In the full study population the sensitivity and negative predictive value were high for the D-dimer test, 0.94 and 0.96, respectively (Table 2 ), whereas the specificity and positive predictive value were relatively low. The OR for the calf difference test was 1.44 and the ROC area was 0.69.

The average estimates of diagnostic accuracy for each of the four case-control ratios were similar to the corresponding estimates of the full study population (Figure 2 ). For example, the negative predictive value of the D-dimer test was 0.955 in both the full study population and for the four case-control ratios. The OR of the calf difference test was 1.44 in the full study population and the OR derived from the nested case-control samples were on average also 1.44.

figure 2

Estimates of diagnostic accuracy of the D-dimer test and calf difference test for the 100 nested case-control samples with case-control ratios ranging from 1:1 to 1:4 . The boxes indicate mean values and corresponding interquartile ranges (25 th and 75 th percentile). Whiskers indicate 2.5 th and 97.5 th percentiles. The dotted lines represent the values estimated in the full study population.

The use of (conventional) case-control studies in diagnostic research has often been associated with biased estimates of diagnostic accuracy, due to the incorrect sampling of subjects [ 3 – 6 , 18 ]. Moreover, this study design does not allow for the estimation of the desired absolute disease probabilities. We discussed and showed that a case-control study nested within a well defined cohort of subjects suspected of a particular target disease with known sample size can yield valid estimates of diagnostic accuracy of an index test, including the absolute probabilities of disease presence or absence. Diagnostic accuracy parameters derived from a full (cross-sectional) cohort of patients suspected of DVT were similar to the estimates derived from various nested case-control samples averaged over 100 simulations. Expectedly, the variability decreased with increasing number of controls, making the measures estimated in the larger case-control samples more precise.

As discussed, the number of subjects from which the index test results need to be retrieved can substantially be reduced with a nested case-control design. Hence, the nested case-control design is particularly advantageous when the prevalence of the target condition in the cohort of patients suspected of the target disease is rare, when the index test results are costly or difficult to collect and for re-analysing stored images or specimen. However, precision of the diagnostic accuracy measures will be hampered by increased variability when too little control patients are included.

Rutjes et al nicely discussed limitations of different study designs in diagnostic research [ 6 ]. They proposed the 'two-gate design with representative sampling' (which resembles the nested case-control design in this paper) as a valid design. We confirmed their proposition with a quantitative analysis of a diagnostic study. Rutjes et al suggested not to use the term 'nested case-control' to prevent confusion with etiologic studies where this design is commonly applied. Indeed, diagnostic and etiologic research differs fundamentally, first and foremost on the concept of time. Diagnostic accuracy studies are, in contrast to etiologic studies, typically cross-sectional in nature. Furthermore, diagnostic associations between index and reference tests are purely descriptive, whereas in etiologic studies causal associations and potential confounding are involved. Despite these major differences we believe there is no reason not to use the term nested case-control study in diagnostic research as well. The term inherently refers to the method of sampling of study subjects which can be the same in a diagnostic or etiologic setting, and has no direct bearing on the other issues typically related to etiologic case control studies.

Our findings support the view that the nested case-control study is a valid and efficient design for diagnostic studies. We believe that the nested case-control approach should be applied more often in diagnostic research, and also be (re)appraised in current guidelines on diagnostic methodology.

Knottnerus JA, van Weel C, Muris JW: Evaluation of diagnostic procedures. BMJ. 2002, 324 (7335): 477-480. 10.1136/bmj.324.7335.477.

Article   PubMed   PubMed Central   Google Scholar  

Knottnerus JA, Muris JW: Assessment of the accuracy of diagnostic tests: the cross-sectional study. J Clin Epidemiol. 2003, 56 (11): 1118-1128. 10.1016/S0895-4356(03)00206-3.

Article   CAS   PubMed   Google Scholar  

Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, Meulen van der JHP, Bossuyt PMM: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999, 282: 1061-1066. 10.1001/jama.282.11.1061.

Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM: Evidence of bias and variation in diagnostic accuracy studies. CMAJ. 2006, 174 (4): 469-476.

Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J: Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004, 140 (3): 189-202.

Article   PubMed   Google Scholar  

Rutjes AW, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PM: Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem. 2005, 51 (8): 1335-1341. 10.1373/clinchem.2005.048595.

Kraemer H: Evaluating Medical Tests. 1992, London, UK , Sage Publications

Google Scholar  

Mantel N: Synthetic retrospective studies and related topics. Biometrics. 1973, 29 (3): 479-486. 10.2307/2529171.

Essebag V, Genest J, Suissa S, Pilote L: The nested case-control study in cardiology. Am Heart J. 2003, 146 (4): 581-590. 10.1016/S0002-8703(03)00512-X.

Ernster VL: Nested case-control studies. Prev Med. 1994, 23 (5): 587-590. 10.1006/pmed.1994.1093.

Langholz B: Case-Control Study, Nested. Encyclopedia of Biostatistics. Edited by: Armitage PCT. 2005, New York , John Wiley & Sons, 646-665. 2nd

Rothman KJ, Greenland S: Modern epidemiology. 1998, Philadelphia , Lincot-Raven Publishers, Second

Mantel N, Haenszel W: Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959, 22 (4): 719-748.

CAS   PubMed   Google Scholar  

Ransohoff DF, Feinstein AR: Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med. 1978, 299 (17): 926-930.

Begg CB, Greenes RA: Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics. 1983, 39: 297-215. 10.2307/2530820.

Article   Google Scholar  

Knottnerus JA, Leffers JP: The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol. 1992, 45: 1143-1154. 10.1016/0895-4356(92)90155-G.

van der Schouw YT, van Dijk R, Verbeek ALM: Problems in selecting the adequate patient population from existing data files for assessment studies of new diagnostic tests. J Clin Epidemiol. 1995, 48: 417-422. 10.1016/0895-4356(94)00144-F.

Oostenbrink R, Moons KG, Bleeker SE, Moll HA, Grobbee DE: Diagnostic research on routine care data: prospects and problems. J Clin Epidemiol. 2003, 56 (6): 501-506. 10.1016/S0895-4356(03)00080-5.

Oudega R, Hoes AW, Moons KG: The Wells rule does not adequately rule out deep venous thrombosis in primary care patients. Ann Intern Med. 2005, 143 (2): 100-107.

Oudega R, Moons KG, Hoes AW: Limited value of patient history and physical examination in diagnosing deep vein thrombosis in primary care. Fam Pract. 2005, 22 (1): 86-91. 10.1093/fampra/cmh718.

Perrier A, Desmarais S, Miron M, de Moerloose P, Lepage R, Slosman D, Didier D, Unger P, Patenaude J, Bounameaux H: Non-invasive diagnosis of venous thromboembolism in outpatients. Lancet. 1999, 353: 190-195. 10.1016/S0140-6736(98)05248-9.

Schutgens RE, Ackermark P, Haas FJ, Nieuwenhuis HK, Peltenburg HG, Pijlman AH, Pruijm M, Oltmans R, Kelder JC, Biesma DH: Combination of a normal D-dimer concentration and a non-high pretest clinical probability score is a safe strategy to exclude deep venous thrombosis. Circulation. 2003, 107 (4): 593-597. 10.1161/01.CIR.0000045670.12988.1E.

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/8/48/prepub

Download references

Acknowledgements

For this research project we received financial support from the Netherlands Organization for Scientific Research, grant number: ZON-MW904-66-112. The funding source had no influence on the design, data analysis and report of this study.

Author information

Authors and affiliations.

Julius Center for Health Sciences and Primary Care, University Medical Center, Utrecht, The Netherlands

Cornelis J Biesheuvel, Yvonne Vergouwe, Ruud Oudega, Arno W Hoes, Diederick E Grobbee & Karel GM Moons

The Children's Hospital at Westmead, Sydney, Australia

Cornelis J Biesheuvel

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Karel GM Moons .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors' contributions

All authors commented on the draft and the interpretation of the findings, read and approved the final manuscript. CJB was responsible for the design, statistical analysis and wrote the original manuscript. YV was responsible for the design and statistical analysis. RO was responsible for the data collection. AWH was responsible for expertise in case-control design. DEG and KGMM were responsible for conception and design of the study and coordination.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2, rights and permissions.

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Biesheuvel, C.J., Vergouwe, Y., Oudega, R. et al. Advantages of the nested case-control design in diagnostic research. BMC Med Res Methodol 8 , 48 (2008). https://doi.org/10.1186/1471-2288-8-48

Download citation

Received : 07 March 2008

Accepted : 21 July 2008

Published : 21 July 2008

DOI : https://doi.org/10.1186/1471-2288-8-48

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

BMC Medical Research Methodology

ISSN: 1471-2288

definition of nested case control study

Book cover

Encyclopedia of Behavioral Medicine pp 1473 Cite as

Nested Study

  • J. Rick Turner 2  
  • Reference work entry
  • First Online: 20 October 2020

Nested case-control study

A nested case-control study is one that is “nested” within a cohort study.

In many cohort studies, all subjects provide a wide range of information at the time of recruitment, e.g., results from a physical examination, answers to multiple questionnaires, blood and urine samples, and results from imaging techniques. Because of the large numbers of subjects in these studies and the cost of analyzing some biological samples, some of these resources are often not analyzed in detail at the time of collection, but are stored for future use. The nested case-control study is performed using subjects who develop the disease of interest in due course, and control subjects who are selected from those who were disease-free at the time the case subjects (those who developed the disease) were diagnosed.

The appropriate data sets and samples are then retrieved and analyzed for these two subsets (cases and controls) of the original cohort recruited into...

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

References and Further Readings

Webb, P., Bain, C., & Pirozzo, S. (2005). Essential epidemiology: An introduction for students and health professionals . Cambridge: Cambridge University Press.

Google Scholar  

Download references

Author information

Authors and affiliations.

Campbell University College of Pharmacy and Health Sciences, Buies Creek, NC, USA

J. Rick Turner

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to J. Rick Turner .

Editor information

Editors and affiliations.

Behavioral Medicine Research Center, Department of Psychology, University of Miami, Miami, FL, USA

Marc D. Gellman

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this entry

Cite this entry.

Turner, J.R. (2020). Nested Study. In: Gellman, M.D. (eds) Encyclopedia of Behavioral Medicine. Springer, Cham. https://doi.org/10.1007/978-3-030-39903-0_1046

Download citation

DOI : https://doi.org/10.1007/978-3-030-39903-0_1046

Published : 20 October 2020

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-39901-6

Online ISBN : 978-3-030-39903-0

eBook Packages : Medicine Reference Module Medicine

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Case-Control Study? | Definition & Examples

What Is a Case-Control Study? | Definition & Examples

Published on February 4, 2023 by Tegan George . Revised on June 22, 2023.

A case-control study is an experimental design that compares a group of participants possessing a condition of interest to a very similar group lacking that condition. Here, the participants possessing the attribute of study, such as a disease, are called the “case,” and those without it are the “control.”

It’s important to remember that the case group is chosen because they already possess the attribute of interest. The point of the control group is to facilitate investigation, e.g., studying whether the case group systematically exhibits that attribute more than the control group does.

Table of contents

When to use a case-control study, examples of case-control studies, advantages and disadvantages of case-control studies, other interesting articles, frequently asked questions.

Case-control studies are a type of observational study often used in fields like medical research, environmental health, or epidemiology. While most observational studies are qualitative in nature, case-control studies can also be quantitative , and they often are in healthcare settings. Case-control studies can be used for both exploratory and explanatory research , and they are a good choice for studying research topics like disease exposure and health outcomes.

A case-control study may be a good fit for your research if it meets the following criteria.

  • Data on exposure (e.g., to a chemical or a pesticide) are difficult to obtain or expensive.
  • The disease associated with the exposure you’re studying has a long incubation period or is rare or under-studied (e.g., AIDS in the early 1980s).
  • The population you are studying is difficult to contact for follow-up questions (e.g., asylum seekers).

Retrospective cohort studies use existing secondary research data, such as medical records or databases, to identify a group of people with a common exposure or risk factor and to observe their outcomes over time. Case-control studies conduct primary research , comparing a group of participants possessing a condition of interest to a very similar group lacking that condition in real time.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

definition of nested case control study

Case-control studies are common in fields like epidemiology, healthcare, and psychology.

You would then collect data on your participants’ exposure to contaminated drinking water, focusing on variables such as the source of said water and the duration of exposure, for both groups. You could then compare the two to determine if there is a relationship between drinking water contamination and the risk of developing a gastrointestinal illness. Example: Healthcare case-control study You are interested in the relationship between the dietary intake of a particular vitamin (e.g., vitamin D) and the risk of developing osteoporosis later in life. Here, the case group would be individuals who have been diagnosed with osteoporosis, while the control group would be individuals without osteoporosis.

You would then collect information on dietary intake of vitamin D for both the cases and controls and compare the two groups to determine if there is a relationship between vitamin D intake and the risk of developing osteoporosis. Example: Psychology case-control study You are studying the relationship between early-childhood stress and the likelihood of later developing post-traumatic stress disorder (PTSD). Here, the case group would be individuals who have been diagnosed with PTSD, while the control group would be individuals without PTSD.

Case-control studies are a solid research method choice, but they come with distinct advantages and disadvantages.

Advantages of case-control studies

  • Case-control studies are a great choice if you have any ethical considerations about your participants that could preclude you from using a traditional experimental design .
  • Case-control studies are time efficient and fairly inexpensive to conduct because they require fewer subjects than other research methods .
  • If there were multiple exposures leading to a single outcome, case-control studies can incorporate that. As such, they truly shine when used to study rare outcomes or outbreaks of a particular disease .

Disadvantages of case-control studies

  • Case-control studies, similarly to observational studies, run a high risk of research biases . They are particularly susceptible to observer bias , recall bias , and interviewer bias.
  • In the case of very rare exposures of the outcome studied, attempting to conduct a case-control study can be very time consuming and inefficient .
  • Case-control studies in general have low internal validity  and are not always credible.

Case-control studies by design focus on one singular outcome. This makes them very rigid and not generalizable , as no extrapolation can be made about other outcomes like risk recurrence or future exposure threat. This leads to less satisfying results than other methodological choices.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

Prevent plagiarism. Run a free check.

A case-control study differs from a cohort study because cohort studies are more longitudinal in nature and do not necessarily require a control group .

While one may be added if the investigator so chooses, members of the cohort are primarily selected because of a shared characteristic among them. In particular, retrospective cohort studies are designed to follow a group of people with a common exposure or risk factor over time and observe their outcomes.

Case-control studies, in contrast, require both a case group and a control group, as suggested by their name, and usually are used to identify risk factors for a disease by comparing cases and controls.

A case-control study differs from a cross-sectional study because case-control studies are naturally retrospective in nature, looking backward in time to identify exposures that may have occurred before the development of the disease.

On the other hand, cross-sectional studies collect data on a population at a single point in time. The goal here is to describe the characteristics of the population, such as their age, gender identity, or health status, and understand the distribution and relationships of these characteristics.

Cases and controls are selected for a case-control study based on their inherent characteristics. Participants already possessing the condition of interest form the “case,” while those without form the “control.”

Keep in mind that by definition the case group is chosen because they already possess the attribute of interest. The point of the control group is to facilitate investigation, e.g., studying whether the case group systematically exhibits that attribute more than the control group does.

The strength of the association between an exposure and a disease in a case-control study can be measured using a few different statistical measures , such as odds ratios (ORs) and relative risk (RR).

No, case-control studies cannot establish causality as a standalone measure.

As observational studies , they can suggest associations between an exposure and a disease, but they cannot prove without a doubt that the exposure causes the disease. In particular, issues arising from timing, research biases like recall bias , and the selection of variables lead to low internal validity and the inability to determine causality.

Sources in this article

We strongly encourage students to use sources in their work. You can cite our article (APA Style) or take a deep dive into the articles below.

George, T. (2023, June 22). What Is a Case-Control Study? | Definition & Examples. Scribbr. Retrieved February 22, 2024, from https://www.scribbr.com/methodology/case-control-study/
Schlesselman, J. J. (1982). Case-Control Studies: Design, Conduct, Analysis (Monographs in Epidemiology and Biostatistics, 2) (Illustrated). Oxford University Press.

Is this article helpful?

Tegan George

Tegan George

Other students also liked, what is an observational study | guide & examples, control groups and treatment groups | uses & examples, cross-sectional study | definition, uses & examples, what is your plagiarism score.

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Publish with us
  • About the journal
  • Meet the editors
  • Specialist reviews
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 3, Issue 1
  • Association of vaginal oestradiol and the rate of breast cancer in Denmark: registry based, case-control study, nested in a nationwide cohort
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0003-4549-8271 Amani Meaidi 1 ,
  • Nelsan Pourhadi 1 ,
  • Ellen Christine Løkkegaard 2 , 3 ,
  • Christian Torp-Pedersen 4 , 5 and
  • http://orcid.org/0000-0001-6506-2569 Lina Steinrud Mørch 1
  • 1 Cancer Surveillance and Pharmacoepidemiology , Danish Cancer Institute , Copenhagen , Denmark
  • 2 Department of Gynecology and Obstetrics , Nordsjællands Hospital , Hillerod , Denmark
  • 3 Clinical Medicine , Copenhagen University , Copenhagen , Denmark
  • 4 Department of Cardiology , Nordsjaellands Hospital , Hillerod , Denmark
  • 5 Public Health , Copenhagen University , Copenhagen , Denmark
  • Correspondence to Dr Amani Meaidi, Cancer Surveillance and Pharmacoepidemiology, Danish Cancer Institute, Copenhagen, 2100, Denmark; amani-meaidi{at}live.dk

Objective To estimate the rate of breast cancer associated with use of vaginal oestradiol tablets according to duration and intensity of their use.

Design Registry based, case-control study, nested in a nationwide cohort.

Setting Based in Denmark using the civil registration system, the national registry of medicinal product statistics, the Danish cancer registry, the Danish birth registry, and statistics Denmark.

Participants Women aged 50-60 years in year 2000 or turning 50 years during the study period of 1 January 2000 to 31 December 2018 were included. Exclusions were a history of cancer, mastectomy, use of systemic hormone treatment, use of the levonorgestrel releasing intrauterine system, or use of vaginal oestrogen treatments other than oestradiol tablets. To each woman who developed breast cancer during follow-up (18 997), five women in the control group (94 985) were incidence density matched by birth year.

Main outcome measure The main outcome was pathology confirmed breast cancer diagnosis.

Results 2782 (14.6%) women with breast cancer (cases) and 14 999 (15.8%) women with no breast cancer diagnosis (controls) had been exposed to vaginal oestradiol tablets with 234 cases and 1232 controls having been in treatment for at least four years at a high intensity (>50 micrograms per week). Increasing durations and intensities of use (cumulative dose/cumulative duration) of vaginal oestradiol tablets was not associated with increasing rates of breast cancer. Compared with never-use, cumulative use of vaginal oestradiol for more than nine years was associated with an adjusted hazard ratio of 0.87 (95% confidence interval 0.69 to 1.11). Results were similar in women who had long term use (≥four years) and with high intensity of use (>50-70 micrograms per week) with an adjusted hazard ratio 0.93 (95% confidence interval 0.81 to 1.08).

Conclusions Use of vaginal oestradiol tablets was not associated with increased breast cancer rate compared with never-use. Increasing duration and intensity of use was not associated with increased rates of breast cancer.

  • Hormone replacement therapy
  • Breast neoplasms

Data availability statement

No data are available. Raw data used to conduct this study is only accessible through approval from the Danish Data Protection Agency and the Danish Health Data Board. Although anonymised, the data was available on an individual level, making data sharing restricted by the General Data Protection Regulation of EU law.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjmed-2023-000753

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

While systemic oestrogen treatment has been associated with increased risk of breast cancer development, use of vaginal oestrogen has been suggested to be risk-free

The effect of duration and intensity of vaginal oestrogen use on breast cancer risk is uncertain

WHAT THIS STUDY ADDS

In this nationwide study, use of vaginal oestradiol tablets was not associated with a significant increased risk of breast cancer

This finding remained even in women who were in treatment for >nine years and in women who used the drug long term at high intensity (>50 micrograms per week)

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE, OR POLICY

The findings add reassurance to the breast cancer safety of vaginal oestrogen use in women

Introduction

Breast cancer is the most common cancer in women, affecting around 7.8 million women worldwide with 2.3 million incidences and 700 000 deaths annually. 1 Knowledge about external risk factors would potentially enable preventive actions.

Vaginally administrated oestrogen is the primary pharmaceutical treatment for the genitourinary syndrome of menopause, a condition caused by the physiological oestrogen deficiency following menopause. 2 Around 50% of postmenopausal women will have the syndrome with symptoms such as vaginal irritation, recurrent urogenital infections, and dyspareunia. 2 While systemic menopausal hormone treatment with oestrogen has been linked to an increased risk of breast cancer, use of vaginal oestrogen has been suggested to be risk-free. 3 However, studies have not been able to account for both duration and intensity of vaginal oestrogen use when assessing the association with breast cancer risk, which is highly relevant considering the variation in dosage and time of use among women in need of vaginal oestrogen. 3 4 Furthermore, vaginal oestrogen treatment has previously been associated with an increased risk of endometrial cancer. 4–6 This link to an oestrogen sensitive cancer further necessitates high quality evidence on the breast cancer safety of vaginal oestrogen treatment, especially considering the rise in use. 7

Nested in a Danish, nationwide, female population, we aimed to investigate the risk of breast cancer among women using vaginal oestradiol tablets.

Study population

In Denmark, access to healthcare is freely available to all Danish citizens. We conducted a nationwide, nested, case-control study using the following Danish national registries: (1) the civil registration system, which contains information about sex and vital status of all Danish citizens; (2) the national registry of medicinal product statistics, which includes information about all redeemed prescriptions at Danish pharmacies since 1 January 1995; (3) the Danish cancer registry, which includes all cancer cases since 1 January 1943; (4) the national registry of patients, which comprises information about discharge diagnoses and surgical codes on all somatic hospital admissions since 1 January 1976; (5) the Danish national birth registry, which holds information about all live and death births since 1 January 1973; and (6) statistics Denmark, which provides a yearly update on the education and income status of all Danish citizens. 8–12

We identified incident breast cancer cases and randomly chose controls with no breast cancer who had been matched by birth year. Women were chosen if they were recorded between 1 January 2000 and 31 December 2018 in a nationwide population of all Danish women aged 50-60 years on 1 January 2000 or turning 50 years throughout the study period. Women had no history of cancer (except non-melanoma skin cancer), mastectomy, or prior use of systemic hormone treatment, the hormone-releasing intrauterine system, other vaginal oestrogen treatments than oestradiol tablets, and anti-oestrogen medications.

Data sources and definitions of exclusion criteria are provided in online supplemental table S1 . The personal identification number given to all Danish citizens at birth or immigration allowed reliable linkage between data sources. The year of initiation of the study period and the age restriction of the study population were defined to ensure almost complete exposure history of local and systemic hormone treatment on all included women.

Supplemental material

Breast cancer.

The cancer registry contains records of all incidences of malignant neoplasms in the Danish population from 1943 and onwards. 10 A woman was considered a case with incident breast cancer from the date of a first time invasive breast cancer diagnosis in the Danish cancer registry (the International Classification of Diseases, 10th revision, codes C500-C509). 10 On the date of diagnosis, five women with no breast cancer (controls) were incidence density matched by birth year to each case of breast cancer. The national pathology registry provided information on oestrogen receptor positivity of the breast cancers (systemised nomenclature of medicine code F29521).

Vaginal oestradiol tablets

The exposure of interest was treatment with the vaginal oestradiol tablet as this form is by far the most commonly used type of vaginal oestrogen among Danish women. 7 During the study period, use of any vaginal oestradiol drug formulation required a prescription from a physician. Vaginal oestradiol tablets were available in doses of 10 µg and 25 µg.

Daily updated, individual level information about prescription redemption of vaginal oestradiol tablet was provided by the national registry of medicinal product statistics, which holds information on all prescriptions filled by the Danish population since 1995. 9 The registry receives its information electronically from the digital accounting systems of Danish pharmacies that primarily use the systems to secure reimbursement from the national health service. 9

A woman was considered to be using vaginal oestradiol tablets if she redeemed at least one prescription of the drug. Using the anatomical therapeutic chemical code (ATC) of oestradiol drug formulations (G03CA03) and conditioning on vaginal tablet administration, information on exposure status was obtained from the national registry of medicinal product statistics. 9 This national registry provided information on the date of redeemed prescription, size of drug unit, as well as size and number of redeemed packages. 9 This information was updated daily for each woman during the study period.

Vaginal oestradiol tablets are recommended to be taken once a day for the first two weeks of treatment followed by a maintenance dose of two tablets per week. However, the dosage may be regulated up or down according to the urogenital symptoms of the woman. The validated programme “medicinMacro”, accessible from Github ( www.github.com ) in the “tagteam/heaven” R package, was used to calculate the most likely daily dose. As such, also calculated was duration and time of use and non-use of vaginal oestradiol tablets according to information on date and amount of purchased drug, recommended default, minimum, and maximum dosages, and the prescription pattern of up to five most recent prescriptions. 13 14

Potential confounders

Potential confounding factors, such as educational level and yearly income, was obtained from statistics Denmark. Information about polycystic ovary syndrome, endometriosis, and chronic obstructive pulmonary disease (a surrogate measure for health threatening smoking) was identified from the national registry of patients, and data for redeemed prescriptions on bisphosphonates and diabetes mellitus medication was provided by the national registry of medicinal product statistics. 9 11 Information about parity was extracted from the national birth registry. 12

Statistical analysis

Conditional logistic regression models provided adjusted hazard rate ratios and corresponding 95% confidence intervals of breast cancer according to duration, intensity, and user status of vaginal oestradiol tablets at time of index date (date of diagnosis or matching). Duration was calculated as the cumulative duration of use at time of the index date without consideration to breaks in treatment. In a sensitivity analysis, duration was categorised according to the maximum length of continuous use without treatment breaks. Intensity of use was calculated as the cumulative dose of vaginal oestradiol tablets redeemed at time of index divided by the cumulative duration of use and categorised into low (≤25 μg/week), medium (>25-50 μg/week), high (>50-70 μg/week), and very high (>70 μg/week) intensity of use. User status was categorised into current use (use within 0-2 months prior to index date), recent use (use 2-24 months prior to index date), and previous use (>24 months prior to index date).

Women who had never used vaginal oestradiol tablets and other hormone treatments constituted the reference group in all analyses. The potential confounding factors described above were included in the statistical models.

Sensitivity analyses were conducted on the subpopulation of cases with breast cancers positive for oestrogen receptors and their matched controls as well as on the subpopulation of women with a Charlson comorbidity index score of zero (definition in online supplemental table S1 ).

All analyses were repeated with one year lag time. The level of statistical significance was set at P<0.05. Data was analysed using R statistical software (RStudio version 4.2.1). 15

Patient and public involvement

No patients or members of the public were involved in the design, analysis, or writing up of the study because the research project was undertaken by a small research group without funds or staff for patient and public involvement measures. The results of the study will, nevertheless, be disseminated to the public and health professionals by press releases and presentations at scientific conferences.

A total of 18 997 women with breast cancer and 94 985 women in the control group were identified ( figure 1 ). Characteristics of cases and controls are presented in table 1 . The overall prevalence of vaginal oestradiol use was 2782 (14.6%) among women with breast cancer and 14 999 (15.8%) among population controls. In the control group, 5261 (5.5%) of 94 985 women currently used, 3291 (3.5%) recently used, and 6447 (6.8%) previously used vaginal oestradiol tablets. The corresponding prevalence estimates were similar in women with breast cancer ( table 1 ). Median age at initiation of vaginal oestradiol tablets was 57 years (interquartile range 53-61) among women in the control group, median cumulative duration of use was 8.1 months (2.8-27.4), median cumulative dose was 1620 μg (750-5625), and the median intensity of use was 50.5 µg/week (49.5-55.0). The characteristics of vaginal oestradiol use were similar in breast cancer cases ( table 1 ).

  • Download figure
  • Open in new tab
  • Download powerpoint

Study flowchart

  • View inline

Characteristics of the study population and its use of vaginal oestradiol tablets

Use of vaginal oestradiol tablets was not associated with a significant increase in breast cancer rate compared with never-use ( figure 2 ). Increased cumulative duration of use did not imply increased rates of breast cancer ( figure 2 ), the adjusted hazard ratio of more than nine years of cumulative use was estimated to be 0.87 (95% confidence interval 0.69 to 1.11). The absence of association with duration of use persisted when only considering consistent, uninterrupted use ( online supplemental figure S1 ).

Rate of breast cancer according to duration of use of vaginal oestradiol tablets. Adjusted for educational level; yearly income; history of polycystic ovary syndrome, endometriosis, chronic obstructive pulmonary disease, use of bisphosphonates, or diabetes mellitus; and parity at index date

No consistent association was observed between intensity of use and rate of breast cancer ( figure 3 ). A total of 227 cases and 1186 controls were exposed to vaginal oestradiol tablets for more than four years with an intensity of more than 50-70 µg/week, which is above the recommended maintenance dose of 20-50 µg/week. In these users, the adjusted hazard ratio of breast cancer was found to be 0.93 (95% confidence interval 0.81 to 1.08) compared with never-use.

Rate of breast cancer according to duration and intensity of use of vaginal oestradiol tablets. Intensity categories were low (≤25 µg/week), medium (>25-50 µg/week), high (>50-70 µg/week), and very high (>70 µg/week). Adjusted for educational level, yearly income, polycystic ovary syndrome, endometriosis, chronic obstructive pulmonary disease, use of bisphosphonates, diabetes mellitus, and parity at index date

Results remained robust when stratifying according to user status at time of diagnosis or matching ( figure 4 ).

Rate of breast cancer according to timing, duration, and intensity of use of vaginal oestradiol tablets. Timing of use: current use (0-2 months within index date), recent use (2 months-2 years within index date), previous use (>2 years prior to index date). Adjusted for educational level, yearly income, polycystic ovary syndrome, endometriosis, chronic obstructive pulmonary disease, use of bisphosphonates, diabetes mellitus, and parity at index date

The lack of consistent association between duration and intensity of use of vaginal oestradiol tablets and breast cancer rate persisted in the subpopulation of oestrogen-receptor positive breast cancer cases ( online supplemental figure S2 ) and in healthy women with a Charlson comorbidity index score of zero ( online supplemental figure S3 ), respectively.

Sensitivity analyses including a one year lag time did not materially change the main estimates ( online supplemental figure S4 ).

In this real-world, nationwide, Danish population, increasing duration and intensity of use of vaginal oestradiol tablets was not found to be associated with an increased risk of breast cancer.

Several studies have found orally administrated oestrogen-only treatment to be associated with an increased risk of breast cancer. 3 A meta-analysis of individual participants worldwide reported a hazard ratio of 1.33 in current users of oral oestrogen-only treatment compared with no use. 3 The meta-analysis reports a hazard ratio of 1.09 with use of vaginal oestrogen without further consideration to intensity of use. 3 Similarly, a prospective cohort study by the Women’s Health Initiative of 45 663 postmenopausal women did not find any association between vaginal oestrogen use and breast cancer risk, but did not investigate the association according to duration or intensity of use. 4 A nationwide observational study from Finland reported that use of vaginal oestrogen for less than five years was not associated with an increased risk of breast cancer, but the study did not have sufficient power to study the effect of use of more than five years or the role of the intensity of use. 16

The apparent absence of association between use of vaginal oestradiol tablets and development of breast cancer have previously been explained by the low dose of oestradiol absorbed into the blood with vaginal application of low-dose oestrogen. 17 18 Our study suggests that use of >50-70 µg/week for more than four years is not associated with increased breast cancer risk (hazard ratio 0.93 (95% confidence interval 0.81-1.08)) compared with never-use. Use of >50-70 micrograms per week corresponds to 2.5-fold to 3.5-fold more than recommended weekly maintenance dose of the currently marketed low dose 10 µg vaginal oestradiol tablet.

To our knowledge, our study is the first to report on the breast cancer risk with vaginal oestradiol tablets according to duration and intensity of use. One strength of our study is its nationwide design with a large unselected study population. Additionally, a strength is the use of high quality registry data with accurate and continuously updated data for breast cancer diagnoses and vaginal oestradiol prescriptions as well as medical conditions, reproductive factors, and education. These data allow adjustment for several known risk factors for breast cancer and potential confounders. Use of registries covering the entire Danish population eliminated recall bias, minimised selection bias, provided a long study period, and resulted in no missing data for exposure, outcome, and covariates for all eligible study patients. Thus, no cases or controls were selected on missing data. Cancer diagnoses were from the cancer registry, in which all cancer diagnoses are histologically verified, further enhancing validity. 8 For all women included in the study, we had at least five years of prescription history (establishment of the National Prescription Registry in Denmark was in 1995).

Considering the observational nature of our study, the main limitation is potential existence of bias by unknown or unmeasured confounders. Women who were were adherent users of vaginal oestradiol tablets could potentially be healthier than women who did not use the tablets because adherence to a long term, expensive treatment for a physiological condition might be more likely in women prioritising health and having a favourable socioeconomic status. This potential healthy user bias may have biased our results towards the null. However, the results remained robust in a subpopulation of all healthy women with a Charlson comorbidity index score of zero. Furthermore, as in many other countries, in Denmark, high socioeconomic position has been associated with higher incidence of breast cancer, including in our study (data not shown). 19 Thus, if healthy user bias was present in our study, the direction of the bias would not necessarily cause an underestimation of the association. Finally, we did adjust for education and income in our study.

Despite controlling for several potential confounders, we cannot exclude the occurrence of residual confounding and unmeasured confounding. Obesity has been associated with an increased risk of breast cancer, and obesity is expected to be more common among women who do not use vaginal oestradiol tablets because their oestrogen production in lipid tissue likely decreases the need for exogenous oestrogen. 20 Thus, not adjusting for obesity might have caused an underestimation of the association between vaginal oestradiol and breast cancer. However, obesity is highly (and inversely) correlated with educational status in Denmark, and we did adjust for such. 21

Conclusions

Alongside recent studies suggesting that vaginal oestrogen treatment may safely be used by many women who have had breast cancer, this study adds reassurance to the breast cancer safety of vaginal oestrogen treatment. 22 However, considering the globally prevalent and often life-long indication and increasing use of vaginal oestrogen by postmenopausal women, further studies, especially in other populations, are warranted to confirm drug safety for all potential patients.

Ethics approval

This study was approved by the Danish Data Protection Agency and the Danish Health Data Board. Registry-based studies are not subject to ethics approval in Denmark.

  • World Cancer Research Fund/American Institute for Cancer Research
  • Portman DJ ,
  • Gass MLS , Vulvovaginal Atrophy Terminology Consensus Conference Panel
  • Collaborative Group on Hormonal Factors in Breast Cancer
  • Crandall CJ ,
  • Andrews CA , et al
  • Keiding N , et al
  • Bhupathiraju SN ,
  • Grodstein F ,
  • Stampfer MJ , et al
  • Goukasian I ,
  • Lidegaard O
  • Pedersen CB
  • Pottegård A ,
  • Schmidt SAJ ,
  • Wallach-Kildemoes H , et al
  • Gjerstorff ML
  • Schmidt M ,
  • Sandegaard JL , et al
  • Bliddal M ,
  • Pottegård A , et al
  • Torp-Pedersen C
  • Støvring H ,
  • Rostgaard K , et al
  • R Core Team
  • Lyytinen H ,
  • Pukkala E ,
  • Ylikorkala O
  • Notelovitz M ,
  • Nanavati N , et al
  • Sørdal T , et al
  • Dalton SO ,
  • Engholm G , et al
  • Berrino F ,
  • Abagnato CA , et al
  • Sundhedsstyrelsen
  • Jensen M-B , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1

Twitter @no

Contributors LSM and AM designed the study. AM conducted the data management, and NP and CT-P ran the analysis. The manuscript was drafted by AM and LSM, but all authors contributed to the final manuscript. All authors contributed to the interpretation of the findings. All authors approved the final version and made the decision to submit for publication. All authors had access to the statistically analysed data. AM and CT-P had access to and verified the raw data. AM is the guarantor of the overall content, takes responsibility for the work and conduct of the study, and controlled the decision to publish. The corresponding author attests that all listed authors meet the authorship criteria and that no others meeting the criteria have been omitted. Transparency: The lead author (the guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; CTP has received grants from Novo Nordisk and Bayer outside of the current study; AM, EL, and LSM signed an authorship agreement with no financial benefit with Novo Nordisk after submission of this manuscript but before publication.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Wiley-Blackwell Online Open

Logo of blackwellopen

A plea to stop using the case‐control design in retrospective database studies

Martijn j. schuemie.

1 Observational Health Data Sciences and Informatics, New York, New York

2 Epidemiology Analytics, Janssen Research and Development, Titusville, New Jersey

3 Department of Biostatistics, University of California, Los Angeles, California

Patrick B. Ryan

4 Department of Biomedical Informatics, Columbia University Medical Center, New York, New York

Kenneth K.C. Man

5 Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong

6 Research Department of Practice and Policy, UCL School of Pharmacy, London, UK

7 Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands

8 Department of Social Work and Social Administration, Faculty of Social Science, The University of Hong Kong, Pokfulam, Hong Kong

Ian C.K. Wong

Marc a. suchard.

9 Department of Biomathematics, University of California, Los Angeles, California

10 Department of Human Genetics, University of California, Los Angeles, California

George Hripcsak

11 Medical Informatics Services, NewYork‐Presbyterian Hospital, New York, New York

Associated Data

The case‐control design is widely used in retrospective database studies, often leading to spectacular findings. However, results of these studies often cannot be replicated, and the advantage of this design over others is questionable. To demonstrate the shortcomings of applications of this design, we replicate two published case‐control studies. The first investigates isotretinoin and ulcerative colitis using a simple case‐control design. The second focuses on dipeptidyl peptidase‐4 inhibitors and acute pancreatitis, using a nested case‐control design. We include large sets of negative control exposures (where the true odds ratio is believed to be 1) in both studies. Both replication studies produce effect size estimates consistent with the original studies, but also generate estimates for the negative control exposures showing substantial residual bias. In contrast, applying a self‐controlled design to answer the same questions using the same data reveals far less bias. Although the case‐control design in general is not at fault, its application in retrospective database studies, where all exposure and covariate data for the entire cohort are available, is unnecessary, as other alternatives such as cohort and self‐controlled designs are available. Moreover, by focusing on cases and controls it opens the door to inappropriate comparisons between exposure groups, leading to confounding for which the design has few options to adjust for. We argue that this design should no longer be used in these types of data. At the very least, negative control exposures should be used to prove that the concerns raised here do not apply.

1. BACKGROUND

Case‐control 1 studies consider the question “are persons with a specific disease outcome exposed more frequently to a specific agent than those without the disease?” Thus, the central idea is to compare “cases,” ie, individuals that experience the outcome of interest with “controls,” ie, individuals that did not experience the outcome of interest. Often, one matches controls to cases based on characteristics such as age and sex to make them more comparable. The comparison focuses on differential exposure to the agents of interest in the two groups; greater exposure amongst the cases than amongst the controls suggests a possible positive association. Perhaps the greatest success of the case‐control design stands its contribution to the evidence that smoking causes lung cancer. 2 Since this landmark finding, researchers increasingly have applied the design, occasionally generating spectacular findings leading to headlines in major news outlets, such as a recent study linking anticholinergic drugs to an increased risk of dementia, 3 or another recent study linking immunosuppressants to a lower risk of Parkinson's disease. 4

The case‐control design was originally developed to support prospective studies in situations where data on subjects were costly to acquire, and study budgets did not allow for recruiting and following large cohorts. 5 However, its application in retrospective database studies, for example those using electronic health records and insurance claims where longitudinal person‐level data have already been captured and the analysis is performed solely within the resident data, has become commonplace. Because the data already exist, and their acquisition is a sunk cost, the efficiency argument that initially justified the design becomes a moot point. The added value over other designs for these type of retrospective data is therefore questionable. Along these lines, Schneeweiss provided similar conceptual argument against this design in 2010 6 ; our prior research published in 2013 showed empirical evidence that it is prone to substantial bias when using retrospective data, 7 and others have reported on the lack of reproducibility of such case‐control studies in 1988. 8 Yet, despite all these prior concerns, the number of publications applying this design has increased dramatically over time. Using a specific PubMed query, we identify 1801 publications in the last 5 years that employ the case‐control design in a retrospective database study (see Supplementary Materials).

Here, we aim to describe the mechanisms that lead to bias in clinical case‐control studies using retrospective longitudinal data. We reproduce two published studies to demonstrate this problem, showing that both studies are likely severely biased, and show that an alternative design is equally viable given the same data, and demonstrates considerably less bias. We not only argue against further use of the case‐control designs in situations where other designs such as a comparative cohort analysis or self‐controlled design are equally viable, but we also highlight a set of study diagnostics that researcher should perform if they insist on using a case‐control design anyway.

1.1. Issues with the case‐control design

One root cause of trouble is that the case‐control design focuses on the definition of cases and their controls, while the main research question centers on exposed and unexposed. Being “exposed” is often defined as the presence of some intervention in the patient's record at the time of the outcome, while “unexposed” indicates the absence of such an intervention. However, this begs the question whether exposed and unexposed are comparable. Is someone who has received a treatment for some serious ailment really comparable to someone else who did not receive that treatment, even when matching on age and sex, or nesting in a subpopulation with a particular disease? While other designs such as the new‐user cohort design makes this comparison explicit, the case‐control design obfuscates this essential question of comparability.

A related issue stems from the fact that the case‐control design is anchored on the date of the outcome. Any covariates that we may capture to adjust for differences relative to this date might actually be captured after the exposure of interest was started, thus opening the possibility of erroneously adjusting for causal intermediaries. This problem becomes uncomfortably clear if we try to rephrase a case‐control study as the equivalent cohort study. New‐user cohort studies are anchored on the date the exposure starts, thus allowing baseline covariates to be captured before exposure and ruling out causal intermediaries. However, often the unexposed group implied by a case‐control study has no well‐defined index date when follow‐up starts.

One final argument against using a case‐control design when the data is already available to also perform a cohort study is efficiency, ie, by using only a sample of the controls information on other controls is discarded, and a case‐control study estimate will therefore always have lower precision than the estimate of an equivalent cohort study using all data.

We replicate two recent studies that represent a range of approaches taken in case‐control studies. The first study, by Crockett et al, 9 investigates the effect of isotretinoin on the risk of several outcomes including ulcerative colitis (UC) using a fairly simple design. The second study, by Chou et al, 10 investigates dipeptidyl peptidase‐4 (DPP‐4) inhibitors on the risk of acute pancreatitis, employing a more complex design with nesting in a cohort of type‐2 diabetes mellitus patients, and additional confounding adjustment through covariates included in a multivariable regression. We replicate these two studies as faithfully as possible, and additionally include a set of negative control exposures that are not believed to cause the outcomes of interest such that their true odds ratios (ORs) should be equal to 1. Applying the same design used in the replication studies to these controls allows us to quantify residual bias in the design. 11 , 12

An overview of our methods is provided next. The full protocol, attached as supplementary information, provides specific details such as the codes used to identify disease conditions and drug exposures, and the list of negative control outcomes. The computer code for executing our study is available as open source software at https://github.com/OHDSI/StudyProtocols/tree/master/EvaluatingCaseControl .

1.2. Data source

For our analyses, we use the IBM® MarketScan® Commercial Claims and Encounters Database (CCAE) that represent longitudinal data from individuals enrolled in United States employer‐sponsored insurance health plans. The data include adjudicated health insurance claims (eg, inpatient, outpatient, and outpatient pharmacy) as well as enrollment data from large employers and health plans that provide private healthcare coverage to employees, their spouses, and dependents. Additionally, CCAE captures laboratory tests for a subset of the covered lives. This administrative claims database includes a variety of fee‐for‐service, preferred provider organizations, and capitated health plans, captured from March 2000 up to and including September 2017. The data are transformed to the OMOP Common Data Model version 5.1.

In contrast, the Crockett study used the PharMetrics Patient‐Centric Database (IMS Health, Watertown, MA), another US insurance claims database. The Chou study used the Taiwanese National Health Insurance Research Database.

1.3. Crockett study replication

Both the original study by Crockett et al and our replication define cases as subjects having at least three health‐care contacts with a UC diagnosis code on three different dates, or one UC diagnosis code and exposure to a drug used in UC treatment. Controls are selected from the overall population, matching on age (with 2‐year caliper), gender, and length of enrollment (90‐day caliper). The index date is defined as the date of the outcome for cases, and the date exactly 12 months after enrollment start for controls. The exposure of interest is isotretinoin. Subjects are considered “exposed” if they are exposed any time in the 12 months prior to the index date. One noteworthy difference with the original study is that our replication does not match on region or health plan since those data are not available in the CCAE database.

1.3.1. Chou study replication

Both the original study by Chou et al and our replication only consider cases and controls from a patient cohort who had at least one outpatient or inpatient diagnosis of type‐2 diabetes mellitus and who fill at least one prescription of oral antihyperglycemic agents. The cohort entry date is defined as the prescribing date of the first claim of oral antihyperglycemic agents. To be eligible for the study cohort, patients need to be 18 years old and have claims data for a continuous period of at least 12 months before the cohort entry date and 6 months after the cohort entry date. Cases are defined as subjects having an inpatient diagnosis of acute pancreatitis, with the index date set to the date of diagnosis. Up to four controls are selected by matching on age (within a 1‐year caliper), gender, and time in cohort (within a 1‐year caliper), with the index date set to the index date of the case to which the controls are matched. Subjects are considered exposed if they are exposed any time in the 30 days prior to the index date.

Additionally, the following risk factors of acute pancreatitis are included in the conditional logistic regression outcome model, based on occurrence in the year before the index date: gallstone disease, alcohol‐related disease, hypertriglyceridemia, cystic fibrosis, neoplasm, obesity, and tobacco use. Both studies also adjust for the Diabetes Complications Severity Index to account for the potential impacts of the severity of diabetes on the risk of acute pancreatitis, and include exposure to drugs that might be associated with acute pancreatitis (furosemide, NSAIDs, corticosteroids, antibiotics, and cancer drugs).

1.3.2. Negative control exposures

We identify a large set of negative control exposures where we are confident they are not causally associated with the outcome of interest, such that we can assume the true OR is 1. First, we generate a candidate list of negative control exposures by identifying exposures with no evidence of being causally related to the outcome of interest. 13 We search for this evidence in the literature through MeSH headings 14 and natural language processing, 15 spontaneous reports of adverse events, 16 and product labels in the US 17 and Europe. 18 We then reverse sort the candidate exposures by prevalence in the longitudinal database and manually curate until at least 35 negative controls remain. For the nested case‐control study (the Chou et al replication), we define a nesting cohort for each negative control exposure by selecting one of the primary indications of each drug. The negative controls are listed in Appendix 1 of the protocol.

1.3.3. Alternative study design

To demonstrate that alternative designs with better means to adjust for confounding are equally viable in retrospective database studies, we estimate the same effects using the self‐controlled case series (SCCS) design. 19 The SCCS compares the rate of outcomes during exposure to the rate of outcomes during all unexposed time, both before, between, and after exposures. It is a Poisson regression that is conditioned on the person. Thus, it seeks to answer the following question: “Given that a patient has the outcome, is the outcome more likely during exposed time compared to nonexposed time?” Because the SCCS is a self‐controlled design, where each person serves as their own control, it is immune to confounding due to factors that are constant over time.

Crockett et al defined cases and control to be exposed if they were exposed in the 365 days prior to the index date. Therefore, when approximating that study using the SCCS design, we define subjects to be exposed starting on the day after treatment initiation and stopping 365 days after the end of their last prescription, allowing for a 30‐day gap between prescriptions. When approximating the Chou study, we define subjects to be exposed starting on the day after treatment initiation and stopping 30 days after the end of their last prescription, also allowing for a 30‐day gap between prescriptions. For both studies, we exclude the first 365 days of observation to establish exposure status at the start of follow‐up, add a pre‐exposure window of 30 days to counter any time‐varying effects due to contra‐indications, and adjust for age and season by assuming a constant effect of age and season within each calendar month and using five‐knot cubic splines to model the effect across months.

In the Crockett replication, we identify 122 192 cases of UC, which are matched to 366 576 controls (original study: 4428 cases and 21 832 controls). In the Chou replication, we identify 6799 cases of acute pancreatitis, which are matched to 27 196 controls (original study: 1957 cases and 7828 controls).

2.1. Comparability of exposed to unexposed

Table  1 reports the percentage of the 1049 exposed and 487 719 unexposed in our Crockett replication that have an occurrence of select diagnosis and medication codes in the year up to a month prior to the exposure start (for exposed) or index date (for unexposed). We exclude the last month to minimize the risk of detecting precursors of the outcome. Table  1 also reports the standardized difference of means (S.Diff) for each characteristic. Typically, an absolute S.Diff >0.1 is considered imbalance between the exposure groups. Many characteristics exceed this threshold.

Patient characteristics of exposed and unexposed in the Crockett replication. S.Diff: Standardized difference of means (exposed – unexposed)

Table  2 reports the same statistics for the 4933 exposed and 29 062 unexposed in the Chou replication. One characteristic exceeds our predefined threshold, and the S.Diff is positive for almost all, suggesting those not exposed to DPP‐4 inhibitors are overall sicker than those that are not exposed.

Patient characteristics of exposed and unexposed in the Chou replication

2.2. Case‐control effect size estimates and residual bias

The left panel of Figure  2 evaluates the OR estimate and its standard error for isotretinoin exposure both in the original study by Crockett et al (OR = 4.36, 95% confidence interval (CI) = 1.97 to 9.66), and our replication study (OR = 3.51, CI = 3.11 to 3.96) . The figure also includes the negative control exposure estimates, executed under the same study design. The full list of negative controls and their estimates can be found in the Supplementary Table S1. Note that, even though for the negative controls the true odds ratio is assumed to be 1, almost all have an estimated OR that would be called significantly ( p  < 0.05) greater than 1.

sim8215-fig-0002

Estimates from the self‐controlled case series (SCCS) design for the exposures of interest (yellow diamond) and the negative control exposures (blue dots) for the Crockett replication (left) and the Chou replication (right). Estimates below the dashed line have p  < 0.05. Estimates in the orange area have a calibrated p  < 0.05 [Colour figure can be viewed at wileyonlinelibrary.com ]

The right panel of Figure ​ Figure1 1 evaluates the OR estimate and its standard error for DPP‐4 inhibitor exposure both in the original study by Chou et al (OR = 1.04; CI = 0.89 to 1.21), and our replication study (OR = 1.12, CI = 1.03 to 1.22). This figure also includes the negative control exposure estimates, executed under the same study design. Many negative controls show spurious statistically significant associations. Furthermore, the fact that the orange boundary (which indicates where the calibrated p‐value 12 is equal to 0.05), deviates strongly from the dashed line (indicating where the uncalibrated p‐value is equal to 0.05) demonstrates that overall the variability of the estimates of the negative controls is greater than can be explained by random error alone. While our replication study yields a statistically significant effect under a nominal p‐value, which does not account for systematic error, this finding is unlikely to inform us about the true effect size, but rather reflects the range of estimates one would expect to observe using this design even when no true effect exists.

sim8215-fig-0001

Estimates from the original study (purple triangle), our replication (yellow diamond), and our replication applied to the negative control exposures (blue dots) for the Crockett replication (left) and the Chou replication (right). Estimates below the dashed line have p  < 0.05. Estimates in the orange area have a calibrated p  < 0.05 [Colour figure can be viewed at wileyonlinelibrary.com ]

2.3. Self‐controlled case series effect size estimates and residual bias

The left panel of Figure  2 shows the incidence rate ratio (IRR) estimates generated using the SCCS for the question addressed in the Crockett study as well as the negative controls. A much smaller effect (IRR = 1.64; CI = 1.30 to 2.06) is observed for isotretinoin than in both the original case‐control study and our replication, but in the SCCS analysis, this effect clearly stands out compared to the negative controls, suggesting there might be a true effect. The right panel of Figure  2 shows the effect estimates for DPP‐4 inhibitors (IRR = 1.11; CI = 1.04 to 1.19) are similar to the case‐control studies. Even though the negative controls show less bias than when using the case‐control design, this estimate would still be consistent with the effects observed for the negative controls. The full list of SCCS estimates can be found in the Supplementary Table S2.

3. DISCUSSION

Our replications of the two published case‐control studies produce similar effect size estimates to the original studies, with highly overlapping CIs, but we also show that these findings are likely caused by bias. This explains why others studying the same questions have found different answers. For example, Bernstein et al 20 reported no association between isotretinoin and UC (OR = 1.16, CI = 0.56 to 2.20), and Singh et al 21 reported a positive association between sitagliptin (a DPP‐4 inhibitor) and acute pancreatitis (OR = 2.01, CI = 1.19 to 3.38). Our use of negative controls clearly shows that these case‐control designs tend to produce biased estimates. We hypothesize that this bias is due to confounding that stems from differences between the exposure groups and show that exposed and unexposed indeed have different baseline characteristics.

This brings us to our main criticism of applying the case‐control design when other designs are viable, ie, by focusing on cases and controls, this design hides the question of comparability between exposed and unexposed. As a consequence, applications of the case‐control design often imply inappropriate comparisons between exposure groups, where especially the unexposed group is often ill‐defined. This, in turn, leads to confounding, and because the design anchors on the outcome date rather than the exposure date, our hands are tied when we try to adjust for this confounding, because we must avoid including causal intermediaries.

We have attempted to reframe the Crockett and Chou studies as comparative new‐user cohort studies, but found it impossible to identify appropriate comparator groups with index dates. For example, when should follow‐up start for people not exposed to isotretinoin in the cohort‐study equivalent of the Crockett study? Rather than being a limitation of the cohort design, we believe these difficulties reveal the inappropriateness of the comparison implied by these case‐control designs.

As an alternative, we reframed the studies as SCCS 19 instead. Because of the self‐controlled nature of the SCCS design, it addresses the type of confounding observed for the case‐control studies, and may often be suitable for use in these type of data. 22 Consequently, our analysis of the same data using the SCCS designs yields considerably less bias as observed through the negative control estimates.

A typical response to the biases reported here is that these findings do not apply to any given researcher's specific case‐control study; people agree that the two example studies we show here are problematic, but if we had only modified the designs in a specific way, all the issues we observe would go away. For example, the use of propensity scores 23 or disease risk scores 24 could address between person confounding in case‐control studies, but our concern with the use of these scores is the danger of accidentally including intermediary variables, as well as other issues reported earlier. 25 In general, we find it hard to justify such a blind faith in the ability of minor tweaks to remedy the weaknesses inherent to the case‐control design for these types of data and argue that empirical evidence is needed to support such claims. Moreover, it is important to remember that the examples we chose reflect what is actually used in published case‐control studies today.

4. CONCLUSIONS

We cannot stress enough that our criticism does not apply to the case‐control design in general. In theory, even a case‐control design applied in a retrospective database study may well be unbiased if all issues raised in this paper are addressed. However, our main point is that, in such database studies, where all exposure and covariate data are available for the entire cohort, it is unnecessary to conduct a case‐control study, as other alternatives such as the cohort and SCCS designs are available. Outside of such database studies, a case‐control design may be completely justified to reduce the cost of acquiring the necessary data.

In this paper, we have highlighted several diagnostics that can reveal issues with a case‐control study design. If researchers insist on using the case‐control design, we believe the onus lies on them to use these diagnostics to verify that the concerns raised here do not apply to their study.

Supporting information

SIM_8215‐Supp‐0001‐ProtocolRevisionV02.pdf

SIM_8215‐Supp‐0002PublicationCountV02.pdf

SIM_8215‐Supp‐0003‐SupplementaryTableS1.csv

SIM_8215‐Supp‐0004SupplementaryTableS2.csv

ACKNOWLEDGEMENTS

This work was partially supported through National Science Foundation Grants IIS 1251151  and DMS 1264153, and NIH Grants R01 LM006910 and U01 HG008680. Infrastructure to carry out the project was funded, in part, by Janssen Research and Development.

DATA AVAILABILITY STATEMENT

All 77 odds ratio estimates and corresponding confidence intervals produced in this study have been included as Supplementary Materials.

Schuemie MJ, Ryan PB, Man KKC, Wong ICK, Suchard MA, Hripcsak G. A plea to stop using the case‐control design in retrospective database studies . Statistics in Medicine . 2019; 38 :4199–4208. 10.1002/sim.8215 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

IMAGES

  1. Nested case control study design

    definition of nested case control study

  2. PPT

    definition of nested case control study

  3. Nested Case Control Study

    definition of nested case control study

  4. PPT

    definition of nested case control study

  5. Structure of the nested case-control study (A) with the number of

    definition of nested case control study

  6. NESTED CASE CONTROL STUDY

    definition of nested case control study

VIDEO

  1. Nested Quantifiers| Translation of nested quantifier to English (1.5) -اردو / हिंदी

  2. #5- Case Control Studies part 1

  3. case control study part 2 || epidemiology|| PSM|| @Sudarshan263

  4. case control study I Features I Steps I odds ratio I PSM

  5. Prob review

  6. Case Control study, Odds Ratio Concept

COMMENTS

  1. Nested case-control study

    A nested case-control (NCC) study is a variation of a case-control study in which cases and controls are drawn from the population in a fully enumerated cohort. [1] Usually, the exposure of interest is only measured among the cases and the selected controls. Thus the nested case-control study is more efficient than the full cohort design.

  2. A Practical Overview of Case-Control Studies in Clinical Practice

    In a case-control study the researcher identifies a case group and a control group, with and without the outcome of interest. Such a study design is called observational because the researcher does not control the assignment of a subject to one of the groups, unlike in a planned experimental study.

  3. Nested case-control studies

    In the nested case-control study, cases of a disease that occur in a defined cohort are identified and, for each, a specified number of matched controls is selected from among those in the cohort who have not developed the disease by the time of disease occurrence in the case.

  4. A Nested Case-Control Study

    A Nested Case-Control Study Case-Control Studies A Nested Case-Control Study Suppose a prospective cohort study were conducted among almost 90,000 women for the purpose of studying the determinants of cancer and cardiovascular disease.

  5. Nested case-control studies (Chapter 7)

    The nested case-control design accommodates case event times into the sampling of controls. In this design one or more controls is or are selected for each case from the risk set at the time at which the case event occurs. Controls may also be matched to cases on selected variables.

  6. Nested case control study

    Overview nested case control study Quick Reference A case control study that utilizes cases and control subjects already being studied for another purpose; often part of the larger population of a cohort study.

  7. Nested Study

    Definition A nested case-control study is one that is "nested" within a cohort study. In many cohort studies, all subjects provide a wide range of information at the time of recruitment, e.g., results from a physical examination, answers to multiple questionnaires, blood and urine samples, and results from imaging techniques.

  8. Nested Case-Control Studies

    In the nested case-control study, cases of a disease that occur in a defined cohort are identified and, for each, a specified number of matched controls is selected from among those in the cohort who have not developed the disease by the time of disease occurrence in the case.

  9. A New Comparison of Nested Case-Control and Case-Cohort Designs and

    Introduction Case-cohort and nested case-control designs are the most common approaches for reducing the costs of exposure assessment in prospective epidemiologic studies. Exposure data in these designs are obtained on a subset of the full cohort.

  10. Methodologic considerations in the design and analysis of nested case

    The nested case-control study (NCC) design within a prospective cohort study is used when outcome data are available for all subjects, but the exposure of interest has not been collected, and is difficult or prohibitively expensive to obtain for all subjects. A NCC analysis with good matching procedures yields estimates that are as efficient and unbiased as estimates from the full cohort study.

  11. Advantages of the nested case-control design in diagnostic research

    A major flaw inherent to case-control studies, described as early as 1959 , is the difficulty to ensure that cases and controls are a representative sample of the same source population. In a nested case-control study the cases emerge from a well-defined source population and the controls are sampled from that same population.

  12. Nested case-control studies: advantages and disadvantages

    a) The nested case-control study is a retrospective design b) The study design minimised selection bias compared with a case-control study c) Recall bias was minimised compared with a case-control study d) Causality could be inferred from the association between prescription of antipsychotic drugs and venous thromboembolism Answers

  13. Comparison of cohort and nested case‐control designs for estimating the

    The nested case-control (NCC) study design is another widely used approach to explore the association between drug exposure and the event of interest. NCC analysis is conducted as a case-control study based on all cases (subjects who experienced the event of interest during follow-up) ...

  14. A Practical Overview of Case-Control Studies in Clinical Practice

    The main advantages of a nested case-control study are as follows: (1) cost reduction and effort minimization, as only a fraction of the parent cohort requires the necessary outcome assessment; (2) reduced selection bias, as both case and control subjects are sampled from the same population; and (3) flexibility in analysis by allowing testing of a hypotheses in the future that is not ...

  15. Analysis of Nested Case-Control Study Designs: Revisiting the Inverse

    The nested case-control design, like the case-cohort design, is a schema in which a representative sample of a full cohort is used. It includes all cases and a pre-specified number of controls randomly chosen from the risk set of each failure time ( Thomas, 1977 ). The design is also referred as incidence density sampling or risk set sampling.

  16. A Nested Case-Control Study

    A Nested Case-Control Study. Now consider a hypothetical prospective cohort study among 89,949 women in whom the investigators took blood samples and froze them at baseline for possible future use. After following the cohort for 12 years the investigators wanted to investigate a possible association between the pesticide DDT and breast cancer ...

  17. Nested Case-Control Studies

    The nested case-control study design (or the case-control in a cohort study) is described here and compared with other designs, including the classic case-control and cohort studies and the case-cohort study. In the nested case-control study, cases of a disease that occur in a defined cohort are identified and, for each, a specified number of ...

  18. 7.2

    Nested Case-Control Study: This is a case-control study within a cohort study. At the beginning of the cohort study \ ( (t_0)\), members of the cohort are assessed for risk factors. Cases and controls are identified subsequently at time \ (t_1\).

  19. Application of the matched nested case-control design to the secondary

    A nested case-control study is an efficient design that can be embedded within an existing cohort study or randomised trial. It has a number of advantages compared to the conventional case-control design, and has the potential to answer important research questions using untapped prospectively collected data. We demonstrate the utility of the matched nested case-control design by applying it ...

  20. Advantages of the nested case-control design in diagnostic research

    Our findings support the view that the nested case-control study is a valid and efficient design for diagnostic studies and should also be (re)appraised in current guidelines on diagnostic accuracy research. Peer Review reports Background

  21. Nested Study

    Definition. A nested case-control study is one that is "nested" within a cohort study. In many cohort studies, all subjects provide a wide range of information at the time of recruitment, e.g., results from a physical examination, answers to multiple questionnaires, blood and urine samples, and results from imaging techniques.

  22. What Is a Case-Control Study?

    Revised on June 22, 2023. A case-control study is an experimental design that compares a group of participants possessing a condition of interest to a very similar group lacking that condition. Here, the participants possessing the attribute of study, such as a disease, are called the "case," and those without it are the "control.".

  23. Association of vaginal oestradiol and the rate of breast cancer in

    Objective To estimate the rate of breast cancer associated with use of vaginal oestradiol tablets according to duration and intensity of their use. Design Registry based, case-control study, nested in a nationwide cohort. Setting Based in Denmark using the civil registration system, the national registry of medicinal product statistics, the Danish cancer registry, the Danish birth registry ...

  24. A plea to stop using the case‐control design in retrospective database

    One root cause of trouble is that the case‐control design focuses on the definition of cases and their controls, while the main research question centers on exposed and unexposed. ... For the nested case‐control study (the Chou et al replication), we define a nesting cohort for each negative control exposure by selecting one of the primary ...