Log in using your username and password
- Search More Search for this keyword Advanced search
- Latest content
- Current issue
- BMJ Journals More You are viewing from: Google Indexer

You are here
- Volume 20, Issue 1
- Observational research methods. Research design II: cohort, cross sectional, and case-control studies
- Article Text
- Article info
- Citation Tools
- Rapid Responses
- Article metrics

- Department of Accident and Emergency Medicine, Taunton and Somerset Hospital, Taunton, Somerset, UK
- Correspondence to: Dr C J Mann; tonygood{at}doctors.org.uk
Cohort, cross sectional, and case-control studies are collectively referred to as observational studies. Often these studies are the only practicable method of studying various problems, for example, studies of aetiology, instances where a randomised controlled trial might be unethical, or if the condition to be studied is rare. Cohort studies are used to study incidence, causes, and prognosis. Because they measure events in chronological order they can be used to distinguish between cause and effect. Cross sectional studies are used to determine prevalence. They are relatively quick and easy but do not permit distinction between cause and effect. Case controlled studies compare groups retrospectively. They seek to identify possible predictors of outcome and are useful for studying rare diseases or outcomes. They are often used to generate hypotheses that can then be studied via prospective cohort or other studies.
- research methods
- cohort study
- case-control study
- cross sectional study
http://dx.doi.org/10.1136/emj.20.1.54
Statistics from Altmetric.com
Request permissions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Cohort, cross sectional, and case-control studies are often referred to as observational studies because the investigator simply observes. No interventions are carried out by the investigator. With the recent emphasis on evidence based medicine and the formation of the Cochrane Database of randomised controlled trials, such studies have been somewhat glibly maligned. However, they remain important because many questions can be efficiently answered by these methods and sometimes they are the only methods available.
The objective of most clinical studies is to determine one of the following—prevalence, incidence, cause, prognosis, or effect of treatment; it is therefore useful to remember which type of study is most commonly associated with each objective (table 1)
- View inline
While an appropriate choice of study design is vital, it is not sufficient. The hallmark of good research is the rigor with which it is conducted. A checklist of the key points in any study irrespective of the basic design is given in box 1.
Study purpose
The aim of the study should be clearly stated.
The sample should accurately reflect the population from which it is drawn.
The source of the sample should be stated.
The sampling method should be described and the sample size should be justified.
Entry criteria and exclusions should be stated and justified.
The number of patients lost to follow up should be stated and explanations given.
Control group
The control group should be easily identifiable.
The source of the controls should be explained—are they from the same population as the sample?
Are the controls matched or randomised—to minimise bias and confounding.
Quality of measurements and outcomes
Validity—are the measurements used regarded as valid by other investigators?
Reproducibility—can the results be repeated or is there a reason to suspect they may be a “one off”?
Blinded—were the investigators or subjects aware of their subject/control allocation?
Quality control—has the methodology been rigorously adhered to?
Completeness
Compliance—did all patients comply with the study?
Drop outs—how many failed to complete the study?
Missing data—how much are unavailable and why?
Distorting influences
Extraneous treatments—other interventions that may have affected some but not all of the subjects.
Confounding factors—Are there other variables that might influence the results?
Appropriate analysis—Have appropriate statistical tests been used?
All studies should be internally valid. That is, the conclusions can be logically drawn from the results produced by an appropriate methodology. For a study to be regarded as valid it must be shown that it has indeed demonstrated what it says it has. A study that is not internally valid should not be published because the findings cannot be accepted.
The question of external validity relates to the value of the results of the study to other populations—that is, the generalisability of the results. For example, a study showing that 80% of the Swedish population has blond hair, might be used to make a sensible prediction of the incidence of blond hair in other Scandinavian countries, but would be invalid if applied to most other populations.
Every published study should contain sufficient information to allow the reader to analyse the data with reference to these key points.
In this article each of the three important observational research methods will be discussed with emphasis on their strengths and weaknesses. In so doing it should become apparent why a given study used a particular research method and which method might best answer a particular clinical problem.
COHORT STUDIES
These are the best method for determining the incidence and natural history of a condition. The studies may be prospective or retrospective and sometimes two cohorts are compared.
Prospective cohort studies
A group of people is chosen who do not have the outcome of interest (for example, myocardial infarction). The investigator then measures a variety of variables that might be relevant to the development of the condition. Over a period of time the people in the sample are observed to see whether they develop the outcome of interest (that is, myocardial infarction).
In single cohort studies those people who do not develop the outcome of interest are used as internal controls.
Where two cohorts are used, one group has been exposed to or treated with the agent of interest and the other has not, thereby acting as an external control.
Retrospective cohort studies
These use data already collected for other purposes. The methodology is the same but the study is performed posthoc. The cohort is “followed up” retrospectively. The study period may be many years but the time to complete the study is only as long as it takes to collate and analyse the data.
Advantages and disadvantages
The use of cohorts is often mandatory as a randomised controlled trial may be unethical; for example, you cannot deliberately expose people to cigarette smoke or asbestos. Thus research on risk factors relies heavily on cohort studies.
As cohort studies measure potential causes before the outcome has occurred the study can demonstrate that these “causes” preceded the outcome, thereby avoiding the debate as to which is cause and which is effect.
A further advantage is that a single study can examine various outcome variables. For example, cohort studies of smokers can simultaneously look at deaths from lung, cardiovascular, and cerebrovascular disease. This contrasts with case-control studies as they assess only one outcome variable (that is, whatever outcome the cases have entered the study with).
Cohorts permit calculation of the effect of each variable on the probability of developing the outcome of interest (relative risk). However, where a certain outcome is rare then a prospective cohort study is inefficient. For example, studying 100 A&E attenders with minor injuries for the outcome of diabetes mellitus will probably produce only one patient with the outcome of interest. The efficiency of a prospective cohort study increases as the incidence of any particular outcome increases. Thus a study of patients with a diagnosis of deliberate self harm in the 12 months after initial presentation would be efficiently studied using a cohort design.
Another problem with prospective cohort studies is the loss of some subjects to follow up. This can significantly affect the outcome. Taking incidence analysis as an example (incidence = cases/per period of time), it can be seen that the loss of a few cases will seriously affect the numerator and hence the calculated incidence. The rarer the condition the more significant this effect.
Retrospective studies are much cheaper as the data have already been collected. One advantage of such a study design is the lack of bias because the outcome of current interest was not the original reason for the data to be collected. However, because the cohort was originally constructed for another purpose it is unlikely that all the relevant information will have been rigorously collected.
Retrospective cohorts also suffer the disadvantage that people with the outcome of interest are more likely to remember certain antecedents, or exaggerate or minimise what they now consider to be risk factors (recall bias).
Where two cohorts are compared one will have been exposed to the agent of interest and one will not. The major disadvantage is the inability to control for all other factors that might differ between the two groups. These factors are known as confounding variables.
A confounding variable is independently associated with both the variable of interest and the outcome of interest. For example, lung cancer (outcome) is less common in people with asthma (variable). However, it is unlikely that asthma in itself confers any protection against lung cancer. It is more probable that the incidence of lung cancer is lower in people with asthma because fewer asthmatics smoke cigarettes (confounding variable). There are a virtually infinite number of potential confounding variables that, however unlikely, could just explain the result. In the past this has been used to suggest that there is a genetic influence that makes people want to smoke and also predisposes them to cancer.
The only way to eliminate all possibility of a confounding variable is via a prospective randomised controlled study. In this type of study each type of exposure is assigned by chance and so confounding variables should be present in equal numbers in both groups.
Finally, problems can arise as a result of bias. Bias can occur in any research and reflects the potential that the sample studied is not representative of the population it was drawn from and/or the population at large. A classic example is using employed people, as employment is itself associated with generally better health than unemployed people. Similarly people who respond to questionnaires tend to be fitter and more motivated than those who do not. People attending A&E departments should not be presumed to be representative of the population at large.
How to run a cohort study
If the data are readily available then a retrospective design is the quickest method. If high quality, reliable data are not available a prospective study will be required.
The first step is the definition of the sample group. Each subject must have the potential to develop the outcome of interest (that is, circumcised men should not be included in a cohort designed to study paraphimosis). Furthermore, the sample population must be representative of the general population if the study is primarily looking at the incidence and natural history of the condition (descriptive).
If however the aim is to analyse the relation between predictor variables and outcomes (analytical) then the sample should contain as many patients likely to develop the outcome as possible, otherwise much time and expense will be spent collecting information of little value.
Cohort studies
Cohort studies describe incidence or natural history.
They analyse predictors (risk factors) thereby enabling calculation of relative risk.
Cohort studies measure events in temporal sequence thereby distinguishing causes from effects.
Retrospective cohorts where available are cheaper and quicker.
Confounding variables are the major problem in analysing cohort studies.
Subject selection and loss to follow up is a major potential cause of bias.
Each variable studied must be accurately measured. Variables that are relatively fixed, for example, height need only be recorded once. Where change is more probable, for example, drug misuse or weight, repeated measurements will be required.
To minimise the potential for missing a confounding variable all probable relevant variables should be measured. If this is not done the study conclusions can be readily criticised. All patients entered into the study should also be followed up for the duration of the study. Losses can significantly affect the validity of the results. To minimise this as much information about the patient (name, address, telephone, GP, etc) needs to be recorded as soon as the patient is entered into the study. Regular contact should be made; it is hardly surprising if the subjects have moved or lost interest and become lost to follow up if they are only contacted at 10 year intervals!
Beware, follow up is usually easier in people who have been exposed to the agent of interest and this may lead to bias.
There are many famous examples of Cohort studies including the Framingham heart study, 2 the UK study of doctors who smoke 3 and Professor Neville Butler‘s studies on British children born in 1958. 4 A recent example of a prospective cohort study by Davey Smith et al was published in the BMJ 5 and a retrospective cohort design was used to assess the use of A&E departments by people with diabetes. 6
CROSS SECTIONAL STUDIES
These are primarily used to determine prevalence. Prevalence equals the number of cases in a population at a given point in time. All the measurements on each person are made at one point in time. Prevalence is vitally important to the clinician because it influences considerably the likelihood of any particular diagnosis and the predictive value of any investigation. For example, knowing that ascending cholangitis in children is very rare enables the clinician to look for other causes of abdominal pain in this patient population.
Cross sectional studies are also used to infer causation.
At one point in time the subjects are assessed to determine whether they were exposed to the relevant agent and whether they have the outcome of interest. Some of the subjects will not have been exposed nor have the outcome of interest. This clearly distinguishes this type of study from the other observational studies (cohort and case controlled) where reference to either exposure and/or outcome is made.
The advantage of such studies is that subjects are neither deliberately exposed, treated, or not treated and hence there are seldom ethical difficulties. Only one group is used, data are collected only once and multiple outcomes can be studied; thus this type of study is relatively cheap.
Many cross sectional studies are done using questionnaires. Alternatively each of the subjects may be interviewed. Table 2 lists the advantages and disadvantages of each.
Any study with a low response rate can be criticised because it can miss significant differences in the responders and non-responders. At its most extreme all the non-responders could be dead! Strenuous efforts must be made to maximise the numbers who do respond. The use of volunteers is also problematic because they too are unlikely to be representative of the general population. A good way to produce a valid sample would be to randomly select people from the electoral role and invite them to complete a questionnaire. In this way the response rate is known and non-responders can be identified. However, the electoral role itself is not an entirely accurate reflection of the general population. A census is another example of a cross sectional study.
Market research organisations often use cross sectional studies (for example, opinion polls). This entails a system of quotas to ensure the sample is representative of the age, sex, and social class structure of the population being studied. However, to be commercially viable they are convenience samples—only people available can be questioned. This technique is insufficiently rigorous to be used for medical research.
How to run a cross sectional study
Formulate the research question(s) and choose the sample population. Then decide what variables of the study population are relevant to the research question. A method for contacting sample subjects must be devised and then implemented. In this way the data are collected and can then be analysed
The most important advantage of cross sectional studies is that in general they are quick and cheap. As there is no follow up, less resources are required to run the study.
Cross sectional studies are the best way to determine prevalence and are useful at identifying associations that can then be more rigorously studied using a cohort study or randomised controlled study.
The most important problem with this type of study is differentiating cause and effect from simple association. For example, a study finding an association between low CD4 counts and HIV infection does not demonstrate whether HIV infection lowers CD4 levels or low CD4 levels predispose to HIV infection. Moreover, male homosexuality is associated with both but causes neither. (Another example of a confounding variable).
Often there are a number of plausible explanations. For example, if a study shows a negative relation between height and age it could be concluded that people lose height as they get older, younger generations are getting taller, or that tall people have a reduced life expectancy when compared with short people. Cross sectional studies do not provide an explanation for their findings.
Rare conditions cannot efficiently be studied using cross sectional studies because even in large samples there may be no one with the disease. In this situation it is better to study a cross sectional sample of patients who already have the disease (a case series). In this way it was found in 1983 that of 1000 patients with AIDS, 727 were homosexual or bisexual men and 236 were intrvenous drug abusers. 6 The conclusion that individuals in these two groups had a higher relative risk was inescapable. The natural history of HIV infection was then studied using cohort studies and efficacy of treatments via case controlled studies and randomised clinical trials.
An example of a cross sectional study was the prevalence study of skull fractures in children admitted to hospital in Edinburgh from 1983 to 1989. 7 Note that although the study period was seven years it was not a longitudinal or cohort study because information about each subject was recorded at a single point in time.
A questionnaire based cross sectional study explored the relation between A&E attendance and alcohol consumption in elderly persons. 9
A recent example can be found in the BMJ , in which the prevalence of serious eye disease in a London population was evaluated. 10
Cross sectional studies
Cross sectional studies are the best way to determine prevalence
Are relatively quick
Can study multiple outcomes
Do not themselves differentiate between cause and effect or the sequence of events
CASE-CONTROL STUDIES
In contrast with cohort and cross sectional studies, case-control studies are usually retrospective. People with the outcome of interest are matched with a control group who do not. Retrospectively the researcher determines which individuals were exposed to the agent or treatment or the prevalence of a variable in each of the study groups. Where the outcome is rare, case-control studies may be the only feasible approach.
As some of the subjects have been deliberately chosen because they have the disease in question case-control studies are much more cost efficient than cohort and cross sectional studies—that is, a higher percentage of cases per study.
Case-control studies determine the relative importance of a predictor variable in relation to the presence or absence of the disease. Case-control studies are retrospective and cannot therefore be used to calculate the relative risk; this a prospective cohort study. Case-control studies can however be used to calculate odds ratios, which in turn, usually approximate to the relative risk.
How to run a case-control study
Decide on the research question to be answered. Formulate an hypothesis and then decide what will be measured and how. Specify the characteristics of the study group and decide how to construct a valid control group. Then compare the “exposure” of the two groups to each variable.
When conditions are uncommon, case-control studies generate a lot of information from relatively few subjects. When there is a long latent period between an exposure and the disease, case-control studies are the only feasible option. Consider the practicalities of a cohort study or cross sectional study in the assessment of new variant CJD and possible aetiologies. With less than 300 confirmed cases a cross sectional study would need about 200 000 subjects to include one symptomatic patient. Given a postulated latency of 10 to 30 years a cohort study would require both a vast sample size and take a generation to complete.
In case-control studies comparatively few subjects are required so more resources are available for studying each. In consequence a huge number of variables can be considered. This type of study is therefore useful for generating hypotheses that can then be tested using other types of study.
This flexibility of the variables studied comes at the expense of the restricted outcomes studied. The only outcome is the presence or absence of the disease or whatever criteria was chosen to select the cases.
The major problems with case-control studies are the familiar ones of confounding variables (see above) and bias. Bias may take two major forms.
Sampling bias
The patients with the disease may be a biased sample (for example, patients referred to a teaching hospital) or the controls may be biased (for example, volunteers, different ages, sex or socioeconomic group).
Observation and recall bias
As the study assesses predictor variables retrospectively there is great potential for a biased assessment of their presence and significance by the patient or the investigator, or both.
Overcoming sampling bias
Ideally the cases studied should be a random sample of all the patients with the disease. This is not only very difficult but in many instances is impossible because many cases may not have been diagnosed or have been misdiagnosed. For example, many cases of non-insulin dependent diabetes will not have sought medical attention and therefore be undiagnosed. Conversely many psychiatric diseases may be differently labelled in different countries and even by different doctors in the same country. As a result they will be misdiagnosed for the purposes of the study. However, in reality you are often left studying a sample of those patients who it is possible to recruit. Selecting the controls is often a more difficult problem.
To enable the controls to represent the same population as the cases, one of four techniques may be used.
A convenience sample—sampled in the same way as the cases, for example, attending the same outpatient department. While this is certainly convenient it may reduce the external validity of the study.
Matching—the controls may be a matched or unmatched random sample from the unaffected population. Again the problems of controlling for unknown influences is present but if the controls are too closely matched they may not be representative of the general population. “Over matching” may cause the true difference to be underestimated.
The advantage of matching is that it allows a smaller sample size for any given effect to be statistically significant.
Using two or more control groups. If the study demonstrates a significant difference between the patients with the outcome of interest and those without, even when the latter have been sampled in a number of different ways (for example, outpatients, in patients, GP patients) then the conclusion is more robust.
Using a population based sample for both cases and controls. It is possible to take a random sample of all the patients with a particular disease from specific registers. The control group can then be constructed by selecting age and sex matched people randomly selected from the same population as the area covered by the disease register.
Overcoming observation and recall bias
Overcoming retrospective recall bias can be achieved by using data recorded, for other purposes, before the outcome had occurred and therefore before the study had started. The success of this strategy is limited by the availability and reliability of the data collected. Another technique is blinding where neither the subject nor the observer know if they are a case or control subject. Nor are they aware of the study hypothesis. In practice this is often difficult or impossible and only partial blinding is practicable. It is usually possible to blind the subjects and observers to the study hypothesis by asking spurious questions. Observers can also be easily blinded to the case or control status of the patient where the relevant observation is not of the patient themselves but a laboratory test or radiograph.
Case-control studies
Case-control studies are simple to organise
Retrospectively compare two groups
Aim to identify predictors of an outcome
Permit assessment of the influence of predictors on outcome via calculation of an odds ratio
Useful for hypothesis generation
Can only look at one outcome
Bias is an major problem
Blinding cases to their case or control status is usually impracticable as they already know that they have a disease or illness. Similarly observers can hardly be blinded to the presence of physical signs, for example, cyanosis or dyspnoea.
As a result of the problems of matching, bias and confounding, case-control studies, are often flawed. They are however useful for generating hypotheses. These hypotheses can then be tested more rigorously by other methods—randomised controlled trials or cohort studies.
Case-control studies are very common. They are particularly useful for studying infrequent events, for example, cot death, survival from out of hospital cardiac arrest, and toxicological emergencies.
A recent example was the study of atrial fibrillation in middle aged men during exercise. 11
USING DATABASES FOR RESEARCH (SECONDARY DATA)
Pre-existing databases provide an excellent and convenient source of data. There are a host of such databases and the increasing archiving of information on computers means that this is an enlarging area for obtaining data. Table 3 lists some common examples of potentially useful databases.
Such databases enable vast numbers of people to be entered into a study prospectively or retrospectively. They can be used to construct a cohort, to produce a sample for a cross sectional study, or to identify people with certain conditions or outcomes and produce a sample for a case controlled study. A recent study used census data from 11 countries to look at the relation between social class and mortality in middle aged men. 12
These type of data are ordinarily collected by people other than the researcher and independently of any specific hypothesis. The opportunity for observer bias is thus diminished. The use of previously collected data is efficient and comparatively inexpensive and moreover the data are collected in a very standardised way, permitting comparisons over time and between different countries. However, because the data are collected for other purposes it may not be ideally suited to the testing of the current hypothesis, additionally it may be incomplete. This may result in sampling bias. For example, the electoral roll depends upon registration by each individual. Many homeless, mentally ill, and chronically sick people will not be registered. Similarly the notification of certain communicable diseases is a statutory responsibility for doctors in the UK: while it is probable that most cases of cholera are reported it is highly unlikely that most cases of food poisoning are.
Causes and associations
Because observational studies are not experiments (as are randomised controlled trials) it is difficult to control many external variables. In consequence when faced with a clear and significant association between some form of illness or cause of death and some environmental influence a judgement has to be made as to whether this is a causal link or simply an association. Table 4 outlines the points to be considered when making this judgement. 13
None of these judgements can provide indisputable evidence of cause and effect, but taken together they do permit the investigator to answer the fundamental questions “is there any other way to explain the available evidence?” and is there any other more likely than cause and effect?”
Qualitative studies can produce high quality information but all such studies can be influenced by known and unknown confounding variables. Appropriate use of observational studies permits investigation of prevalence, incidence, associations, causes, and outcomes. Where there is little evidence on a subject they are cost effective ways of producing and investigating hypotheses before larger and more expensive study designs are embarked upon. In addition they are often the only realistic choice of research methodology, particularly where a randomised controlled trial would be impractical or unethical.
Cohort studies look forwards in time by following up each subject
Subjects are selected before the outcome of interest is observed
They establish the sequence of events
Numerous outcomes can be studied
They are the best way to establish the incidence of a disease
They are a good way to determine causes of diseases
The principal summary statistic of cohort studies is the relative risk ratio
If prospective, they are expensive and often take a long time for sufficient outcome events to occur to produce meaningful results
Cross sectional studies look at each subject at one point in time only
Subjects are selected without regard to the outcome of interest
Less expensive
They are the best way to determine prevalence
The principal summary statistic of cross sectional studies is the odds ratio
Weaker evidence of causality than cohort studies
Inaccurate when studying rare conditions
Case-control studies look back at what has happened to each subject
Subjects are selected specifically on the basis of the outcome of interest
Efficient (small sample sizes)
Produce odds ratios that approximate to relative risks for each variable studied
Prone to sampling bias and retrospective analysis bias
Only one outcome is studied
GLOSSARY OF TERMS
The inclusion of subjects or methods such that the results obtained are not truly representative of the population from which it is drawn
The process by which the researcher and or the subject is ignorant of which intervention or exposure has occurred.
Cochrane database
An international collaborative project collating peer reviewed prospective randomised clinical trials.
Is a component of a population identified so that one or more characteristic can be studied as it ages through time.
Confounding variable
A variable that is associated with both the exposure and outcome of interest that is not the variable being studied.
A group of people without the condition of interest, or unexposed to or not treated with the agent of interest.
False positive
A test result that suggests that the subject has a specific disease or condition when in fact the subject does not.
Is a rate and therefore is always related either explicitly or by implication to a time period. With regard to disease it can be defined as the number of new cases that develop during a specified time interval.
A period of time between exposure to an agent and the development of symptoms, signs, or other evidence of changes associated with that exposure.
The process by which each case is matched with one or more controls, which have been deliberately chosen to be as similar as the test subjects in all regards other than the variable being studied.

Observational study
A study in which no intervention is made (in contrast with an experimental study). Such studies provide estimates and examine associations of events in their natural settings without recourse to experimental intervention.
The ratio of the probability of an event occurring to the probability of non-occurrence. In a clinical setting this would be equivalent to the odds of a condition occurring in the exposed group divided by the odds of it occurring in the non-exposed group.
Is not defined by a time interval and is therefore not a rate. It may be defined as the number of cases of a disease that exist in a defined population at a specified point in time.
Randomised controlled trial
Subjects are assigned by statistically randomised methods to two or more groups. In doing so it is assumed that all variables other than the proposed intervention are evenly distributed between the groups. In this way bias is minimised.
Relative risk
This is the ratio of the probability of developing the condition if exposed to a certain variable compared with the probability if not exposed.
Response rate
The proportion of subjects who respond to either a treatment or a questionnaire.
Risk factor
A variable associated with a specific disease or outcome.
Validity—internal
The rigour with which a study has been designed and executed—that is, can the conclusion be relied upon?
Validity—external
The usefulness of the findings of a study with respect to other populations.
A value or quality that can vary between subjects and/or over time
- Download figure
- Open in new tab
- Download powerpoint
Study design for cohort studies.
Study design for cross sectional studies
Study design for case-control studies.
- Fowkes F , Fulton P. Critical appraisal of published research: introductory guidelines. BMJ 1991 ; 302 : 1136 –40.
- ↵ Lerner DJ , Kannel WB. Patterns of coronary heart disease morbidity and mortality in the sexes: a 26 year follow-up of the Framingham population. Am Heart J 1986 ; 111 : 383 –90. OpenUrl CrossRef PubMed Web of Science
- ↵ Doll R , Peto H. Mortality in relation to smoking. 40 years observation on female British doctors. BMJ 1989 ; 208 : 967 –73. OpenUrl
- ↵ Alberman ED , Butler NR, Sheridan MD. Visual acuity of a national sample (1958 cohort) at 7 years. Dev Med Child Neurol 1971 ; 13 : 9 –14. OpenUrl PubMed Web of Science
- ↵ Davey Smith G , Hart C, Blane D, et al . Adverse socioeconomic conditions in childhood and cause specific mortality: prospective observational study. BMJ 1998 ; 316 : 1631 –5. OpenUrl Abstract / FREE Full Text
- ↵ Goyder EC , Goodacre SW, Botha JL, et al . How do individuals with diabetes use the accident and emergency department? J Accid Emerg Med 1997 ; 14 : 371 –4. OpenUrl Abstract / FREE Full Text
- ↵ Jaffe HW , Bregman DJ, Selik RM. Acquired immune deficiency in the US: the first 1000 cases. J Inf Dis 1983 ; 148 : 339 –45. OpenUrl Abstract / FREE Full Text
- Johnstone AJ , Zuberi SH, Scobie WH. Skull fractures in children: a population study. J Accid Emerg Med 1996 ; 13 : 386 –9. OpenUrl Abstract / FREE Full Text
- ↵ van der Pol V , Rodgers H, Aitken P, et al . Does alcohol contribute to accident and emergency department attendance in elderly people? J Accid Emerg Med 1996 ; 13 : 258 –60. OpenUrl Abstract / FREE Full Text
- ↵ Reidy A , Minassian DC, Vafadis G, et al . BMJ 1998 ; 316 : 1643 –7. OpenUrl Abstract / FREE Full Text
- ↵ Karjaleinen , Kujala U, Kaprio J, et al . BMJ 1998 ; 316 : 1784 –5. OpenUrl FREE Full Text
- ↵ Kunst A , Groenhof F, Mackenbach J. BMJ 1998 ; 316 : 1636 –42. OpenUrl Abstract / FREE Full Text
- ↵ Hill AB , Hill ID. Bradford Hills principles of medical statistics. 12th edn. London: Edward Arnold, 1991.
Read the full text or download the PDF:
- Journal home
- Advance online publication
- About the journal
- J-STAGE home
- Annals of Clinical Epidemiolog ...
- Volume 4 (2022) Issue 2
- Article overview
Department of Health Services Research, Faculty of Medicine, University of Tsukuba Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine
Tokyo University of Science, Department of Information and Computer Technology
Corresponding author

2022 Volume 4 Issue 2 Pages 33-40
- Published: 2022 Received: - Available on J-STAGE: April 04, 2022 Accepted: - Advance online publication: - Revised: -
(compatible with EndNote, Reference Manager, ProCite, RefWorks)
(compatible with BibDesk, LaTeX)
Matching is a technique through which patients with and without an outcome of interest (in case-control studies) or patients with and without an exposure of interest (in cohort studies) are sampled from an underlying cohort to have the same or similar distributions of some characteristics. This technique is used to increase the statistical efficiency and cost efficiency of studies. In case-control studies, besides time in risk set sampling, controls are often matched for each case with respect to important confounding factors, such as age and sex, and covariates with a large number of values or levels, such as area of residence (e.g., post code) and clinics/hospitals. In the statistical analysis of matched case-control studies, fixed-effect models such as the Mantel-Haenszel odds ratio estimator and conditional logistic regression model are needed to stratify matched case-control sets and remove selection bias artificially introduced by sampling controls. In cohort studies, exact matching is used to increase study efficiency and remove or reduce confounding effects of matching factors. Propensity score matching is another matching method whereby patients with and without exposure are matched based on estimated propensity scores to receive exposure. If appropriately used, matching can improve study efficiency without introducing bias and could also present results that are more intuitive for clinicians.
Matching is mainly used in observational studies, including case-control and cohort studies. Matching is a technique by which patients with and without an outcome of interest (in case-control studies) or patients with and without an exposure of interest (in cohort studies) are sampled from an underlying cohort to have the same or similar distributions of characteristics such as age and sex.
The main purpose of matching is to increase study efficiency for data collection and subsequent statistical analysis. Matching helps researchers reduce the volume of data for collection without much loss of information (i.e., improving cost efficiency) and obtain more precise estimates than simple random sampling of the same number of patients (i.e., improving statistical efficiency). In addition, in cohort studies, matching can remove or reduce confounding effects of matching factors.
This paper aims to introduce basic principles of matching in case-control and cohort studies, with some recent examples.
Fig. 1 Graphical representation of cumulative incidence sampling (A), case-control sampling (B), and risk set sampling (C) for 10 example patients in a cohort. ● indicates an outcome onset and time at selection as a case. ○ indicates time at selection as a control.
1. Wacholder S. The case-control study as data missing by design: estimating risk differences. Epidemiology 1996;7:144–150.
2. Noma H, Tanaka S. Analysis of case-cohort designs with binary outcomes: improving efficiency using whole-cohort auxiliary information. Stat Methods Med Res 2017;26:691–706.

Fig. 2 Graphical representation of a risk set sampling for 10 example patients in a population-based cohort. ● indicates an outcome onset and time at selection as a case. ○ indicates time at selection as a control.

In a study requiring primary data collection, case-control study designs are efficient because only information on cases and selected controls, instead of all people in the underlying cohort, is collected and used for statistical analysis. Especially for rare outcomes, a cohort study recruiting many people to observe a sufficient number of outcomes is not feasible. However, a case-control design would still be feasible, with reduced costs and efforts.
3. Schuemie MJ, Ryan PB, Man KKC, Wong ICK, Suchard MA, Hripcsak G. A plea to stop using the case-control design in retrospective database studies. Stat Med 2019;38:4199–4208.
4. Schneeweiss S, Suissa S.Discussion of Schuemie et al. “A plea to stop using the case-control design in retrospective database studies”. Stat Med 2019;38:4209–4212.
Similar to cohort studies, case-control studies typically require confounder adjustment using stratified analysis or regression modeling. To further improve statistical efficiency in adjusted analyses, case-control studies may match controls on confounders to be adjusted for, i.e., sampling a control(s) with an identical (or nearly identical) value of confounders for each case. When the total number of cases and controls to be sampled is fixed, the adjusted odds ratio estimates are likely to be less variable (i.e., more statistically efficient) in case-control data matched on strong confounders than in unmatched data.
Besides common confounding factors such as age and sex, area of residence (e.g., post code) or clinics/hospitals (which patients are registered to or visit) are sometimes matched between cases and controls. If variables with a large number of values or levels (e.g., over 1,000 post codes or clinics/hospitals) are adjusted for as “surrogate” confounders in the statistical analysis, at least one case and one control in each area (or clinic/hospital) are needed; otherwise, the data are discarded in the fixed-effect models (stratification). Although a case and control may rarely come from the same area (or clinic/hospital) in unmatched case-control sampling, matching can ensure that the pairs (or sets) of cases and controls are derived from the same area (or clinics/hospitals). Consequently, the odds ratio adjusted for these variables can be efficiently estimated.
5. Rothman KJ, Lash TL. 6 Epidemiologic study design with validity and efficiency considerations. Modern epidemiology 4th edition. Lippincott Williams & Wilkins, 2021:105–140.
6. Marsh JL, Hutton JL, Binks K. Removal of radiation dose response effects: an example of over-matching. BMJ 2002;325:327–330.
7. Richardson K, Fox C, Maidment I, Steel N, Loke YK, Arthur A, et al. Anticholinergic drugs and risk of dementia: case-control study. BMJ 2018;361:k1315.
8. Lapi F, Azoulay L, Yin H, Nessim SJ, Suissa S. Concurrent use of diuretics, angiotensin converting enzyme inhibitors, and angiotensin receptor blockers with non-steroidal anti-inflammatory drugs and risk of acute kidney injury: nested case-control study. BMJ 2013;346:e8525.
9. Woodward M. Epidemiology: study design and data analysis. Chapman & Hall: Boca Raton, 1999:265.
10. Hennessy S, Bilker WB, Berlin JA, Strom BL. Factors influencing the optimal control-to-case ratio in matched case-control studies. Am J Epidemiol 1999;149:195–197.
Sometimes, a case cannot find a prespecified number of controls. For example, in a case-control study planning 1:4 matching, some cases could find only less than four controls. However, it is not necessary to exclude these pairs when matching factors or matched sets of cases and controls are stratified in the analysis. The mixture of pairs with different matching ratios will not result in a biased estimate as long as an adequate adjustment for matching factors is adopted.
11. Wang MH, Shugart YY, Cole SR, Platz EA. A simulation study of control sampling methods for nested case-control studies of genetic and molecular biomarkers and prostate cancer progression. Cancer Epidemiol Biomarkers Prev 2009;18:706–711.
To remove the selection bias artificially introduced by case-control matching, it is necessary to “stratify” data on matching factors in the statistical analysis. One traditional method is the Mantel-Haenszel odds ratio estimator that stratifies on matching factors themselves (e.g., subgroups by age group and sex, if controls are matched on these factors) or matched sets (e.g., each pair of a case and control). The Mantel-Haenszel estimator adjusts for matching factors as fixed effects and estimates a common odds ratio assumed to be constant across strata. The Mantel-Haenszel odds ratio estimator consistently estimates the common odds ratio when each stratum contains sparse data (e.g., only two patients, one case and one control, in each stratum) but the number of strata increases. Adjusting for confounding factors besides the matching factors by additional stratification within the matching factor strata is infeasible.
12. Pearce N. Analysis of matched case-control studies. BMJ 2016;352:i969.
13. Greenland S. Partial and marginal matching in case-control studies. Modern statistical methods in chronic disease epidemiology. Wiley: New York, NY, 1986:35–49.
Finally, time at matching (time from cohort entry, calendar time, or possibly age as time from birth) can be considered one of the “matching factors” in risk set sampling. If the hazard of disease incidence varies with time and the exposure prevalence changes during follow-up, time should be accounted for as a “confounder.” To do so, one can use the Mantel-Haenszel odds ratio estimator or a conditional logistic regression model, which estimates the hazard ratio constant over time (and across other matching factors, if any) that would be modeled by the Cox proportional hazards model in an underlying cohort.
14. Hayashi M, Takamatsu I, Kanno Y, Yoshida T, Abe T, Sato Y, Japanese Calciphylaxis Study Group. A case-control study of calciphylaxis in Japanese end-stage renal disease patients. Nephrol Dial Transplant 2012;27:1580–1584.
15. Iwagami M, Taniguchi Y, Jin X, Adomi M, Mori T, Hamada S, et al. Association between recorded medical diagnoses and incidence of long-term care needs certification: a case control study using linked medical and long-term care data in two Japanese cities. Annals Clin Epidemiol 2019;1:56–68.
A matched cohort study may also be conducted from a practical viewpoint: it would provide an intuitive presentation of patient characteristics in “comparable” exposure groups matched on important confounding factors such as age, sex, and calendar time. As crude absolute measures (such as risks and rates) during the follow-up period are easily summarized in exposed and unexposed patients, clinicians unfamiliar with statistical analysis can grasp the difference between the two groups in a non-statistical manner.
Fig. 3 Graphical representation of a matched-pair cohort study for 10 example patients in a cohort. Solid lines indicate that people are exposed, dotted lines denote that people are not exposed, and ● indicates the incidence of outcome.

Fig. 4 Graphical representation of a matched-pair cohort study for 10 example patients in a population-based cohort. Solid lines denote that people are exposed, dotted lines denote that people are not exposed, ▼ indicates the timing of the matched-pair cohort inclusion in the exposed group, ▽ indicates the timing of the matched-pair cohort inclusion in the non-exposed group, and ● indicates the incidence of outcome.
16. Suissa S, Dell’Aniello S. Time-related biases in pharmacoepidemiology. Pharmacoepidemiol Drug Saf 2020;29:1101–1110.
17. Thomas LE, Yang S, Wojdyla D, Schaubel DE. Matching with time-dependent treatments: a review and look forward. Stat Med 2020;39:2350–2370.

18. Greenland S, Morgenstern H. Matching and efficiency in cohort studies. Am J Epidemiol 1990;131:151–159.
Regarding the matching ratio, 1:4 or 1:5 is sometimes chosen in matched-pair cohort studies, whereas 1:1 may be chosen more frequently to prioritize simplicity and intuitiveness. Mixed matching ratios (meaning that, for example, some pairs are matched in a ratio of 1:4, whereas other pairs are matched by a ratio of 1:3, 1:2, or 1:1 between exposed and unexposed people) will not cause bias if matching variables or matched sets are adjusted for in the analysis. In contrast, as such varying matching ratios do not balance the distributions of matching factors in exposed and unexposed people, the unadjusted comparison in the matched cohort still suffers from confounding bias.
Matching with or without replacement remains the choice of researchers, although matching without replacement may be more intuitive for clinicians.
19. Shinozaki T, Mansournia MA, Matsuyama Y. On hazard ratio estimators by proportional hazards models in matched-pair cohort studies. Emerg Themes Epidemiol 2017;14:6.
20. Sjölander A, Greenland S. Ignoring the matching variables in cohort studies – when is it valid and why? Stat Med 2013;32:4696–4708.
21. Sutradhar R, Baxter NN, Austin PC. Terminating observation within matched pairs of subjects in a matched cohort analysis: a Monte Carlo simulation study. Stat Med 2016;35:294–304.
22. Shinozaki T, Mansournia MA. Hazard ratio estimators after terminating observation within matched pairs in sibling and propensity score matched designs. Int J Biostat 2019;15.
23. Yasunaga H. Introduction to applied statistics—chapter 1 propensity score analysis. Annals Clin Epidemiol 2020;2:33–37.
24. Abadie A, Imbens GW. Matching on the estimated propensity score. Econometrica 2016;84:781–807.
25. Shinozaki T, Nojima M. Misuse of regression adjustment for additional confounders following insufficient propensity score balancing. Epidemiology 2019;30:541–548.
26. Ohbe H, Goto T, Miyamoto Y, Yasunaga H. Risk of cardiovascular events after spouse’s ICU admission. Circulation 2020;142:1691–1693.
27. Nagasu H, Yano Y, Kanegae H, Heerspink HJL, Nangaku M, Hirakawa Y, et al. Kidney outcomes associated with SGLT2 inhibitors versus other glucose-lowering drugs in real-world clinical practice: the Japan chronic kidney disease database. Diabetes Care 2021;44:2542–2551.
We have provided an overview and some recent examples of matching in case-control and cohort studies. Matching in case-control studies can increase study efficiency, including both cost and statistical efficiencies. Nevertheless, caution is still warranted since inappropriate sampling of controls and application of statistical analysis without stratification would result in a biased estimate. In cohort studies, exact matching can increase efficiency and remove or reduce the confounding effect of matching factors, whereas a propensity score matching can be used to balance the distributions of measured confounding factors between exposed and unexposed individuals. If appropriately used, matching can improve study efficiency without introducing bias and can present results that are more intuitive for clinicians.
We would like to thank Dr. Hiroyuki Ohbe of the Department of Clinical Epidemiology and Health Economics, School of Public Health, The University of Tokyo, and Dr. Motohiko Adomi in the Department of Epidemiology, Harvard T.H. Chan School of Public Health, for their critical reading of the manuscript and feedback.
No potential competing interests relevant to this paper are reported.
- 1. Wacholder S. The case-control study as data missing by design: estimating risk differences. Epidemiology 1996; 7 :144–150.
- 2. Noma H, Tanaka S. Analysis of case-cohort designs with binary outcomes: improving efficiency using whole-cohort auxiliary information. Stat Methods Med Res 2017; 26 :691–706.
- 3. Schuemie MJ, Ryan PB, Man KKC, Wong ICK, Suchard MA, Hripcsak G. A plea to stop using the case-control design in retrospective database studies. Stat Med 2019; 38 :4199–4208.
- 4. Schneeweiss S, Suissa S.Discussion of Schuemie et al. “A plea to stop using the case-control design in retrospective database studies”. Stat Med 2019; 38 :4209–4212.
- 5. Rothman KJ, Lash TL. 6 Epidemiologic study design with validity and efficiency considerations. Modern epidemiology 4th edition. Lippincott Williams & Wilkins, 2021:105–140.
- 6. Marsh JL, Hutton JL, Binks K. Removal of radiation dose response effects: an example of over-matching. BMJ 2002; 325 :327–330.
- 7. Richardson K, Fox C, Maidment I, Steel N, Loke YK, Arthur A, et al. Anticholinergic drugs and risk of dementia: case-control study. BMJ 2018; 361 :k1315.
- 8. Lapi F, Azoulay L, Yin H, Nessim SJ, Suissa S. Concurrent use of diuretics, angiotensin converting enzyme inhibitors, and angiotensin receptor blockers with non-steroidal anti-inflammatory drugs and risk of acute kidney injury: nested case-control study. BMJ 2013; 346 :e8525.
- 9. Woodward M. Epidemiology: study design and data analysis. Chapman & Hall: Boca Raton, 1999:265.
- 10. Hennessy S, Bilker WB, Berlin JA, Strom BL. Factors influencing the optimal control-to-case ratio in matched case-control studies. Am J Epidemiol 1999; 149 :195–197.
- 11. Wang MH, Shugart YY, Cole SR, Platz EA. A simulation study of control sampling methods for nested case-control studies of genetic and molecular biomarkers and prostate cancer progression. Cancer Epidemiol Biomarkers Prev 2009; 18 :706–711.
- 12. Pearce N. Analysis of matched case-control studies. BMJ 2016; 352 :i969.
- 13. Greenland S. Partial and marginal matching in case-control studies. Modern statistical methods in chronic disease epidemiology. Wiley: New York, NY, 1986:35–49.
- 14. Hayashi M, Takamatsu I, Kanno Y, Yoshida T, Abe T, Sato Y, Japanese Calciphylaxis Study Group. A case-control study of calciphylaxis in Japanese end-stage renal disease patients. Nephrol Dial Transplant 2012; 27 :1580–1584.
- 15. Iwagami M, Taniguchi Y, Jin X, Adomi M, Mori T, Hamada S, et al. Association between recorded medical diagnoses and incidence of long-term care needs certification: a case control study using linked medical and long-term care data in two Japanese cities. Annals Clin Epidemiol 2019; 1 :56–68.
- 16. Suissa S, Dell’Aniello S. Time-related biases in pharmacoepidemiology. Pharmacoepidemiol Drug Saf 2020; 29 :1101–1110.
- 17. Thomas LE, Yang S, Wojdyla D, Schaubel DE. Matching with time-dependent treatments: a review and look forward. Stat Med 2020; 39 :2350–2370.
- 18. Greenland S, Morgenstern H. Matching and efficiency in cohort studies. Am J Epidemiol 1990; 131 :151–159.
- 19. Shinozaki T, Mansournia MA, Matsuyama Y. On hazard ratio estimators by proportional hazards models in matched-pair cohort studies. Emerg Themes Epidemiol 2017; 14 :6.
- 20. Sjölander A, Greenland S. Ignoring the matching variables in cohort studies – when is it valid and why? Stat Med 2013; 32 :4696–4708.
- 21. Sutradhar R, Baxter NN, Austin PC. Terminating observation within matched pairs of subjects in a matched cohort analysis: a Monte Carlo simulation study. Stat Med 2016; 35 :294–304.
- 22. Shinozaki T, Mansournia MA. Hazard ratio estimators after terminating observation within matched pairs in sibling and propensity score matched designs. Int J Biostat 2019; 15 .
- 23. Yasunaga H. Introduction to applied statistics—chapter 1 propensity score analysis. Annals Clin Epidemiol 2020; 2 :33–37.
- 24. Abadie A, Imbens GW. Matching on the estimated propensity score. Econometrica 2016; 84 :781–807.
- 25. Shinozaki T, Nojima M. Misuse of regression adjustment for additional confounders following insufficient propensity score balancing. Epidemiology 2019; 30 :541–548.
- 26. Ohbe H, Goto T, Miyamoto Y, Yasunaga H. Risk of cardiovascular events after spouse’s ICU admission. Circulation 2020; 142 :1691–1693.
- 27. Nagasu H, Yano Y, Kanegae H, Heerspink HJL, Nangaku M, Hirakawa Y, et al. Kidney outcomes associated with SGLT2 inhibitors versus other glucose-lowering drugs in real-world clinical practice: the Japan chronic kidney disease database. Diabetes Care 2021; 44 :2542–2551.

Register with J-STAGE for free!
Already have an account? Sign in here
GFMER Geneva Foundation for Medical Education and Research
- Annual reports
- GFMER members
- Country coordinators
- Obstetric fistula
- Cervical cancer
- Emergency and surgical care
- Picture of the week
- Social media
- Free medical journals
- Medical schools
Training course in research methodology and research protocol development 2021
Reproductive health
Cohort study
Measures of disease and association
Current and historical cohort studies
Case-control study
Measure of association
Population and hospital-based case-controls studies
Confounding and bias
Advantages and disadvantages of cohort and case-control studies
Cohort studies
Advantages.
- Allow complete information on the subject’s exposure, including quality control of data, and experience thereafter.
- Provide a clear temporal sequence of exposure and disease.
- Give an opportunity to study multiple outcomes related to a specific exposure.
- Permit calculation of incidence rates (absolute risk) as well as relative risk.
- Methodology and results are easily understood by non-epidemiologists.
- Enable the study of relatively rare exposures.
Disadvantages.
- Not suited for the study of rare diseases because a large number of subjects is required.
- Not suited when the time between exposure and disease manifestation is very long, although this can be overcome in historical cohort studies.
- Exposure patterns, for example the composition of oral contraceptives, may change during the course of the study and make the results irrelevant.
- Maintaining high rates of follow-up can be difficult.
- Expensive to carry out because a large number of subjects is usually required.
- Baseline data may be sparse because the large number of subjects does not allow for long interviews.
Case-control studies
- Permit the study of rare diseases.
- Permit the study of diseases with long latency between exposure and manifestation.
- Can be launched and conducted over relatively short time periods.
- Relatively inexpensive as compared to cohort studies.
- Can study multiple potential causes of disease.
- Information on exposure and past history is primarily based on interview and may be subject to recall bias.
- Validation of information on exposure is difficult, or incomplete, or even impossible.
- By definition, concerned with one disease only.
- Cannot usually provide information on incidence rates of disease.
- Generally incomplete control of extraneous variables.
- Choice of appropriate control group may be difficult.
- Methodology may be hard to comprehend for non-epidemiologists and correct interpretation of results may be difficult.
Assessment of causality

An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
- Advanced Search
- Journal List
- HHS Author Manuscripts

Observational Studies: Cohort and Case-Control Studies
Jae w. song.
1 Research Fellow, Section of Plastic Surgery, Department of Surgery The University of Michigan Health System; Ann Arbor, MI
Kevin C. Chung
2 Professor of Surgery, Section of Plastic Surgery, Department of Surgery The University of Michigan Health System; Ann Arbor, MI
Observational studies are an important category of study designs. To address some investigative questions in plastic surgery, randomized controlled trials are not always indicated or ethical to conduct. Instead, observational studies may be the next best method to address these types of questions. Well-designed observational studies have been shown to provide results similar to randomized controlled trials, challenging the belief that observational studies are second-rate. Cohort studies and case-control studies are two primary types of observational studies that aid in evaluating associations between diseases and exposures. In this review article, we describe these study designs, methodological issues, and provide examples from the plastic surgery literature.
Because of the innovative nature of the specialty, plastic surgeons are frequently confronted with a spectrum of clinical questions by patients who inquire about “best practices.” It is thus essential that plastic surgeons know how to critically appraise the literature to understand and practice evidence-based medicine (EBM) and also contribute to the effort by carrying out high-quality investigations. 1 Well-designed randomized controlled trials (RCTs) have held the pre-eminent position in the hierarchy of EBM as level I evidence ( Table 1 ). However, RCT methodology, which was first developed for drug trials, can be difficult to conduct for surgical investigations. 3 Instead, well-designed observational studies, recognized as level II or III evidence, can play an important role in deriving evidence for plastic surgery. Results from observational studies are often criticized for being vulnerable to influences by unpredictable confounding factors. However, recent work has challenged this notion, showing comparable results between observational studies and RCTs. 4 , 5 Observational studies can also complement RCTs in hypothesis generation, establishing questions for future RCTs, and defining clinical conditions.
Levels of Evidence Based Medicine
From REF 1 .
Observational studies fall under the category of analytic study designs and are further sub-classified as observational or experimental study designs ( Figure 1 ). The goal of analytic studies is to identify and evaluate causes or risk factors of diseases or health-related events. The differentiating characteristic between observational and experimental study designs is that in the latter, the presence or absence of undergoing an intervention defines the groups. By contrast, in an observational study, the investigator does not intervene and rather simply “observes” and assesses the strength of the relationship between an exposure and disease variable. 6 Three types of observational studies include cohort studies, case-control studies, and cross-sectional studies ( Figure 1 ). Case-control and cohort studies offer specific advantages by measuring disease occurrence and its association with an exposure by offering a temporal dimension (i.e. prospective or retrospective study design). Cross-sectional studies, also known as prevalence studies, examine the data on disease and exposure at one particular time point ( Figure 2 ). 6 Because the temporal relationship between disease occurrence and exposure cannot be established, cross-sectional studies cannot assess the cause and effect relationship. In this review, we will primarily discuss cohort and case-control study designs and related methodologic issues.

Analytic Study Designs. Adapted with permission from Joseph Eisenberg, Ph.D.

Temporal Design of Observational Studies: Cross-sectional studies are known as prevalence studies and do not have an inherent temporal dimension. These studies evaluate subjects at one point in time, the present time. By contrast, cohort studies can be either retrospective (latin derived prefix, “retro” meaning “back, behind”) or prospective (greek derived prefix, “pro” meaning “before, in front of”). Retrospective studies “look back” in time contrasting with prospective studies, which “look ahead” to examine causal associations. Case-control study designs are also retrospective and assess the history of the subject for the presence or absence of an exposure.
COHORT STUDY
The term “cohort” is derived from the Latin word cohors . Roman legions were composed of ten cohorts. During battle each cohort, or military unit, consisting of a specific number of warriors and commanding centurions, were traceable. The word “cohort” has been adopted into epidemiology to define a set of people followed over a period of time. W.H. Frost, an epidemiologist from the early 1900s, was the first to use the word “cohort” in his 1935 publication assessing age-specific mortality rates and tuberculosis. 7 The modern epidemiological definition of the word now means a “group of people with defined characteristics who are followed up to determine incidence of, or mortality from, some specific disease, all causes of death, or some other outcome.” 7
Study Design
A well-designed cohort study can provide powerful results. In a cohort study, an outcome or disease-free study population is first identified by the exposure or event of interest and followed in time until the disease or outcome of interest occurs ( Figure 3A ). Because exposure is identified before the outcome, cohort studies have a temporal framework to assess causality and thus have the potential to provide the strongest scientific evidence. 8 Advantages and disadvantages of a cohort study are listed in Table 2 . 2 , 9 Cohort studies are particularly advantageous for examining rare exposures because subjects are selected by their exposure status. Additionally, the investigator can examine multiple outcomes simultaneously. Disadvantages include the need for a large sample size and the potentially long follow-up duration of the study design resulting in a costly endeavor.

Cohort and Case-Control Study Designs
Advantages and Disadvantages of the Cohort Study
Cohort studies can be prospective or retrospective ( Figure 2 ). Prospective studies are carried out from the present time into the future. Because prospective studies are designed with specific data collection methods, it has the advantage of being tailored to collect specific exposure data and may be more complete. The disadvantage of a prospective cohort study may be the long follow-up period while waiting for events or diseases to occur. Thus, this study design is inefficient for investigating diseases with long latency periods and is vulnerable to a high loss to follow-up rate. Although prospective cohort studies are invaluable as exemplified by the landmark Framingham Heart Study, started in 1948 and still ongoing, 10 in the plastic surgery literature this study design is generally seen to be inefficient and impractical. Instead, retrospective cohort studies are better indicated given the timeliness and inexpensive nature of the study design.
Retrospective cohort studies, also known as historical cohort studies, are carried out at the present time and look to the past to examine medical events or outcomes. In other words, a cohort of subjects selected based on exposure status is chosen at the present time, and outcome data (i.e. disease status, event status), which was measured in the past, are reconstructed for analysis. The primary disadvantage of this study design is the limited control the investigator has over data collection. The existing data may be incomplete, inaccurate, or inconsistently measured between subjects. 2 However, because of the immediate availability of the data, this study design is comparatively less costly and shorter than prospective cohort studies. For example, Spear and colleagues examined the effect of obesity and complication rates after undergoing the pedicled TRAM flap reconstruction by retrospectively reviewing 224 pedicled TRAM flaps in 200 patients over a 10-year period. 11 In this example, subjects who underwent the pedicled TRAM flap reconstruction were selected and categorized into cohorts by their exposure status: normal/underweight, overweight, or obese. The outcomes of interest were various flap and donor site complications. The findings revealed that obese patients had a significantly higher incidence of donor site complications, multiple flap complications, and partial flap necrosis than normal or overweight patients. An advantage of the retrospective study design analysis is the immediate access to the data. A disadvantage is the limited control over the data collection because data was gathered retrospectively over 10-years; for example, a limitation reported by the authors is that mastectomy flap necrosis was not uniformly recorded for all subjects. 11
An important distinction lies between cohort studies and case-series. The distinguishing feature between these two types of studies is the presence of a control, or unexposed, group. Contrasting with epidemiological cohort studies, case-series are descriptive studies following one small group of subjects. In essence, they are extensions of case reports. Usually the cases are obtained from the authors' experiences, generally involve a small number of patients, and more importantly, lack a control group. 12 There is often confusion in designating studies as “cohort studies” when only one group of subjects is examined. Yet, unless a second comparative group serving as a control is present, these studies are defined as case-series. The next step in strengthening an observation from a case-series is selecting appropriate control groups to conduct a cohort or case-control study, the latter which is discussed in the following section about case-control studies. 9
Methodological Issues
Selection of subjects in cohort studies.
The hallmark of a cohort study is defining the selected group of subjects by exposure status at the start of the investigation. A critical characteristic of subject selection is to have both the exposed and unexposed groups be selected from the same source population ( Figure 4 ). 9 Subjects who are not at risk for developing the outcome should be excluded from the study. The source population is determined by practical considerations, such as sampling. Subjects may be effectively sampled from the hospital, be members of a community, or from a doctor's individual practice. A subset of these subjects will be eligible for the study.

Levels of Subject Selection. Adapted from Ref 9 .
Attrition Bias (Loss to follow-up)
Because prospective cohort studies may require long follow-up periods, it is important to minimize loss to follow-up. Loss to follow-up is a situation in which the investigator loses contact with the subject, resulting in missing data. If too many subjects are loss to follow-up, the internal validity of the study is reduced. A general rule of thumb requires that the loss to follow-up rate not exceed 20% of the sample. 6 Any systematic differences related to the outcome or exposure of risk factors between those who drop out and those who stay in the study must be examined, if possible, by comparing individuals who remain in the study and those who were loss to follow-up or dropped out. It is therefore important to select subjects who can be followed for the entire duration of the cohort study. Methods to minimize loss to follow-up are listed in Table 3 .
Methods to Minimize Loss to Follow-Up
Adapted from REF 2 .
CASE-CONTROL STUDIES
Case-control studies were historically borne out of interest in disease etiology. The conceptual basis of the case-control study is similar to taking a history and physical; the diseased patient is questioned and examined, and elements from this history taking are knitted together to reveal characteristics or factors that predisposed the patient to the disease. In fact, the practice of interviewing patients about behaviors and conditions preceding illness dates back to the Hippocratic writings of the 4 th century B.C. 7
Reasons of practicality and feasibility inherent in the study design typically dictate whether a cohort study or case-control study is appropriate. This study design was first recognized in Janet Lane-Claypon's study of breast cancer in 1926, revealing the finding that low fertility rate raises the risk of breast cancer. 13 , 14 In the ensuing decades, case-control study methodology crystallized with the landmark publication linking smoking and lung cancer in the 1950s. 15 Since that time, retrospective case-control studies have become more prominent in the biomedical literature with more rigorous methodological advances in design, execution, and analysis.
Case-control studies identify subjects by outcome status at the outset of the investigation. Outcomes of interest may be whether the subject has undergone a specific type of surgery, experienced a complication, or is diagnosed with a disease ( Figure 3B ). Once outcome status is identified and subjects are categorized as cases, controls (subjects without the outcome but from the same source population) are selected. Data about exposure to a risk factor or several risk factors are then collected retrospectively, typically by interview, abstraction from records, or survey. Case-control studies are well suited to investigate rare outcomes or outcomes with a long latency period because subjects are selected from the outset by their outcome status. Thus in comparison to cohort studies, case-control studies are quick, relatively inexpensive to implement, require comparatively fewer subjects, and allow for multiple exposures or risk factors to be assessed for one outcome ( Table 4 ). 2 , 9
Advantages and Disadvantages of the Case-Control Study
An example of a case-control investigation is by Zhang and colleagues who examined the association of environmental and genetic factors associated with rare congenital microtia, 16 which has an estimated prevalence of 0.83 to 17.4 in 10,000. 17 They selected 121 congenital microtia cases based on clinical phenotype, and 152 unaffected controls, matched by age and sex in the same hospital and same period. Controls were of Hans Chinese origin from Jiangsu, China, the same area from where the cases were selected. This allowed both the controls and cases to have the same genetic background, important to note given the investigated association between genetic factors and congenital microtia. To examine environmental factors, a questionnaire was administered to the mothers of both cases and controls. The authors concluded that adverse maternal health was among the main risk factors for congenital microtia, specifically maternal disease during pregnancy (OR 5.89, 95% CI 2.36-14.72), maternal toxicity exposure during pregnancy (OR 4.76, 95% CI 1.66-13.68), and resident area, such as living near industries associated with air pollution (OR 7.00, 95% CI 2.09-23.47). 16 A case-control study design is most efficient for this investigation, given the rarity of the disease outcome. Because congenital microtia is thought to have multifactorial causes, an additional advantage of the case-control study design in this example is the ability to examine multiple exposures and risk factors.
Selection of Cases
Sampling in a case-control study design begins with selecting the cases. In a case-control study, it is imperative that the investigator has explicitly defined inclusion and exclusion criteria prior to the selection of cases. For example, if the outcome is having a disease, specific diagnostic criteria, disease subtype, stage of disease, or degree of severity should be defined. Such criteria ensure that all the cases are homogenous. Second, cases may be selected from a variety of sources, including hospital patients, clinic patients, or community subjects. Many communities maintain registries of patients with certain diseases and can serve as a valuable source of cases. However, despite the methodologic convenience of this method, validity issues may arise. For example, if cases are selected from one hospital, identified risk factors may be unique to that single hospital. This methodological choice may weaken the generalizability of the study findings. Another example is choosing cases from the hospital versus the community; most likely cases from the hospital sample will represent a more severe form of the disease than those in the community. 2 Finally, it is also important to select cases that are representative of cases in the target population to strengthen the study's external validity ( Figure 4 ). Potential reasons why cases from the original target population eventually filter through and are available as cases (study participants) for a case-control study are illustrated in Figure 5 .

Levels of Case Selection. Adapted from Ref 2 .
Selection of Controls
Selecting the appropriate group of controls can be one of the most demanding aspects of a case-control study. An important principle is that the distribution of exposure should be the same among cases and controls; in other words, both cases and controls should stem from the same source population. The investigator may also consider the control group to be an at-risk population, with the potential to develop the outcome. Because the validity of the study depends upon the comparability of these two groups, cases and controls should otherwise meet the same inclusion criteria in the study.
A case-control study design that exemplifies this methodological feature is by Chung and colleagues, who examined maternal cigarette smoking during pregnancy and the risk of newborns developing cleft lip/palate. 18 A salient feature of this study is the use of the 1996 U.S. Natality database, a population database, from which both cases and controls were selected. This database provides a large sample size to assess newborn development of cleft lip/palate (outcome), which has a reported incidence of 1 in 1000 live births, 19 and also enabled the investigators to choose controls (i.e., healthy newborns) that were generalizable to the general population to strengthen the study's external validity. A significant relationship with maternal cigarette smoking and cleft lip/palate in the newborn was reported in this study (adjusted OR 1.34, 95% CI 1.36-1.76). 18
Matching is a method used in an attempt to ensure comparability between cases and controls and reduces variability and systematic differences due to background variables that are not of interest to the investigator. 8 Each case is typically individually paired with a control subject with respect to the background variables. The exposure to the risk factor of interest is then compared between the cases and the controls. This matching strategy is called individual matching. Age, sex, and race are often used to match cases and controls because they are typically strong confounders of disease. 20 Confounders are variables associated with the risk factor and may potentially be a cause of the outcome. 8 Table 5 lists several advantages and disadvantages with a matching design.
Advantages and Disadvantages for Using a Matching Strategy
Multiple Controls
Investigations examining rare outcomes may have a limited number of cases to select from, whereas the source population from which controls can be selected is much larger. In such scenarios, the study may be able to provide more information if multiple controls per case are selected. This method increases the “statistical power” of the investigation by increasing the sample size. The precision of the findings may improve by having up to about three or four controls per case. 21 - 23
Bias in Case-Control Studies
Evaluating exposure status can be the Achilles heel of case-control studies. Because information about exposure is typically collected by self-report, interview, or from recorded information, it is susceptible to recall bias, interviewer bias, or will rely on the completeness or accuracy of recorded information, respectively. These biases decrease the internal validity of the investigation and should be carefully addressed and reduced in the study design. Recall bias occurs when a differential response between cases and controls occurs. The common scenario is when a subject with disease (case) will unconsciously recall and report an exposure with better clarity due to the disease experience. Interviewer bias occurs when the interviewer asks leading questions or has an inconsistent interview approach between cases and controls. A good study design will implement a standardized interview in a non-judgemental atmosphere with well-trained interviewers to reduce interviewer bias. 9
The STROBE Statement: The Strengthening the Reporting of Observational Studies in Epidemiology Statement
In 2004, the first meeting of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) group took place in Bristol, UK. 24 The aim of the group was to establish guidelines on reporting observational research to improve the transparency of the methods, thereby facilitating the critical appraisal of a study's findings. A well-designed but poorly reported study is disadvantaged in contributing to the literature because the results and generalizability of the findings may be difficult to assess. Thus a 22-item checklist was generated to enhance the reporting of observational studies across disciplines. 25 , 26 This checklist is also located at the following website: www.strobe-statement.org . This statement is applicable to cohort studies, case-control studies, and cross-sectional studies. In fact, 18 of the checklist items are common to all three types of observational studies, and 4 items are specific to each of the 3 specific study designs. In an effort to provide specific guidance to go along with this checklist, an “explanation and elaboration” article was published for users to better appreciate each item on the checklist. 27 Plastic surgery investigators should peruse this checklist prior to designing their study and when they are writing up the report for publication. In fact, some journals now require authors to follow the STROBE Statement. A list of participating journals can be found on this website: http://www.strobe-statement.org./index.php?id=strobe-endorsement .
Due to the limitations in carrying out RCTs in surgical investigations, observational studies are becoming more popular to investigate the relationship between exposures, such as risk factors or surgical interventions, and outcomes, such as disease states or complications. Recognizing that well-designed observational studies can provide valid results is important among the plastic surgery community, so that investigators can both critically appraise and appropriately design observational studies to address important clinical research questions. The investigator planning an observational study can certainly use the STROBE statement as a tool to outline key features of a study as well as coming back to it again at the end to enhance transparency in methodology reporting.
Acknowledgments
Supported in part by a Midcareer Investigator Award in Patient-Oriented Research (K24 AR053120) from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (to Dr. Kevin C. Chung).
None of the authors has a financial interest in any of the products, devices, or drugs mentioned in this manuscript.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

IMAGES
VIDEO
COMMENTS
Are you looking for a convenient way to deepen your understanding of the Bible? Whether you’re a beginner or a seasoned student of the Scriptures, free Bible study lessons in PDF format can be an excellent resource.
Are you looking to deepen your understanding of the Bible and strengthen your faith? Look no further. In this digital age, there are numerous resources available to aid in your spiritual growth. One such resource is free Bible study lessons...
When you’re performing research as part of your job or for a school assignment, you’ll probably come across case studies that help you to learn more about the topic at hand. But what is a case study and why are they helpful? Read on to lear...
Every published study should contain suffi- cient information to allow the reader to analyse the data with reference to these key points. In this article each
A cohort study is an observational study where individuals are classified (or selected) according to exposure status. (exposed or unexposed), and are followed
Cohort studies are used to study incidence, causes, and prognosis. Because they measure events in chronological order they can be used to distinguish between
This article discusses the observational analytic study designs, i.e., case–control and cohort studies. These two study.
Matching is a technique by which patients with and without an outcome of interest (in case-control studies) or patients with and without an exposure of interest
Cohort studies and case-control studies are two primary types of observational studies that aid in evaluating associations between diseases and exposures. In
The case–control study design compares the use of a drug (or other exposure) among those with a disease (cases) to the use of the drug among
The cohort study starts with the putative cause of disease, and observes the occurrence of disease relative to the hypothesized causal agent, while the case-
(3) Unlike cohort designs, we can only study one disease/outcome at a time. Page 18. Sampling Bias: The data collection in a case-control study begins
Cohort studies can be prospective or retrospective (Figure 2). Prospective studies are carried out from the present time into the future. Because prospective
Case-control study done in the population of an ongoing cohort study, thus "nested" inside the cohort study. In large cohorts, it is often more efficient to