Methods
Participants
We included studies which investigated a representative sample of patients over the age of 18 years, who had symptoms suggestive of CAD (for example, chest pain or breathlessness), or who were asymptomatic but had risk factors for CAD (e.g. diabetes mellitus, hypertension).
Types of Studies
We included studies which compared findings on exercise ECG and exercise echocardiogram, to the presence or absence of CAD by angiography, examination post mortem, or after long-term follow-up for the development of an acute coronary syndrome. We limited inclusion to prospective studies in which the investigators were either blinded to the results of coronary angiography in the study population, or in which those performing angiography were blinded to the results of prior stress tests, or those conditions where both were blinded. To evaluate how exercise stress tests should be performed, we also included studies comparing different exercise protocols and strategies. Sufficient data had to be available to enable the construction of 2 × 2 tables.
We excluded studies investigating other cardiac pathologies, such as acute coronary syndromes, cardiomyopathies, valvular heart disease and cardiac failure, as well as those solely using pharmacological stressors (e.g. dobutamine, dipyridamole and adenosine) or solely using imaging modalities such as MPI without comparison. We also excluded studies in which the participants had a previous MI or known CAD.
Search Strategy
We performed an English language-limited search of MEDION (1966 to July 2009), MEDLINE (1966 to July 2009), EMBASE (1980–2009), CENTRAL (1966 to July 2009) and the Cochrane Library (issue 1, 2009), using the following MeSH terms: "electrocardiography", "exercise test", "echocardiography, stress", "myocardial ischemia", "angina pectoris", "coronary disease", "coronary artery disease", "diagnosis", "diagnostic errors", "diagnostic techniques and procedures", "diagnostic techniques, cardiovascular", "physical examination". We further narrowed the search by applying the "diagnosis" subheading. We also obtained guidelines from a simple search of TRIP database. Additional articles were found by search of the bibliographies of included articles. After screening the titles and abstracts, records were screened by two independent reviewers (DN and AB), after piloting on a sample of 20 studies. Discrepancies between the reviewers were resolved by a third independent reviewer (CH).
Data Extraction and Management
Two reviewers (DN and AB) extracted the following data: the details of the study population (country, age, number, characteristics), details of the reference standard and index test, blinding of the reference standard and the index test. Prevalence of CAD, true-positive, false-positive, true negative and false negative rates were also extracted, and where they were not reported they were calculated from the sensitivity, specificity and other parameters reported in the publication of the particular study.
These data were extracted for each test performed within each study. Where an individual study evaluated multiple tests (e.g. bicycle ECG and treadmill ECG) in the same population of patients, these patients were only counted once in the overall analysis to avoid double-counting of patient data. In subgroup analysis, the same methodology was used to ensure that there was no double-counting.
Assessment of Methodological Quality
The Quality Assessment of Diagnostic Accuracy Studies (QUADAS) instrument was used to assess the quality of the selected studies and potential bias. Quality assessment was completed by two independent reviewers (DN and AB). Any disagreements were resolved by discussion involving all researchers when appropriate.
Studies selected for analysis were given an A, B, C, or D rating. If insufficient data were given to be confident that a criterion had been met, it was assessed as not being met. In addition, our exclusion criteria included an unrepresentative spectrum and an invalid reference standard. Studies fulfilling all QUADAS criteria were rated A. Studies without total verification with the reference standard or with interpretation of the index feature unblinded to the results of the reference standard were rated D. Studies without an independent reference standard, with interpretation of the reference standard unblinded to the results of the index feature, or with an unduly long period between recording of the index feature and outcome were rated C. All other studies were rated B.
Statistical Analysis and Data Synthesis
Data were extracted by two reviewers (DN and AB). Any identified errors were discussed and corrected; 2 × 2 tables were reconstructed on the basis of information in the study or information retrieved from the study investigators. Whereas sensitivity and specificity explain how many of the subjects with or without CAD an exercise test can pick up, likelihood ratios try to quantify how much more likely patients with CAD are to have a certain test result compared with patients without CAD. We calculated the sensitivity, specificity, likelihood ratios for the presence (positive likelihood ratio) or absence (negative likelihood ratio) of CAD and pre- and post-test probabilities of the outcome. Although the classical approach of testing predictive ability of the positive and negative predictive values, likelihood ratios can yield more clinically useful information. Confidence intervals were calculated on the basis of the standard error of a proportion by use of STATA version 9.2.
We categorised studies according to ECG or echocardiographic testing, treadmill, bicycle, gender, mean age< 60, mean age> 60, low risk (prevalence< 10%), intermediate risk (prevalence 10%–30%), high risk (prevalence> 30%) and double-blinding. For each subgroup, we produced hierarchical summary receiver operating characteristic (HSROC) curves using STATA version 9.2. HSROC curves have been previously validated and used to express how sensitivity and specificity both vary across individual studies within meta-analyses of diagnostic accuracy.
We report both the pre- and post-test probabilities of CAD for each study in dumbbell plots, or for each meta-analysis using hypothetical pretest probabilities as in summary of findings tables. Meta-analysis was done with the bivariate method in STATA version 9.2 when at least four studies were available for that subgroup. Using Microsoft Excel version 2007, we constructed a Forest plot of the included studies by subgroup, with likelihood ratios instead of pre- and post-test probabilities of CAD.