Journals Library

An error has occurred in processing the XML document

An error occurred retrieving content to display, please try again.

Page not found (404)

Sorry - the page you requested could not be found.

Please choose a page from the navigation or try a website search above to find the information you need.

{{metadata.Title}}

{{metadata.Headline}}

An error has occurred in processing the XML document

{{author}}{{author}}{{($index < metadata.AuthorsAndEtalArray.length-1) ? ',' : '.'}}

An error has occurred in processing the XML document

An error has occurred in processing the XML document

{{metadata.Journal}} Volume: {{metadata.Volume}}, Issue:{{metadata.Issue}}, Published in {{metadata.PublicationDate | date:'MMMM yyyy'}}

https://dx.doi.org/{{metadata.DOI}}

Citation: {{author}}{{ (($index < metadata.AuthorsArray.length-1) && ($index <=6)) ? ', ' : '' }}{{(metadata.AuthorsArray.length <= 6) ? '.' : '' }} {{(metadata.AuthorsArray.length > 6) ? 'et al.' : ''}} {{metadata.Title}}. {{metadata.JournalShortName}} {{metadata.PublicationDate | date:'yyyy'}};{{metadata.Volume}}({{metadata.Issue}})

You might also be interested in:
{{classification.Category.Concept}}

Report Content

The full text of this issue is available as a PDF document from the Toolkit section on this page.

The full text of this issue is available as a PDF document from the Toolkit section on this page.

Abstract

OBJECTIVES

To review how heterogeneity has been examined in systematic reviews of diagnostic test accuracy studies.

DATA SOURCES

Centre for Reviews and Dissemination's Database of Abstracts of Reviews of Effects (DARE).

REVIEW METHODS

Systematic reviews that evaluated a diagnostic or screening test by including studies that compared a test with a reference test were identified from DARE. Reviews for which structured abstracts had been written up to December 2002 were screened for inclusion. Data extraction was undertaken using standardised data extraction forms.

RESULTS

A total of 189 systematic reviews met the inclusion criteria. The median number of studies included was 18. Meta-analyses have a higher number with a median of 22 studies compared with 11 for narrative reviews. Graphical plots to demonstrate the spread in study results were provided in 56% of meta-analyses; in 79% these were plots of sensitivity and specificity in the receiver operating characteristic (ROC) space. Statistical tests to identify heterogeneity were used in 32% of reviews: 41% of meta-analyses and 9% of reviews using narrative syntheses. The chi-squared test and Fisher's exact test to assess heterogeneity in individual aspects of test performance were the most common. In contrast, only 16% of meta-analyses used correlation coefficients to test for a threshold effect. A narrative synthesis was used in 30% of reviews. Of the meta-analyses, 52% carried out statistical pooling alone, 18% conducted only summary receiver operator characteristic (SROC) analyses and 30% used both methods of statistical synthesis. For those undertaking SROC analyses, the main differences between the models used were the weights chosen for the regression models, although in 42% of cases the use of, or choice of, weight was not provided. The proportion of reviews using statistical pooling alone has declined from 67% in 1995 to 42% in 2001, with a corresponding increase in the use of SROC methods, from 33% to 58%. However, two-thirds of those using SROC methods also carried out statistical pooling rather than presenting only SROC models. Reviews using SROC analyses also tended to present their results as some combination of sensitivity and specificity rather than using alternative, perhaps less clinically meaningful, means of data presentation such as diagnostic odds ratios. Three-quarters of meta-analyses attempted to investigate statistically possible sources of variation, using subgroup analysis or regression analysis. The impact of clinical or socio-demographic variables was investigated in 74% of these reviews and test- or threshold-related variables in 79%. At least one quality-related variable was investigated in 63% of reviews. Within this subset, the most commonly considered variables were the use of blinding, sample size, the reference test used and the avoidance of verification bias.

CONCLUSIONS

The emphasis on pooling individual aspects of diagnostic test performance and the under-use of statistical tests and graphical approaches to identify heterogeneity perhaps reflect the uncertainty in the most appropriate methods to use and also greater familiarity with more traditional indices of test accuracy. This indicates the difficulty and complexity of carrying out such reviews. In these cases it is strongly suggested that meta-analyses are carried out with the involvement of a statistician familiar with the field. Further methodological work on the statistical methods available for combining diagnostic test accuracy studies is needed, as are sufficiently large, prospectively designed primary studies of diagnostic test accuracy comparing two or more tests for the same target disorder. Use of individual patient data meta-analysis in diagnostic test accuracy reviews should be explored to allow heterogeneity to be considered in more detail.

Abstract

OBJECTIVES

To review how heterogeneity has been examined in systematic reviews of diagnostic test accuracy studies.

DATA SOURCES

Centre for Reviews and Dissemination's Database of Abstracts of Reviews of Effects (DARE).

REVIEW METHODS

Systematic reviews that evaluated a diagnostic or screening test by including studies that compared a test with a reference test were identified from DARE. Reviews for which structured abstracts had been written up to December 2002 were screened for inclusion. Data extraction was undertaken using standardised data extraction forms.

RESULTS

A total of 189 systematic reviews met the inclusion criteria. The median number of studies included was 18. Meta-analyses have a higher number with a median of 22 studies compared with 11 for narrative reviews. Graphical plots to demonstrate the spread in study results were provided in 56% of meta-analyses; in 79% these were plots of sensitivity and specificity in the receiver operating characteristic (ROC) space. Statistical tests to identify heterogeneity were used in 32% of reviews: 41% of meta-analyses and 9% of reviews using narrative syntheses. The chi-squared test and Fisher's exact test to assess heterogeneity in individual aspects of test performance were the most common. In contrast, only 16% of meta-analyses used correlation coefficients to test for a threshold effect. A narrative synthesis was used in 30% of reviews. Of the meta-analyses, 52% carried out statistical pooling alone, 18% conducted only summary receiver operator characteristic (SROC) analyses and 30% used both methods of statistical synthesis. For those undertaking SROC analyses, the main differences between the models used were the weights chosen for the regression models, although in 42% of cases the use of, or choice of, weight was not provided. The proportion of reviews using statistical pooling alone has declined from 67% in 1995 to 42% in 2001, with a corresponding increase in the use of SROC methods, from 33% to 58%. However, two-thirds of those using SROC methods also carried out statistical pooling rather than presenting only SROC models. Reviews using SROC analyses also tended to present their results as some combination of sensitivity and specificity rather than using alternative, perhaps less clinically meaningful, means of data presentation such as diagnostic odds ratios. Three-quarters of meta-analyses attempted to investigate statistically possible sources of variation, using subgroup analysis or regression analysis. The impact of clinical or socio-demographic variables was investigated in 74% of these reviews and test- or threshold-related variables in 79%. At least one quality-related variable was investigated in 63% of reviews. Within this subset, the most commonly considered variables were the use of blinding, sample size, the reference test used and the avoidance of verification bias.

CONCLUSIONS

The emphasis on pooling individual aspects of diagnostic test performance and the under-use of statistical tests and graphical approaches to identify heterogeneity perhaps reflect the uncertainty in the most appropriate methods to use and also greater familiarity with more traditional indices of test accuracy. This indicates the difficulty and complexity of carrying out such reviews. In these cases it is strongly suggested that meta-analyses are carried out with the involvement of a statistician familiar with the field. Further methodological work on the statistical methods available for combining diagnostic test accuracy studies is needed, as are sufficiently large, prospectively designed primary studies of diagnostic test accuracy comparing two or more tests for the same target disorder. Use of individual patient data meta-analysis in diagnostic test accuracy reviews should be explored to allow heterogeneity to be considered in more detail.

If you would like to receive a notification when this project publishes in the NIHR Journals Library, please submit your email address below.

An error has occurred in processing the XML document

 

Responses to this report

 

No responses have been published.

If you would like to submit a response to this publication, please do so using the form below.

Comments submitted to the NIHR Journals Library are electronic letters to the editor. They enable our readers to debate issues raised in research reports published in the Journals Library. We aim to post within 2 working days all responses that contribute substantially to the topic investigated, as determined by the Editors.

Your name and affiliations will be published with your comment.

Once published, you will not have the right to remove or edit your response. The Editors may add, remove, or edit comments at their absolute discretion.

By submitting your response, you are stating that you agree to the terms & conditions