Continuing my discussion from last week, these are additional questions from Greenhalgh (1) you should consider when you read a paper that discusses diagnostic testing.
6. Was the test shown to be reproducible both within and between observers? This is a question that gets at intra and inter-observer reliability. No matter what the test, if the same person conducts the same test on two occasions on a patient or subject who otherwise has remain unchanged, they will still get different results in some proportion of tests. This is true of all tests, but would trust a test with 99% reliability far more than we would one with 70% reliability. While this may be less of a problem for a diagnost9ic test where we read results in terms of numbers (such as blood cholesterol or heart rate), this can be more significant when applied to reading radiology results, for example.
7. What are the features of the test as derived from this validation study? You may have a test which is seen to be reliable, but the test itself could be invalid; that is, its sensitivity and specificity is far too low. If your test has too high a false negative rate it will mislead clinicians rather than illustrate features important and related to the patient. This is something that needs balance; for example, if we are looking at a test for color blindness and see that it is, say, 95% sensitive and 80% specific, we might not worry- no one dies from color blindness itself. On the other hand, as Greenhalgh points out, the Guthrie hell prick screening test done on infants to test for congenital hypothyroidism is 99% sensitive but has a positive predictive value of just 6% (meaning: it does a great job of identifying children with the condition at a very high rate of false positive findings) but here this may be okay because we cannot afford to miss any kid with this condition since it leads to mental handicap. For everyone else, you just need to repeat the test every now and again, which is small potatoes in the scheme of things.
8. Were the confidence intervals given for sensitivity, specificity and other features of the test? To refresh your memory, remember that the confidence interval demonstrates the possible range of results within which the true value lies. And also recall that the larger the sample size, the narrower the confidence interval, which is good.
9. Has a sensible “normal range” been derived from these results? If a test provides non-dichotomous results (such as temperature or blood pressure as examples) we have to determine when the results will be seen as abnormal. If our BP is 142/90, would we call that abnormal when we might see 138/90 as not? Or would we advise the patient that we might wish to recheck them in some short period of time? Defining “normal” can be rather difficult to do.
10. Has the test been placed in the context of other potential tests in the diagnostic sequence for the condition? In some cases, a single diagnostic test might suffice for us to begin treatment; i.e., such as blood pressure of 160/100. In other cases, there is a sequence of tests we use before we decide to being treating. We might use McMurray’s test as part of a sequence of tests for determining the presence of a torn meniscus but not take its findings on its own, since it has a low sensitivity and specificity.
These questions can help you determine whether or not you can apply the results of a paper looking at a diagnostic test to your patient.
1. Greenhalgh T. How to read a paper: the basics of evidence-based medicine. London; UK; BMJ Books, 2001:113-116