Understanding Imaging Studies
Before we can embark on reviewing current literature on different imaging technologies, it is important to understand the strengths and weaknesses that define them. All portions of the studies need to be assessed: the design of the study, materials and methods, breakdown of the results and conclusions. Understanding these different components and critically evaluating them will allow the reader to avoid misinformation from erroneous conclusions, objectively reviewing the literature and prevent the same mistakes from happening in future studies. In a clinical setting, diagnosis is usually determined by a set of tests. We ask patients particular questions, perform vitality testing or take diagnostic images such as radiographs or CBCT. When we do these tests we are gathering information and determining the probability of a particular diagnosis given the results the tests provide. This event can be described as Bayes’ Theorem, which relates current probability to prior probability. Applied to a clinical scenario it involves the knowledge of the probability of a diagnosis prior to testing and the likelihood ratio, a measure of the test’s performance, derived from sensitivity and specificity, which allows the clinician to determine the probability of the correct diagnosis. Sensitivity and specificity are statistical measures. Sensitivity measures the proportion of actual positives and is complementary to the false negative rate, or true positive. Specificity measures the proportion of true negatives that are correctly identified and is complementary to the false positive rate, or true negative. A problem with the likelihood ratio, sensitivity, and specificity is that it can vary among different subject subgroups even when the test threshold remains constant. Test accuracy is misjudged when too narrow a range of disease patients or too narrow a range of disease-free patients are used. Spectrum bias has two elements: performance of the test changes when applied to different groups (healthy vs. diseased), and there is a bias, which results from this change in performance.
A spectrum of health towards different levels of disease (low, moderate or severe) has to be included to accurately assess the validity of diagnostic tests. As the clinical spectra of study patients changes then sensitivity and specificity changes accordingly. Therefore, indices of test accuracy calculated in one patient group cannot be generalized to other groups. Another factor that stems from spectrum bias that needs to be understood is prevalence of disease. A test’s performance is considered independent of the prevalence of disease. Disease status is usually part of a range, and usually an endpoint is determined to distinguish health from disease. This is the reason that in order to conduct a sound study, disease prevalence cannot approach 100%, in order to properly represent real world disease prevalence. Another bias, diagnostic review bias, can occur when the reference test result is not definitive and the study test results affect the diagnosis . There is a subjective component to test interpretation, and it is possible that if the examiner has prior knowledge of the diagnosis being studied, that it will influence the test interpretation. This can happen when the person collecting or reviewing the data is not blinded. Diagnostic review bias goes hand in hand with spectrum bias. If there is high disease prevalence in a particular study, chances are that the reviewers, if familiar with the study design, are likely to over diagnose, and therefore overtreat.
Chang stated that one cannot meaningfully discuss sensitivity or specificity without an explicitly defined, independently derived gold standard. The standards are usually imperfect, can be subject to both false positive and false negative results. As times change, a particular gold standard can be replaced by another one as a result of improved technology. In a study evaluating the efficacy of diagnostic imaging tests, a gold standard needs to be defined independently. The diagnostic tests cannot themselves be considered the gold standard, but rather an independent sample of the test. If they are used as gold standards, it leads to over diagnosis and overtreatment. Currently, a histological or surgical sample would be defined as gold standard to which the diagnostic tests should be compared. Proper sample size is of great importance when designing a study. An accurate conclusion from the obtained results can only be reached with an appropriate sample size. A small sample will give you a result, which may not be enough to detect differences and result in false negatives, and therefore be inconclusive.In diagnostic imaging studies there is a tendency to select experts in the particular field studied to interpret the diagnostic images. In order for the findings to be applicable in the real world clinical setting, a variety of experience level clinicians need to be included. This allows for establishing ease of use for the particular technology in question, as well as clinical applicability. If studies only use oral radiologists as examiners, then the results are only applicable in a clinical setting when an oral radiologist, not any other clinician, similarly evaluates the images.
Laboratory studies that do not use live patients, but rather samples to mimic a clinical scenario, must design the mock disease in a manner that imitates real disease. For the purpose of our study, in which cadaver mandibles and maxillae are used, replicated apical periodontitis for CBCT evaluation needs to resemble what we would encounter in live patients. In many of the previous studies done, lesions were made using burs. As will be explained in later sections, CBCT is a much more accurate technology when compared to conventional radiography. For this reason, bur holes look artificial and can be recognized by examiners as such. This limits the clinical applicability of the study that uses bur holes. Prior probability establishes the probability that a particular event (disease) will reflect established beliefs about the event (disease) before there is new evidence or information, which determines the probability of an outcome. Unfortunately it is not linked to the real disease prevalence. However, when there is a lack of independent samples, it will lead to the use of prior probability to determine the results of imaging studies.
Studies using the same tooth or periapical area as the subject of examination increase dramatically the prior probability and therefore increasing the perceived prevalence of disease in the study. On the same token, in a clinical setting, prior probability can lead to satisfaction of search, which happens when lesions remain undetected after an initial lesion is detected. Satisfaction of search has been used as an explanation for false negative findings or under interpreting in radiology, reducing the accuracy of the diagnostic evaluation, and therefore affecting the results of the test conducted .All of the above-mentioned concerns are prevalent in past and current imaging studies in dentistry. In the following sections we will discuss different technologies throughout time and studies related to them. With this established knowledge, we can better judge the strength of validity in the studies that supports existing techniques and technologies and help formulate a methodology for our study that better serves clinical endodontics.