Diagnostic Accuracy of Clinical Tests in Detecting Rotator Cuff Pathology

Purpose: There is minimal information on predictive value of strength-related clinical tests in detecting rotator cuff (RC) tear size and tendon reparability of large and massive tears. The purpose of this diagnostic study was to examine the validity of four strength-related clinical sign/tests in relation to RC tear size and reparability. Methods: This was a prospective blinded study of consecutive patients with a full thickness RC tear who underwent a repair. The magnetic resonance imaging (MRI) and arthroscopic surgery were used as the gold standards. Results: Eighty-five patients, 50 males (59%), age 65, SD=10 completed the study. There were 60 (71%) minor tears (small/ moderate) and 25 (29%) major tears (large/massive) with 70 (82%) patients achieving a full repair. The Jobe test had a sensitivity of 93% and 88% and a negative likelihood ratio (LR) of 0.16 and 0.27 for tendon reparability and tear size respectively. The dropping sign, hornblower sign and lift-off test had poor sensitivity (<60%) and high specificity (>98%) values with large positive LRs for tear size detection and tendon reparability. The validity indices in relation to MRI findings were similar to surgical findings. Conclusion: A negative Jobe test accurately ruled out the presence of a major tear, significant supraspinatus fatty infiltration and a need for partial repair. The dropping and hornblower signs and lift off test were highly specific and when positive, they confirmed the presence of a major tear, fatty infiltration in the corresponding muscle and difficulty achieving a full repair.

musculoskeletal condition. Many of these costs could be avoided, in particular imaging costs, through early clinical detection and appropriate early management. This highlights the important role of the clinical examination in achieving the initial diagnosis.
Unfortunately, difference in testing positions, diverse criteria used for positive test results and variability in reference standards impact the interpretation of the clinical tests' measurement properties.
While pain provocation tests have shown poor performance characteristics in confirming pathology due to low specificity, clinical tests or signs that are based on weakness and represent the integrity of specific muscles tend to have better specificity [7][8][9][10][11][12].
The most commonly used strength-related clinical examination tests or signs for pathology in supraspinatus, infraspinatus, teres minor and subscapularis muscles are the Jobe test [13], dropping sign [14], hornblower sign [15] and the lift-off test [16] respectively.
The majority of previous studies related to these clinical tests have examined their ability to detect the presence of a RC tear with minimal research existing on tear size detection [17][18][19]. To date, we are not aware of studies that have examined the value of these clinical tests in relation to tendon reparability which is affected by tear size, fatty infiltration and tendon quality. The initial concept of partial versus full repair was first introduced by Burkhart and colleagues approximately two decades ago [20][21][22]. The goal of the partial repair in patients with large/massive tears is to bring back the torn tendon to the tuberosities without excessive tension and to restore the humeral head force couple and fulcrum and improve the overall shoulder kinematics [20][21][22].
The increasing body of literature on large and massive tears and partial repairs in recent years [21,[24][25][26] warrants examining strength-related clinical tests in relation to tear size and tendon reparability. Establishing the relationship between clinical examination and these factors will expedite care pathways, reduce unnecessary health care visits and improve the clinical decision-making process. The primary objective of this study was therefore to examine the validity of four clinical examination sign/tests in estimating rotator cuff tear size and determining tendon reparability. The secondary objective was to examine the value of these clinical tests in relation to pathological changes in the corresponding muscles (associated tear, atrophy and fatty infiltration). The Magnetic Resonance Imaging (MRI) and arthroscopic surgery were used as the gold standards.

Participants
This prospective blinded diagnostic study was conducted at a tertiary shoulder center where consecutive surgical candidates for rotator cuff repair were examined. Inclusion criteria included pain and functional disability for more than 6 months which had failed non-operative treatment and presence of a full-thickness rotator cuff tear diagnosed on MRI and later confirmed by surgery.
Exclusion criteria included previous shoulder surgery on the affected side, presence of an active work-related shoulder injury, infection, avascular necrosis or frozen shoulder. Informed consent was obtained from all individual participants included in the study. This test was conducted at 90° of the scapular plane elevations with the thumb down. The outcomes were documented as negative when the patient reported no pain or pain without weakness and positive when weakness was detected (<5/5 manual muscle testing) with or without pain. The patient was asked to push against the examiner's hand while maintaining the elbow at 90° and shoulder at 45° of external rotation position. The outcomes were documented as positive when the forearm was dropped back to neutral position.  This sign involved observing the patient while bringing both hands to the mouth. The outcomes were documented as negative when the patient was able to externally rotate the arm in abduction and positive when patient was not able to reach the mouth without abducting the affected arm. The test was conducted with the patient in standing. The ability to internally rotate to lumbar spine (waist line) was first examined. While placing the dorsum of the hand against mid-lumbar spine, the patient lifted the affected side hand away from the back. The outcomes were documented as negative when the hand was lifted away from the back. An inability to perform this task was considered a positive lift-off test.

Clinical Examination
The clinical examination was conducted 2-3 weeks prior to surgery by a physical therapist to maintain the blindness of the orthopedic surgeon and independence of the surgical findings.

Predictors of Test Accuracy
Tear size and tendon reparability were based on surgical findings. The level of fatty infiltration in rotator cuff muscles and the number of tears extending to infraspinatus or subscapularis were based on MRI imaging findings. The overall tear size was measured arthroscopically using a calibrated probe and classified as small (<1cm), medium (1-3cm), large (3-5cm) and massive (>5cm) based on the largest dimension [28]. For the purpose of calculating sensitivity and specificity that require binominal variables, the small and medium tears were collapsed together as minor tears and large and massive tears were collapsed as major tears.

Surgical Findings
In terms of reparability, patients were classified into two categories of full and partial repair. Full repair was either an anatomical repair or a repair to the articular margin with less than 1cm residual defect. Partial repair referred to a residual defect of more than 1cm [24] and was done when a full repair was not feasible.

Magnetic Resonance Imaging Findings
The majority of the MRI studies were performed internally on a 1.5-T system (General Electric Medical Systems, Milwaukee, Wis.) using a 15 platform and dedicated GE shoulder surface coil.
However, all MRI images examined in this study were I.5T and the measurements were made on a PACS workstation using Agfa IMPAX software technology. All images were interpreted by a senior musculoskeletal trained radiologist with 21years of clinical experience. We examined the inter-examiner reliability of the MRI findings between the radiologist and an orthopedic surgeon (not involved in surgery of the participants) with shoulder subspecialty training and 10 years of clinical experience on a subsample of patients.
Presence of a full thickness tear of the supraspinatus tendon was examined on the sagittal T2 fat-suppressed images. Full thickness tears of the infraspinatus tendon were diagnosed by defining the infraspinatus muscle and musculotendinous junction on the sagittal images and following the tendon laterally to the attachment on the greater tuberosity. Teres minor and subscapularis tendons were evaluated on sagittal and axial proton-density fat-saturated images.
Fatty infiltration was documented for all muscles on the T1 sagittal image on the most lateral oblique image in which the spine is seen in contact with the scapular body as defined by Goutallier [29].
Accordingly, stage 0 corresponds to no fat, stage 1 corresponds to the muscle containing some fatty streaks, stage 2 referring to more muscle than fat, stage 3 corresponding to as much fat as muscle and stage 4 is fatter than muscle. The imaging results were specific to the muscle related to the clinical test as described in the original studies test [13][14][15][16]. For example, presence of fatty infiltration (stages 3-4 vs. stages 0-2) in supraspinatus muscle was examined against results of the Jobe test while presence of fatty infiltration in the infraspinatus muscle (stages 3-4 vs. stages 0-2) was examined in relation to the dropping and the hornblower signs as both tests are affected by the integrity of this muscle. The lift-off test was used to examine the presence of the subscapularis pathology.

Statistical Analysis
The sample size calculation was based on the estimation of the positive likelihood ratio (LRs). To detect a LR+ of 2-5, considered the least acceptable LR, a minimum sample of 70 to 79 patients was considered necessary [30].
Descriptive statistics were provided for all relevant data. The Kappa coefficient and percentage of agreement examined interexaminer reliability on MRI findings. Strength of agreement was interpreted as suggested by Landis [31]. Surgical and imaging findings were recorded for true and false positive and negatives and 2x2 tables were constructed to calculate sensitivity (Se), specificity (Sp) and likelihood ratios (LRs). The LRs which are based on both sensitivity and specificity of the test were used to determine whether a test result changed the probability of having a condition.
A test with a higher LR+ has a greater value of ruling in the disease while a test with a lower value of LR− has a better predictive value of ruling out the disease. Guidelines suggested by Jaeschke et al. [32] were used for interpretation of LRs and Fagan's nomogram [33] was used to compute the approximate post-test probability based on the pre-test probability and LRs.

Results
Ninety patients consented to participate in the study. Of these patients, two were excluded at the time of surgery due to having a partial thickness rotator cuff tear, and another three patients cancelled their surgery due to personal or other medical reasons.
Hence, data of 85 patients with full-thickness tear of supraspinatus, 35 females (41%), 50 males (59%), age 65, SD=10 were used for analysis.    Table 1 shows the relationship between surgical findings (tear size and reparability) and each clinical sign/test. Table 2 shows the relationship between imaging findings (fatty infiltration, associated tears) and clinical signs/tests.

Dropping Sign
The dropping sign [14] had low sensitivity for the overall tear size detection, reparability, associated infraspinatus tear and advanced fatty infiltration. However, the sign was highly specific indicating a low false positive rate for all the above findings (Tables   1 & 2). As an example, a LR+= of 19 for tear size detection changes the pre-test probability of 50% to approximately 96%, meaning that if the clinician gives a 50% chance of having a major tear in a patient and observes a positive dropping sign, the 50% chance increases to 96%, a very significant chance of there being a large/ massive tear.

Hornblower Sign
A positive hornblower sign [15]

Lift-off Test
The lift-off test [16] had a perfect specificity of 100% for both tear size detection and reparability and a high specificity of 98% for subscapularis full-thickness tear. The sensitivity was low for all surgical and imaging outcomes for all outcomes (e.g. tear size, reparability and subscapularis full-thickness tear, Tables 1&2).
These findings indicate that this test is helpful in clinical decision making only when it is positive.

Discussion
The primary objective of this study was to examine the ability of commonly used strength-related clinical tests in predicting reparability and tear size of the RC tendons. The concept of reparability has been gaining importance since its conception in early 1900's as the surgeons continue to face challenges in managing large and massive tears in patients with high physical demands [21,[23][24][25][26]. To our knowledge, despite the significant number of validity studies and reviews [11,12,[17][18][19][34][35][36], the relationship between clinical tests and tear size and particularly reparability of the tendons has not been systematically examined.
The significance of the present study is providing further evidence on clinicians' ability to differentiate between minor and major tears.
Costly imaging investigations that would not alter management can be avoided by simple clinical tests and signs and most importantly, achievable and realistic post-operative patient expectations can be facilitated in the presence of large and massive tears.

Jobe Test
The results of the present study indicate that a negative Jobe test helps with a successful ruling out of a major tear that may not be fully repairable. However, when positive, this test does not guide clinical management due to its low specificity. The findings of the present study are consistent with higher sensitivity and poorer specificity of this test reported by other investigators being 86% and 50% by Leroux et al. [19] 84% and 58% by Hartel et al.
The Jobe test was initially described to assess the supraspinatus muscle in isolation in early 1980s [13]. However, there is minimal anatomical basis for this position to select out the supraspinatus muscle. It has been shown that in addition to the supraspinatus muscle, nine other muscles including the infraspinatus, upper subscapularis, trapezius, and serratus anterior are activated during this test [38].

Dropping Sign
The dropping sign is highly specific in predicting a major tear, advanced fatty infiltration and associated infraspinatus tear and inability to achieve a full repair. The large positive likelihood ratios direct the clinicians' pre-test probability in a significant way, assisting with the diagnosis and overall management. There is very limited published literature on this clinical sign [12,39]. In a study conducted by Walch et al. [39] in late 1998 [39], the CT arthrogram was used as the gold standard at over one year following a cuff tear. The investigators reported sensitivity and specificity of 100% al. [12] reported a low sensitivity and high specificity for presence of individual tear (50% and 88% respectively) or combined tears (44% and 97%) following an acute anterior dislocation. High specificity of dropping test has been noted in a recent study that has examined the strength of external rotation in a similar position as the dropping sign (45 degrees of external rotation) [40]. The authors reported that weakness of infraspinatus muscle was more significant at 45 degrees of external rotation compared to the neutral position, confirming a high specificity of the dropping sign position.

Hornblower sign
The hornblower sign was also highly specific and when positive it was associated with larger tears, infraspinatus advanced fatty infiltration and tear and inability to achieve a full repair. There is limited published literature on the accuracy of the hornblower sign. In a retrospective study by Walch et al. [39], in which the CT arthrogram was used at over one year following a cuff tear, sensitivity and specificity values of 100% and 93% were reported for detecting stage 3 and 4 fatty infiltration or complete absence of teres minor. In their study, all patients with stage 3/4 fatty infiltration in the infraspinatus had a negative hornblower sign.
The authors concluded that the hornblower test was specific to teres minor only and not affected by the infraspinatus pathology [39]. However, the infraspinatus involvement in shoulder elevation and external rotation has been confirmed in the literature [41][42][43][44]. McMahon et al. [42] suggested that infraspinatus activity progressively increased from 0 to 90 degrees and reported that the highest level of infraspinatus muscle activity occurred during shoulder forward flexion at 90 degrees. In a study by Ha et al.
[41] the infraspinatus was highly activated at external rotation with the shoulder flexed to 90 degrees. Similarly, the activity of the infraspinatus at 90 degrees of abduction has been noted by Reinold et al. [44]. Studies that have examined the pathophysiology of rotator cuff tendons have suggested that overhead shoulder injuries in young athletes are shown to affect the infraspinatus tendon more often than other rotator cuff tendons, highlighting the activity and strain of this tendon in the external rotation/abduction position [43]. More specifically, in a recent prospective study by Jain et al. [37], the sensitivity and specificity values of the hornblower test for infraspinatus tear were 17% and 96% which is consistent with our results.

The Lift-off Test
The lift-off test was highly specific in predicting a major tear, subscapularis tear and inability to achieve a full repair. The subscapularis muscle is one of the internal rotators of the shoulder joint. The hand behind back is reported to be superior to internal rotation in neutral position. Using the data of electromyography, [45] moment arms [46] and physiologic cross-sectional areas [47] the contribution of the subscapularis muscle to internal rotation strength is calculated to be approximately 50% with the arm at the side. With the arm in full internal rotation (lift off position), the contribution of this muscle increases up to almost 90%.  [12] reported sensitivity and specificity values of 39% and 74% respectively in 49 patients following an acute anterior dislocation for presence of a full-thickness tear using ultrasonography as the gold standard. In a study by Itoi et al. [11] where the strength of the lift off test was graded, inability to lift the hand off the back against gravity had a sensitivity of 14% and specificity of 100% for subscapularis tear [11]. In a study by Leroux et  Similarly, Hertel et al reported a sensitivity of 62% and specificity of 100 [17]. In a study by Naredo et al. [48] who used ultrasonography as the gold standard, a sensitivity and specificity of 50% and 96% were reported respectively for subscapularis tear [48]. All of these studies have similar results, reporting low sensitivity and high specificity for the lift-off test.

General Principles of Validity Indices of Shoulder Tests
Normally, clinical tests that are based on pain are more sensitive and less specific, while clinical tests that are based on weakness and represent the integrity of specific RC muscles tend to have better specificity and lower sensitivity [7][8][9][10][11][12]. The reason for this finding is that a sensitive test corresponds with a negative test results which helps to rule out pathology. Absence of pain with pain provocation tests rules out a cluster of pathologies under umbrella of impingement syndrome such as inflammation, tendonitis, bursitis, partial or full thickness tear. However, presence of pain does not confirm any specific pathology which affects the utility of these tests in clinical decision making as different shoulder conditions have different managements. Strength-related clinical tests have different performance characteristics. They are not very sensitive which means they are not good screening tools to rule out a specific pathology but once they are positive, they confirm the presence of pathology. This general principle is more obvious when the severity of pathology is more significant. For example, in the present study we chose large/massive tears as the positive criterion. A number of patients with a major tear presented with a negative sign/test which led to poor sensitivity for the dropping/ hornblower signs and liftoff test. However, due to low proportion of false positives, specificity and LR+ were quite significant indicating that in clinical settings similar to ours (specialty orthopedic clinics where the prevalence of a major tear is high) a positive sign or test directs clinicians in the right direction and guide management more effectively by confirming pathology.

Conclusion
A negative Jobe test accurately ruled out the presence of a major tear, significant supraspinatus fatty infiltration and a need for partial repair. The dropping and hornblower signs and lift off test were highly specific and when positive, they confirmed the presence of a major tear, fatty infiltration in the corresponding muscle and difficulty in achieving a full repair.