Accuracy of Laryngoscopy for Quantitative Vocal Fold
Analysis in Combination with AI, A Cohort Study of Manual
Artefacts
Volume 6 - Issue 3
Mette Pedersen1*, Christian F Larsen2
- 1Medical Centre, Østergade, Copenhagen, Denmark
- 2Copenhagen Business School, Solbjerg Plads, Denmark
Received: April 01, 2021; Published: April 15, 2021
Corresponding author: Mette Pedersen, Medical Centre, Østergade 18, Copenhagen, Denmark
DOI: 10.32474/SJO.2021.06.000237
Fulltext
PDF
To view the Full Article Peer-reviewed Article PDF
Abstract
Introduction: A cohort of high-speed videoendoscopies was evaluated for usability for deep learning. The aim of
our study was to find the percentage of our high-speed videos (15.732) that could be used for deep learning (AI). A
screening of the material showed that some videos had artefacts, making them non usable for deep learning.
Material: A randomization was made with Wolfram Alpha random number generator selecting between 15.732
videos from 7.909 patients. The various non usable videos are described including the rear parts of the vocal folds not
seen, the epiglottis or uvula blocking vision, parts of the vocal folds not seen, no vibration of the vocal folds, persistent
constricted larynx, picture taken from an oblique angle, the front part of the vocal folds not seen, and parts of the
arytenoid region not seen.
Method: Assuming the assessments are independent with regards to whether there is a finding, the total number of
assessments with a given finding is binomial distributed. With 100 assessments, an observed incidence of 1, 10 and 25
findings will result in estimated 95% confidence intervals of [0%-3%], [4%-16%] and [17%-33%], respectively. 95%
confidence intervals are calculated as Wald test using the asymptotic Normal distribution assumption of the estimated
proportion in the binomial distribution. Assuming the incidence of findings for each of the different findings was
below 25%, the expected length of the 95% confidence interval is 16%-point (33-17), with 200 and 500 assessments,
the corresponding length is 14%-point and 8%-point, respectively. Based on these calculations 100 randomised films
were sufficient to be used for calculations.
Results and Conclusion: The prospective cohort study of high-speed videos covered 12 years from the February
2007 to January 2019 in an otorhinolaryngology medical centre. 7.909 patients with a total of 15.732 high-speed video
films of the larynx including the vocal folds had been consecutively sampled (4.000 frames per second, Richard Wolf
Ltd. endocam 5562). Observations on high-speed video for the usable versus non usable videos with 95% confidence
intervals, showed that only 51% were usable. The interesting result is that oblique angle pictures (10%) as well as
insufficient pictures of the front of the vocal folds and arytenoids (14%) were the largest groups of the non-usable.
They can be augmented by the examiner in the future. Various video and deep learning programs are discussed.
Keywords: Manual artifacts; deep learning; vocal fold analysis; quantitative measures
Abbreviations: AI: Artificial Intelligence; OCT: Optical Coherence Tomography
Abstract|
Introduction|
Material|
Method|
Results|
Discussion|
Conclusion|
References|