Comparison of Inter-rater Reliability of Human and Computer Prosodic Annotation Using Brazil’s Prosody Model

Okim Kang, David O. Johnson

Abstract


The current study examined whether the computer annotations of prodody based on Brazil’s (1997) framework were comparable with human annotations. A series of statistical tests were performed for each prosodic feature:  tone unit (two accuracy scores and Pearson’s correlation), prominent syllable (accuracy, F-measure, and Cohen’s kappa), tone choice (accuracy and Fleiss' kappa), and relative pitch (accuracy, Fleiss' kappa, and Pearson’s correlation). We considered one population to be the inter-rater reliability scores between the three human coders and the other population to be the inter-rater reliability scores between the computer and the three humans. If the differences between these two populations were significant, then the computer and human annotations were considered not comparable, but if the differences were not significant, then the computer and human annotations were considered comparable.   The results indicated that the computer and human annotations were comparable for tone choice and not comparable for prominent syllable. For tone unit, two of the t-tests provided evidence that they were comparable and one did not. The relative pitch t-tests showed a significant disparity between the estimates of relative pitch by the humans and the computer’s actual relative pitch calculation.


Full Text:

PDF


DOI: https://doi.org/10.5430/elr.v4n4p58

Refbacks

  • There are currently no refbacks.


Copyright (c)



English Linguistics Research
ISSN 1927-6028 (Print)   ISSN 1927-6036 (Online)

E-mail: elr@sciedupress.com

Copyright © Sciedu Press

To make sure that you can receive messages from us, please add the 'Sciedupress.com' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.