An Annotated Corpus of Tonal Piano Music from the Long 19th Century


We present a dataset of 264 annotated piano pieces of nine composers, composed in the long 19th century ( Annotations adhere to the DCML harmony annotation standard and include Roman numerals, phrase boundaries, and cadence types. The scores are encoded in the XML-based MuseScore 3 format. Annotations are embedded within the MuseScore files. In addition, all harmony information, alongside key features of the encoded measure and note objects, is provided in the form of plaintext TSV-formatted tables for increased interoperability with other datasets and analysis tools. Annotations were collaboratively created and reviewed by a pool of trained music theorists. Collaboration took place asynchronously online via a semi-automated GitHub-based workflow designed for quality assurance, allowing cycles of revisions and reviews until consensus is reached. The full revision history is retained, providing data for further empirical research on inter-annotator agreement and related topics. We also present descriptive statistics about the nine corpora and the dataset as a whole, including comparisons of pitch-class contents, phrase lengths, modulations, and cadence types. We conclude with a discussion of our musicological principles for corpus building and considerations of representability.

Empirical Musicology Review
Fabian C. Moss
Fabian C. Moss
Assistant Professor for Digital Music Philology and Music Theory

Fabian C. Moss is an assistant professor for Digital Music Philology and Music Theory at Julius-Maximilians University Würzburg (JMU), Germany.