Background A central finding in music perception is that listeners’ ratings of the stability of probe tones against the background of a given musical context reflect structural properties of the context, leading to high correlations with music-theoretical models of tonal hierarchies. Studies using the probe-tone paradigm commonly rely on either melodic or harmonic contexts and report averages or other summary statistics on an aggregated level. It is, however, possible that theoretical models well characterize aggregated responses on a population level, but may not be reflective of perception on an individual participant level.
Aims We are interested in differences in listeners’ rating behavior and in whether the close resemblance between probe-tone ratings and music-theoretical descriptions of tonal hierarchies to both aggregated and individual ratings.
Method Our study uses the classical probe-tone method, presenting 40 participants with scales and cadences in either the major or the (natural) minor mode. We deploy a model comparison approach that compares various Bayesian mixed effects models which are provided with insights from the theoretical and empirical literature on tonal hierarchies in music and linear combinations thereof (Krumhansl, 1990; Temperley, 2001; Albrecht & Shanahan, 2013; Harasim et al., 2021). A baseline model that accounts for the presence of probe tones in the given context was also implemented.
Results Our findings indicate a number of asymmetries between the two modes and two context types: 1) In a major-scale context, all models predict the data well, when aggregated across participants. This is, however, not the case for minor scales. 2) Likewise, individual ratings are well-predicted by all theoretical models in the major-scale context, but only marginally better than the baseline model. In the minor-scale context, all except one of the theoretical models carry predictive value for individual ratings. 3) A model comparison reveals weak to no evidence that the theoretical models add much predictive value beyond the baseline model (presence in scalar context). 4) Contrary to scalar contexts, all models predict the aggregated rating data well for cadences in both the major and minor mode. 5) On the individual level, we observe that the music-theoretical models carry great additional predictive value compared to the baseline model, for responses given in both major and minor cadences. 6) The model comparison shows that, in cadential contexts, all models carry predictive value for individual ratings, that the theoretical models improve prediction beyond the baseline model, which was not the case in scalar contexts.
Conclusions Our study points to differences between context types (scalar vs cadential), mode (major vs minor), and data analysis (aggregated vs individual). It shows the importance to distinguish between behavioral data on the aggregated and individual levels, in particular in the minor-scale context, but also reveals that in some cases, music theoretical-models carry great predictive value for individual responses.