Data-Driven Music History

Workshop at the International Conference of Students of Systematic Musicology (SysMus)

14 September 2020, 2-3:30 PM

Fabian C. Moss (@fabianmoss)
Digital and Cognitive Musicology Lab
École Polytechnique Fédérale de Lausanne

Who am I?

  • Music and Mathematics Education (University of Cologne, Germany)
  • MA Musicology (University for Music and Dance, Cologne, Germany)
  • started PhD in Musicology at Technical University, Dresden, Germany
  • finished PhD in Digital Humanities at École Polytechnique Fédérale de Lausanne, Switzerland

Who are you?

Selected quotes

"I am interested in the intersection of music theory, music cognition, and linguistics."

"I want to study more methods about dealing with data."

"I am very interested in combining historic and systematic music research."

"I'm interested in learning new methods regarding empirical approaches to music history"

"it is interesting to know more about data processing of symblic or audio music data"

"I'm particularly interested in the quantitative-empirical methodologies. [...] I'm also interested in the balance between the qualitative and quantitative approach/methodology in research."

"I am not familiar with many data analysis methods and would love to learn more about data-driven music history"

"...learn more about Python and how it can be applied in a research."

Some caveats

  • This is not an introduction to programming in Python (but some examples are given)
  • Title of the workshop is "Data-Driven History of Music"
  • We concentrate on a very specific example, not "Music" in general
  • Focus on Western classical music and a specific representation of pieces

Research Questions

  • General: How can we study historical changes quantitatively?
  • Specific: What can we say about the history of tonality based on a dataset of musical pieces?

A bit of theory

In [31]:
note_names = list("FCGDAEB") # diatonic note names in fifths ordering
note_names
Out[31]:
['F', 'C', 'G', 'D', 'A', 'E', 'B']
In [32]:
accidentals = ["bb", "b", "", "#", "##"] # up to two accidentals is suffient here
accidentals
Out[32]:
['bb', 'b', '', '#', '##']
In [33]:
lof = [ n + a for a in accidentals for n in note_names ] # lof = "Line of Fifths"
print(lof)
['Fbb', 'Cbb', 'Gbb', 'Dbb', 'Abb', 'Ebb', 'Bbb', 'Fb', 'Cb', 'Gb', 'Db', 'Ab', 'Eb', 'Bb', 'F', 'C', 'G', 'D', 'A', 'E', 'B', 'F#', 'C#', 'G#', 'D#', 'A#', 'E#', 'B#', 'F##', 'C##', 'G##', 'D##', 'A##', 'E##', 'B##']
In [34]:
len(lof) # how long is this line-of-fifths segment?
Out[34]:
35

We call the elements on the line of fifths tonal pitch-classes

Data

A (kind of) large corpus: TP3C

Here, we use a dataset that was specifically compiled for this kind of analysis, the Tonal pitch-class counts corpus (TP3C) (Moss, Neuwirth, Rohrmeier, 2020)

  • 2,012 pieces
  • 75 composers
  • approx. spans 600 years of music history
  • does not contain complete pieces but only counts of tonal pitch-classes
In [35]:
import pandas as pd # to work with tabular data

url = "https://raw.githubusercontent.com/DCMLab/TP3C/master/tp3c.tsv"
data = pd.read_table(url)

data.sample(10)
Out[35]:
composer composer_first work_group work_catalogue opus no mov title composition publication source display_year Fbb Cbb Gbb Dbb Abb Ebb Bbb Fb Cb Gb Db Ab Eb Bb F C G D A E B F# C# G# D# A# E# B# F## C## G## D## A## E## B##
1311 Schumann Robert Dichterliebe Op. 48 3 NaN Die Rose, die Lilie, die Taube 1840.0 NaN OSLC 1840.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 52 69 61 40 45 52 48 3 1 0 0 0 0 0 0 0 0 0 0
279 Alkan Charles Valentin Esquisses Op. 63 37 NaN NaN 1861.0 NaN MS 1861.0 0 0 0 0 0 0 0 0 0 2 6 128 131 48 113 144 143 68 6 8 41 34 14 12 4 3 4 4 3 0 0 0 0 0 0
61 Beethoven Ludwig van Piano Sonatas Op. 22 NaN 1.0 Piano Sonata No. 11 1800.0 NaN MS 1800.0 0 0 0 0 0 0 0 2 6 35 111 61 284 657 650 463 339 474 422 217 63 46 42 16 4 5 4 0 0 0 0 0 0 0 0
208 Liszt Franz 12 Transcendental Etudes S. 139 7 NaN Eroica 1851.0 NaN MS 1851.0 0 0 0 0 0 0 4 15 129 148 225 297 376 393 296 225 163 156 130 137 113 88 56 42 45 15 32 6 5 7 0 0 0 0 0
1010 Busnoys Antoine NaN NaN NaN NaN NaN Mon seul et celé souvenir 1480.0 NaN ELVIS 1480.0 0 0 0 0 0 0 0 0 0 0 0 0 12 58 55 29 84 59 55 13 1 6 4 0 0 0 0 0 0 0 0 0 0 0 0
900 Lang Josephine Sechs Lieder Op. 25 3 NaN Barcarole NaN 1860.0 OSLC 1860.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 2 55 99 107 81 59 84 44 8 2 7 0 0 0 0 0 0 0 0 0
317 Beethoven Ludwig van Piano Sonatas Op. 49 2 1.0 Piano Sonata No. 20 NaN 1805.0 MS 1805.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 139 273 335 259 144 231 175 46 12 13 4 0 0 0 0 0 0 0 0 0
1120 Reichardt Louise Zwölf Deutsche und Italiänische Romantische Ge... Op. NaN 2 NaN Wenn ich ihn nur habe NaN NaN OSLC 1802.5 0 0 0 0 0 0 0 1 0 10 40 39 21 18 22 16 9 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1754 Corelli Arcangelo 12 Trio Sonatas Op. 3 6 1.0 NaN NaN 1689.0 CCARH 1689.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 65 117 104 68 77 81 59 13 0 8 0 0 0 0 0 0 0 0 0 0
1979 Mozart Wolfgang Amadeus Sonaten KV 331 11 1.0 Var. 4 1783.0 NaN CCARH 1783.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 44 108 119 66 18 86 37 4 0 1 1 0 0 0 0 0 0 0
In [107]:
data.display_year.plot(kind="hist", bins=50, figsize=(15,6)); # historical overview

For this workshop we ignore all the metadata about the pieces (titles, composer names etc.) but only focus on their tonal material. Therefore, we don't need all the columns of the table.

In [37]:
tpc_counts = data.loc[:, lof] # select all rows (":") and the lof columns
tpc_counts.sample(20)
Out[37]:
Fbb Cbb Gbb Dbb Abb Ebb Bbb Fb Cb Gb Db Ab Eb Bb F C G D A E B F# C# G# D# A# E# B# F## C## G## D## A## E## B##
1233 0 0 0 0 0 0 0 0 0 0 0 0 14 79 101 30 13 53 34 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
1779 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 56 21 33 61 66 37 3 3 13 0 0 0 0 0 0 0 0 0
254 0 0 0 0 0 0 0 0 3 44 50 154 441 541 408 488 657 604 334 66 78 164 71 6 0 0 0 0 0 0 0 0 0 0 0
1704 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 54 67 76 64 55 65 42 4 0 6 0 0 0 0 0 0 0 0 0
1647 0 0 0 0 0 0 0 0 0 6 51 71 61 53 70 99 70 29 16 32 14 4 0 0 0 0 0 0 0 0 0 0 0 0 0
715 0 0 0 0 0 16 0 75 49 105 111 112 145 621 88 61 46 41 20 46 32 13 20 12 2 63 7 2 2 0 0 0 0 0 0
1672 0 0 0 0 0 0 0 0 0 0 0 0 1 0 45 57 36 57 64 73 49 10 0 15 2 0 0 0 0 0 0 0 0 0 0
837 0 0 0 0 0 0 0 0 0 0 0 0 4 14 21 16 15 18 12 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
741 0 0 0 0 0 0 0 13 84 141 138 199 267 296 310 300 306 415 282 308 382 239 138 134 84 77 23 19 15 0 1 0 0 0 0
1595 0 0 0 0 0 0 0 0 4 4 14 8 51 209 252 244 144 163 158 80 22 11 8 13 4 0 0 0 0 0 0 0 0 0 0
513 0 0 0 0 0 0 0 0 0 4 14 37 214 228 85 253 322 271 173 104 47 90 71 14 5 0 11 6 0 0 0 0 0 0 0
1502 0 0 0 0 0 0 0 0 0 0 0 0 0 11 16 7 12 17 17 11 3 0 4 1 0 0 0 0 0 0 0 0 0 0 0
1121 0 0 0 0 0 0 0 0 0 0 16 27 0 20 44 34 27 3 0 18 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1959 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 36 38 49 33 32 28 19 8 8 3 0 0 0 0 0 0 0 0 0
169 0 0 0 0 0 0 0 0 0 0 0 0 0 13 75 107 123 125 118 111 84 27 7 12 0 0 0 0 0 0 0 0 0 0 0
1777 0 0 0 0 0 0 0 0 0 0 1 75 135 81 110 126 181 108 27 8 26 7 0 0 0 0 0 0 0 0 0 0 0 0 0
665 0 0 0 0 0 0 0 0 24 100 28 99 311 289 121 410 395 814 386 179 533 225 121 66 149 53 3 10 0 4 0 0 0 0 0
436 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 124 57 63 171 188 108 66 101 90 43 21 33 18 5 1 2 2 0 0 0
690 0 0 0 0 0 0 0 0 0 0 0 0 20 25 7 25 47 45 27 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0
1526 0 0 0 0 0 0 0 0 16 14 11 98 236 496 106 127 182 101 37 8 20 11 20 2 0 0 0 0 0 0 0 0 0 0 0

Let us have an overview of the note counts in these pieces!

If we would just look at the raw counts of the tonal pitch-classe, we could not learn much from it. Using a theoretical model (the line of fifths) shows that the notes in pieces are usually come from few adjacent keys (you don't say!).

Random piece

We probably have very long pieces (sonatas) and very short pieces (songs) in the dataset. Since we don't want length (or the absolute number of notes in a piece) to have an effect, we rather consider tonal pitch-class distributions instead counts, by normalizing all pieces to sum to one.

In [38]:
tpc_dists = tpc_counts.div(tpc_counts.sum(axis=1), axis=0)
tpc_dists.sample(20)
Out[38]:
Fbb Cbb Gbb Dbb Abb Ebb Bbb Fb Cb Gb Db Ab Eb Bb F C G D A E B F# C# G# D# A# E# B# F## C## G## D## A## E## B##
1658 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.011364 0.034091 0.034091 0.090909 0.113636 0.090909 0.068182 0.102273 0.056818 0.125000 0.136364 0.045455 0.034091 0.056818 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
109 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000668 0.016700 0.086172 0.113560 0.111556 0.125585 0.157649 0.156981 0.120240 0.047428 0.022044 0.021376 0.018036 0.002004 0.000000 0.000000 0.000000 0.0 0.0 0.0
353 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.023077 0.117692 0.150769 0.136154 0.137692 0.144615 0.136154 0.104615 0.026154 0.010000 0.013077 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
137 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.003731 0.016169 0.085821 0.134328 0.125622 0.139303 0.130597 0.141791 0.125622 0.055970 0.019900 0.012438 0.008706 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
611 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.065393 0.046289 0.072006 0.089640 0.160911 0.231447 0.122704 0.058046 0.071271 0.047024 0.025716 0.005143 0.001470 0.000000 0.002204 0.000000 0.000735 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
1741 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.003036 0.090081 0.178138 0.176113 0.153846 0.135628 0.122470 0.103239 0.012146 0.018219 0.007085 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
495 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.022403 0.026477 0.012220 0.075356 0.226069 0.228106 0.107943 0.077393 0.075356 0.109980 0.028513 0.000000 0.010183 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
248 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.004561 0.102623 0.002281 0.047891 0.139111 0.407070 0.131129 0.026226 0.036488 0.096921 0.005701 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
1152 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.039175 0.245361 0.158763 0.107216 0.131959 0.173196 0.103093 0.041237 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
1599 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.016138 0.008646 0.054179 0.134294 0.170605 0.122767 0.088761 0.154467 0.112968 0.064553 0.021326 0.011527 0.014409 0.023055 0.002305 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
606 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.059524 0.059524 0.061905 0.235714 0.161905 0.147619 0.104762 0.057143 0.054762 0.045238 0.007143 0.004762 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
731 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.022831 0.091324 0.063927 0.105023 0.082192 0.125571 0.148402 0.136986 0.086758 0.063927 0.027397 0.031963 0.013699 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
82 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.004723 0.069658 0.269185 0.165289 0.108619 0.077922 0.107438 0.118064 0.036600 0.015348 0.008264 0.017710 0.000000 0.001181 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
935 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.088496 0.143363 0.176991 0.141593 0.139823 0.162832 0.120354 0.023009 0.000000 0.001770 0.001770 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
670 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.001664 0.003177 0.004993 0.010743 0.009684 0.032531 0.097140 0.106219 0.079437 0.131185 0.113330 0.165229 0.100620 0.034196 0.041610 0.041459 0.014223 0.007263 0.004388 0.000605 0.000000 0.000303 0.000000 0.000000 0.0 0.0 0.0
564 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.038182 0.098182 0.076364 0.109091 0.121818 0.132727 0.250909 0.083636 0.007273 0.014545 0.052727 0.012727 0.001818 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
404 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000507 0.001013 0.006079 0.021783 0.069402 0.107396 0.100304 0.110942 0.155015 0.152482 0.112969 0.059777 0.030395 0.039007 0.026342 0.006079 0.000507 0.000000 0.000000 0.000000 0.0 0.0 0.0
341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.006089 0.012855 0.019621 0.016915 0.092693 0.179296 0.209743 0.098782 0.076455 0.113667 0.098106 0.058187 0.008119 0.004736 0.004736 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0
371 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.028169 0.089924 0.195016 0.173348 0.136511 0.109426 0.140845 0.081257 0.023835 0.014085 0.005417 0.002167 0.000000 0.0 0.0 0.0
53 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000898 0.007181 0.026032 0.073609 0.101436 0.095153 0.147217 0.164273 0.151706 0.114004 0.033214 0.033214 0.038600 0.012567 0.000898 0.0 0.0 0.0

For further numerical analysis, we extract the data from this table and assign it to a variable X.

In [39]:
# extract values of table to matrix
X = tpc_dists.values

X.shape # shows (#rows, #columns) of X
Out[39]:
(2012, 35)

Now, X is a 2012 $\times$ 35 matrix where the rows represent the pieces and the columns (also called "features" or "dimensions") represent the relative frequency of tonal pitch-classes.

Thinking in 35 dimensions is quite difficult for most people. Without trying to imagine what this would look like, what can we already say about this data?

Since each piece is a point in this 35-D space and pieces are represented as vectors, pieces that have similar tonal pitch-class distributions must be close in this space (whatever this looks like).

What groups of pieces that cluster together? Maybe pieces of the same composer are similar to each other? Maybe pieces from a similar time? Maybe pieces for the same instruments?

If we find clusters, these would still be in 35-D and thus difficult to interpret. Luckily, there are a range of so-called dimensionality reduction methods that transform the data into lower-dimensional spaces so that we actually can look at them.

A very common dimensionality reduction method is Principal Components Analysis (PCA).

The basic idea of PCA is:

  • find dimensions in the data that maximize the variance in this direction
  • these dimensions have to be orthogonal to each other (mutually independent)
  • these dimensions are called the principal components
  • each principal component is associated with how much of the data variance it explains
In [40]:
import numpy as np # for numerical computations
import sklearn
from sklearn.decomposition import PCA # for dimensionality reduction

pca = sklearn.decomposition.PCA(n_components=35) # initialize PCA with 35 dimensions
pca.fit(X) # apply it to the data
variance = pca.explained_variance_ratio_ # assign explained variance to variable

explained variance

In [47]:
variance[:5]
Out[47]:
array([0.41144591, 0.23410347, 0.09063507, 0.07574242, 0.04436989])

The first principal component explains 41.1% of the variance of the data, the second explains 23.4% and the third 9%. Together, this amounts to 73.6%.

Almost three quarters of the variance in the dataset is retained by reducing the dimensionality from 35 to 3 dimensions (8.6%)! If we reduce the data to two dimensions, we still can explain $\approx$ 65% of the variance.

This is great because it means that we can look at the data in 2 or 3 dimensions without loosing too much information.

Recovering the line of fifths from data

In [48]:
pca3d = PCA(n_components=3)
pca3d.fit(X)

X_ = pca3d.transform(X)
X_.shape
Out[48]:
(2012, 3)

3D Scatterplot

Each piece in this plot is represented by a point in 3-D space. But remember that this location represents ~75% of the information contained in the full tonal pitch-class distribution. In 35-D space each dimension corresponded to the relative frequency of a tonal pitch-class in a piece.

  • What do these three dimensions signify?
  • How can we interpret them?

Fortunately, we can inspect them individually and try to interpret what we see.

Principal Components

Clearly, looking at two principal components at a time shows that there is some latent structure in the data. How can we understand it better?

One way to see whether the pieces are clustered together systematically be coloring them according to some criterion.

As always, many different options are available. For the present purpose we will use the most simple summary of the piece: its most frequent note (which is the mode of its pitch-class distribution in statistical terms) and call this note its tonal center.

This will also allow to map the tonal pitch-classes on the line of fifths to colors.

Line of fifths coloring

In [51]:
tpc_dists["tonal_center"] = tpc_dists.apply(lambda piece: np.argmax(piece[lof].values) - 15, axis=1)
tpc_dists.sample(10)
Out[51]:
Fbb Cbb Gbb Dbb Abb Ebb Bbb Fb Cb Gb Db Ab Eb Bb F C G D A E B F# C# G# D# A# E# B# F## C## G## D## A## E## B## tonal_center
319 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.049020 0.105392 0.193627 0.088235 0.128676 0.155637 0.172794 0.067402 0.000000 0.006127 0.029412 0.003676 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.00000 0.0 0.0 -2
959 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.029730 0.113514 0.129730 0.118919 0.189189 0.205405 0.140541 0.062162 0.000000 0.008108 0.002703 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.00000 0.0 0.0 3
213 0.0 0.0 0.0 0.0 0.0 0.000536 0.014464 0.002321 0.010536 0.087321 0.116250 0.044643 0.088750 0.110714 0.115536 0.097857 0.031429 0.023929 0.072500 0.067143 0.036786 0.014464 0.006250 0.023929 0.009643 0.010357 0.005536 0.001964 0.005000 0.000000 0.002143 0.0 0.00000 0.0 0.0 -5
914 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.004326 0.004851 0.018747 0.012192 0.021893 0.044442 0.029759 0.039984 0.064761 0.072889 0.070661 0.072496 0.093078 0.114709 0.103828 0.062533 0.057682 0.054536 0.031463 0.012585 0.005768 0.004457 0.002360 0.0 0.00000 0.0 0.0 6
127 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.004911 0.012500 0.039732 0.095089 0.156250 0.115625 0.119196 0.145536 0.153125 0.076339 0.022768 0.011607 0.036607 0.009821 0.000446 0.000446 0.000000 0.000000 0.000000 0.0 0.00000 0.0 0.0 1
711 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.005298 0.066225 0.127152 0.152318 0.050331 0.107285 0.182781 0.147020 0.068874 0.014570 0.014570 0.049007 0.010596 0.000000 0.001325 0.000000 0.000000 0.002649 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.00000 0.0 0.0 -1
879 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.135371 0.139738 0.135371 0.165939 0.170306 0.135371 0.117904 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.00000 0.0 0.0 3
1020 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.016349 0.138965 0.070845 0.076294 0.237057 0.209809 0.147139 0.057221 0.002725 0.035422 0.008174 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.00000 0.0 0.0 0
858 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.008929 0.125000 0.178571 0.133929 0.125000 0.178571 0.125000 0.125000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.00000 0.0 0.0 0
755 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000960 0.000960 0.001440 0.006721 0.002880 0.006721 0.016323 0.033125 0.033125 0.033605 0.053289 0.088814 0.149784 0.071051 0.097456 0.157945 0.063370 0.087854 0.018243 0.019203 0.047048 0.009602 0.000000 0.0 0.00048 0.0 0.0 8

Dimension pairs

Dimension pairs colored Line of Fifths

Historical development of tonality

The line of fifths is an important underlying structure for pitch-class distributions in tonal compositions

But we have treated all pieces in our dataset as synchronic and have not yet taken their historical location into account.

Remember the tonal pitch-class distribution of an example piece above?

Random piece

Let's assume the pitch-class content of a piece spreads on the line of fifths from F to A$\sharp$.

Line of fifths

This means, its range on the line of fifths is $10 - (-1) = 11$. The piece covers eleven consecutive fifths on the lof.

We can generalize this calculation and write a function that calculates the range for each piece in the dataset.

In [96]:
def lof_range(piece):
    l = [i for i, v in enumerate(piece) if v!=0]
    return max(l) - min(l)
In [114]:
data["lof_range"] = data.loc[:, lof].apply(lof_range, axis=1) # create a new column
data.sample(20)
Out[114]:
composer composer_first work_group work_catalogue opus no mov title composition publication source display_year Fbb Cbb Gbb Dbb Abb Ebb Bbb Fb Cb Gb Db Ab Eb Bb F C G D A E B F# C# G# D# A# E# B# F## C## G## D## A## E## B## lof_range
1022 Victoria TomasLuisde NaN NaN NaN NaN NaN O vos omnes NaN 1585.0 ELVIS 1585.0 0 0 0 0 0 0 0 0 0 9 1 0 8 25 19 25 51 67 35 9 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11
51 Bach Johann Sebastian Wohltemperiertes Klavier II BWV 877 1 NaN NaN 1740.0 NaN MS 1740.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 18 78 120 96 122 132 156 119 47 24 40 26 2 0 0 0 13
941 Mahler Gustav Kindertotenlieder NaN NaN 4 NaN Oft denk' ich sie sind nur ausgegangen NaN 1904.0 OSLC 1904.0 0 0 0 0 0 12 0 5 53 75 66 96 106 245 65 54 51 54 30 12 4 22 1 0 0 0 0 0 0 0 0 0 0 0 0 17
261 Koželuh Leopold Piano Sonata Op. 38 3 2 Allegretto 1793.0 NaN DB 1793.0 0 0 0 0 0 0 0 0 0 4 88 192 69 178 389 392 232 123 103 114 42 14 8 3 2 0 0 0 0 0 0 0 0 0 0 15
1335 Schumann Robert Liederkreis Op. 39 7 NaN Auf einer Burg 1840.0 1842.0 OSLC 1840.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 82 51 29 52 102 54 16 0 13 4 0 0 0 0 0 0 0 0 0 0 10
1286 Schubert Franz Winterreise D. 911 89 7 NaN Auf dem Flusse NaN 1827.0 OSLC 1827.0 0 0 0 0 0 0 0 0 0 0 0 0 1 5 0 40 88 44 111 173 287 178 70 84 142 91 15 0 7 3 0 0 0 0 0 17
1714 Grieg Edvard Lyrical Pieces Op. 12 8 NaN NaN 1866.0 1867.0 DCML 1866.0 0 0 0 0 0 0 0 0 4 0 10 27 66 65 41 26 63 29 7 6 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 13
401 Bach Johann Sebastian Wohltemperiertes Klavier I BWV 868 2 NaN NaN 1722.0 NaN MS 1722.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 21 99 136 135 143 115 135 90 23 8 2 0 0 0 0 0 0 11
857 Rue Pierre de la Ave Sanctissima Maria NaN NaN NaN NaN Cruxifixus 1485.0 NaN ELVIS 1485.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 19 29 22 19 23 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6
2009 Joplin Scott Ragtimes NaN NaN NaN NaN Wall Street Rag NaN 1909.0 CCARH 1909.0 0 0 0 0 0 0 0 0 0 0 6 9 20 72 188 303 151 154 232 173 42 73 16 19 37 0 2 0 0 0 0 0 0 0 0 16
1869 Corelli Arcangelo 12 concerti grossi Op. 6 10 4.0 NaN NaN 1714.0 CCARH 1714.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 79 174 206 167 147 151 141 34 0 12 0 0 0 0 0 0 0 0 0 0 0 9
1520 Alkan Charles Valentin Un Morceau Caractéristique Op. 74 2 NaN NaN 1840.0 NaN MS 1840.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 154 140 38 90 285 474 274 62 78 226 88 1 8 0 0 0 0 0 0 0 12
741 Granados Enrique Goyescas Op. 11 5 NaN El Amor y la muerte (balada) 1909.0 1912.0 MS 1909.0 0 0 0 0 0 0 0 13 84 141 138 199 267 296 310 300 306 415 282 308 382 239 138 134 84 77 23 19 15 0 1 0 0 0 0 23
462 Bach Johann Sebastian Inventions and Sinfonias BWV 773 NaN NaN NaN NaN 1723.0 MS 1723.0 0 0 0 0 0 0 0 0 0 0 0 41 93 80 90 102 92 106 47 2 14 9 0 0 0 0 0 0 0 0 0 0 0 0 0 10
1304 Schumann Robert Dichterliebe Op. 48 12 NaN Am leuchtenden Sommermorgen 1840.0 NaN OSLC 1840.0 0 0 0 0 0 0 0 0 2 12 6 3 27 113 96 49 42 58 37 13 23 11 10 0 1 3 0 0 0 0 0 0 0 0 0 17
1689 Corelli Arcangelo 12 Trio Sonatas Op. 1 8 3.0 Largo NaN 1681.0 CCARH 1681.0 0 0 0 0 0 0 0 0 0 0 0 39 55 44 39 88 90 66 7 0 13 3 0 0 0 0 0 0 0 0 0 0 0 0 0 10
1320 Schumann Robert Frauenliebe und Leben Op. 42 3 NaN Ich kann's nicht fassen, nicht glauben 1840.0 NaN OSLC 1840.0 0 0 0 0 0 0 0 0 3 2 5 49 74 54 62 129 108 57 26 10 16 17 1 0 0 0 0 0 0 0 0 0 0 0 0 14
983 Dufay Guillaume Missa l'homme armé NaN NaN NaN NaN Kyrie 1474.0 NaN ELVIS 1474.0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 35 26 65 54 41 18 8 1 0 0 0 0 0 0 0 0 0 0 0 0 0 8
1525 Alkan Charles Valentin Un Morceau Caractéristique Op. 74 7 NaN NaN 1840.0 NaN MS 1840.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 3 22 66 210 221 76 81 226 144 20 21 46 18 4 0 0 0 0 0 0 14
1427 Scriabin Alexander NaN Op. 9 NaN NaN Nocturne for the left hand 1894.0 NaN MS 1894.0 0 1 0 0 0 0 61 17 33 117 215 248 110 99 151 94 49 83 127 79 48 99 95 99 38 5 11 36 10 0 0 0 0 0 0 27

This allows us now to take the display_year (composition or publication) and lof_range (range on the line of fifths) features to observe historical changes.

Historical scatterplot

We could try to fit a line to this data to see whether there is a trend (kinda obvious here).

Line Scatter

But actually, this is not the best idea. Why should any historical process be linear? More complex models might make more sense.

A more versatile technique is Locally Weighted Scatterplot Smoothing (LOWESS) that locally fits a polynomial. Using this method, we see that a non-linear process is displayed.

Scatter Lowess

If there is time: some more advanced stuff

Usung bootstrap sampling we achieve an estimation of the local varience of the data and thus of the diversity in the note usage of the musical pieces.

Final Result

We also can distinguish three regions in terms of line-of-fifth range: diatonic, chromatic, and enharmonic.

Grouping the data together in these three regions, we see a clear change from diatonic and chromatic to chromatic and enharmonic pieces over the course of history.

Epochs

  • Renaissance: largest diatonic proportion overall but mostly chromatic
  • Baroque: alost completely chromatic
  • Classical: enharmonic proportion increases -> more distant modulations
  • This trend continues through the Romantic eras

Summary

1. We have analyzed a very specific aspect of Western classical music.

2. We have used a large(-ish) corpus to answer our research question.

3. We have operationalized musical pieces as vectors that represent distributions of tonal pitch-classes.

4. We have used the dimensionality-reduction technique Principal Component Analysis (PCA) in order to visually inspect the distribution of the data in 2 and 3 dimensions.

5. We have used music-theoretical domain knowledge to find meaningful structure in this space.

6. We have seen that pieces are largely distributed along the line of fifths.

7. We have used Locally Weighted Scatterplot Smoothing (LOWESS) to estimate the variance in this historical process.

8. We have seen that, historically, composers explore ever larger regions on this line and that the variance also increases.

Conclusion

  1. Data-driven approaches to music analysis offer new ways of studying music history.
  2. One of the largest obstacles is the lack of appropriate data (maybe you could help improve the situation?)
  3. It is difficult to operationalize/formalize musical concepts.
  4. Good news: there is a lot to be done for Master/PhD students!

The end

  • Thank you very much for participating in this workhop
  • I would appreciate it if you would send me some feedback (mail: fabian.moss@epfl.ch; Twitter: @fabianmoss)
  • Please get in touch if you are interested in working on a small project
  • Special thanks to Diana Kayser for organization and making everyhing possible!!!
  • My funding: École Polytechnique Fédérale de Lausane (EPFL) and Swiss National Science Foundation (SNSF)

Have a great SysMus 2020!

Have fun!