Midi DF is a Python module for converting midi data into DataFrames containing information on each chord in the midi file. DataFrames can be exported as JSON to facilitate use in external applications. Chords are stored as lists of midi pitches. Further reading on "dyads" and "interval classes", as well as the visualization technique that inspired this project, can be found here.
Examples using exported JSON data: After Effects (using AE's Expressions system, based on JavaScript), Processing (p5.js) (realtime and interactive)
Required libraries are pandas (0.20.3), numpy (1.13.3), and midi (0.2.3).
In short, midi.read_midifile()
--> [tempo_map()
-->] midi_to_df()
--> condense_df()
--> [merge_tracks()
-->] output_json()
.
import json
import pandas as pd
import numpy as np
import midi
import copy
import midi_df as mdf
kiev = midi.read_midifile("kiev.mid")
kiev_s = mdf.track_summary(kiev)
kiev_s
Track Name | Channel | Note Events | Tempo Track | Instrument ID | |
---|---|---|---|---|---|
0 | The Great Gate of Kiev | 0 | tempo track | {} | |
1 | Staff | 0 | 2790 | {0} | |
2 | Staff-1 | 1 | 2032 | {0} |
We can see that Track 0 is the tempo track, and Tracks 1 & 2 have note data on channels 0 and 1, both with instrument 0 (Acoustic Piano in the GM standard).
Since Track 0 is the default for ch
, we don't need to specify it in the parameters for tempo_df()
.
kiev_tempo = mdf.tempo_df(kiev)
kiev_tempo.head(10)
Tick | uSec/tick | Time (uSec) | Time (sec) | |
---|---|---|---|---|
0 | 0 | 2403.843750 | 0 | 0.000000 |
1 | 16128 | 2604.166667 | 38769192 | 38.769192 |
2 | 22272 | 2403.843750 | 54769192 | 54.769192 |
3 | 35328 | 2314.812500 | 86153776 | 86.153776 |
4 | 46950 | 2332.088542 | 113056527 | 113.056527 |
5 | 47052 | 2349.619792 | 113294400 | 113.294400 |
6 | 47155 | 2367.421875 | 113536411 | 113.536411 |
7 | 47257 | 2385.494792 | 113777888 | 113.777888 |
8 | 47360 | 2403.843750 | 114023594 | 114.023594 |
9 | 47462 | 2422.479167 | 114268786 | 114.268786 |
kiev_df1 = mdf.midi_to_df(kiev,1,kiev_tempo)
kiev_df1.head(15)
dTicks | Tick | Pitch | On/Off | Playing | Time (s) | Beat | |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | NaN | None | [] | 0.000000 | 0.00 |
1 | 0 | 0 | NaN | None | [] | 0.000000 | 0.00 |
2 | 0 | 0 | NaN | None | [] | 0.000000 | 0.00 |
3 | 0 | 0 | NaN | None | [] | 0.000000 | 0.00 |
4 | 0 | 0 | NaN | None | [] | 0.000000 | 0.00 |
5 | 0 | 0 | 63.0 | on | [63] | 0.000000 | 0.00 |
6 | 0 | 0 | 67.0 | on | [63, 67] | 0.000000 | 0.00 |
7 | 0 | 0 | 70.0 | on | [63, 67, 70] | 0.000000 | 0.00 |
8 | 0 | 0 | 75.0 | on | [63, 67, 70, 75] | 0.000000 | 0.00 |
9 | 720 | 720 | 63.0 | off | [67, 70, 75] | 1.730767 | 3.75 |
10 | 0 | 720 | 67.0 | off | [70, 75] | 1.730767 | 3.75 |
11 | 0 | 720 | 70.0 | off | [75] | 1.730767 | 3.75 |
12 | 0 | 720 | 75.0 | off | [] | 1.730767 | 3.75 |
13 | 48 | 768 | 65.0 | on | [65] | 1.846152 | 4.00 |
14 | 0 | 768 | 70.0 | on | [65, 70] | 1.846152 | 4.00 |
kiev_c1 = mdf.condense_df(kiev_df1)
kiev_c1.head()
Beat | Time (s) | Playing | |
---|---|---|---|
8 | 0.00 | 0.000000 | [63, 67, 70, 75] |
12 | 3.75 | 1.730767 | [] |
16 | 4.00 | 1.846152 | [65, 70, 74, 77] |
20 | 7.75 | 3.576919 | [] |
24 | 8.00 | 3.692304 | [67, 70, 75, 79] |
kiev_c2 = mdf.condense_df(mdf.midi_to_df(kiev,2,kiev_tempo))
kiev_c2.head()
Beat | Time (s) | Playing | |
---|---|---|---|
8 | 0.00 | 0.000000 | [43, 46, 51, 55] |
12 | 3.75 | 1.730767 | [] |
16 | 4.00 | 1.846152 | [41, 46, 50, 53] |
20 | 7.75 | 3.576919 | [] |
24 | 8.00 | 3.692304 | [39, 43, 46, 51] |
kiev_m = mdf.merge_tracks([kiev_c1, kiev_c2],[1,2])
kiev_m.head()
Beat | Time (s) | Playing 1 | Playing 2 | Playing | |
---|---|---|---|---|---|
0 | 0.00 | 0.000000 | [63, 67, 70, 75] | [43, 46, 51, 55] | [67, 70, 43, 75, 46, 51, 55, 63] |
1 | 3.75 | 1.730767 | [] | [] | [] |
2 | 4.00 | 1.846152 | [65, 70, 74, 77] | [41, 46, 50, 53] | [65, 70, 41, 74, 77, 46, 50, 53] |
3 | 7.75 | 3.576919 | [] | [] | [] |
4 | 8.00 | 3.692304 | [67, 70, 75, 79] | [39, 43, 46, 51] | [67, 70, 39, 75, 43, 46, 79, 51] |
mdf.merge_tracks([mdf.condense_df(mdf.midi_to_df(kiev, t, kiev_tempo)) for t in [1,2]]).head()
Beat | Time (s) | Playing 1 | Playing 2 | Playing | |
---|---|---|---|---|---|
0 | 0.00 | 0.000000 | [63, 67, 70, 75] | [43, 46, 51, 55] | [67, 70, 43, 75, 46, 51, 55, 63] |
1 | 3.75 | 1.730767 | [] | [] | [] |
2 | 4.00 | 1.846152 | [65, 70, 74, 77] | [41, 46, 50, 53] | [65, 70, 41, 74, 77, 46, 50, 53] |
3 | 7.75 | 3.576919 | [] | [] | [] |
4 | 8.00 | 3.692304 | [67, 70, 75, 79] | [39, 43, 46, 51] | [67, 70, 39, 75, 43, 46, 79, 51] |
output_json(kiev_m, "kiev.json")
The resulting JSON file can be found here.
print("Maximum chord size:",mdf.max_notes(kiev_m))
print("Dyads in maximum chord:", mdf.max_dyads(kiev_m))
Maximum chord size: 11
Dyads in maximum chord: 55.0
mdf.max_dyad_counts(kiev_m)
0 14
1 9
2 12
3 15
4 16
5 15
6 4
sum 55
dtype: int64
Note that "sum" refers to the maximum of the sums of dyads (not the sum of the maxima), equal to the output of max_dyads()
.
mdf.track_summary(fake_sample_file, recommend=True)
Recommended tracks: [1,2,3,4,6]
Track Name | Channel | Note Events | Tempo Track | Instrument ID | |
---|---|---|---|---|---|
0 | A Song | 0 | tempo track | {} | |
1 | Guitar 1 | 0 | 3234 | {25} | |
2 | Guitar 2 | 1 | 1847 | {26, 29} | |
3 | Keyboard - Square Wave | 2 | 376 | {80} | |
4 | Organ | 3 | 1429 | {18} | |
5 | Drum Kit | 9 | 2308 | {0} | |
6 | Flute Solo | 10 | 145 | {73} |
We can see that the author of this midi file has named their tracks by instrument. From here, we can determine which set of tracks we wish to use in our DataFrame. For example, if we only want to output guitars we would use Tracks 1 and 2, or if we wanted the whole band, we would copy the "Recommended tracks" list. Track 5 is a percussion track, set to midi channel 9, so we can assume it does not carry useful melodic information.