tldr: ready-to-use pipeline is demucs + basic pitch + omnizart drum + omnizart chord + all-in-one (+ sheet sage chords)
. or buy RipX license, it's worth it
Suppose you have a wav/mp3 of Western music and you want to produce a midi.
Well, take RipX DeepRemix and you'll get a certain quality.
Questions:
- Does it use spleeter? Is demucs better?
- How does it draw precise Melodyne-like pitch contours? I see them in a demo of Basic Pitch, but they're lost on midi export?
- How does it estimate midi notes?
- Does it extract midi notes first and then Q-filters a certain part over it for playback? Or how does it split a part into harmonics/notes in a polyphonic texture?
The rest of this doc focuses on how to build your own RipX from existing tools.
Demucs v4 (2022)
Next, you want to get beats and downbeats (measures) - in millisecond timestamps. And verse/chorus/bridge form annotation, ideally. Well, since July 2023 you can get All-in-One (2023) 🤗:
There's also a nice visualizer:
Other:
- WaveBeat
- Beat Transformer (2022)
- omnizart beat - requires midi, has issues running "beat" now
- Spotify Track's Audio Analysis API - gives bars/beats, sections, key
Questions:
- does All-in-One improve madmom significantly?
- is Spotify's API any good?
For each of non-drum parts you want pitch recognition. General-purpose SOTA is Basic Pitch:
Other:
- Sheet Sage (2022) for melody - although I tested it, and results are far from great
Historically, a lot of effort was put specifically into solo piano recognition. So now we have Onsets and Frames (2018) 🐟
Other:
- Bytedance (2020) (no model - can't run)
- Onsets and Velocities (2023) (no model - can't run)
- omnizart drum
- Magenta OaF Drums
Doing it straight away via MT3 gives very bizarre results.
So: get your parts Basic Pitched, get your drums separately, get your downbeat timings, and then use gpt4 mido
scripting / midisox / Ableton (XML-hackable!) to merge parts back together.
Caveat: Basic Pitch produces midi with resolution 480, whereas omnizart outputs in 220. Merging them can lead to errors when using online tools
- Omnizart chord (2019)
- Sheet Sage (2022) - easy to run, chords are mostly right
- key: Spotify API
Possibly, with "omnizart chord" and any beat tracking you can already build a clone of Chordify and even improve it by adding form annotation.
Also, a paper by Chordify (2014) with original references.
It's tough and not solved holistically yet.
- https://paperswithcode.com/task/music-transcription/latest
- TuneFlow is probably a wrapper around demucs and Basic Pitch.
- https://splitter.fm/