How Many Mistakes Are Allowed in Piano Transcribing?

Piano Transcribing

The piano is a multi-pitch instrument and requires a large amount of notes to be transcribed. The standard approach is to use a neural network to transcribe the notes. Unfortunately, the most accurate algorithms aren’t capable of matching human performance. This leaves amateur musicians with the option of using an AMT system to generate music scores. However, the results aren’t all that impressive.

www.tartalover.com

To improve on the accuracy of transcription, a two-stage framework has been proposed that combines deep learning and spectrogram factorization techniques. In the first stage, a convolutional neural network (CNN) is used to identify the onsets of individual notes. These onsets are then input to a second CNN that computes the probability that a note is present at a specific onset.

Spectrogram factorization is also a valuable technique that is used to improve the precision of note verification. It involves averaging the spectrogram over a number of time frames. This is useful because the same note can have different spectral content depending on its dynamics.

How Many Mistakes Are Allowed in Piano Transcribing?

Another notable method is the Differentiable Dictionary Search (DDS). The method combines deep density models with matrix decomposition. Although it isn’t as good at modeling unseen sources as other techniques, it is more accurate. For example, DDS is better suited to the task of transcription than the neural network, since neural networks cannot generalize to unseen notes.

An interesting tidbit is that the LSTM model actually transcribes the entire note. Its post-processing routines are more complex than those of other models.

There are many other approaches, however, to transcribing songs. One method is to use a metronome to create an artificial tempo. This can help beginners with their synchronization skills. Another is to slow down the song to 50% and try imitating the top note on the keyboard.

Finally, there is the MIDI aligned piano sounds (MAPS) database. MAPS contains more than 60 hours of audio, which is a great deal of data for piano transcription. There are nine categories of recordings, including chords, isolated notes, and MIDI files. Each of these categories is comprised of at least 30 pieces of music.

Using this information, the spectrogram of the note to be transcribed is reconstructed. During this process, a note’s onset is estimated from the spectrogram and the best pitch at that onset is detected. By minimizing the difference between the original spectrogram and the reconstruction, it is possible to get a good estimate of the note’s activation.

In addition, the LSTM model also has an onset detection stage. These two steps are similar, but they have their own benefits. LSTM’s onset detection stage uses a spectro-temporal pattern, and is able to detect onsets more effectively than other methods.

Compared to other methods, LSTM’s note recognition and note verification stages perform well. They are able to reduce false positive notes, improve the precision of note-level transcription, and save computing time.

For a comprehensive comparison, two state-of-the-art methods were tested. Among them, a hybrid method based on both deep learning and spectrogram factorization was shown to be superior.

Comments |0|

Legend *) Required fields are marked
**) You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>