1
$\begingroup$

Intro:

I'm building a real-time pitch visualizer app for Android using TarsosDSP. The app uses the FFT_YIN algorithm to detect fundamental frequency and maps it in real-time to a vertical piano-roll-style view. Pitch tracking is quite accurate during sustained notes, but I'm experiencing a recurring issue:

Problem:

When I release a note, especially abruptly (tested with voice and MIDI keyboard), the pitch value often drops sharply to a very low frequency (e.g., ~50–60 Hz) for a brief moment (60–90 ms), then returns to normal or silence. These spikes appear as steep drops in the visual trace.

This happens only during release, not during sustained notes.

I've confirmed it's not due to graphing code: I log the pitch values and the low value is directly returned by TarsosDSP.

Background noise has been ruled out: the environment is very quiet, and no pitch is detected when idle.

What it looks like:

[screen print for 20:63] [1] https://i.sstatic.net/AS3jMQ8J.png

[screen print for 20:70] [2] https://i.sstatic.net/oT5YK1oA.png - Erratic fall happens here

[screen print for 21:55 ] [3] https://i.sstatic.net/2l5BFCM6.png - Goes to next played note after a quick return to 1st position

What I’ve tried:

I’ve tested multiple algorithms in TarsosDSP: YIN, FFT_YIN, FFT_PITCH, DYNAMIC_WAVELET. All behave similarly; FFT_YIN is the most stable.

Sample rate and buffer size:

22050 Hz, bufferSize = 1024, overlap = 0 give better results.

44100 Hz causes more spikes.

I verified pitch-to-Y mapping in the piano roll visually and mathematically — it's not the cause.

Hypothesis:

I suspect these are false positives caused by:

Resonant harmonics or partials after note release.

The pitch estimator interpreting low-energy tail/noise as a new low-frequency tone.

Other unknown algorithm limitations during the signal-processing step in Tarsos.

Snippet of code involved:

// this logic is executed via the setOnClickListener of the main playPauseButton in the UI val sampleRate = 22050 val bufferSize = 1024 val overlap = 0 val audioRecord = AudioRecord( MediaRecorder.AudioSource.MIC, sampleRate, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize ) val tarsosFormat = TarsosDSPAudioFormat( sampleRate.toFloat(), 16, // bitsPerSample 1, // channels true, // signed false // bigEndian ) val inputStream = AndroidAudioInputStream(audioRecord, tarsosFormat) dispatcher = AudioDispatcher(inputStream, bufferSize, overlap) val pdh = PitchDetectionHandler { result, _ -> val pitchInHz = result.pitch if (pitchInHz > 0) { val note = NoteConverter.hzToNote(pitchInHz) runOnUiThread { findViewById<TextView>(R.id.textView1).text = note } } else { runOnUiThread { findViewById<TextView>(R.id.textView1).text = "No note detected" } } pianoRollView.pitchInHz2 = pitchInHz pianoRollView.invalidate() } val p: AudioProcessor = PitchProcessor( PitchProcessor.PitchEstimationAlgorithm.FFT_YIN, 22050f, 1024, pdh ) dispatcher?.addAudioProcessor(p) Thread(dispatcher, "Audio Dispatcher").start() 

Temporary workaround:

I'm considering implementing a pitch validation filter that ignores any detected pitch that doesn’t last at least 90 ms. The idea is to reject brief, isolated pitch values that likely represent noise or artifacts. However, it would be ideal to solve the root of the problem by modifying or re-building Tarsos´ pitch-detection and signal-processing algorithms.

My questions:

Is this behavior expected from FFT_YIN or pitch trackers in general when analyzing silence/release transitions?

Would you recommend filtering by duration as a solution, or is there a more robust DSP technique (e.g., confidence score, RMS, median filter)?

Is there a better pitch estimation algorithm for this kind of real-time use case?

Bonus: I’m happy to provide more screenshots, logs or even video if helpful.

$\endgroup$
3
  • $\begingroup$ I'm still incredulous at how much attention YIN gets. There's really nothing useful about YIN. The pYIN alg, I have never completely grokked but the YIN alg I understand and there's really nothing particularly novel in it except this "cumulative mean normalized difference function" which is useless. $\endgroup$ Commented Jul 27 at 16:55
  • $\begingroup$ Hi Robert, thanks for the insight. I already suspected the algorithm was not that precise. I have focused mainly in mobile dev for now so I am certainly not an audio engineer, but would like to develop tools useful for musicians and audio enthusiasts too. Do you have any other alternative to YIN or FFT-YIN algorithms for pitch-detection for mobile implementations in Android and iOS? I am happy to try out different libraries than tarsos too $\endgroup$ Commented Jul 28 at 14:07
  • $\begingroup$ Probably the best recommendation I can make from my old-school knowledge is defining the autocorrelation out of the Average Squared Difference Function. This is a little more detailed. $\endgroup$ Commented Jul 28 at 15:11

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.