Intro:
I'm building a real-time pitch visualizer app for Android using TarsosDSP. The app uses the FFT_YIN algorithm to detect fundamental frequency and maps it in real-time to a vertical piano-roll-style view. Pitch tracking is quite accurate during sustained notes, but I'm experiencing a recurring issue:
Problem:
When I release a note, especially abruptly (tested with voice and MIDI keyboard), the pitch value often drops sharply to a very low frequency (e.g., ~50–60 Hz) for a brief moment (60–90 ms), then returns to normal or silence. These spikes appear as steep drops in the visual trace.
This happens only during release, not during sustained notes.
I've confirmed it's not due to graphing code: I log the pitch values and the low value is directly returned by TarsosDSP.
Background noise has been ruled out: the environment is very quiet, and no pitch is detected when idle.
What it looks like:
[screen print for 20:63] [1] https://i.sstatic.net/AS3jMQ8J.png
[screen print for 20:70] [2] https://i.sstatic.net/oT5YK1oA.png - Erratic fall happens here
[screen print for 21:55 ] [3] https://i.sstatic.net/2l5BFCM6.png - Goes to next played note after a quick return to 1st position
What I’ve tried:
I’ve tested multiple algorithms in TarsosDSP: YIN, FFT_YIN, FFT_PITCH, DYNAMIC_WAVELET. All behave similarly; FFT_YIN is the most stable.
Sample rate and buffer size:
22050 Hz, bufferSize = 1024, overlap = 0 give better results.
44100 Hz causes more spikes.
I verified pitch-to-Y mapping in the piano roll visually and mathematically — it's not the cause.
Hypothesis:
I suspect these are false positives caused by:
Resonant harmonics or partials after note release.
The pitch estimator interpreting low-energy tail/noise as a new low-frequency tone.
Other unknown algorithm limitations during the signal-processing step in Tarsos.
Snippet of code involved:
// this logic is executed via the setOnClickListener of the main playPauseButton in the UI val sampleRate = 22050 val bufferSize = 1024 val overlap = 0 val audioRecord = AudioRecord( MediaRecorder.AudioSource.MIC, sampleRate, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize ) val tarsosFormat = TarsosDSPAudioFormat( sampleRate.toFloat(), 16, // bitsPerSample 1, // channels true, // signed false // bigEndian ) val inputStream = AndroidAudioInputStream(audioRecord, tarsosFormat) dispatcher = AudioDispatcher(inputStream, bufferSize, overlap) val pdh = PitchDetectionHandler { result, _ -> val pitchInHz = result.pitch if (pitchInHz > 0) { val note = NoteConverter.hzToNote(pitchInHz) runOnUiThread { findViewById<TextView>(R.id.textView1).text = note } } else { runOnUiThread { findViewById<TextView>(R.id.textView1).text = "No note detected" } } pianoRollView.pitchInHz2 = pitchInHz pianoRollView.invalidate() } val p: AudioProcessor = PitchProcessor( PitchProcessor.PitchEstimationAlgorithm.FFT_YIN, 22050f, 1024, pdh ) dispatcher?.addAudioProcessor(p) Thread(dispatcher, "Audio Dispatcher").start() Temporary workaround:
I'm considering implementing a pitch validation filter that ignores any detected pitch that doesn’t last at least 90 ms. The idea is to reject brief, isolated pitch values that likely represent noise or artifacts. However, it would be ideal to solve the root of the problem by modifying or re-building Tarsos´ pitch-detection and signal-processing algorithms.
My questions:
Is this behavior expected from FFT_YIN or pitch trackers in general when analyzing silence/release transitions?
Would you recommend filtering by duration as a solution, or is there a more robust DSP technique (e.g., confidence score, RMS, median filter)?
Is there a better pitch estimation algorithm for this kind of real-time use case?
Bonus: I’m happy to provide more screenshots, logs or even video if helpful.