feat(audio): Add waveform visualization for PTT voice messages#2345
feat(audio): Add waveform visualization for PTT voice messages#2345ffigueroa wants to merge 1 commit intoEvolutionAPI:mainfrom
Conversation
Reviewer's GuideImplements waveform-enabled PTT audio sending for WhatsApp by decoding audio buffers to derive duration and a 64-point Uint8Array waveform, wiring these into Baileys message content, adjusting audio bitrate to meet WhatsApp PTT expectations, and introducing a patch-package based Baileys patch plus Docker/postinstall wiring to apply it in all environments. Sequence diagram for sending PTT audio with generated waveformsequenceDiagram actor Client participant EvolutionAPI participant BaileysStartupService participant FFmpeg_processAudio participant audioDecode_duration participant audioDecode_waveform participant Baileys_sendMessageWithTyping participant WhatsApp Client->>EvolutionAPI: POST /audioWhatsapp (SendAudioDto) EvolutionAPI->>BaileysStartupService: audioWhatsapp(data, file, isIntegration) BaileysStartupService->>FFmpeg_processAudio: processAudio(mediaData.audio) FFmpeg_processAudio-->>BaileysStartupService: Buffer convert (48k bitrate) alt Converted_audio_is_Buffer BaileysStartupService->>audioDecode_duration: getAudioDuration(convert) audioDecode_duration-->>BaileysStartupService: seconds BaileysStartupService->>audioDecode_waveform: getAudioWaveform(convert) audioDecode_waveform-->>BaileysStartupService: Uint8Array waveform (64 values) BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, messageContent_with_waveform) else Raw_or_URL_audio BaileysStartupService->>BaileysStartupService: Derive audioBuffer (URL or base64 Buffer) alt audioBuffer_is_Buffer BaileysStartupService->>audioDecode_duration: getAudioDuration(audioBuffer) audioDecode_duration-->>BaileysStartupService: seconds BaileysStartupService->>audioDecode_waveform: getAudioWaveform(audioBuffer) audioDecode_waveform-->>BaileysStartupService: Uint8Array waveform (64 values) BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, audioBuffer_with_waveform) else audioBuffer_is_URL BaileysStartupService->>Baileys_sendMessageWithTyping: sendMessageWithTyping(number, URL_audio_without_waveform) end end Baileys_sendMessageWithTyping-->>WhatsApp: PTT message with seconds and waveform WhatsApp-->>Client: Voice message UI with waveform visualization Updated class diagram for BaileysStartupService audio waveform supportclassDiagram class BaileysStartupService { +audioWhatsapp(data SendAudioDto, file any, isIntegration boolean) Promise~any~ -getAudioDuration(audioBuffer Buffer) Promise~number~ -getAudioWaveform(audioBuffer Buffer) Promise~Uint8Array~ } class SendAudioDto { +number string +audio any +delay number } class AudioDecoder { +decode(audioBuffer Buffer) AudioData } class AudioData { +duration number +getChannelData(channelIndex number) Float32Array } BaileysStartupService --> SendAudioDto : uses BaileysStartupService --> AudioDecoder : uses audioDecode AudioDecoder --> AudioData : returns BaileysStartupService ..> Uint8Array : generates waveform BaileysStartupService ..> Buffer : processes audio buffers File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Both
getAudioDurationandgetAudioWaveformcallaudioDecodeon the same buffer; consider decoding once and passing the decoded data through to avoid redundant CPU-heavy work for each message. - In
getAudioWaveform,samplesPerWaveformcan become 0 for very short audio, which leads to a division by zero when computingavg; adding a guard to ensure a minimum of 1 sample per bucket would make this more robust. - The new waveform-related logging (
infowith first 10 values, type, etc.) runs on every audio send and may be quite noisy in production; you might want to downgrade some of these to a debug/verbose level or gate them behind a flag.
Prompt for AI Agents
Please address the comments from this code review: ## Overall Comments - Both `getAudioDuration` and `getAudioWaveform` call `audioDecode` on the same buffer; consider decoding once and passing the decoded data through to avoid redundant CPU-heavy work for each message. - In `getAudioWaveform`, `samplesPerWaveform` can become 0 for very short audio, which leads to a division by zero when computing `avg`; adding a guard to ensure a minimum of 1 sample per bucket would make this more robust. - The new waveform-related logging (`info` with first 10 values, type, etc.) runs on every audio send and may be quite noisy in production; you might want to downgrade some of these to a debug/verbose level or gate them behind a flag. ## Individual Comments ### Comment 1 <location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3061-3070` </location> <code_context> + const audioData = await audioDecode(audioBuffer); + const samples = audioData.getChannelData(0); // Get first channel + const waveformLength = 64; + const samplesPerWaveform = Math.floor(samples.length / waveformLength); + + // First pass: calculate raw averages + const rawValues: number[] = []; + for (let i = 0; i < waveformLength; i++) { + const start = i * samplesPerWaveform; + const end = start + samplesPerWaveform; + let sum = 0; + for (let j = start; j < end && j < samples.length; j++) { + sum += Math.abs(samples[j]); + } + const avg = sum / samplesPerWaveform; + rawValues.push(avg); + } </code_context> <issue_to_address> **issue (bug_risk):** Guard against very short audio causing division by zero and incorrect indexing when computing the waveform. When `samples.length < waveformLength`, `samplesPerWaveform` becomes 0, causing a division by zero (`avg = sum / samplesPerWaveform` → `NaN`) and leaving `start`/`end` stuck at 0. Handle this case explicitly (e.g., enforce `samplesPerWaveform >= 1`, reduce `waveformLength` for very short clips, or early‑return a simplified waveform) so short audio doesn’t break the loop or produce invalid values. </issue_to_address> ### Comment 2 <location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3045` </location> <code_context> + private async getAudioDuration(audioBuffer: Buffer): Promise<number> { + try { + this.logger.info('Getting audio duration...'); + const audioData = await audioDecode(audioBuffer); + const duration = Math.ceil(audioData.duration); + this.logger.info(`Audio duration: ${duration} seconds`); </code_context> <issue_to_address> **suggestion (performance):** Avoid decoding the same audio buffer twice for duration and waveform to reduce overhead. `getAudioDuration` and `getAudioWaveform` each call `audioDecode(audioBuffer)`, and `audioWhatsapp` runs them sequentially on the same buffer. Consider decoding once and reusing the decoded data (or adding a helper that does both) to avoid duplicate work and lower CPU usage under load. Suggested implementation: ```typescript private async decodeAudio(audioBuffer: Buffer): Promise<AudioData | null> { try { this.logger.info('Decoding audio buffer...'); const audioData = await audioDecode(audioBuffer); this.logger.info('Audio buffer decoded successfully'); return audioData; } catch (error) { this.logger.warn(`Failed to decode audio buffer: ${error.message}`); return null; } } private async getAudioDuration(audioData: AudioData | null): Promise<number> { if (!audioData) { this.logger.warn('Audio data missing, using default 1 second duration'); return 1; } try { const duration = Math.ceil(audioData.duration); this.logger.info(`Audio duration: ${duration} seconds`); return duration; } catch (error) { this.logger.warn(`Failed to get audio duration: ${error.message}, using default 1 second`); return 1; } } private async getAudioWaveform(audioData: AudioData | null): Promise<Uint8Array> { try { this.logger.info('Generating audio waveform...'); ``` 1. At the top of `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts`, import the `AudioData` type from the same module that provides `audioDecode`, e.g.: - `import audioDecode, { AudioData } from 'audio-decode';` Adjust this to match how `audioDecode` is currently imported in your file. 2. Update `getAudioWaveform`’s implementation further down in the file to **stop calling `audioDecode`**. Instead, operate directly on the passed `audioData` parameter. Remove any `audioDecode(audioBuffer)` calls inside this method. 3. Anywhere `getAudioDuration` and `getAudioWaveform` are called (likely in your `audioWhatsapp` flow), change the usage to: - Decode once: `const audioData = await this.decodeAudio(audioBuffer);` - Then reuse: `const duration = await this.getAudioDuration(audioData);` - And: `const waveform = await this.getAudioWaveform(audioData);` 4. Remove any remaining direct calls to `audioDecode(audioBuffer)` in this class that are only used to derive duration or waveform, to ensure the buffer is decoded only once per processing flow. </issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| const samplesPerWaveform = Math.floor(samples.length / waveformLength); | ||
| | ||
| // First pass: calculate raw averages | ||
| const rawValues: number[] = []; | ||
| for (let i = 0; i < waveformLength; i++) { | ||
| const start = i * samplesPerWaveform; | ||
| const end = start + samplesPerWaveform; | ||
| let sum = 0; | ||
| for (let j = start; j < end && j < samples.length; j++) { | ||
| sum += Math.abs(samples[j]); |
There was a problem hiding this comment.
issue (bug_risk): Guard against very short audio causing division by zero and incorrect indexing when computing the waveform.
When samples.length < waveformLength, samplesPerWaveform becomes 0, causing a division by zero (avg = sum / samplesPerWaveform → NaN) and leaving start/end stuck at 0. Handle this case explicitly (e.g., enforce samplesPerWaveform >= 1, reduce waveformLength for very short clips, or early‑return a simplified waveform) so short audio doesn’t break the loop or produce invalid values.
- Add audio-decode library for audio buffer analysis - Implement getAudioDuration() to extract duration from audio - Implement getAudioWaveform() to generate 64-value waveform array - Normalize waveform values to 0-100 range for WhatsApp compatibility - Change audio bitrate from 128k to 48k per WhatsApp PTT requirements - Add Baileys patch to prevent waveform overwrite - Increase Node.js heap size for build to prevent OOM Fixes EvolutionAPI#1086
4cd0a3b to fac3cff Compare | Closing to reopen with clean commit history |
Summary
This PR adds proper waveform visualization for PTT (Push-to-Talk) voice messages sent via the API. Currently, audio messages sent through Evolution API display without the visual waveform in WhatsApp, making them look less authentic compared to messages sent directly from the app.
Changes
audio-decodelibrary to analyze audio buffer and generate a 64-value waveform array representing the audio amplitudepatch-package)Technical Details
getAudioDuration(): Extracts duration in seconds from audio buffergetAudioWaveform(): Generates normalized waveform (0-100 range) with 64 sample pointsUint8Arrayfor Baileys compatibilityRelated Issues
Testing
Screenshots
Voice messages now display with proper waveform visualization instead of a flat line.
Summary by Sourcery
Add waveform-enabled PTT audio sending for WhatsApp and wire up Baileys patching in builds.
New Features:
Bug Fixes:
Enhancements:
Build: