Skip to content

valoricDe/whisper-node

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whisper-node

npm downloads npm downloads

Node.js bindings for OpenAI's Whisper. Transcription done local.

Features

  • Output transcripts to JSON (also .txt .srt .vtt)
  • Optimized for CPU (Including Apple Silicon ARM)
  • Timestamp precision to single word

Installation

  1. Add dependency to project
npm install whisper-node 
  1. Download whisper model of choice [OPTIONAL]
npx whisper-node download 

Requirement for Windows: Install the make command from here.

Usage

import whisper from 'whisper-node'; const transcript = await whisper("example/sample.wav"); console.log(transcript); // output: [ {start,end,speech} ]

Output (JSON)

[ { "start": "00:00:14.310", // time stamp begin "end": "00:00:16.480", // time stamp end "speech": "howdy" // transcription } ]

Full Options List

import whisper from 'whisper-node'; const filePath = "example/sample.wav"; // required const options = { modelName: "base.en", // default // modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName') whisperOptions: { language: 'auto' // default (use 'auto' for auto detect) gen_file_txt: false, // outputs .txt file gen_file_subtitle: false, // outputs .srt file gen_file_vtt: false, // outputs .vtt file word_timestamps: true // timestamp for every word // timestamp_size: 0 // cannot use along with word_timestamps:true } } const transcript = await whisper(filePath, options);

Input File Format

Files must be .wav and 16Hz

Example .mp3 file converted with an FFmpeg command: ffmpeg -i input.mp3 -ar 16000 output.wav

Made with

Roadmap

  • Support projects not using Typescript
  • Allow custom directory for storing models
  • Config files as alternative to model download cli
  • Remove path, shelljs and prompt-sync package for browser, react-native expo, and webassembly compatibility
  • fluent-ffmpeg to automatically convert to 16Hz .wav files as well as support separating audio from video
  • Pyanote diarization for speaker names
  • Implement WhisperX as optional alternative model for diarization and higher precision timestamps (as alternative to C++ version)
  • Add option for viewing detected langauge as described in Issue 16
  • Include typescript typescript types in d.ts file
  • Add support for language option
  • Add support for transcribing audio streams as already implemented in whisper.cpp

Modifying whisper-node

npm run dev - runs nodemon and tsc on '/src/test.ts'

npm run build - runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'

Acknowledgements

About

Node.js bindings for OpenAI's Whisper. (C++ CPU version by ggerganov)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 74.7%
  • JavaScript 25.3%