This project provides tools for downloading YouTube videos, performing speaker diarization, and transcribing the audio. It leverages state-of-the-art models for accurate speaker identification and transcription.
- Download audio from YouTube videos
- Perform speaker diarization to identify different speakers in the audio
- Transcribe the audio to text
- Save detailed protocols of speaker segments and their transcriptions
For transcription, we use the Whisper turbo model from OpenAI. Whisper is a general-purpose speech recognition model that is trained on a large dataset of diverse audio.
For downloading audio from YouTube, we use the pytubefix library.
For speaker diarization, we use the pyannote.audio library. pyannote.audio is a toolkit for speaker diarization that provides pre-trained models for speaker identification and segmentation.
To install the required dependencies, run:
uv syncYou need to set the HUGGING_FACE_TOKEN environment variable. You can do this by adding the following line to your shell configuration file (e.g., .bashrc, .zshrc):
export HUGGING_FACE_TOKEN=<your_hugging_face_token>To use Celery, you need to have Redis installed and running. You can install Redis using Homebrew:
brew install redisStart the Redis server:
brew services start redisTo monitor Celery tasks, you can use Flower. Install Flower using pip:
uv add flowerTo monitor Celery tasks and run Celery workers, follow these steps:
-
Start the Flower server:
uv run celery -A diarization.celery_task flower --port=5555
Access the Flower dashboard by navigating to
http://localhost:5555in your web browser. -
Start the Celery worker:
uv run celery -A diarization.celery_task worker --loglevel=info -P threads
To convert the audio to WAV format, you need to have ffmpeg installed. You can install ffmpeg using Homebrew:
brew install ffmpegTo download and process a single YouTube video, run:
uv run diarization.py <YouTube_URL> <output_folder>To process multiple YouTube URLs from a simple text file (one link per line), run:
uv run diarization.py --file <file_with_urls> <output_folder>To run Celery tasks for downloading and transcribing, start the Celery worker:
uv run celery -A diarization.celery_task worker --loglevel=info -P threadsThen, you can call the task from your code:
from diarization.celery_task import download_and_transcribe result = download_and_transcribe.delay(<YouTube_URL>, <output_folder>) print(result.get())To run the Streamlit app for a user-friendly interface, use the following command:
uv run streamlit run streamlit_app.pyThe output will include:
- The downloaded audio file in WAV format
- A JSON file with diarization results
- A text file with detailed protocols of speaker segments and their transcriptions
uv run diarization.py https://www.youtube.com/watch?v=example <output_folder>The diarization results can be used for various applications, including:
- Training large language models (LLMs) with speaker-specific data
- Implementing Retrieval-Augmented Generation (RAG) applications for more accurate and context-aware responses
This project is licensed under the MIT License.