youtube_caption_finder is a Python library for searching YouTube videos by their captions via an external API. It provides an object‐oriented interface for configuring search filters, sorting options, and performing queries to retrieve video information. The library supports lazy loading of paginated results so that additional pages are fetched on demand.
- Overview
- Features
- Installation
- Usage
- API Reference
- Lazy Loading of Results
- Contributing
- License
- Disclaimer
youtube_caption_finder allows you to search for YouTube videos based on caption content. Using an external API (Filmot), the library sends search queries and returns results as structured VideoInfo objects. Filters and sorting options can be configured, and results are loaded lazily—only the requested page is fetched when needed.
- Search by Caption: Query YouTube videos based on caption content.
- Flexible Filters: Configure filters (e.g. video title, views, likes, duration, date range, license type).
- Sorting Options: Sort results by various fields such as view count, upload date, etc.
- Lazy Pagination: Load additional pages on demand using a generator interface (via
search_all()). - Channel Extraction: Extract the canonical channel ID from a channel URL (even from vanity URLs like
@examplechannel). - Command-Line Interface: Use the library from the command line for quick searches.
Note: The library does not implement caption processing functionality itself. Users can implement caption handling on top of the library using other tools.
Clone the repository and install with pip:
git clone https://github.com/yourusername/youtube_caption_finder.git cd youtube_caption_finder pip install .For development purposes, install in editable mode:
pip install -e .Below is an example of how to use the library in your code:
from youtube_caption_finder import ( YoutubeCaptionFinder, Filters, SortOption, SortField, SortOrder, LicenseType, VideoInfo ) # Create a client instance client = YoutubeCaptionFinder() # Configure filters filters = Filters() filters.set_license(LicenseType.CREATIVE_COMMONS) # Configure sorting options sort_option = SortOption(SortField.VIEW_COUNT, SortOrder.DESC) query = "USA taxes" # Retrieve the first page of results videos = client.search(query, filters=filters, sort_option=sort_option) for video in videos: print(video) # Lazy iteration over all pages (fetch next results on demand) video_generator = client.search_all(query, filters=filters, sort_option=sort_option) first_video = next(video_generator) print("First video:", first_video) second_video = next(video_generator) print("Second video:", second_video)The library provides a CLI. Once installed, you can run:
youtube_caption_finder --license CREATIVE_COMMONS --sort VIEW_COUNT --order desc "USA taxes"This command will perform a search with the specified parameters and print out the results.
The library includes a module to extract a channel’s canonical ID from its URL—even for vanity URLs. For example:
from youtube_caption_finder.channel import Channel channel_url = "https://www.youtube.com/@examplechannel" channel = Channel(channel_url) print("Channel ID:", channel.channel_id)The main client class.
-
search(query, channel_id=None, filters=None, sort_option=None) Returns a list of VideoInfo objects from the first page of search results.
-
search_all(query, channel_id=None, filters=None, sort_option=None) Returns a generator yielding VideoInfo objects across pages. You can use next() to fetch additional results.
-
get_filters(query, channel_id=None, filters=None, sort_option=None) Returns a dictionary of available filter options from the search page.
A dataclass that encapsulates filtering options.
- set_title(title): Set a video title filter.
- set_views(min_views, max_views): Set a views range filter.
- set_likes(min_likes, max_likes): Set a likes range filter.
- set_duration(start_duration, end_duration): Set a duration filter (in seconds).
- set_date_range(start, end): Set a date range filter (using ISO date strings or date objects).
- set_license(license_type): Set the license type using the LicenseType enum.
- to_dict(): Serializes filter settings into a dictionary suitable for URL parameters.
- SortField: Enum defining sorting fields (e.g., UPLOAD_DATE, VIEW_COUNT).
- SortOrder: Enum defining sort order (ASC or DESC).
- SortOption: Class that combines a SortField and SortOrder. Use the to_dict() method to serialize sorting options.
A dataclass representing a YouTube video result. Attributes include video_id, title, channel, views, likes, upload_date, etc.
The library supports lazy loading of search results through the search_all() method in the YoutubeCaptionFinder class. This method returns a generator that:
- Requests a page of results based on an internal page counter.
- Yields individual VideoInfo objects one by one.
- Automatically advances to the next page when the current page is exhausted. This allows you to process search results on demand without waiting for all pages to load.
client = YoutubeCaptionFinder() results = client.search_all("USA taxes") # Get the next result by calling next() first_result = next(results) print(first_result) # Subsequent calls to next() will fetch additional videos (and pages if needed)Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Follow PEP‑8 and include proper docstrings and tests.
- Submit a pull request with a detailed description of your changes.
This project is licensed under the Apache-2.0 license. See the LICENSE file for details.
This project is intended solely for educational and research purposes. Users are advised that utilizing this project may result in actions that disregard the directives specified in a website's robots.txt file. The robots.txt file is a standard used by websites to communicate with web crawlers and other automated agents about which areas of the site should not be processed or analyzed.
By using this project, you acknowledge and agree that the author is not responsible or liable for any misuse or damage caused by your use of the project. It is your responsibility to ensure that your use of this project complies with all applicable laws and regulations, as well as the terms of service of any websites you interact with. The author explicitly disclaims any liability for actions taken by users that contravene website policies or legal statutes.