youtube_caption_finder

youtube_caption_finder is a Python library for searching YouTube videos by their captions via an external API. It provides an object‐oriented interface for configuring search filters, sorting options, and performing queries to retrieve video information. The library supports lazy loading of paginated results so that additional pages are fetched on demand.

Overview

youtube_caption_finder allows you to search for YouTube videos based on caption content. Using an external API (Filmot), the library sends search queries and returns results as structured VideoInfo objects. Filters and sorting options can be configured, and results are loaded lazily—only the requested page is fetched when needed.

Features

Search by Caption: Query YouTube videos based on caption content.
Flexible Filters: Configure filters (e.g. video title, views, likes, duration, date range, license type).
Sorting Options: Sort results by various fields such as view count, upload date, etc.
Lazy Pagination: Load additional pages on demand using a generator interface (via search_all()).
Channel Extraction: Extract the canonical channel ID from a channel URL (even from vanity URLs like @examplechannel).
Command-Line Interface: Use the library from the command line for quick searches.

Note: The library does not implement caption processing functionality itself. Users can implement caption handling on top of the library using other tools.

Installation

Using pip

Clone the repository and install with pip:

git clone https://github.com/yourusername/youtube_caption_finder.git cd youtube_caption_finder pip install .

Development Installation

For development purposes, install in editable mode:

pip install -e .

Usage

Programmatic Usage

Below is an example of how to use the library in your code:

from youtube_caption_finder import ( YoutubeCaptionFinder, Filters, SortOption, SortField, SortOrder, LicenseType, VideoInfo ) # Create a client instance client = YoutubeCaptionFinder() # Configure filters filters = Filters() filters.set_license(LicenseType.CREATIVE_COMMONS) # Configure sorting options sort_option = SortOption(SortField.VIEW_COUNT, SortOrder.DESC) query = "USA taxes" # Retrieve the first page of results videos = client.search(query, filters=filters, sort_option=sort_option) for video in videos: print(video) # Lazy iteration over all pages (fetch next results on demand) video_generator = client.search_all(query, filters=filters, sort_option=sort_option) first_video = next(video_generator) print("First video:", first_video) second_video = next(video_generator) print("Second video:", second_video)

Command-Line Interface (CLI)

The library provides a CLI. Once installed, you can run:

youtube_caption_finder --license CREATIVE_COMMONS --sort VIEW_COUNT --order desc "USA taxes"

This command will perform a search with the specified parameters and print out the results.

Working with Channels

The library includes a module to extract a channel’s canonical ID from its URL—even for vanity URLs. For example:

from youtube_caption_finder.channel import Channel channel_url = "https://www.youtube.com/@examplechannel" channel = Channel(channel_url) print("Channel ID:", channel.channel_id)

API Reference

YoutubeCaptionFinder

The main client class.

search(query, channel_id=None, filters=None, sort_option=None) Returns a list of VideoInfo objects from the first page of search results.
search_all(query, channel_id=None, filters=None, sort_option=None) Returns a generator yielding VideoInfo objects across pages. You can use next() to fetch additional results.
get_filters(query, channel_id=None, filters=None, sort_option=None) Returns a dictionary of available filter options from the search page.

Filters

A dataclass that encapsulates filtering options.

set_title(title): Set a video title filter.
set_views(min_views, max_views): Set a views range filter.
set_likes(min_likes, max_likes): Set a likes range filter.
set_duration(start_duration, end_duration): Set a duration filter (in seconds).
set_date_range(start, end): Set a date range filter (using ISO date strings or date objects).
set_license(license_type): Set the license type using the LicenseType enum.
to_dict(): Serializes filter settings into a dictionary suitable for URL parameters.

Sorting Options

SortField: Enum defining sorting fields (e.g., UPLOAD_DATE, VIEW_COUNT).
SortOrder: Enum defining sort order (ASC or DESC).
SortOption: Class that combines a SortField and SortOrder. Use the to_dict() method to serialize sorting options.

VideoInfo

A dataclass representing a YouTube video result. Attributes include video_id, title, channel, views, likes, upload_date, etc.

Lazy Loading of Results

The library supports lazy loading of search results through the search_all() method in the YoutubeCaptionFinder class. This method returns a generator that:

Requests a page of results based on an internal page counter.
Yields individual VideoInfo objects one by one.
Automatically advances to the next page when the current page is exhausted. This allows you to process search results on demand without waiting for all pages to load.

Example of On-Demand Loading

client = YoutubeCaptionFinder() results = client.search_all("USA taxes") # Get the next result by calling next() first_result = next(results) print(first_result) # Subsequent calls to next() will fetch additional videos (and pages if needed)

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch for your feature or bug fix.
Follow PEP‑8 and include proper docstrings and tests.
Submit a pull request with a detailed description of your changes.

License

This project is licensed under the Apache-2.0 license. See the LICENSE file for details.

Disclaimer

This project is intended solely for educational and research purposes. Users are advised that utilizing this project may result in actions that disregard the directives specified in a website's robots.txt file. The robots.txt file is a standard used by websites to communicate with web crawlers and other automated agents about which areas of the site should not be processed or analyzed.

By using this project, you acknowledge and agree that the author is not responsible or liable for any misuse or damage caused by your use of the project. It is your responsibility to ensure that your use of this project complies with all applicable laws and regulations, as well as the terms of service of any websites you interact with. The author explicitly disclaims any liability for actions taken by users that contravene website policies or legal statutes.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src/youtube_caption_finder		src/youtube_caption_finder
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

youtube_caption_finder

Table of Contents

Overview

Features

Installation

Using pip

Development Installation

Usage

Programmatic Usage

Command-Line Interface (CLI)

Working with Channels

API Reference

YoutubeCaptionFinder

Filters

Sorting Options

VideoInfo

Lazy Loading of Results

Example of On-Demand Loading

Contributing

License

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

youtube_caption_finder

Table of Contents

Overview

Features

Installation

Using pip

Development Installation

Usage

Programmatic Usage

Command-Line Interface (CLI)

Working with Channels

API Reference

YoutubeCaptionFinder

Filters

Sorting Options

VideoInfo

Lazy Loading of Results

Example of On-Demand Loading

Contributing

License

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages