- Notifications
You must be signed in to change notification settings - Fork 4.4k
Closed as not planned
Closed as not planned
Copy link
Description
Feature Request
Add the ability to filter web search results by including or excluding specific domains/websites in the OpenAI Responses API web_search tool.
Problem Statement
Currently, the OpenAI Responses API web_search tool configuration only supports basic parameters:
response = client.responses.create( model="gpt-4o", tools=[ { "type": "web_search_preview", "user_location": { "type": "approximate", "country": "US", "city": "San Francisco", }, } ], input="Search query here", )This limits users who need to:
- Focus searches on specific trusted sources
- Exclude known unreliable or irrelevant domains
- Customize search results for specific use cases (e.g., academic research, official sources only)
Proposed Solution
Add two new optional parameters to the web_search tool configuration, similar to Perplexity AI's implementation:
include_domains: List of domains to limit search results toexclude_domains: List of domains to exclude from search results
Example Implementation
Current OpenAI Implementation:
response = client.responses.create( model="gpt-4o", tools=[ { "type": "web_search_preview", "user_location": { "type": "approximate", "country": "US", }, } ], input="Latest AI research papers", )Proposed Enhancement:
response = client.responses.create( model="gpt-4o", tools=[ { "type": "web_search_preview", "user_location": { "type": "approximate", "country": "US", }, # New domain filtering parameters "include_domains": ["arxiv.org", "openai.com", "nature.com"], "exclude_domains": ["medium.com", "reddit.com"] } ], input="Latest AI research papers", )Reference: Perplexity AI Implementation
Perplexity's API already supports this functionality:
import requests response = requests.post( "https://api.perplexity.ai/search", headers={ "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" }, json={ "query": "machine learning research", "include_domains": ["arxiv.org", "ieee.org", "acm.org"], "exclude_domains": ["blogspot.com", "wordpress.com"] } )Use Cases
- Academic Research: Include only .edu domains and academic publishers
- Official Information: Focus on government (.gov) and organizational (.org) domains
- Technical Documentation: Include official documentation sites, exclude forums
- News Aggregation: Include trusted news sources, exclude tabloids
- Product Research: Include official vendor sites, exclude affiliate spam
Implementation Considerations
- Both parameters should be optional to maintain backward compatibility
- Support for wildcard patterns (e.g.,
*.edu,*.gov) - Consider adding validation for domain format
- Document any limitations on the number of domains that can be filtered
- Ensure the filtering happens at the search API level for efficiency
Benefits
- More precise and relevant search results
- Better control over information sources
- Reduced noise from unreliable sources
- Improved efficiency by filtering early in the search process
- Feature parity with competing APIs like Perplexity
Additional Considerations
The web_search tool could also benefit from:
- A
sitesparameter to search within specific sites only (similar to Google'ssite:operator) - Support for excluding specific URL patterns, not just domains
- Option to prioritize certain domains in results ranking
This enhancement would significantly improve the utility of the web_search tool for developers building applications that require high-quality, domain-specific information retrieval.
MengAiDev and chtmp223
Metadata
Metadata
Assignees
Labels
No labels