Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
Showing results for 
Search instead for 
Did you mean: 

Help with hybrid search function

the_peterlandis
New Contributor II

I need help writing a filter.  I want to pre-filter a vector index before performing a hybrid search and create this as a function.  Below is a simple example of searching for products for a given customer.  A prefilter is key as this provides authorizations for searching a vector index before applying the top k which reduces the vector space searching as a prefilter before the search.  I am not seeing any filter capability like how you would call the API.
Example of the search API with prefilter

results = index.similarity_search(
   query_text=question,
   query_type = "HYBRID",
   columns=["content", "product", "product_description", "product_id", "purchase_date"],
   filters="{"customer_emai":customer_email},
   num_results=5
)

Below is the SQL Function I need help on

%sql
CREATE OR REPLACE FUNCTION get_customer_products(
search_query string comment 'Search query',
customer_email string comment 'Customer email',
)
RETURNS TABLE (
product STRING,
product_description STRING,
product_id int,
purchase_date_timestamp timestamp
)
COMMENT 'Retrieve the customer products'
LANGUAGE SQL
RETURN (
SELECT
content,
product_id,
purchase_date_timestamp
from
vector_search(
index=> 'gold.my_store.products',
query => search_query,
query_type => 'HYBRID',
filter => '{"customer_email": "' || customer_email || '"}',
num_results => 5
)
)
1 ACCEPTED SOLUTION

Accepted Solutions

KaushalVachhani
Databricks Employee
Databricks Employee

@the_peterlandis, Yes, currently vector_search SQL function doesn't provide pre filter support. However, if you must implement the UC function for this, you can do it something like below using Python code with filters.

 

%sql CREATE OR REPLACE FUNCTION kaushal.kaushal.vector_similarity_search( query_text STRING, filter_id INT, num_results INT ) RETURNS STRING LANGUAGE PYTHON COMMENT "Vector similarity search using authenticated client" ENVIRONMENT ( dependencies = '["databricks-vectorsearch", "databricks-sdk"]', environment_version = 'None' ) AS $$ import json import os # Get credentials from Databricks secrets # You'll need to set these up first def get_secret(scope, key): from databricks.sdk import WorkspaceClient w = WorkspaceClient() return w.secrets.get_secret(scope=scope, key=key).value # Alternative: if dbutils is available in UDF context # token = dbutils.secrets.get(scope="your-scope", key="databricks-token") # host = dbutils.secrets.get(scope="your-scope", key="databricks-host") # Set environment variables for authentication os.environ['DATABRICKS_HOST'] = get_secret("your-scope", "databricks-host") os.environ['DATABRICKS_TOKEN'] = get_secret("your-scope", "databricks-token") from databricks.vector_search.client import VectorSearchClient # Initialize client - should now pick up environment variables client = VectorSearchClient() index = client.get_index( endpoint_name="vector-search-demo-endpoint-kaushal", index_name="kaushal.kaushal.my_text_data_index" ) results = index.similarity_search( query_text=query_text, columns=["id", "content"], filters={"id": [filter_id]}, num_results=num_results ) return json.dumps(results.get('result', {}).get('data_array', [])) $$;


Then run your UC function with SQL, and you should get the expected results. 

 

KaushalVachhani_0-1763803856179.png

View solution in original post

2 REPLIES 2

the_peterlandis
New Contributor II

Based on this documentation, it says, it indicate sql function VECTOR_SEARCH cannot apply pre filter which prefilter is a fundamental capability for vector search.  Just very surprised this is not supported.

KaushalVachhani
Databricks Employee
Databricks Employee

@the_peterlandis, Yes, currently vector_search SQL function doesn't provide pre filter support. However, if you must implement the UC function for this, you can do it something like below using Python code with filters.

 

%sql CREATE OR REPLACE FUNCTION kaushal.kaushal.vector_similarity_search( query_text STRING, filter_id INT, num_results INT ) RETURNS STRING LANGUAGE PYTHON COMMENT "Vector similarity search using authenticated client" ENVIRONMENT ( dependencies = '["databricks-vectorsearch", "databricks-sdk"]', environment_version = 'None' ) AS $$ import json import os # Get credentials from Databricks secrets # You'll need to set these up first def get_secret(scope, key): from databricks.sdk import WorkspaceClient w = WorkspaceClient() return w.secrets.get_secret(scope=scope, key=key).value # Alternative: if dbutils is available in UDF context # token = dbutils.secrets.get(scope="your-scope", key="databricks-token") # host = dbutils.secrets.get(scope="your-scope", key="databricks-host") # Set environment variables for authentication os.environ['DATABRICKS_HOST'] = get_secret("your-scope", "databricks-host") os.environ['DATABRICKS_TOKEN'] = get_secret("your-scope", "databricks-token") from databricks.vector_search.client import VectorSearchClient # Initialize client - should now pick up environment variables client = VectorSearchClient() index = client.get_index( endpoint_name="vector-search-demo-endpoint-kaushal", index_name="kaushal.kaushal.my_text_data_index" ) results = index.similarity_search( query_text=query_text, columns=["id", "content"], filters={"id": [filter_id]}, num_results=num_results ) return json.dumps(results.get('result', {}).get('data_array', [])) $$;


Then run your UC function with SQL, and you should get the expected results. 

 

KaushalVachhani_0-1763803856179.png

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now