Databricks Community

the_peterlandis · a week ago

I need help writing a filter. I want to pre-filter a vector index before performing a hybrid search and create this as a function. Below is a simple example of searching for products for a given customer. A prefilter is key as this provides authorizations for searching a vector index before applying the top k which reduces the vector space searching as a prefilter before the search. I am not seeing any filter capability like how you would call the API.
Example of the search API with prefilter

results = index.similarity_search(
query_text=question,
query_type = "HYBRID",
columns=["content", "product", "product_description", "product_id", "purchase_date"],
filters="{"customer_emai":customer_email},
num_results=5
)

Below is the SQL Function I need help on

%sql

CREATE OR REPLACE FUNCTION get_customer_products(

search_query string comment 'Search query',

customer_email string comment 'Customer email',

)

RETURNS TABLE (

product STRING,
product_description STRING,

product_id int,

purchase_date_timestamp timestamp

)

COMMENT 'Retrieve the customer products'

LANGUAGE SQL

RETURN (

SELECT

content,

product_id,

purchase_date_timestamp

from

vector_search(

index=> 'gold.my_store.products',

query => search_query,

query_type => 'HYBRID',

filter => '{"customer_email": "' || customer_email || '"}',

num_results => 5

)

KaushalVachhani · a week ago

@the_peterlandis, Yes, currently vector_search SQL function doesn't provide pre filter support. However, if you must implement the UC function for this, you can do it something like below using Python code with filters.

%sql CREATE OR REPLACE FUNCTION kaushal.kaushal.vector_similarity_search( query_text STRING, filter_id INT, num_results INT ) RETURNS STRING LANGUAGE PYTHON COMMENT "Vector similarity search using authenticated client" ENVIRONMENT ( dependencies = '["databricks-vectorsearch", "databricks-sdk"]', environment_version = 'None' ) AS $$ import json import os # Get credentials from Databricks secrets # You'll need to set these up first def get_secret(scope, key): from databricks.sdk import WorkspaceClient w = WorkspaceClient() return w.secrets.get_secret(scope=scope, key=key).value # Alternative: if dbutils is available in UDF context # token = dbutils.secrets.get(scope="your-scope", key="databricks-token") # host = dbutils.secrets.get(scope="your-scope", key="databricks-host") # Set environment variables for authentication os.environ['DATABRICKS_HOST'] = get_secret("your-scope", "databricks-host") os.environ['DATABRICKS_TOKEN'] = get_secret("your-scope", "databricks-token") from databricks.vector_search.client import VectorSearchClient # Initialize client - should now pick up environment variables client = VectorSearchClient() index = client.get_index( endpoint_name="vector-search-demo-endpoint-kaushal", index_name="kaushal.kaushal.my_text_data_index" ) results = index.similarity_search( query_text=query_text, columns=["id", "content"], filters={"id": [filter_id]}, num_results=num_results ) return json.dumps(results.get('result', {}).get('data_array', [])) $$;

Then run your UC function with SQL, and you should get the expected results.

View solution in original post

the_peterlandis · a week ago

Based on this documentation, it says, it indicate sql function VECTOR_SEARCH cannot apply pre filter which prefilter is a fundamental capability for vector search. Just very surprised this is not supported.

KaushalVachhani · a week ago

@the_peterlandis, Yes, currently vector_search SQL function doesn't provide pre filter support. However, if you must implement the UC function for this, you can do it something like below using Python code with filters.

%sql CREATE OR REPLACE FUNCTION kaushal.kaushal.vector_similarity_search( query_text STRING, filter_id INT, num_results INT ) RETURNS STRING LANGUAGE PYTHON COMMENT "Vector similarity search using authenticated client" ENVIRONMENT ( dependencies = '["databricks-vectorsearch", "databricks-sdk"]', environment_version = 'None' ) AS $$ import json import os # Get credentials from Databricks secrets # You'll need to set these up first def get_secret(scope, key): from databricks.sdk import WorkspaceClient w = WorkspaceClient() return w.secrets.get_secret(scope=scope, key=key).value # Alternative: if dbutils is available in UDF context # token = dbutils.secrets.get(scope="your-scope", key="databricks-token") # host = dbutils.secrets.get(scope="your-scope", key="databricks-host") # Set environment variables for authentication os.environ['DATABRICKS_HOST'] = get_secret("your-scope", "databricks-host") os.environ['DATABRICKS_TOKEN'] = get_secret("your-scope", "databricks-token") from databricks.vector_search.client import VectorSearchClient # Initialize client - should now pick up environment variables client = VectorSearchClient() index = client.get_index( endpoint_name="vector-search-demo-endpoint-kaushal", index_name="kaushal.kaushal.my_text_data_index" ) results = index.similarity_search( query_text=query_text, columns=["id", "content"], filters={"id": [filter_id]}, num_results=num_results ) return json.dumps(results.get('result', {}).get('data_array', [])) $$;

Then run your UC function with SQL, and you should get the expected results.

Databricks Community

Help with hybrid search function

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples