Skip to content

fix(bigquery): escape apostrophes in filter values using standard SQL quoting#38835

Open
Krishnachaitanyakc wants to merge 1 commit intoapache:masterfrom
Krishnachaitanyakc:fix/bigquery-apostrophe-filter-escaping
Open

fix(bigquery): escape apostrophes in filter values using standard SQL quoting#38835
Krishnachaitanyakc wants to merge 1 commit intoapache:masterfrom
Krishnachaitanyakc:fix/bigquery-apostrophe-filter-escaping

Conversation

@Krishnachaitanyakc
Copy link

@Krishnachaitanyakc Krishnachaitanyakc commented Mar 25, 2026

User description

SUMMARY

Fixes #35857

BigQuery errors when dashboard filters on text columns contain apostrophes (e.g. O'Brien, Fernando's).

Root cause: The sqlalchemy-bigquery dialect's process_string_literal function uses Python's repr() to render string literals when literal_binds=True is used during query compilation. When the string contains an apostrophe, repr() wraps the value in double quotes (e.g. repr("O'Brien") -> "O'Brien"). In BigQuery SQL, double-quoted tokens are identifiers (like column or table names), not string literals, so the query fails with a syntax error.

Fix: Monkey-patch the BigQuery dialect's colspecs to use a custom TypeDecorator whose literal_processor always produces single-quoted literals with properly doubled internal quotes ('O''Brien'), which is the standard SQL escaping convention that BigQuery expects. This approach follows the same pattern used for the Databricks engine spec (superset/db_engine_specs/databricks.py).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before: Filter with Fernando's generates SQL WHERE name = "Fernando's" (double-quoted identifier, causes BigQuery syntax error)

After: Filter with Fernando's generates SQL WHERE name = 'Fernando''s' (properly escaped single-quoted literal)

TESTING INSTRUCTIONS

  1. Set up a BigQuery connection in Superset
  2. Create a dataset with a text column containing values with apostrophes (e.g. names like O'Brien, Fernando's)
  3. Create a chart using this dataset
  4. Add a filter on the text column selecting a value with an apostrophe
  5. Verify the chart renders without errors
  6. Also verify filters without apostrophes continue to work normally

Unit tests are included:

  • test_string_literal_with_apostrophe - verifies apostrophe escaping
  • test_string_literal_without_apostrophe - verifies normal strings unaffected
  • test_string_literal_in_filter_with_apostrophe - verifies IN clause escaping

ADDITIONAL INFORMATION

  • Has associated issue: BigQuery errors when filters on text columns have apostrophes in them #35857
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

CodeAnt-AI Description

Escape BigQuery filter values that contain apostrophes

What Changed

  • Filters using names like O'Brien now run in BigQuery instead of failing with a syntax error
  • String values are written with standard single-quote escaping, including values inside multi-select filters
  • Normal text filters without apostrophes still work the same way

Impact

✅ Fewer BigQuery filter errors
✅ Clearer text filtering in dashboards
✅ Reliable filters for names with apostrophes

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here 

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret? 

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here 

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports. 

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review 

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

… quoting The sqlalchemy-bigquery dialect uses Python's repr() to render string literals when literal_binds=True. repr() switches to double-quote delimiters when the string contains an apostrophe (e.g. repr("O'Brien") produces "O'Brien"). In BigQuery SQL, double-quoted tokens are identifiers, not string literals, so any filter containing an apostrophe causes a syntax error. This patch monkey-patches the BigQuery dialect's colspecs to use a custom string type whose literal_processor always produces single-quoted literals with properly doubled internal quotes (standard SQL escaping). Fixes apache#35857
@dosubot dosubot bot added the data:connect:googlebigquery Related to BigQuery label Mar 25, 2026
@codeant-ai-for-open-source codeant-ai-for-open-source bot added the size:L This PR changes 100-499 lines, ignoring generated files label Mar 25, 2026
@codeant-ai-for-open-source
Copy link
Contributor

Sequence Diagram

This PR updates BigQuery SQL compilation so text filter values with apostrophes are always rendered as standard single quoted SQL literals. The flow highlights the new dialect patch and how query compilation now produces BigQuery safe filter SQL.

sequenceDiagram participant Superset participant BigQueryEngineSpec participant BigQueryDialect participant SafeStringType participant BigQuery Superset->>BigQueryEngineSpec: Load BigQuery engine spec BigQueryEngineSpec->>BigQueryDialect: Patch string literal handling Superset->>BigQueryDialect: Compile query with literal binds BigQueryDialect->>SafeStringType: Process text filter value SafeStringType-->>BigQueryDialect: Return escaped single quoted literal BigQueryDialect->>BigQuery: Execute query with valid filter literal 
Loading

Generated by CodeAnt AI

Copy link
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #2b0fe4

Actionable Suggestions - 1
  • superset/db_engine_specs/bigquery.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: 470b90a..470b90a
    • superset/db_engine_specs/bigquery.py
    • tests/unit_tests/db_engine_specs/test_bigquery.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

This helper always produces a single-quoted literal with properly doubled
internal quotes.
"""
escaped = value.replace("'", "''").replace("%", "%%")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect % escaping in string literals

The _process_string_literal function incorrectly escapes % characters to %%, but in BigQuery SQL string literals, % does not require escaping. This causes strings containing % to be misrepresented in queries (e.g., '100%' becomes '100%%').

Code suggestion
Check the AI-generated fix before applying
Suggested change
escaped = value.replace("'", "''").replace("%", "%%")
escaped = value.replace("'", "''")

Code Review Run #2b0fe4


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:connect:googlebigquery Related to BigQuery size/L size:L This PR changes 100-499 lines, ignoring generated files

1 participant