Skip to content

Conversation

@alexmojaki
Copy link
Contributor

Helps with #1851

This avoids slowly looping in Python, relying more on optimised C methods. For this toy benchmark:

from elasticapm.instrumentation.packages.dbapi2 import tokenize, scan import time sql = f"Hello $q${'Peter Pan $ ' * 1000000}$q$ at Disney World" tokens = tokenize(sql) start = time.time() list(scan(tokens)) end = time.time() print(end - start)

it's about 3-4x faster with this PR, and about 10x faster if there's no inner $.

This is nice, but probably not enough. Both with and without this PR, the code assumes that anything between 2 dollar signs is potentially a dollar quote. But according to https://www.postgresql.org/docs/15/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING:

The tag, if any, of a dollar-quoted string follows the same rules as an unquoted identifier, except that it cannot contain a dollar sign

From further up:

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($).

Checking that the dollar quote is valid should avoid a lot of pointless searching for a closing quote, which should resolve the use case in the issue. It also seems important for correctness.

@cla-checker-service
Copy link

cla-checker-service bot commented Sep 27, 2023

💚 CLA has been signed

@github-actions github-actions bot added agent-python community Issues opened by the community triage Issues awaiting triage labels Sep 27, 2023
@alexmojaki
Copy link
Contributor Author

@cla-checker-service recheck?

basepi
basepi previously approved these changes Sep 27, 2023
Copy link
Contributor

@basepi basepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent! Thank you for adding tests as well.

Do you have any interest in tackling the "quote correctness check" piece that is remaining?

@alexmojaki
Copy link
Contributor Author

Do you have any interest in tackling the "quote correctness check" piece that is remaining?

Not really, sorry :)

@basepi
Copy link
Contributor

basepi commented Sep 27, 2023

No problem. I really appreciate this contribution! Big improvement.

@basepi
Copy link
Contributor

basepi commented Sep 27, 2023

@elasticmachine, run elasticsearch-ci/docs

@basepi basepi merged commit 468ecf7 into elastic:main Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-python community Issues opened by the community triage Issues awaiting triage

3 participants