Skip to content

add(grammargen): combined language/blob generation API#12

Merged
odvcencio merged 1 commit intomainfrom
danmuji-generate-language-blob-api
Mar 18, 2026
Merged

add(grammargen): combined language/blob generation API#12
odvcencio merged 1 commit intomainfrom
danmuji-generate-language-blob-api

Conversation

@odvcencio
Copy link
Copy Markdown
Owner

@odvcencio odvcencio commented Mar 18, 2026

What does this PR do?

Adds grammargen.GenerateLanguageAndBlob and grammargen.GenerateLanguageAndBlobWithContext so callers can get both the compiled Language and the serialized blob from one generation pass, without having to depend on the diagnostic report API.

This is intended to unblock downstream callers like danmuji that want to cache the blob but still need the live Language object immediately.

Why this approach?

The combined generation path already exists internally via generateWithReportCtx(..., includeLanguage: true, includeBlob: true) and is exposed indirectly through GenerateWithReport. The gap is that callers who only need the operational outputs have to opt into a diagnostics-oriented API.

A small public wrapper keeps the common case direct, preserves the existing GenerateLanguage / GenerateLanguageWithContext shape, and avoids forcing downstream code to couple to report internals.

Correctness

  • All existing tests pass (go test ./...)
  • New tests added for new behavior
  • 206/206 grammars still smoke-parse (go test ./grammars/ -run TestSupportedLanguagesParseSmoke)
  • Correctness snapshots still match (go test ./grammars/ -run TestCorrectness)
  • CGo parity holds for affected languages (go test -tags 'cgo treesitter_c_parity' ./cgo_harness/)

Notes:

  • I have not run the validation matrix on this branch yet.
  • I did not add a dedicated test because this is a thin wrapper over the existing combined-generation path rather than new generation logic.

Performance

  • Ran benchmarks before and after (go test -bench=. -benchmem -count=5)
  • No regressions in ns/op, B/op, or allocs/op beyond noise
  • If this is a hot path change, included benchmark numbers in the PR

Notes:

  • I have not run benchmarks on this branch yet.
  • The purpose of this API is to let downstream callers avoid paying for a second generation pass when they need both outputs.

Maintainability

  • No unnecessary abstractions — code does what it needs to and nothing more
  • No speculative features or "while I'm here" cleanup outside the PR's scope
  • Scanner ports follow existing conventions (MarkEnd before SetResultSymbol, symbol resolution via ExternalSymbols)
  • No new dependencies without justification

Notes:

  • The scanner-port item is effectively N/A here because this PR does not touch scanners.

Self-review

Review your own diff before requesting review from others. Walk through it as if you're seeing it for the first time. Leave comments on your own PR pointing out the notable parts — the tricky bits, the non-obvious decisions, anything a reviewer's eye would naturally land on. Comment on what you think the crux of the solution is and why.

  • Reviewed my own diff end to end
  • Left comments on notable sections and the crux of the solution
  • Called out anything I'm unsure about or want a second opinion on

Notes:

  • The main thing I would want a second opinion on is whether maintainers want a focused API-level test here, even though the implementation is just exposing an already-used internal path.

Due diligence

  • Read the code you're changing before changing it
  • Checked that similar patterns exist elsewhere in the codebase and stayed consistent
  • If touching the parser or lexer: tested with at least 3 diverse grammars
  • If adding a grammar or scanner: verified against upstream C output

Notes:

  • The parser/lexer and grammar/scanner-specific items are N/A for this PR. This change only exposes an existing combined-generation path through a narrower public API.
- Add GenerateLanguageAndBlob and GenerateLanguageAndBlobWithContext for combined language/blob generation in a single pass - Provides direct access to both compiled Language struct and serialized blob without requiring callers to use diagnostic reports - Follows existing API pattern with both context-less and context-aware variants for cancellation support
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the grammargen package by introducing new API functions that streamline the process of generating Tree-sitter languages. The primary goal is to allow callers to efficiently obtain both the compiled Language object and its serialized binary representation in a single generation pass. This change improves performance by avoiding redundant computations and simplifies the API by decoupling common use cases from the diagnostic reporting infrastructure, making it easier for consumers like danmuji to cache blobs while retaining immediate access to the live Language object.

Highlights

  • New API Functions: Introduced grammargen.GenerateLanguageAndBlob and grammargen.GenerateLanguageAndBlobWithContext to allow callers to generate both the compiled Language struct and its serialized blob form in a single pass.
  • Improved Efficiency: These new functions eliminate the need for a second generation pass when both the Language object and its blob are required, optimizing performance for downstream callers.
  • Decoupling from Diagnostics: The new API provides a direct way to obtain operational outputs without coupling callers to the diagnostic report API, simplifying usage for common scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.


// GenerateLanguageAndBlob compiles a Grammar into both a Language struct and
// its serialized blob form in a single generation pass.
func GenerateLanguageAndBlob(g *Grammar) (*gotreesitter.Language, []byte, error) {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This zero-context wrapper keeps the public API parallel to GenerateLanguage/GenerateLanguageWithContext, so callers that only need the outputs do not have to switch over to the diagnostics-oriented GenerateWithReport surface.

// GenerateLanguageAndBlobWithContext is like GenerateLanguageAndBlob but
// accepts a context for cancellation.
func GenerateLanguageAndBlobWithContext(ctx context.Context, g *Grammar) (*gotreesitter.Language, []byte, error) {
report, err := generateWithReportCtx(ctx, g, reportBuildOptions{
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of the change: reuse the existing combined generation path so the language and blob come from one pass, rather than asking downstream callers to do GenerateLanguage plus a separate blob generation step.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces GenerateLanguageAndBlob and GenerateLanguageAndBlobWithContext to provide a convenient API for generating both a compiled Language and its serialized blob in a single pass. The implementation is a clean, thin wrapper around existing internal functionality. My main feedback is to add a dedicated test case for this new public API to ensure its correctness and long-term maintainability, a point you also raised in the PR description.

Comment on lines +34 to +36
func GenerateLanguageAndBlob(g *Grammar) (*gotreesitter.Language, []byte, error) {
return GenerateLanguageAndBlobWithContext(context.Background(), g)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this is a thin wrapper, adding a dedicated test case would be beneficial to verify the public API contract. It would ensure that both the Language object and the []byte blob are generated correctly in one pass and that they are consistent with each other.

You've noted this in the PR description, and I agree that adding a test would improve long-term maintainability. A simple test could:

  1. Call GenerateLanguageAndBlob.
  2. Verify both returned values are non-nil.
  3. Use the returned *gotreesitter.Language to parse a sample input and check for errors.
  4. Decode the returned []byte blob into a new language object.
  5. Use the decoded language to parse the same input and verify the resulting S-expression matches the one from step 3.
@odvcencio odvcencio merged commit ce4e5b5 into main Mar 18, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant