Skip to content

fix: rebuild database when custom file filters are provided#495

Open
MuLeiSY2021 wants to merge 1 commit intoAsyncFuncAI:mainfrom
MuLeiSY2021:fix/respect-file-filters-with-cache
Open

fix: rebuild database when custom file filters are provided#495
MuLeiSY2021 wants to merge 1 commit intoAsyncFuncAI:mainfrom
MuLeiSY2021:fix/respect-file-filters-with-cache

Conversation

@MuLeiSY2021
Copy link
Copy Markdown

Summary

  • When custom file exclusion/inclusion parameters are provided via "Refresh Wiki" advanced options, the cached .pkl database is now removed and rebuilt with the new filters applied
  • Previously, the cached database was returned immediately, silently ignoring all filter parameters

Fixes #494

Changes

api/data_pipeline.pyprepare_db_index() method:

  • Added a check for custom filter parameters (excluded_dirs, excluded_files, included_dirs, included_files) before loading cached database
  • When custom filters are present, the existing .pkl cache file is deleted to force a rebuild with the new filters
  • When no custom filters are provided, behavior is unchanged (cache is used as before)

Test plan

  • Index a repository without any exclusions
  • Re-index the same repository with README.md in the excluded files list
  • Verify the regenerated wiki does not reference README.md
  • Verify that re-indexing without exclusions still uses the cached database (check logs for "Loading existing database...")

🤖 Generated with Claude Code

When users specify excluded/included files or directories via the "Refresh Wiki" advanced options, the cached .pkl database was returned immediately without applying the filter parameters. This caused file exclusion rules to be silently ignored. Now, when any custom file filter parameter is provided, the existing database cache is removed and rebuilt with the new filters applied. Fixes AsyncFuncAI#494 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where custom file filtering options were not correctly applied when refreshing a wiki. It introduces a mechanism to detect the presence of custom filters and, if found, forces a rebuild of the database to ensure that the new exclusion/inclusion rules are properly incorporated, thereby improving the accuracy and relevance of the generated wiki content.

Highlights

  • Custom File Filters: When custom file exclusion/inclusion parameters are provided via "Refresh Wiki" advanced options, the cached .pkl database is now removed and rebuilt with the new filters applied.
  • Previous Behavior: Previously, the cached database was returned immediately, silently ignoring all filter parameters.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies api/data_pipeline.py to introduce logic for handling custom file filters. If custom filters (excluded/included directories or files) are provided, the existing database is removed to force a rebuild, ensuring the new filters are applied. Otherwise, the existing database is loaded as before. A review comment suggests wrapping the os.remove() call in a try-except block to handle potential OSError exceptions, which would improve the application's robustness.

"Existing database contains no usable embeddings. Rebuilding embeddings..."
if has_custom_filters:
logger.info("Custom file filters provided. Rebuilding database to apply filters...")
os.remove(self.repo_paths["save_db_file"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The os.remove() call is not wrapped in a try-except block, which could lead to unhandled exceptions (e.g., PermissionError) and cause the request to fail. It's best practice to handle potential file system errors to make the application more robust.

Suggested change
os.remove(self.repo_paths["save_db_file"])
try:
os.remove(self.repo_paths["save_db_file"])
except OSError as e:
logger.warning(f"Could not remove cached database file: {e}. Proceeding to rebuild.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant