Skip to content

Conversation

@BrianLusina
Copy link
Owner

@BrianLusina BrianLusina commented Nov 19, 2025

Describe your change:

This adds a stream checker data structure for suffix matches

The stream checker data structure leverages the use of the Trie data structure to find suffixes that match words that it was initialized with. Since the trie is a Prefix Tree essentially matching on prefixes, this required a reverse of the Trie to instead match on suffixes.

Note that not change to the Trie node is changed other than the
initialization of using char in the constructor.

BREAKING CHANGE

The Trie data structure does not handle the search
correctly anymore and will need to be refactored to cater for the
changes that have been introduced.

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms have a URL in its comments that points to Wikipedia or other similar explanation.
  • If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

Summary by CodeRabbit

  • New Features
    • Added Stream Checker data structure to process character streams and detect word matches in real-time
    • Added Trie data structure for efficient string searching and prefix matching operations
  • Documentation
    • Added Stream Checker documentation with constraints and behavior specifications
    • Updated documentation directory to include new data structure entries
  • Tests
    • Added comprehensive test suite for Stream Checker with multiple test scenarios
BrianLusina and others added 2 commits November 19, 2025 09:19
structure for suffice matches The stream checker data structure leverages the use of the Trie data structure to find suffixes that match words that it was initialized with. Since the trie is a Prefix Tree essentially matching on prefixes, this required a reverse of the Trie to instead match on suffixes. Note that not change to the Trie node is changed other than the initialization of using char in the constructor. BREAKING CHANGE The Trie data structure does not handle the search correctly anymore and will need to be refactored to cater for the changes that have been introduced.
@BrianLusina BrianLusina self-assigned this Nov 19, 2025
@BrianLusina BrianLusina added enhancement Algorithm Algorithm Problem Datastructures Datastructures Documentation Documentation Updates Array Array data structure Hash Map Hash Map Data structure Trees Trie Queue labels Nov 19, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Walkthrough

The pull request introduces a Stream Checker data structure and a complete Trie implementation. It adds new files for StreamChecker (which processes character streams and checks suffix matches against a word list using a reversed Trie), implements TrieNode and Trie classes, and updates documentation to reflect these new additions in the codebase.

Changes

Cohort / File(s) Summary
Documentation
DIRECTORY.md
Added entries for Stream Checker under Streams > Datastructures and Trie/Trie Node under Trees > Ternary to index new data structures.
Trie Data Structure
datastructures/trees/trie/trie_node.py, datastructures/trees/trie/trie.py, datastructures/trees/trie/__init__.py
Implemented TrieNode with children (DefaultDict) and is_end flag; created Trie class with insert, search (returns words with prefix), and starts_with methods; updated module exports to expose Trie and TrieNode.
Stream Checker Implementation
datastructures/streams/stream_checker/__init__.py
Implemented StreamChecker class that processes character streams using a reversed-word Trie, maintaining a deque of recent characters, and returning true when a stream suffix matches any word.
Stream Checker Documentation & Tests
datastructures/streams/stream_checker/README.md, datastructures/streams/stream_checker/test_stream_checker.py
Added README documenting StreamChecker behavior with constraints and examples; added three unit tests validating query results across different word and input sequences.

Sequence Diagram(s)

sequenceDiagram participant Client participant StreamChecker participant Trie participant Deque Client->>StreamChecker: __init__(["go", "hi"]) StreamChecker->>StreamChecker: reverse each word StreamChecker->>Trie: insert("og"), insert("ih") Trie->>Trie: build reversed-word trie StreamChecker->>Deque: create deque(maxlen=2) Client->>StreamChecker: query('g') StreamChecker->>Deque: append('g') StreamChecker->>Trie: traverse from root using stream buffer Trie-->>StreamChecker: no match (not end node) StreamChecker-->>Client: false Client->>StreamChecker: query('o') StreamChecker->>Deque: append('o'), deque=['g','o'] StreamChecker->>Trie: traverse ['o','g'] in reverse Trie-->>StreamChecker: match found (end node) StreamChecker-->>Client: true Client->>StreamChecker: query('i') StreamChecker->>Deque: append('i'), deque=['o','i'] StreamChecker->>Trie: traverse ['i','o'] in reverse Trie-->>StreamChecker: no match StreamChecker-->>Client: false 
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Trie implementation logic: Verify insert/search/starts_with methods correctly traverse and build/search the trie structure, especially the DFS logic in search().
  • StreamChecker initialization & query logic: Confirm reversed-word insertion, deque management with maxlen constraint, and reverse-stream traversal through Trie are correct.
  • Test coverage: Validate that test cases adequately exercise edge cases (empty streams, single-character words, multiple matches).
  • TrieNode defaultdict usage: Ensure the recursive defaultdict pattern with TrieNode factories works as intended without circular import issues.

Poem

🐰 A Trie takes root, reversed words now flow,
Through streams of letters, we finally know—
When "go" meets "og" in a suffix dance,
The Checker returns true at first glance! ✨🌳

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: adding a stream checker data structure and trie node implementation.
Description check ✅ Passed The description adequately explains the change, includes necessary context about the Trie reversal approach, and completes most checklist items with documentation of potential breaking changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/stream-checker

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
DIRECTORY.md (1)

268-270: Fix list indentation to satisfy markdownlint (MD007).

markdownlint-cli2 is flagging the new Streams and Trie bullets for incorrect indentation. Adjusting leading spaces will keep this index consistent and clear.

Suggested diff:

- * Streams - * Stream Checker - * [Test Stream Checker](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/streams/stream_checker/test_stream_checker.py) +* Streams + * Stream Checker + * [Test Stream Checker](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/streams/stream_checker/test_stream_checker.py) @@ - * Trie - * [Trie](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/trees/trie/trie.py) - * [Trie Node](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/trees/trie/trie_node.py) + * Trie + * [Trie](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/trees/trie/trie.py) + * [Trie Node](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/trees/trie/trie_node.py)

Also applies to: 309-311

datastructures/streams/stream_checker/test_stream_checker.py (1)

1-35: StreamChecker tests cover core behavior; consider one overlapping-words case.

The three tests validate basic true/false behavior and multi-query sequences correctly. As a small enhancement, you might add a case with overlapping longer words (e.g., words = ["cd", "bcd", "abcd"] with stream "a", "b", "c", "d") to guard against regressions in the reversed-trie traversal logic.

datastructures/streams/stream_checker/__init__.py (1)

1-73: StreamChecker logic looks correct; clarify docs and consider tiny cleanups.

The reversed-trie + bounded deque implementation is sound and matches the intended suffix-checker semantics, assuming the TrieNode import/annotation issue is fixed as noted in datastructures/trees/trie/trie_node.py.

A few small, non-blocking improvements:

  • Docstring accuracy (lines 42–55):

    • The comment “True if the letter is the end of a word” understates the behavior; the method returns True if any suffix of the current stream forms a word.
    • The complexity text repeats itself as O(L) and O(Lmax); you could simplify to O(min(L, Lmax)) and then note that Lmax is bounded (≤ longest word length), so per-query cost is effectively O(1) in that parameter.
  • Micro-optimizations (optional):

    • for word in self.words[::-1]: could be for word in reversed(self.words): to avoid copying the list.
    • self.words is only used in __build_trie; if you don’t need it later, you could drop the instance attribute and just pass words into __build_trie to reduce stored state.

These are minor; the core implementation is good as-is.

datastructures/trees/trie/trie.py (1)

9-20: Consider adding a docstring.

The insertion logic is correct, but adding a docstring would improve API documentation consistency (especially since starts_with has one).

 def insert(self, word: str) -> None: + """ + Inserts a word into the trie. +  + Args: + word: The word to insert into the trie. + """ curr = self.root
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37b2941 and 4993a24.

⛔ Files ignored due to path filters (3)
  • datastructures/streams/stream_checker/images/examples/stream_checker_example_1.png is excluded by !**/*.png
  • datastructures/streams/stream_checker/images/examples/stream_checker_example_2.png is excluded by !**/*.png
  • datastructures/streams/stream_checker/images/examples/stream_checker_example_3.png is excluded by !**/*.png
📒 Files selected for processing (7)
  • DIRECTORY.md (2 hunks)
  • datastructures/streams/stream_checker/README.md (1 hunks)
  • datastructures/streams/stream_checker/__init__.py (1 hunks)
  • datastructures/streams/stream_checker/test_stream_checker.py (1 hunks)
  • datastructures/trees/trie/__init__.py (1 hunks)
  • datastructures/trees/trie/trie.py (1 hunks)
  • datastructures/trees/trie/trie_node.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
datastructures/trees/trie/__init__.py (2)
datastructures/trees/trie/trie_node.py (1)
  • TrieNode (5-23)
datastructures/trees/trie/trie.py (1)
  • Trie (5-57)
datastructures/streams/stream_checker/__init__.py (1)
datastructures/trees/trie/trie_node.py (1)
  • TrieNode (5-23)
datastructures/streams/stream_checker/test_stream_checker.py (1)
datastructures/streams/stream_checker/__init__.py (2)
  • StreamChecker (6-73)
  • query (42-73)
datastructures/trees/trie/trie.py (1)
datastructures/trees/trie/trie_node.py (1)
  • TrieNode (5-23)
🪛 markdownlint-cli2 (0.18.1)
DIRECTORY.md

268-268: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


269-269: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


270-270: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)


309-309: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


310-310: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)


311-311: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

🔇 Additional comments (3)
datastructures/trees/trie/__init__.py (1)

1-8: Re-export of Trie and TrieNode looks good.

The explicit imports plus __all__ = ["Trie", "TrieNode"] cleanly define the package surface and support from datastructures.trees.trie import Trie, TrieNode as intended.

datastructures/trees/trie/trie.py (2)

5-7: LGTM!

The Trie initialization is straightforward and correct.


46-57: LGTM!

The starts_with method correctly checks for prefix existence in the Trie, with appropriate documentation.

@BrianLusina BrianLusina merged commit 484b865 into main Nov 21, 2025
5 of 6 checks passed
@BrianLusina BrianLusina deleted the feat/stream-checker branch November 21, 2025 05:22
@coderabbitai coderabbitai bot mentioned this pull request Nov 21, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Algorithm Algorithm Problem Array Array data structure Datastructures Datastructures Documentation Documentation Updates enhancement Hash Map Hash Map Data structure Queue Trees Trie

2 participants