feat(data structures): stream checker and trie node #107

BrianLusina · 2025-11-19T06:25:06Z

Describe your change:

This adds a stream checker data structure for suffix matches

The stream checker data structure leverages the use of the Trie data structure to find suffixes that match words that it was initialized with. Since the trie is a Prefix Tree essentially matching on prefixes, this required a reverse of the Trie to instead match on suffixes.

Note that not change to the Trie node is changed other than the
initialization of using char in the constructor.

BREAKING CHANGE

The Trie data structure does not handle the search
correctly anymore and will need to be refactored to cater for the
changes that have been introduced.

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Documentation change?

Checklist:

I have read CONTRIBUTING.md.
This pull request is all my own work -- I have not plagiarized.
I know that pull requests will not be merged if they fail the automated tests.
This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
All new Python files are placed inside an existing directory.
All filenames are in all lowercase characters with no spaces or dashes.
All functions and variable names follow Python naming conventions.
All function parameters and return values are annotated with Python type hints.
All functions have doctests that pass the automated testing.
All new algorithms have a URL in its comments that points to Wikipedia or other similar explanation.
If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

Summary by CodeRabbit

New Features
- Added Stream Checker data structure to process character streams and detect word matches in real-time
- Added Trie data structure for efficient string searching and prefix matching operations
Documentation
- Added Stream Checker documentation with constraints and behavior specifications
- Updated documentation directory to include new data structure entries
Tests
- Added comprehensive test suite for Stream Checker with multiple test scenarios

structure for suffice matches The stream checker data structure leverages the use of the Trie data structure to find suffixes that match words that it was initialized with. Since the trie is a Prefix Tree essentially matching on prefixes, this required a reverse of the Trie to instead match on suffixes. Note that not change to the Trie node is changed other than the initialization of using char in the constructor. BREAKING CHANGE The Trie data structure does not handle the search correctly anymore and will need to be refactored to cater for the changes that have been introduced.

coderabbitai · 2025-11-19T06:25:18Z

Walkthrough

The pull request introduces a Stream Checker data structure and a complete Trie implementation. It adds new files for StreamChecker (which processes character streams and checks suffix matches against a word list using a reversed Trie), implements TrieNode and Trie classes, and updates documentation to reflect these new additions in the codebase.

Changes

Cohort / File(s)	Summary
Documentation `DIRECTORY.md`	Added entries for Stream Checker under Streams > Datastructures and Trie/Trie Node under Trees > Ternary to index new data structures.
Trie Data Structure `datastructures/trees/trie/trie_node.py`, `datastructures/trees/trie/trie.py`, `datastructures/trees/trie/__init__.py`	Implemented TrieNode with children (DefaultDict) and is_end flag; created Trie class with insert, search (returns words with prefix), and starts_with methods; updated module exports to expose Trie and TrieNode.
Stream Checker Implementation `datastructures/streams/stream_checker/__init__.py`	Implemented StreamChecker class that processes character streams using a reversed-word Trie, maintaining a deque of recent characters, and returning true when a stream suffix matches any word.
Stream Checker Documentation & Tests `datastructures/streams/stream_checker/README.md`, `datastructures/streams/stream_checker/test_stream_checker.py`	Added README documenting StreamChecker behavior with constraints and examples; added three unit tests validating query results across different word and input sequences.

Sequence Diagram(s)

sequenceDiagram participant Client participant StreamChecker participant Trie participant Deque Client->>StreamChecker: __init__(["go", "hi"]) StreamChecker->>StreamChecker: reverse each word StreamChecker->>Trie: insert("og"), insert("ih") Trie->>Trie: build reversed-word trie StreamChecker->>Deque: create deque(maxlen=2) Client->>StreamChecker: query('g') StreamChecker->>Deque: append('g') StreamChecker->>Trie: traverse from root using stream buffer Trie-->>StreamChecker: no match (not end node) StreamChecker-->>Client: false Client->>StreamChecker: query('o') StreamChecker->>Deque: append('o'), deque=['g','o'] StreamChecker->>Trie: traverse ['o','g'] in reverse Trie-->>StreamChecker: match found (end node) StreamChecker-->>Client: true Client->>StreamChecker: query('i') StreamChecker->>Deque: append('i'), deque=['o','i'] StreamChecker->>Trie: traverse ['i','o'] in reverse Trie-->>StreamChecker: no match StreamChecker-->>Client: false

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Trie implementation logic: Verify insert/search/starts_with methods correctly traverse and build/search the trie structure, especially the DFS logic in search().
StreamChecker initialization & query logic: Confirm reversed-word insertion, deque management with maxlen constraint, and reverse-stream traversal through Trie are correct.
Test coverage: Validate that test cases adequately exercise edge cases (empty streams, single-character words, multiple matches).
TrieNode defaultdict usage: Ensure the recursive defaultdict pattern with TrieNode factories works as intended without circular import issues.

Poem

🐰 A Trie takes root, reversed words now flow,
Through streams of letters, we finally know—
When "go" meets "og" in a suffix dance,
The Checker returns true at first glance! ✨🌳

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: adding a stream checker data structure and trie node implementation.
Description check	✅ Passed	The description adequately explains the change, includes necessary context about the Trie reversal approach, and completes most checklist items with documentation of potential breaking changes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/stream-checker

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

DIRECTORY.md (1)
268-270: Fix list indentation to satisfy markdownlint (MD007).

markdownlint-cli2 is flagging the new Streams and Trie bullets for incorrect indentation. Adjusting leading spaces will keep this index consistent and clear.

Suggested diff:
- * Streams - * Stream Checker - * [Test Stream Checker](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/streams/stream_checker/test_stream_checker.py) +* Streams + * Stream Checker + * [Test Stream Checker](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/streams/stream_checker/test_stream_checker.py) @@ - * Trie - * [Trie](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/trees/trie/trie.py) - * [Trie Node](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/trees/trie/trie_node.py) + * Trie + * [Trie](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/trees/trie/trie.py) + * [Trie Node](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/trees/trie/trie_node.py)
Also applies to: 309-311
datastructures/streams/stream_checker/test_stream_checker.py (1)

1-35: StreamChecker tests cover core behavior; consider one overlapping-words case.

The three tests validate basic true/false behavior and multi-query sequences correctly. As a small enhancement, you might add a case with overlapping longer words (e.g., words = ["cd", "bcd", "abcd"] with stream "a", "b", "c", "d") to guard against regressions in the reversed-trie traversal logic.

datastructures/streams/stream_checker/__init__.py (1)

1-73: StreamChecker logic looks correct; clarify docs and consider tiny cleanups.

The reversed-trie + bounded deque implementation is sound and matches the intended suffix-checker semantics, assuming the TrieNode import/annotation issue is fixed as noted in datastructures/trees/trie/trie_node.py.

A few small, non-blocking improvements:

Docstring accuracy (lines 42–55):

The comment “True if the letter is the end of a word” understates the behavior; the method returns True if any suffix of the current stream forms a word.

The complexity text repeats itself as O(L) and O(Lmax); you could simplify to O(min(L, Lmax)) and then note that Lmax is bounded (≤ longest word length), so per-query cost is effectively O(1) in that parameter.

Micro-optimizations (optional):

for word in self.words[::-1]: could be for word in reversed(self.words): to avoid copying the list.

self.words is only used in __build_trie; if you don’t need it later, you could drop the instance attribute and just pass words into __build_trie to reduce stored state.

These are minor; the core implementation is good as-is.
datastructures/trees/trie/trie.py (1)
9-20: Consider adding a docstring.

The insertion logic is correct, but adding a docstring would improve API documentation consistency (especially since starts_with has one).
 def insert(self, word: str) -> None: + """ + Inserts a word into the trie. +  + Args: + word: The word to insert into the trie. + """ curr = self.root

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37b2941 and 4993a24.

⛔ Files ignored due to path filters (3)

datastructures/streams/stream_checker/images/examples/stream_checker_example_1.png is excluded by !**/*.png
datastructures/streams/stream_checker/images/examples/stream_checker_example_2.png is excluded by !**/*.png
datastructures/streams/stream_checker/images/examples/stream_checker_example_3.png is excluded by !**/*.png

📒 Files selected for processing (7)

DIRECTORY.md (2 hunks)
datastructures/streams/stream_checker/README.md (1 hunks)
datastructures/streams/stream_checker/__init__.py (1 hunks)
datastructures/streams/stream_checker/test_stream_checker.py (1 hunks)
datastructures/trees/trie/__init__.py (1 hunks)
datastructures/trees/trie/trie.py (1 hunks)
datastructures/trees/trie/trie_node.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

datastructures/trees/trie/__init__.py (2)

datastructures/trees/trie/trie_node.py (1)

TrieNode (5-23)

datastructures/trees/trie/trie.py (1)

Trie (5-57)

datastructures/streams/stream_checker/__init__.py (1)

datastructures/trees/trie/trie_node.py (1)

TrieNode (5-23)

datastructures/streams/stream_checker/test_stream_checker.py (1)

datastructures/streams/stream_checker/__init__.py (2)

StreamChecker (6-73)

query (42-73)

datastructures/trees/trie/trie.py (1)

datastructures/trees/trie/trie_node.py (1)

TrieNode (5-23)

🪛 markdownlint-cli2 (0.18.1)

DIRECTORY.md

268-268: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

269-269: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

270-270: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

309-309: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

310-310: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

311-311: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

🔇 Additional comments (3)

datastructures/trees/trie/__init__.py (1)

1-8: Re-export of Trie and TrieNode looks good.

The explicit imports plus __all__ = ["Trie", "TrieNode"] cleanly define the package surface and support from datastructures.trees.trie import Trie, TrieNode as intended.

datastructures/trees/trie/trie.py (2)

5-7: LGTM!

The Trie initialization is straightforward and correct.

46-57: LGTM!

The starts_with method correctly checks for prefix existence in the Trie, with appropriate documentation.

datastructures/streams/stream_checker/README.md

datastructures/trees/trie/trie_node.py

datastructures/trees/trie/trie.py

BrianLusina and others added 2 commits November 19, 2025 09:19

updating DIRECTORY.md

4993a24

BrianLusina self-assigned this Nov 19, 2025

BrianLusina added enhancement Algorithm Algorithm Problem Datastructures Datastructures Documentation Documentation Updates Array Array data structure Hash Map Hash Map Data structure Trees Trie Queue labels Nov 19, 2025

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

datastructures/streams/stream_checker/README.md Show resolved Hide resolved

datastructures/trees/trie/trie_node.py Show resolved Hide resolved

datastructures/trees/trie/trie.py Show resolved Hide resolved

BrianLusina merged commit 484b865 into main Nov 21, 2025
5 of 6 checks passed

BrianLusina deleted the feat/stream-checker branch November 21, 2025 05:22

coderabbitai bot mentioned this pull request Nov 21, 2025

feat(strings, trie): is prefix of word #108

Merged

14 tasks

coderabbitai bot mentioned this pull request Dec 1, 2025

Graphs | Vertex & Graph implementations #77

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(data structures): stream checker and trie node #107

feat(data structures): stream checker and trie node #107

Uh oh!

BrianLusina commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2025 •

edited

Loading

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

feat(data structures): stream checker and trie node #107

feat(data structures): stream checker and trie node #107

Uh oh!

Conversation

BrianLusina commented Nov 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your change:

Checklist:

Summary by CodeRabbit

coderabbitai bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

BrianLusina commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2025 •

edited

Loading