- Notifications
You must be signed in to change notification settings - Fork 2
feat(data structures): stream checker and trie node #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits Select commit Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # Stream of Characters | ||
| | ||
| Design a data structure that processes a stream of characters and, after each character is received, determines if a | ||
| suffix of these characters is a string in a given array of strings words. | ||
| | ||
| For example, if words = ["dog"] and the stream adds the characters ‘d’, ‘c’, ‘a’ , and ‘t’ in sequence, the algorithm | ||
| should detect that the suffix "cat" of the stream "dcat" matches the word "cat" from the list. | ||
| | ||
| So, for words, the goal is to detect if any of these words appear as a suffix of the stream built so far. To accomplish | ||
| this, implement a class StreamChecker: | ||
| | ||
| - **Constructor**: Initializes the object with the list of target words. | ||
| - **boolean query(char letter)**: Appends a character to the stream and returns TRUE if any suffix of the stream matches | ||
| a word in the list words. | ||
| | ||
| Constraints: | ||
| | ||
| - 1 ≤ words.length ≤ 1000 | ||
| - 1 ≤ words[i].length ≤ 200 | ||
| - words[i] consists of lowercase English letters. | ||
| - letter is a lowercase English letter. | ||
| - At most 4 * 10^2 calls will be made to query. | ||
| | ||
| Examples: | ||
| | ||
|  | ||
|  | ||
|  | ||
| | ||
| | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| from typing import Deque, List | ||
| from collections import deque | ||
| from datastructures.trees.trie import TrieNode | ||
| | ||
| | ||
| class StreamChecker(object): | ||
| | ||
| def __init__(self, words: List[str]): | ||
| """ | ||
| Initializes a StreamChecker instance. | ||
| | ||
| Constructor Time: O(Ltotal), where Ltotal is the sum of the lengths of all words. This is a one-time cost. | ||
| | ||
| Parameters: | ||
| words (List[str]): List of words to be checked in the stream. | ||
| | ||
| Returns: | ||
| instance of streamchecker | ||
| """ | ||
| self.words = words | ||
| self.trie = TrieNode() | ||
| self.max_len = 0 | ||
| self.__build_trie() | ||
| # deque(maxlen) is key for stream history optimization | ||
| self.stream: Deque[str] = deque(maxlen=self.max_len) | ||
| | ||
| def __build_trie(self): | ||
| # insert the words in reverse order into the trie | ||
| for word in self.words[::-1]: | ||
| # 1. track max length for deque optimization | ||
| if len(word) > self.max_len: | ||
| self.max_len = len(word) | ||
| | ||
| current = self.trie | ||
| # 2. insert characters in reverse order | ||
| for letter in word[::-1]: | ||
| current = current.children[letter] | ||
| | ||
| # 3. Mark the end of the reversed word | ||
| current.is_end = True | ||
| | ||
| def query(self, letter: str) -> bool: | ||
| """ | ||
| Query Time: O(L), where L is the length of the stream. This is because we only traverse the trie up to the | ||
| length of the stream. | ||
| | ||
| Query Time: O(Lmax), where Lmax is the length of the longest word (up to 200). Since this is a constant limit, | ||
| we can treat this as O(1) amortized time per query. | ||
| | ||
| Parameters: | ||
| letter (str): The next letter in the stream. | ||
| | ||
| Returns: | ||
| bool: True if the letter is the end of a word, False otherwise. | ||
| """ | ||
| self.stream.append(letter) | ||
| current = self.trie | ||
| | ||
| # Iterate stream in reverse (newest character first) | ||
| for character in reversed(self.stream): | ||
| # Check for dead end (critical for query logic) | ||
| if character not in current.children: | ||
| return False | ||
| | ||
| # Traverse to the next node | ||
| current = current.children[character] | ||
| | ||
| # check for match(success condition) | ||
| if current.is_end: | ||
| return True | ||
| | ||
| # If loop finishes without a match | ||
| return False |
Binary file added BIN +50.5 KB datastructures/streams/stream_checker/images/examples/stream_checker_example_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BIN +45.8 KB datastructures/streams/stream_checker/images/examples/stream_checker_example_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added BIN +39.6 KB datastructures/streams/stream_checker/images/examples/stream_checker_example_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions 35 datastructures/streams/stream_checker/test_stream_checker.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| import unittest | ||
| from . import StreamChecker | ||
| | ||
| | ||
| class StreamCheckerTestCase(unittest.TestCase): | ||
| def test_1(self): | ||
| words = ["go", "hi"] | ||
| stream = StreamChecker(words) | ||
| self.assertFalse(stream.query("h")) | ||
| self.assertTrue(stream.query("i")) | ||
| self.assertFalse(stream.query("g")) | ||
| self.assertTrue(stream.query("o")) | ||
| self.assertFalse(stream.query("x")) | ||
| self.assertFalse(stream.query("y")) | ||
| | ||
| def test_2(self): | ||
| words = ["no", "yes"] | ||
| stream = StreamChecker(words) | ||
| self.assertFalse(stream.query("y")) | ||
| self.assertFalse(stream.query("e")) | ||
| self.assertTrue(stream.query("s")) | ||
| self.assertFalse(stream.query("n")) | ||
| self.assertTrue(stream.query("o")) | ||
| | ||
| def test_3(self): | ||
| words = ["a", "aa"] | ||
| stream = StreamChecker(words) | ||
| self.assertTrue(stream.query("a")) | ||
| self.assertTrue(stream.query("a")) | ||
| self.assertTrue(stream.query("a")) | ||
| self.assertFalse(stream.query("b")) | ||
| | ||
| | ||
| if __name__ == '__main__': | ||
| unittest.main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,65 +1,8 @@ | ||
| from collections import defaultdict | ||
| from typing import List | ||
| from datastructures.trees.trie.trie_node import TrieNode | ||
| from datastructures.trees.trie.trie import Trie | ||
| | ||
| | ||
| class TrieNode: | ||
| def __init__(self, char: str): | ||
| self.char = char | ||
| self.children = defaultdict(TrieNode) | ||
| self.is_end = False | ||
| | ||
| | ||
| class Trie: | ||
| def __init__(self): | ||
| self.root = TrieNode("") | ||
| | ||
| def insert(self, word: str) -> None: | ||
| curr = self.root | ||
| | ||
| for char in word: | ||
| if char in curr.children: | ||
| curr = curr.children[char] | ||
| | ||
| else: | ||
| new_node = TrieNode(char) | ||
| curr.children[char] = new_node | ||
| curr = new_node | ||
| | ||
| curr.is_end = True | ||
| | ||
| def search(self, word: str) -> List[str]: | ||
| curr = self.root | ||
| | ||
| if len(word) == 0: | ||
| return [] | ||
| | ||
| for char in word: | ||
| if char in curr.children: | ||
| curr = curr.children[char] | ||
| else: | ||
| return [] | ||
| | ||
| output = [] | ||
| | ||
| def dfs(node: TrieNode, prefix: str) -> None: | ||
| if node.is_end: | ||
| output.append((prefix + node.char)) | ||
| | ||
| for child in node.children.values(): | ||
| dfs(child, prefix + node.char) | ||
| | ||
| dfs(curr, word[:-1]) | ||
| return output | ||
| | ||
| def starts_with(self, prefix: str) -> bool: | ||
| """ | ||
| Returns true if the given prefix is a prefix of any word in the trie. | ||
| """ | ||
| curr = self.root | ||
| | ||
| for char in prefix: | ||
| if char not in curr.children: | ||
| return False | ||
| curr = curr.children[char] | ||
| | ||
| return True | ||
| __all__ = [ | ||
| "Trie", | ||
| "TrieNode" | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| from typing import List | ||
| from datastructures.trees.trie.trie_node import TrieNode | ||
| | ||
| | ||
| class Trie: | ||
| def __init__(self): | ||
| self.root = TrieNode() | ||
| | ||
| def insert(self, word: str) -> None: | ||
| curr = self.root | ||
| | ||
| for char in word: | ||
| if char in curr.children: | ||
| curr = curr.children[char] | ||
| else: | ||
| new_node = TrieNode() | ||
| curr.children[char] = new_node | ||
| curr = new_node | ||
| | ||
| curr.is_end = True | ||
| | ||
| def search(self, word: str) -> List[str]: | ||
| curr = self.root | ||
| | ||
| if len(word) == 0: | ||
| return [] | ||
| | ||
| for char in word: | ||
| if char in curr.children: | ||
| curr = curr.children[char] | ||
| else: | ||
| return [] | ||
| | ||
| output = [] | ||
| | ||
| def dfs(node: TrieNode, prefix: str) -> None: | ||
| if node.is_end: | ||
| output.append((prefix + node.char)) | ||
| | ||
| for child in node.children.values(): | ||
| dfs(child, prefix + node.char) | ||
| | ||
| dfs(curr, word[:-1]) | ||
| return output | ||
BrianLusina marked this conversation as resolved. Show resolved Hide resolved | ||
| | ||
| def starts_with(self, prefix: str) -> bool: | ||
| """ | ||
| Returns true if the given prefix is a prefix of any word in the trie. | ||
| """ | ||
| curr = self.root | ||
| | ||
| for char in prefix: | ||
| if char not in curr.children: | ||
| return False | ||
| curr = curr.children[char] | ||
| | ||
| return True | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| from typing import DefaultDict | ||
| from collections import defaultdict | ||
| | ||
| | ||
| class TrieNode: | ||
| def __init__(self): | ||
| # self.char = char | ||
| """ | ||
| Initializes a TrieNode instance. | ||
| | ||
| A TrieNode contains a character and a dictionary of its children. It also contains a boolean indicating whether the node is the end of a word in the Trie. | ||
| | ||
| Parameters: | ||
| None | ||
| | ||
| Returns: | ||
| None | ||
| """ | ||
| self.children: DefaultDict[str, TrieNode] = defaultdict(TrieNode) | ||
| self.is_end = False | ||
| | ||
| def __repr__(self): | ||
| return f"TrieNode({self.children.items()}, {self.is_end})" | ||
BrianLusina marked this conversation as resolved. Show resolved Hide resolved | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.