Skip to content

Index individual words in profile names for search#124

Open
alltheseas wants to merge 3 commits intodamus-io:masterfrom
alltheseas:word-search-indices
Open

Index individual words in profile names for search#124
alltheseas wants to merge 3 commits intodamus-io:masterfrom
alltheseas:word-search-indices

Conversation

@alltheseas
Copy link

@alltheseas alltheseas commented Mar 5, 2026

Summary

  • Profile search now indexes each word within name and display_name, not just the full string
  • Searching "fishcake" now finds "The Fishcake", searching "grinder" finds "The Grinder", etc.
  • Uses isspace() to split on any whitespace, skipping runs to avoid empty tokens
  • Adds migration to rebuild the search index with per-word entries
  • Includes test that verifies searching "jin" finds "Selene Jin"

Closes #123

Problem

ndb_write_profile_search_indices() only wrote a single index entry for the full name. For "The Fishcake" (indexed as "the fishcake"), searching "fishcake" would start at 'f' in the LMDB sorted index and exhaust the 128 result limit before ever reaching 't'.

Fix

New helper ndb_write_profile_search_word_indices() writes an index entry for each word position. "The Fishcake (android)" gets entries for:

  1. "the fishcake (android)" (full name, existing behavior)
  2. "fishcake (android)" (2nd word)
  3. "(android)" (3rd word)

Test plan

  • make check passes (includes new test_profile_word_search)
  • Search "jin" returns profile with display_name "Selene Jin"
  • Existing test_profile_search ("jean") still passes

Signed-off-by: alltheseas

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Profile search now matches any word within names and display names, supports camelCase splitting, and mid-name prefix queries for broader matching.
  • Bug Fixes

    • Improved search reliability with dedup tracking across index and fallback paths and a linear fallback scan when index results are insufficient.
  • Tests

    • Added unit tests for word-based, camelCase, prefix, and dedup search behaviors.
  • Chores

    • Migration added to rebuild search indices for per-word matching.
@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

Added per-word and camelCase-aware profile name indexing, a migration to rebuild profile search indices, and a fallback linear search path for cases where index results are insufficient; tests added to validate word, prefix, camelCase, and dedup behaviors.

Changes

Cohort / File(s) Summary
Profile search core
src/nostrdb.c
Added case-insensitive helpers (ndb_strncasecmp, ndb_strcasestr), per-word & camelCase indexing (ndb_write_profile_search_word_indices), refactored ndb_write_profile_search_indices to accept profile_root, and added migration function ndb_migrate_word_search_indices.
Search execution & fallback
src/nostrdb.c
Extended profile search flow: added ndb_profile_search_contains_fallback, updated ndb_query_plan_execute_profile_search to track dedup across index + fallback paths and handle lowered queries.
Migrations
src/nostrdb.c (MIGRATIONS array)
Registered new migration entry to drop/rebuild the profile_search index using per-word/camelCase entries (ndb_migrate_word_search_indices).
Tests
test.c
Added unit tests: test_profile_word_search, test_profile_fullname_prefix_search, test_profile_camelcase_standalone, test_profile_search_index_dedup, and integrated them into existing test flows.

Sequence Diagram(s)

sequenceDiagram participant Client participant QueryPlanner participant IndexDB as "Index DB (profile_search)" participant ProfileStore as "Profile Store" participant Migrator Client->>QueryPlanner: submit search query QueryPlanner->>IndexDB: query prefix entries (with dedup tracking) IndexDB-->>QueryPlanner: index results (may be partial) QueryPlanner->>ProfileStore: fallback linear contains checks for remaining candidates ProfileStore-->>QueryPlanner: matched profiles alt migration run Migrator->>IndexDB: drop old index Migrator->>ProfileStore: scan profiles ProfileStore->>IndexDB: write per-word & camelCase index entries IndexDB-->>Migrator: migration complete end 
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through names, split words and case,
I nibble "Selene" and chase "Jin" in place.
I plant per-word seeds and rebuild the field,
Now search finds snacks that used to be concealed.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: indexing individual words in profile names for search functionality.
Linked Issues check ✅ Passed The PR implementation meets all coding requirements from issue #123: per-word indexing, per-word index entries for word-level matching, migration for existing profiles, and fallback search mechanism.
Out of Scope Changes check ✅ Passed All changes are directly related to issue #123 requirements: case-insensitive utilities support profile search, word-based indexing and migration are core requirements, fallback search supports the feature, and new tests validate the functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test.c (1)

1127-1137: Consider removing redundant strncmp checks.

The condition !strncmp(search.key->search, "jin", 3) is always true within this function since you explicitly called ndb_search_profile(&txn, &search, "jin") on line 1119. This check adds no value and can be simplified.

🔧 Proposed simplification
-if (name && !strncmp(search.key->search, "jin", 3) && - strstr(name, "Jin")) { +if (name && strstr(name, "Jin")) {	found = 1;	break;	} -if (display_name && !strncmp(search.key->search, "jin", 3) && - strstr(display_name, "Jin")) { +if (display_name && strstr(display_name, "Jin")) {	found = 1;	break;	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test.c` around lines 1127 - 1137, The strncmp check against "jin" is redundant because this path is already invoked via ndb_search_profile(&txn, &search, "jin"); remove the "!strncmp(search.key->search, "jin", 3)" conditions and simplify both branches to only test for non-null name/display_name and strstr(name, "Jin") / strstr(display_name, "Jin") respectively, then set found = 1 and break; keep existing variables search, name, display_name, and found intact. 
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed. Inline comments: In `@src/nostrdb.c`: - Around line 1619-1626: The loop currently treats only literal ' ' as a delimiter which misses tabs/newlines and produces bad tokens; update the condition around the pointer p to use isspace((unsigned char)*p) from <ctype.h> and ensure the next byte is the start of a word (e.g., !isspace((unsigned char)*(p+1))) before calling ndb_make_search_key and ndb_write_profile_search_index; also skip runs of whitespace so you don't emit empty tokens and consider treating adjacent punctuation (using ispunct) as additional boundaries if needed. Refer to the loop variables name and p and the functions ndb_make_search_key and ndb_write_profile_search_index and the fields note->pubkey/note->created_at/ profile_key/txn/index when making the change. - Around line 1603-1630: ndb_write_profile_search_word_indices currently emits one index per word which can create duplicate non-consecutive hits at query time; modify this function to deduplicate terms before calling ndb_write_profile_search_index (e.g., collect unique terms for the given name or maintain a small in-function set of emitted tokens) and only call ndb_write_profile_search_index once per unique term for the profile; keep the existing call that writes the full name, then for the per-word loop normalize/trim each token and skip calling ndb_write_profile_search_index if that token was already emitted for this profile (reference symbols: ndb_write_profile_search_word_indices and ndb_write_profile_search_index, and ensure compatibility with ndb_query_plan_execute_profile_search). --- Nitpick comments: In `@test.c`: - Around line 1127-1137: The strncmp check against "jin" is redundant because this path is already invoked via ndb_search_profile(&txn, &search, "jin"); remove the "!strncmp(search.key->search, "jin", 3)" conditions and simplify both branches to only test for non-null name/display_name and strstr(name, "Jin") / strstr(display_name, "Jin") respectively, then set found = 1 and break; keep existing variables search, name, display_name, and found intact. 

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f8a7d8d-6fdc-4347-b581-ff3d6f8a2f1b

📥 Commits

Reviewing files that changed from the base of the PR and between 2d459e8 and 999b171.

📒 Files selected for processing (2)
  • src/nostrdb.c
  • test.c
@alltheseas alltheseas force-pushed the word-search-indices branch from 999b171 to 75d18f3 Compare March 5, 2026 05:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/nostrdb.c (1)

1603-1635: ⚠️ Potential issue | 🟠 Major

Per-word fanout can still surface duplicate profiles in a single search.

This helper emits multiple search keys per profile, but query-time dedupe still skips only consecutive pubkeys. The same profile can reappear non-adjacently and consume result slots.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nostrdb.c` around lines 1603 - 1635, The function ndb_write_profile_search_word_indices emits multiple search keys for each word in a profile name which can produce duplicate index entries (and non-adjacent duplicates at query time). Modify ndb_write_profile_search_word_indices to avoid emitting the same search term more than once per profile: track previously emitted terms (e.g., keep the last emitted term string or a small in-function set of seen word substrings) and skip calling ndb_make_search_key/ndb_write_profile_search_index when the current term (p+1) equals any previously emitted term for this name; retain the existing logic for writing the full name index and whitespace skipping. Ensure comparisons are done on the term string (case/normalization as used by ndb_make_search_key) and free any temporary storage if you allocate memory. 
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed. Duplicate comments: In `@src/nostrdb.c`: - Around line 1603-1635: The function ndb_write_profile_search_word_indices emits multiple search keys for each word in a profile name which can produce duplicate index entries (and non-adjacent duplicates at query time). Modify ndb_write_profile_search_word_indices to avoid emitting the same search term more than once per profile: track previously emitted terms (e.g., keep the last emitted term string or a small in-function set of seen word substrings) and skip calling ndb_make_search_key/ndb_write_profile_search_index when the current term (p+1) equals any previously emitted term for this name; retain the existing logic for writing the full name index and whitespace skipping. Ensure comparisons are done on the term string (case/normalization as used by ndb_make_search_key) and free any temporary storage if you allocate memory. 

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5040c2ef-3cf7-4542-9c3e-2d953206620b

📥 Commits

Reviewing files that changed from the base of the PR and between 999b171 and 75d18f3.

📒 Files selected for processing (2)
  • src/nostrdb.c
  • test.c
🚧 Files skipped from review as they are similar to previous changes (1)
  • test.c
alltheseas and others added 3 commits March 5, 2026 13:25
Profile search only indexed the full name/display_name string, so searching "fishcake" could not find "The Fishcake" because the LMDB cursor would start at 'f' and exhaust the 128 result limit before reaching 'the fishcake' in sort order. Index each word within names so that MDB_SET_RANGE can find non-first words directly. Uses isspace() to handle any whitespace delimiter and skips runs of whitespace to avoid empty tokens. Includes a migration to rebuild the search index. Closes damus-io#123 Signed-off-by: alltheseas Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend profile search with two improvements: 1. CamelCase boundary splitting: walk the original mixed-case name and at each lowercase-to-uppercase transition, index starting from the uppercase character. "TheGrinder" now also indexes "grinder". 2. Linear substring fallback: when the index phase doesn't fill the result buffer, scan all profiles with case-insensitive substring matching on name/display_name. This catches cases the index misses (e.g. all-caps names, mid-word queries). Includes prefix early-exit in the index loop so irrelevant trailing entries don't block the fallback, and dedup tracking so the fallback doesn't re-emit pubkeys already found by the index phase. Closes damus-io/damus#3661 Signed-off-by: alltheseas Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a profile has multiple search index entries that prefix-match the query (e.g. name "Fish Fisherman" produces entries for "fish fisherman" and "fish" — both matching query "fish"), the index cursor could emit the same pubkey multiple times. This wasted result slots and hid other matching profiles. Check each index hit against the found_pubkeys array before emitting, skipping pubkeys already seen. End-to-end test: insert two profiles ("Fish Fisherman" and "Fishy"), query "fish" with capacity=2, assert both are returned with distinct pubkeys. Without the fix, profile A consumes both slots via duplicate. Closes damus-io#125 Signed-off-by: alltheseas Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alltheseas alltheseas force-pushed the word-search-indices branch from 4099f85 to 18ed7ba Compare March 5, 2026 19:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/nostrdb.c (1)

5037-5039: ⚠️ Potential issue | 🟠 Major

Dedup tracking is capped at 64 and can reintroduce duplicate profile results.

Once more than 64 unique pubkeys are seen, additional pubkeys are no longer tracked, so later duplicate hits can slip in and consume result slots.

🔧 Suggested fix (replace fixed cap with dynamic capacity)
-static int ndb_profile_search_contains_fallback( +static int ndb_profile_search_contains_fallback(	struct ndb_txn *txn,	const char *search,	unsigned char *filter_pubkey,	struct ndb_filter *note_filter,	struct ndb_query_state *results, -unsigned char found_pubkeys[][32], +unsigned char (*found_pubkeys)[32], +int found_cap,	int *num_found) @@ -if (*num_found < 64) +if (*num_found < found_cap)	memcpy(found_pubkeys[(*num_found)++], note->pubkey, 32);
-unsigned char found_pubkeys[64][32]; -int num_found = 0; +int found_cap; +unsigned char (*found_pubkeys)[32] = NULL; +int num_found = 0; @@	if (!(search = ndb_filter_find_search(filter)))	return 0; + +found_cap = results->limit > 0 ? (int)results->limit : 256; +found_pubkeys = calloc(found_cap, sizeof(*found_pubkeys)); +if (!found_pubkeys) +return 0; @@ -if (num_found < 64) +if (num_found < found_cap)	memcpy(found_pubkeys[num_found++], profile_search.key->id, 32); @@ -if (!ndb_profile_search_contains_fallback(txn, search, +if (!ndb_profile_search_contains_fallback(txn, search, filter_pubkey, f, results, found_pubkeys, + found_cap, &num_found))	goto fail; @@ +free(found_pubkeys);	ndb_filter_destroy(f);	return 1; fail: +free(found_pubkeys);	ndb_filter_destroy(f);	return 0;

Also applies to: 5070-5072, 5138-5141

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nostrdb.c` around lines 5037 - 5039, The code currently caps dedup tracking at 64 when pushing into found_pubkeys (checking if *num_found < 64) which allows duplicates once the cap is reached; replace the fixed-cap logic by adding a dynamic capacity (e.g., found_pubkeys_cap) and ensure found_pubkeys is realloc'd to grow (double the capacity) when *num_found >= found_pubkeys_cap before copying the 32-byte pubkey with memcpy and incrementing *num_found; initialize the capacity (e.g., 64) when allocating found_pubkeys and free it as before; apply the same change to the other duplicate spots that use the same fixed 64 check (the other instances of found_pubkeys/*num_found checks). 
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed. Inline comments: In `@test.c`: - Around line 2805-2823: Replace the fixed sleep(1) used before running the queries with a deterministic polling loop that repeatedly runs the same query (using ndb_begin_query/ndb_query/ndb_end_query and the filter built with ndb_filter_init/ndb_filter_start_field/ndb_filter_add_*/ndb_filter_end) until the expected count is observed or a timeout elapses; implement this pattern in both places (the current test expecting count == 1 and test_profile_search_index_dedup expecting count == 2) so the test queries are retried with a short sleep interval up to a max timeout and then fail if the expected count isn’t reached. - Around line 1127-1129: The substring checks using strstr on variables name and display_name can produce false positives; replace both occurrences (the block setting found = 1 and the similar block at the later occurrence) with exact equality checks against the expected identifier (e.g., strcmp(name, "Jin") == 0 or strcmp(display_name, "Jin") == 0) so the assertion only matches the exact profile, keep the same variables name, display_name and found to locate and update the conditions. --- Duplicate comments: In `@src/nostrdb.c`: - Around line 5037-5039: The code currently caps dedup tracking at 64 when pushing into found_pubkeys (checking if *num_found < 64) which allows duplicates once the cap is reached; replace the fixed-cap logic by adding a dynamic capacity (e.g., found_pubkeys_cap) and ensure found_pubkeys is realloc'd to grow (double the capacity) when *num_found >= found_pubkeys_cap before copying the 32-byte pubkey with memcpy and incrementing *num_found; initialize the capacity (e.g., 64) when allocating found_pubkeys and free it as before; apply the same change to the other duplicate spots that use the same fixed 64 check (the other instances of found_pubkeys/*num_found checks). 

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f33b756f-d3f8-4025-91b6-dfc6d543b239

📥 Commits

Reviewing files that changed from the base of the PR and between 75d18f3 and 4099f85.

📒 Files selected for processing (2)
  • src/nostrdb.c
  • test.c
Comment on lines +1127 to +1129
if ((name && strstr(name, "Jin")) ||
(display_name && strstr(display_name, "Jin"))) {
found = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten assertions to avoid false-positive matches.

At Line 1127 and Line 1160, the checks are substring-based and may pass for the wrong profile if fixtures gain more similar names. Prefer exact expected identity checks for this regression target.

💡 Suggested assertion tightening
-if ((name && strstr(name, "Jin")) || - (display_name && strstr(display_name, "Jin"))) { +if ((name && !strcmp(name, "SeleneJin")) || + (display_name && !strcmp(display_name, "Selene Jin"))) {	found = 1;	break;	}
-if (name && strstr(name, "Selene")) { +if (name && !strcmp(name, "SeleneJin")) {	found = 1;	break;	}

Also applies to: 1160-1163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test.c` around lines 1127 - 1129, The substring checks using strstr on variables name and display_name can produce false positives; replace both occurrences (the block setting found = 1 and the similar block at the later occurrence) with exact equality checks against the expected identifier (e.g., strcmp(name, "Jin") == 0 or strcmp(display_name, "Jin") == 0) so the assertion only matches the exact profile, keep the same variables name, display_name and found to locate and update the conditions. 
Comment on lines +2805 to +2823
// wait for ingestion
sleep(1);

// now search for "grinder" — should find "TheGrinder" via camelCase split
assert(ndb_filter_init(f));
assert(ndb_filter_start_field(f, NDB_FILTER_SEARCH));
assert(ndb_filter_add_str_element(f, "grinder"));
ndb_filter_end_field(f);
assert(ndb_filter_start_field(f, NDB_FILTER_KINDS));
assert(ndb_filter_add_int_element(f, 0));
ndb_filter_end_field(f);
ndb_filter_end(f);

assert(ndb_begin_query(ndb, &txn));
assert(ndb_query(&txn, f, 1, &result, 1, &count));
ndb_end_query(&txn);

assert(count == 1);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace fixed sleeps with deterministic readiness checks.

Line 2806 and Line 2864 use sleep(1) for async ingestion, which can be flaky and adds avoidable test latency. Poll query results with a timeout instead.

⏱️ Suggested deterministic wait pattern
-// wait for ingestion -sleep(1); +// wait for ingestion deterministically (up to ~2s) +count = 0; +for (int attempts = 0; attempts < 200; attempts++) { +assert(ndb_begin_query(ndb, &txn)); +assert(ndb_query(&txn, f, 1, &result, 1, &count)); +ndb_end_query(&txn); +if (count == 1) break; +usleep(10000); +} +assert(count == 1);

Apply the same approach in test_profile_search_index_dedup (expecting count == 2) instead of sleep(1).

Also applies to: 2864-2884

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test.c` around lines 2805 - 2823, Replace the fixed sleep(1) used before running the queries with a deterministic polling loop that repeatedly runs the same query (using ndb_begin_query/ndb_query/ndb_end_query and the filter built with ndb_filter_init/ndb_filter_start_field/ndb_filter_add_*/ndb_filter_end) until the expected count is observed or a timeout elapses; implement this pattern in both places (the current test expecting count == 1 and test_profile_search_index_dedup expecting count == 2) so the test queries are retried with a short sleep interval up to a max timeout and then fail if the expected count isn’t reached. 
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
test.c (2)

2805-2806: ⚠️ Potential issue | 🟠 Major

Replace fixed sleep(1) waits with deterministic polling.

These waits are timing-sensitive and can still flake under slow CI or add avoidable delay under fast CI. Poll until expected count (or timeout) instead.

Deterministic wait pattern
-// wait for ingestion -sleep(1); +// wait for ingestion deterministically +count = 0; +for (int attempts = 0; attempts < 200; attempts++) { +assert(ndb_begin_query(ndb, &txn)); +assert(ndb_query(&txn, f, 1, &result, 1, &count)); +ndb_end_query(&txn); +if (count == 1) break; +usleep(10000); +} +assert(count == 1);
-sleep(1); +count = 0; +for (int attempts = 0; attempts < 200; attempts++) { +assert(ndb_begin_query(ndb, &txn)); +assert(ndb_query(&txn, f, 1, results, 2, &count)); +ndb_end_query(&txn); +if (count == 2) break; +usleep(10000); +} +assert(count == 2);

Also applies to: 2864-2864

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test.c` around lines 2805 - 2806, Replace the fixed sleep(1) calls with a deterministic polling loop: remove sleep(1) (and the duplicate at the other occurrence) and instead poll the actual ingestion condition (e.g., a function or variable that returns the current ingested count or completion state) in a tight loop with a short sleep (e.g., 50–200ms), breaking when the expected count/state is reached or when a hard timeout elapses; use a monotonic timer to measure elapsed time and return/log an error on timeout. Identify the waits by the existing sleep(1) occurrences and update the test to call your check function (e.g., check_ingestion_complete() or read ingested_count) inside the polling loop so the test proceeds deterministically as soon as ingestion finishes. 

1127-1129: ⚠️ Potential issue | 🟡 Minor

Use exact identity checks instead of substring matches in regression assertions.

These checks can pass on unintended profiles when fixtures include similar names. Assert the expected profile identity directly.

Suggested assertion tightening
-if ((name && strstr(name, "Jin")) || - (display_name && strstr(display_name, "Jin"))) { +if ((name && !strcmp(name, "SeleneJin")) || + (display_name && !strcmp(display_name, "Selene Jin"))) {	found = 1;	break;	}
-if (name && strstr(name, "Selene")) { +if (name && !strcmp(name, "SeleneJin")) {	found = 1;	break;	}

Also applies to: 1160-1161

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test.c` around lines 1127 - 1129, The current regression assertion uses substring matching via strstr on variables name and display_name to set found, which can produce false positives; change these checks to exact identity comparisons (e.g., strcmp or direct id equality) against the expected full name or the profile's unique identifier instead of substrings, updating the conditional that sets found (the block referencing name, display_name, and found) and do the same for the other identical check later in the file that uses strstr; ensure you compare the complete expected string or the profile id to guarantee only the intended fixture matches. 
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed. Inline comments: In `@src/nostrdb.c`: - Around line 5037-5039: The fixed-size dedup array (found_pubkeys) and the hard check "*num_found < 64" allow the cache to stop tracking pubkeys and reintroduce duplicates; replace the fixed cap with a dynamic buffer by adding a capacity variable (e.g., found_pubkeys_cap), allocate found_pubkeys initially, and whenever num_found == found_pubkeys_cap grow the buffer (realloc, doubling capacity) before memcpy; update all places that currently use the "*num_found < 64" pattern (the memcpy into found_pubkeys and related checks) so they call the grow logic instead of silently skipping inserts. --- Duplicate comments: In `@test.c`: - Around line 2805-2806: Replace the fixed sleep(1) calls with a deterministic polling loop: remove sleep(1) (and the duplicate at the other occurrence) and instead poll the actual ingestion condition (e.g., a function or variable that returns the current ingested count or completion state) in a tight loop with a short sleep (e.g., 50–200ms), breaking when the expected count/state is reached or when a hard timeout elapses; use a monotonic timer to measure elapsed time and return/log an error on timeout. Identify the waits by the existing sleep(1) occurrences and update the test to call your check function (e.g., check_ingestion_complete() or read ingested_count) inside the polling loop so the test proceeds deterministically as soon as ingestion finishes. - Around line 1127-1129: The current regression assertion uses substring matching via strstr on variables name and display_name to set found, which can produce false positives; change these checks to exact identity comparisons (e.g., strcmp or direct id equality) against the expected full name or the profile's unique identifier instead of substrings, updating the conditional that sets found (the block referencing name, display_name, and found) and do the same for the other identical check later in the file that uses strstr; ensure you compare the complete expected string or the profile id to guarantee only the intended fixture matches. 

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9466a5ed-878e-4b00-aa3e-b299d01adb77

📥 Commits

Reviewing files that changed from the base of the PR and between 4099f85 and 18ed7ba.

📒 Files selected for processing (2)
  • src/nostrdb.c
  • test.c
Comment on lines +5037 to +5039
if (*num_found < 64)
memcpy(found_pubkeys[(*num_found)++],
note->pubkey, 32);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fixed-size dedup cache can reintroduce duplicate profile hits.

found_pubkeys is capped at 64 entries. After Line 5071’s cache fills, later pubkeys are no longer tracked, so duplicates can reappear in both index and fallback phases and consume result slots.

💡 Suggested fix (make dedup capacity dynamic)
-static int ndb_profile_search_contains_fallback( +static int ndb_profile_search_contains_fallback(	struct ndb_txn *txn,	const char *search,	unsigned char *filter_pubkey,	struct ndb_filter *note_filter,	struct ndb_query_state *results, -unsigned char found_pubkeys[][32], -int *num_found) +unsigned char (*found_pubkeys)[32], +int *num_found, +size_t found_cap) @@ -if (*num_found < 64) +if ((size_t)*num_found < found_cap)	memcpy(found_pubkeys[(*num_found)++], note->pubkey, 32);
-// track pubkeys found by index phase for dedup -unsigned char found_pubkeys[64][32]; +// track pubkeys found by index phase for dedup +size_t found_cap; +unsigned char (*found_pubkeys)[32] = NULL;	int num_found = 0; @@ +found_cap = (results->limit != 0 && results->limit < 4096) + ? (size_t)results->limit + : 4096; +found_pubkeys = calloc(found_cap, sizeof(*found_pubkeys)); +if (!found_pubkeys) +goto fail; @@ -if (num_found < 64) +if ((size_t)num_found < found_cap)	memcpy(found_pubkeys[num_found++], profile_search.key->id, 32); @@	if (!ndb_profile_search_contains_fallback(txn, search, filter_pubkey, f, results, found_pubkeys, - &num_found)) + &num_found, + found_cap))	goto fail;	} +free(found_pubkeys);	ndb_filter_destroy(f);	return 1; fail: +free(found_pubkeys);	ndb_filter_destroy(f);	return 0;

Also applies to: 5070-5072, 5139-5141

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nostrdb.c` around lines 5037 - 5039, The fixed-size dedup array (found_pubkeys) and the hard check "*num_found < 64" allow the cache to stop tracking pubkeys and reintroduce duplicates; replace the fixed cap with a dynamic buffer by adding a capacity variable (e.g., found_pubkeys_cap), allocate found_pubkeys initially, and whenever num_found == found_pubkeys_cap grow the buffer (realloc, doubling capacity) before memcpy; update all places that currently use the "*num_found < 64" pattern (the memcpy into found_pubkeys and related checks) so they call the grow logic instead of silently skipping inserts. 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant