fix(proxy): prevent file descriptor leakage#384
Conversation
| No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughAdded client-scoped DB helpers for connection reuse in integration tests and updated tests to use TLS-backed Changes
Sequence Diagram(s)sequenceDiagram participant Client participant Handler participant Frontend as Frontend Channel participant Backend as Backend Channel participant ChannelWriter participant Writer as IO Writer rect rgba(100, 150, 200, 0.5) Note over Handler,ChannelWriter: Previous (fire-and-forget) Handler->>+ChannelWriter: spawn receive() Handler->>Frontend: run client_to_server Handler->>Backend: run server_to_client Handler->>Handler: try_join! completes Handler-->>Client: return Note over ChannelWriter: May still be running\nNo guaranteed orderly shutdown end rect rgba(150, 200, 100, 0.5) Note over Handler,ChannelWriter: New (explicit await & cleanup) Handler->>+ChannelWriter: channel_writer_task = spawn receive() ChannelWriter->>ChannelWriter: drop sender handle Handler->>Frontend: run client_to_server Handler->>Backend: run server_to_client Handler->>Handler: try_join! completes Handler->>Frontend: drop (signal closure) Handler->>Backend: drop (signal closure) Frontend-->>ChannelWriter: channel closes ChannelWriter->>Writer: flush() ChannelWriter->>Writer: shutdown() ChannelWriter-->>Handler: task completes Handler->>Client: return with result end Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
packages/cipherstash-proxy/src/postgresql/handler.rs (1)
272-278: Consider semantics: task panic is logged but doesn't affect return value.If
channel_writer_taskpanics, the error is logged buthandler()will still returnOk(())if the protocol handling succeeded. This is likely intentional (protocol result takes precedence over cleanup failures), but worth confirming this is the desired behavior.If a panic should propagate as an error, you could change this to:
📝 Optional: Propagate panic as error
- if let Err(err) = channel_writer_task.await { - error!( - client_id, - msg = "Channel writer task panicked", - error = ?err - ); - } + channel_writer_task.await.map_err(|err| { + error!( + client_id, + msg = "Channel writer task panicked", + error = ?err + ); + Error::from(ProtocolError::InternalError) + })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy/src/postgresql/handler.rs` around lines 272 - 278, The current code logs a panic from channel_writer_task but still lets handler() return Ok(()), which may hide cleanup failures; decide whether panics in channel_writer_task should fail the handler: if you want to propagate the panic, change the channel_writer_task.await handling in handler() to convert the JoinError into a handler::Error and return Err(...) (e.g. map the Err(join_error) to an appropriate Error variant and return early), otherwise make the intent explicit by adding a comment near channel_writer_task and the error! call indicating that cleanup panics are intentionally non-fatal.packages/cipherstash-proxy-integration/src/common.rs (1)
3-37: Document the Docker prerequisite alongside the connection-reuse guidance.Nice addition. Consider adding one short note that these integration helpers assume Docker-backed test services, so local host services aren’t accidentally used in ad-hoc runs.
As per coding guidelines:
Integration tests should use Docker containers for reproducible environments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/common.rs` around lines 3 - 37, Add a short note to the module doc comment alongside the connection-reuse guidance stating that the integration helpers assume Docker-backed test services (so local host services shouldn’t be used for ad-hoc runs); place this sentence near the top where the pattern examples live (near the connect_with_tls, clear_with_client and *_with_client discussion) and reference the project guideline “Integration tests should use Docker containers for reproducible environments.” to make the prerequisite explicit.packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs (1)
34-125: Optional: extract repeated test setup into one helper.Each test repeats
connect/clear/insert; pulling that into a tiny setup helper would reduce duplication.Refactor sketch
+ async fn setup_jsonb(client: &Client) { + clear_with_client(client).await; + insert_jsonb_with_client(client).await; + } + #[tokio::test] async fn select_jsonb_path_query_number() { trace(); let client = connect_with_tls(PROXY).await; - - clear_with_client(&client).await; - insert_jsonb_with_client(&client).await; + setup_jsonb(&client).await;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs` around lines 34 - 125, Tests duplicate the same three setup steps (connect_with_tls(PROXY), clear_with_client, insert_jsonb_with_client) across select_jsonb_* tests; extract them into a single async helper (e.g., setup_jsonb_test or init_client_with_jsonb) that performs connect_with_tls(PROXY).await, clear_with_client(&client).await, insert_jsonb_with_client(&client).await and returns the client, then update each test (select_jsonb_path_query_number, select_jsonb_path_query_string, select_jsonb_path_query_value, select_jsonb_path_query_with_unknown, select_jsonb_path_query_with_alias) to call that helper at the start and use the returned client variable instead of repeating the three calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed. Nitpick comments: In `@packages/cipherstash-proxy-integration/src/common.rs`: - Around line 3-37: Add a short note to the module doc comment alongside the connection-reuse guidance stating that the integration helpers assume Docker-backed test services (so local host services shouldn’t be used for ad-hoc runs); place this sentence near the top where the pattern examples live (near the connect_with_tls, clear_with_client and *_with_client discussion) and reference the project guideline “Integration tests should use Docker containers for reproducible environments.” to make the prerequisite explicit. In `@packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs`: - Around line 34-125: Tests duplicate the same three setup steps (connect_with_tls(PROXY), clear_with_client, insert_jsonb_with_client) across select_jsonb_* tests; extract them into a single async helper (e.g., setup_jsonb_test or init_client_with_jsonb) that performs connect_with_tls(PROXY).await, clear_with_client(&client).await, insert_jsonb_with_client(&client).await and returns the client, then update each test (select_jsonb_path_query_number, select_jsonb_path_query_string, select_jsonb_path_query_value, select_jsonb_path_query_with_unknown, select_jsonb_path_query_with_alias) to call that helper at the start and use the returned client variable instead of repeating the three calls. In `@packages/cipherstash-proxy/src/postgresql/handler.rs`: - Around line 272-278: The current code logs a panic from channel_writer_task but still lets handler() return Ok(()), which may hide cleanup failures; decide whether panics in channel_writer_task should fail the handler: if you want to propagate the panic, change the channel_writer_task.await handling in handler() to convert the JoinError into a handler::Error and return Err(...) (e.g. map the Err(join_error) to an appropriate Error variant and return early), otherwise make the intent explicit by adding a comment near channel_writer_task and the error! call indicating that cleanup panics are intentionally non-fatal. ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: becdf982-ad36-436a-9796-08d4f3cdf867
📒 Files selected for processing (5)
packages/cipherstash-proxy-integration/src/common.rspackages/cipherstash-proxy-integration/src/select/jsonb_array_elements.rspackages/cipherstash-proxy-integration/src/select/jsonb_path_query.rspackages/cipherstash-proxy/src/connect/channel_writer.rspackages/cipherstash-proxy/src/postgresql/handler.rs
The Proxy was not closing connections correctly which lead to a file descriptor leak which manifested itself as an inability to connect the Postgres database during a test run. This issue would have been present in production but went unnoticed because: 1. In customer infrastructure connections to the Proxy would likely be pooled. 2. Compared to customers using the JS SDK there are not many using Proxy in production. Root Cause ========== The ChannelWriter struct held a sender field for its own channel, creating a circular dependency: The receiver.recv() loop waits for ALL senders to be dropped before returning None. But one of those senders was owned by the ChannelWriter itself: This meant the channel could never close because it was waiting for itself to drop! The Solution ============ Two changes were needed: channel_writer.rs:49 - Drop the ChannelWriter's own sender at the start of the receive() method: ```rust drop(self.sender); // Break the circular dependency ``` handler.rs:157-158 - Explicitly drop frontend/backend after try_join (good practice, though they'll drop when handler returns anyway): ```rust drop(frontend); drop(backend); ``` Additionally, the integration tests were creating a new database connection for every operation (clear, insert, query), causing file descriptor exhaustion when running test suites. Each test created 4+ connections that would accumulate faster than the proxy's 60-second timeout could clean them up. Changes: 1. Added `*_with_client()` variants of all helper functions in common.rs: - clear_with_client() - insert_with_client(), insert_jsonb_with_client() - query_by_with_client(), query_by_params_with_client() - Existing simple_query_with_client() now used consistently 2. Updated test files to reuse single connection per test: - select/jsonb_array_elements.rs - reduced from 12 to 3 connections - select/jsonb_path_query.rs - reduced from 20 to 5 connections 3. Documented connection reuse pattern in common.rs with examples
bc2ac19 to fdc3bd6 Compare
The Proxy was not closing connections correctly which lead to a file descriptor leak which manifested itself as an inability to connect the Postgres database during a test run.
This issue would have been present in production but went unnoticed because:
Root Cause
The ChannelWriter struct held a sender field for its own channel, creating a circular dependency:
The receiver.recv() loop waits for ALL senders to be dropped before returning None.
But one of those senders was owned by the ChannelWriter itself: This meant the channel could never close because it was waiting for itself to drop!
The Solution
Two changes were needed:
channel_writer.rs:49 - Drop the ChannelWriter's own sender at the start of the receive() method:
handler.rs:157-158 - Explicitly drop frontend/backend after try_join (good practice, though they'll drop when handler returns anyway):
Additionally, the integration tests were creating a new database connection for every operation (clear, insert, query), causing file descriptor exhaustion when running test suites. Each test created 4+ connections that would accumulate faster than the proxy's 60-second timeout could clean them up.
Changes:
Added
*_with_client()variants of all helper functions in common.rs:Updated test files to reuse single connection per test:
Documented connection reuse pattern in common.rs with examples
Acknowledgment
By submitting this pull request, I confirm that CipherStash can use, modify, copy, and redistribute this contribution, under the terms of CipherStash's choice.
Summary by CodeRabbit
Bug Fixes
Tests
Chores