Skip to content

refactor: GTFS manager shutdown process#768

Open
Jason10293 wants to merge 1 commit intoOneBusAway:mainfrom
Jason10293:protect-db-shutdown
Open

refactor: GTFS manager shutdown process#768
Jason10293 wants to merge 1 commit intoOneBusAway:mainfrom
Jason10293:protect-db-shutdown

Conversation

@Jason10293
Copy link
Contributor

Summary

This branch makes two main changes to address issue #750 :

  • It updates GTFS manager shutdown to be context-aware.
  • updates tests by isolating ones that mutate the test database.

Runtime changes

  • GtfsManager.Shutdown now takes a context.Context, waits for background goroutines to finish, and returns an error if shutdown times out.
  • cmd/api/app.go now passes shutdownCtx into GTFS manager shutdown and logs any shutdown error.
  • Call sites and tests were updated from manager.Shutdown() to manager.Shutdown(context.Background()) or similar.

Test changes

A new isolated test helper was added for REST tests that modify GTFS or application state:

  • createIsolatedTestApiWithClock
  • createIsolatedTestApi

These helpers create a fresh in-memory GTFS manager per test instead of reusing the package-wide shared database.

The test cleanup also now removes SQLite sidecar files:

  • raba-test.db
  • raba-test.db-wal
  • raba-test.db-shm

Why the isolated tests helpers were needed

Before this change, many REST tests used a shared GTFS manager and a shared SQLite test database for the entire package.

That caused state leakage between tests because several of these tests write into the database by creating or modifying:

  • agencies
  • routes
  • stops
  • calendars
  • trips
  • stop times
  • shapes
  • problem reports

Because the database was shared, test results depended on execution order. A test expecting a clean database could fail if an earlier test had inserted rows into the same tables.

Typical failure patterns were:

  • “empty list” problem-report tests seeing reports inserted by earlier tests
  • regression tests colliding on reused IDs like AgencyA, AgencyB, TripB, or service1
  • custom GTFS data created in one test affecting unrelated tests later in the package
  • leftover SQLite -wal and -shm files preserving stale state across interrupted runs
Copy link
Member

@aaronbrethorst aaronbrethorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jason — the intent here is good: making Shutdown context-aware and adding isolated test helpers are both valuable improvements. The test infrastructure work (:memory: DB for mutating tests, WAL/SHM cleanup, migrating tests to isolated helpers) is well thought out and the right approach for eliminating test state leakage. However, the core Shutdown implementation has structural issues that need to be addressed before merging.


Critical Issues (1 found)

  1. [silent-failure-hunter] DB is closed while background goroutines may still be running. When ctx times out, Shutdown returns the timeout error, and the defer closeDBOnce.Do(...) fires — closing the DB. But the goroutines tracked by wg haven't finished yet (that's what the timeout means). Those goroutines (e.g., pollFeedupdateFeedRealtimeGtfsDB.Queries.GetTrip()) may be actively executing database queries. Closing the DB underneath them causes use-after-close errors or panics. [gtfs_manager.go:389-396]

    Fix: Move the DB close inside the wg.Wait() goroutine so it only runs after all goroutines have exited:

    go func() { manager.wg.Wait() manager.closeDBOnce.Do(func() { /* close DB */ }) close(done) }()

    On timeout, Shutdown returns the error but the DB stays open for the still-running goroutines. The wait goroutine handles cleanup asynchronously — acceptable since the process is exiting.

Important Issues (2 found)

  1. [code-reviewer] Each Shutdown call spawns a new go func() { manager.wg.Wait(); close(done) }() goroutine. If called multiple times (which TestManagerShutdownIdempotent exercises), each call leaks a goroutine blocked on wg.Wait(). Create the wait goroutine once (inside shutdownOnce.Do or a separate sync.Once) and store the done channel on the struct so subsequent calls reuse it. [gtfs_manager.go:398-402]

  2. [code-reviewer] Dead done channel in TestManagerShutdown and TestManagerShutdownWithRealtime. The done channel is created but never closed or written to — the case <-done: branch in the select can never fire. The test passes because errCh receives a value first. Remove the unused done channel and its select branch. [shutdown_test.go:36-50, 82-97]

Suggestions (1 found)

  1. [silent-failure-hunter] The DB close error is logged inside closeDBOnce.Do but never returned to the caller — Shutdown returns nil on the happy path even if GtfsDB.Close() failed. Consider capturing the error in a variable outside the closure so it can be returned. [gtfs_manager.go:389-396]

Strengths

  • The isolated test helper pattern (createIsolatedTestApiWithClock with :memory: DB) is exactly right for tests that mutate state — eliminates cross-test pollution without slowing down read-only tests
  • removeTestDBArtifacts cleaning up WAL/SHM sidecar files prevents stale state across interrupted runs
  • Correct identification of which tests need isolation (problem reports, service date regression, shape handler, etc.)
  • All existing tests pass, lint clean
  • Good use of sync.Once for idempotent shutdown — just needs to be applied more consistently

Recommended Action

  1. Move DB close into the wg.Wait() goroutine (not the defer)
  2. Create the wait goroutine only once, store done channel on the struct
  3. Remove dead done channel from shutdown tests
  4. Re-run make test and make lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants