refactor: GTFS manager shutdown process#768
refactor: GTFS manager shutdown process#768Jason10293 wants to merge 1 commit intoOneBusAway:mainfrom
Conversation
aaronbrethorst left a comment
There was a problem hiding this comment.
Jason — the intent here is good: making Shutdown context-aware and adding isolated test helpers are both valuable improvements. The test infrastructure work (:memory: DB for mutating tests, WAL/SHM cleanup, migrating tests to isolated helpers) is well thought out and the right approach for eliminating test state leakage. However, the core Shutdown implementation has structural issues that need to be addressed before merging.
Critical Issues (1 found)
-
[silent-failure-hunter] DB is closed while background goroutines may still be running. When
ctxtimes out,Shutdownreturns the timeout error, and thedefer closeDBOnce.Do(...)fires — closing the DB. But the goroutines tracked bywghaven't finished yet (that's what the timeout means). Those goroutines (e.g.,pollFeed→updateFeedRealtime→GtfsDB.Queries.GetTrip()) may be actively executing database queries. Closing the DB underneath them causes use-after-close errors or panics. [gtfs_manager.go:389-396]Fix: Move the DB close inside the
wg.Wait()goroutine so it only runs after all goroutines have exited:go func() { manager.wg.Wait() manager.closeDBOnce.Do(func() { /* close DB */ }) close(done) }()
On timeout, Shutdown returns the error but the DB stays open for the still-running goroutines. The wait goroutine handles cleanup asynchronously — acceptable since the process is exiting.
Important Issues (2 found)
-
[code-reviewer] Each
Shutdowncall spawns a newgo func() { manager.wg.Wait(); close(done) }()goroutine. If called multiple times (whichTestManagerShutdownIdempotentexercises), each call leaks a goroutine blocked onwg.Wait(). Create the wait goroutine once (insideshutdownOnce.Door a separatesync.Once) and store thedonechannel on the struct so subsequent calls reuse it. [gtfs_manager.go:398-402] -
[code-reviewer] Dead
donechannel inTestManagerShutdownandTestManagerShutdownWithRealtime. Thedonechannel is created but never closed or written to — thecase <-done:branch in theselectcan never fire. The test passes becauseerrChreceives a value first. Remove the unuseddonechannel and itsselectbranch. [shutdown_test.go:36-50, 82-97]
Suggestions (1 found)
- [silent-failure-hunter] The DB close error is logged inside
closeDBOnce.Dobut never returned to the caller —Shutdownreturnsnilon the happy path even ifGtfsDB.Close()failed. Consider capturing the error in a variable outside the closure so it can be returned. [gtfs_manager.go:389-396]
Strengths
- The isolated test helper pattern (
createIsolatedTestApiWithClockwith:memory:DB) is exactly right for tests that mutate state — eliminates cross-test pollution without slowing down read-only tests removeTestDBArtifactscleaning up WAL/SHM sidecar files prevents stale state across interrupted runs- Correct identification of which tests need isolation (problem reports, service date regression, shape handler, etc.)
- All existing tests pass, lint clean
- Good use of
sync.Oncefor idempotent shutdown — just needs to be applied more consistently
Recommended Action
- Move DB close into the
wg.Wait()goroutine (not thedefer) - Create the wait goroutine only once, store
donechannel on the struct - Remove dead
donechannel from shutdown tests - Re-run
make testandmake lint
Summary
This branch makes two main changes to address issue #750 :
Runtime changes
GtfsManager.Shutdownnow takes acontext.Context, waits for background goroutines to finish, and returns an error if shutdown times out.cmd/api/app.gonow passesshutdownCtxinto GTFS manager shutdown and logs any shutdown error.manager.Shutdown()tomanager.Shutdown(context.Background())or similar.Test changes
A new isolated test helper was added for REST tests that modify GTFS or application state:
createIsolatedTestApiWithClockcreateIsolatedTestApiThese helpers create a fresh in-memory GTFS manager per test instead of reusing the package-wide shared database.
The test cleanup also now removes SQLite sidecar files:
raba-test.dbraba-test.db-walraba-test.db-shmWhy the isolated tests helpers were needed
Before this change, many REST tests used a shared GTFS manager and a shared SQLite test database for the entire package.
That caused state leakage between tests because several of these tests write into the database by creating or modifying:
Because the database was shared, test results depended on execution order. A test expecting a clean database could fail if an earlier test had inserted rows into the same tables.
Typical failure patterns were:
AgencyA,AgencyB,TripB, orservice1-waland-shmfiles preserving stale state across interrupted runs