CUMULUS-4473: Bulk Operations support Granule Inventory Report and s3GranuleIdInputFile by jennyhliu · Pull Request #4215 · nasa/cumulus

jennyhliu · 2026-01-16T20:49:08Z

Summary: Summary of changes

Addresses CUMULUS-4473: Bulk Operations API: Support Granule Inventory Report and User Granule List in S3

Changes

Updated Granules Bulk Operations API endpoints to accept a list of granuleIds instead of
granule objects in the payload.
Updated /executions/search-by-granules and /executions/workflows-by-granules endpoints
to accept granuleIds instead of granule objects in the payload.
Updated Granules Bulk Operations API endpoints to:
- Support granuleInventoryReportName and s3GranuleIdInputFile in the payload.
- Return consistent output formats across endpoints (previously, some endpoints aggregated errors
  while others returned per-granule errors)

Related PRs:

nasa/cumulus-dashboard#1273
nasa/cumulus-api#387
nasa/cumulus-api#388

PR Checklist

Update CHANGELOG
Unit tests
Ad-hoc testing - Deploy changes and test manually
Integration tests

📝 Note:
For most pull requests, please Squash and merge to maintain a clean and readable commit history.

…ctionId

…o jl/CUMULUS-4473

jennyhliu · 2026-01-20T00:26:42Z

example/spec/parallel/createReconciliationReport/CreateReconciliationReportSpec.js

 });
 });

- describe('Creates \'Granule Inventory\' reports.', () => {


Moved the ‘Granule Inventory reports’ tests after the ORCA report tests, and added a bulk delete test using a granule inventory report.

…o jl/CUMULUS-4473

chris-durbin

Are there any existing clients outside of the dashboard that will likely break by the change in using granule ids instead of fully populated granule objects?

Do we need a corresponding change to the dashboard code to merge in at the same time as this PR?

jennyhliu · 2026-01-22T14:31:55Z

Are there any existing clients outside of the dashboard that will likely break by the change in using granule ids instead of fully populated granule objects?

Do we need a corresponding change to the dashboard code to merge in at the same time as this PR?

DAACs probably have scripts using api endpoints.
Dashboard PR nasa/cumulus-dashboard#1273

charleshuang80

I am not done reviewing. I have gone through some of the tests, but going through all of them will take more time. But I wanted at least to share some of the comments I had for the non-test code changes because I know this has taken a while.

packages/api/lambdas/bulk-operation.js

charleshuang80 · 2026-01-21T23:51:47Z

packages/api/endpoints/granules.js

- }
+ const numOfGranules = (payload.query && payload.query.size)
+ || (payload.granules && payload.granules.length);
+ const description = `Bulk run ${payload.workflowName} on ${numOfGranules || ''} granules`;


At first I thought this was odd or missing a condition because of the '' granules, but looking at what it is replacing it is consistent.

charleshuang80 · 2026-01-23T20:20:43Z

packages/db/src/types/granule.ts

 }
 export interface PostgresGranule extends PostgresGranuleUniqueColumns {
 archived: boolean,
+ collection_cumulus_id: number,


just making sure I understand - this is no longer a unique column for a granule because of the duplicate granule changes?

Previously, granule_id + collection_cumulus_id combined to be unique, and is used to retrieve a granule. Now we only need granule_id as a unique key to get granule, and the type needs to match.

charleshuang80 · 2026-01-23T20:23:53Z

packages/db/src/lib/granule.ts

@@ -132,71 +113,31 @@ export const getUniqueGranuleByGranuleId = async (
 knexOrTransaction: Knex | Knex.Transaction,


Seems like the function description can be updated to reflect the changes in it. Also, if we are no longer that worried about getting a unique granule (I assume because dupe granules makes sure it will be unique), then we might be able to shorten/change the function name? But if changing the name feels out of scope for the ticket, maybe we can write a ticket to do that. (though not sure if it will ever get prioritized)

I go ahead delete this function —we can call granulePgModel.get() directly to get the granule. 7443b70

charleshuang80 · 2026-01-23T20:34:21Z

packages/api/lib/granules.js

- const queryGranules = granules || [];
+async function* getGranulesFromS3InBatches({
+ s3Uri,
+ batchSize = 100,


How did you arrive at 100? Wondering if we might want to make this configurable, either for testing purposes to figure out an optimal number, or in case it is a good idea for users to be able to change (though that last thought seems maybe a little complicated with concurrency and such).

batchSize is configurable via the payload. There is no clear optimal value; there is little difference between 100 and 500, as long as sufficient memory is available to process a batch of granules.

charleshuang80 · 2026-01-23T20:37:21Z

packages/api/lib/granules.js

+ }
+ const response = await s3Utils.getObject(awsClients.s3(), parsed);
+
+ const rl = readline.createInterface({


This seems like a good approach. The one thought (for now) that I have with it is, do we need to be worried about a timeout issue with this at all if the file is large? I assume this will run pretty fast, but it is running in a lambda and needs to do/wait for other things, so just wondering if that is something we need to test. Or, if it makes sense to have it go through for now, and then have another ticket to work on testing the limits and then making modifications based on that, I could be on board with that.

charleshuang80

Finished looking at the tests. One missing integration test case, but otherwise some minor comments and changes on the tests.

charleshuang80 · 2026-01-25T18:34:47Z

packages/api/tests/endpoints/granules/test-bulk-delete.js

 .set('Authorization', `Bearer ${jwtAuthToken}`)
 .send(body)
- .expect(400, /One of granules or query is required/);
+ .expect(400,


the test text should be modified to reflect the expectation change

Updated all related test text 7696690

charleshuang80 · 2026-01-25T18:35:00Z

packages/api/tests/endpoints/granules/test-bulk-delete.js

 .set('Authorization', `Bearer ${jwtAuthToken}`)
 .send(body)
- .expect(400, /no values provided for granules/);
+ .expect(400, /granules is empty and no alternative input source was provided/);


the test text should be modified to reflect the expectation change

charleshuang80 · 2026-01-25T19:54:59Z

packages/api/tests/endpoints/granules/test-bulk-granules.js

 .set('Authorization', `Bearer ${jwtAuthToken}`)
 .send(body)
- .expect(400, /One of granules or query is required/);
+ .expect(400,