Skip to content

CUMULUS-4473: Bulk Operations support Granule Inventory Report and s3GranuleIdInputFile#4215

Merged
jennyhliu merged 26 commits intomasterfrom
jl/CUMULUS-4473
Jan 27, 2026
Merged

CUMULUS-4473: Bulk Operations support Granule Inventory Report and s3GranuleIdInputFile#4215
jennyhliu merged 26 commits intomasterfrom
jl/CUMULUS-4473

Conversation

@jennyhliu
Copy link
Contributor

@jennyhliu jennyhliu commented Jan 16, 2026

Summary: Summary of changes

Addresses CUMULUS-4473: Bulk Operations API: Support Granule Inventory Report and User Granule List in S3

Changes

  • Updated Granules Bulk Operations API endpoints to accept a list of granuleIds instead of
    granule objects in the payload.
  • Updated /executions/search-by-granules and /executions/workflows-by-granules endpoints
    to accept granuleIds instead of granule objects in the payload.
  • Updated Granules Bulk Operations API endpoints to:
    • Support granuleInventoryReportName and s3GranuleIdInputFile in the payload.
    • Return consistent output formats across endpoints (previously, some endpoints aggregated errors
      while others returned per-granule errors)

Related PRs:

nasa/cumulus-dashboard#1273
nasa/cumulus-api#387
nasa/cumulus-api#388

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests

📝 Note:
For most pull requests, please Squash and merge to maintain a clean and readable commit history.

@jennyhliu jennyhliu changed the title CUMULUS-4473: Bulk Operations API: Support Granule Inventory Report and s3GranuleIdInputFile CUMULUS-4473: Bulk Operations support Granule Inventory Report and s3GranuleIdInputFile Jan 16, 2026
@jennyhliu jennyhliu marked this pull request as ready for review January 20, 2026 00:14
});
});

describe('Creates \'Granule Inventory\' reports.', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the ‘Granule Inventory reports’ tests after the ORCA report tests, and added a bulk delete test using a granule inventory report.

Copy link

@chris-durbin chris-durbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any existing clients outside of the dashboard that will likely break by the change in using granule ids instead of fully populated granule objects?

Do we need a corresponding change to the dashboard code to merge in at the same time as this PR?

@jennyhliu
Copy link
Contributor Author

Are there any existing clients outside of the dashboard that will likely break by the change in using granule ids instead of fully populated granule objects?

Do we need a corresponding change to the dashboard code to merge in at the same time as this PR?

DAACs probably have scripts using api endpoints.
Dashboard PR nasa/cumulus-dashboard#1273

@jennyhliu jennyhliu requested a review from Jkovarik January 22, 2026 16:33
Copy link
Contributor

@charleshuang80 charleshuang80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not done reviewing. I have gone through some of the tests, but going through all of them will take more time. But I wanted at least to share some of the comments I had for the non-test code changes because I know this has taken a while.

}
const numOfGranules = (payload.query && payload.query.size)
|| (payload.granules && payload.granules.length);
const description = `Bulk run ${payload.workflowName} on ${numOfGranules || ''} granules`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought this was odd or missing a condition because of the '' granules, but looking at what it is replacing it is consistent.

}
export interface PostgresGranule extends PostgresGranuleUniqueColumns {
archived: boolean,
collection_cumulus_id: number,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just making sure I understand - this is no longer a unique column for a granule because of the duplicate granule changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, granule_id + collection_cumulus_id combined to be unique, and is used to retrieve a granule. Now we only need granule_id as a unique key to get granule, and the type needs to match.

@@ -132,71 +113,31 @@ export const getUniqueGranuleByGranuleId = async (
knexOrTransaction: Knex | Knex.Transaction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the function description can be updated to reflect the changes in it. Also, if we are no longer that worried about getting a unique granule (I assume because dupe granules makes sure it will be unique), then we might be able to shorten/change the function name? But if changing the name feels out of scope for the ticket, maybe we can write a ticket to do that. (though not sure if it will ever get prioritized)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I go ahead delete this function —we can call granulePgModel.get() directly to get the granule. 7443b70

const queryGranules = granules || [];
async function* getGranulesFromS3InBatches({
s3Uri,
batchSize = 100,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you arrive at 100? Wondering if we might want to make this configurable, either for testing purposes to figure out an optimal number, or in case it is a good idea for users to be able to change (though that last thought seems maybe a little complicated with concurrency and such).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batchSize is configurable via the payload. There is no clear optimal value; there is little difference between 100 and 500, as long as sufficient memory is available to process a batch of granules.

}
const response = await s3Utils.getObject(awsClients.s3(), parsed);

const rl = readline.createInterface({
Copy link
Contributor

@charleshuang80 charleshuang80 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good approach. The one thought (for now) that I have with it is, do we need to be worried about a timeout issue with this at all if the file is large? I assume this will run pretty fast, but it is running in a lambda and needs to do/wait for other things, so just wondering if that is something we need to test. Or, if it makes sense to have it go through for now, and then have another ticket to work on testing the limits and then making modifications based on that, I could be on board with that.

Copy link
Contributor

@charleshuang80 charleshuang80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished looking at the tests. One missing integration test case, but otherwise some minor comments and changes on the tests.

.set('Authorization', `Bearer ${jwtAuthToken}`)
.send(body)
.expect(400, /One of granules or query is required/);
.expect(400,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test text should be modified to reflect the expectation change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all related test text 7696690

.set('Authorization', `Bearer ${jwtAuthToken}`)
.send(body)
.expect(400, /no values provided for granules/);
.expect(400, /granules is empty and no alternative input source was provided/);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test text should be modified to reflect the expectation change

.set('Authorization', `Bearer ${jwtAuthToken}`)
.send(body)
.expect(400, /One of granules or query is required/);
.expect(400,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test text should be modified to reflect the expectation change

.set('Authorization', `Bearer ${jwtAuthToken}`)
.send(body)
.expect(400, /no values provided for granules/);
.expect(400, /granules is empty and no alternative input source was provided/);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test text should be modified to reflect the expectation change

.set('Authorization', `Bearer ${jwtAuthToken}`)
.send(body)
.expect(400, /One of granules or query is required/);
.expect(400,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test text should be modified to reflect the expectation change

.set('Authorization', `Bearer ${jwtAuthToken}`)
.send(body)
.expect(400, /no values provided for granules/);
.expect(400, /granules is empty and no alternative input source was provided/);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test text should be modified to reflect the expectation change

.set('Authorization', `Bearer ${jwtAuthToken}`)
.send(body)
.expect(400, /One of granules or query is required/);
.expect(400,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test text should be modified to reflect the expectation change

});

expect(asyncOperation.status).toEqual('TASK_FAILED');
expect(asyncOperation.status).toEqual('SUCCEEDED');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. If you didn't substantively change anything in the test, why does it succeed now when it was failing before? Also the comments say Published granule will fail to delete unless override for bulk delete request is specified, and I think that's not happening and why the task was supposed to fail before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of failing the bulk delete operation, I updated to save the error for each granule in the result, this change is consistent with other bulk operations.
https://github.com/nasa/cumulus/pull/4215/files#diff-439a2112d88fd4de009b6e81b89124e656ada8001e5e7a0e0cadfdbe901166e2R193-R194

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now. Thanks for explaining.

} = payload;

// Direct S3 reference
if (s3GranuleIdInputFile) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you added an integration test that uses the reconciliation Granule Inventory Report to run bulk delete. I think we are missing an integration test that uses a s3 input file as being used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bulk operation with granuleInventoryReportName has an extra step to retrieve the S3 file URI. I believe we covered the s3GranuleIdInputFile option in unit tests, and an additional integration test is not necessary.

@jennyhliu jennyhliu merged commit 88c3bf4 into master Jan 27, 2026
9 checks passed
@jennyhliu jennyhliu deleted the jl/CUMULUS-4473 branch January 27, 2026 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants