CUMULUS-4473: Bulk Operations support Granule Inventory Report and s3GranuleIdInputFile#4215
CUMULUS-4473: Bulk Operations support Granule Inventory Report and s3GranuleIdInputFile#4215
Conversation
…o jl/CUMULUS-4473
| }); | ||
| }); | ||
| | ||
| describe('Creates \'Granule Inventory\' reports.', () => { |
There was a problem hiding this comment.
Moved the ‘Granule Inventory reports’ tests after the ORCA report tests, and added a bulk delete test using a granule inventory report.
chris-durbin left a comment
There was a problem hiding this comment.
Are there any existing clients outside of the dashboard that will likely break by the change in using granule ids instead of fully populated granule objects?
Do we need a corresponding change to the dashboard code to merge in at the same time as this PR?
DAACs probably have scripts using api endpoints. |
charleshuang80 left a comment
There was a problem hiding this comment.
I am not done reviewing. I have gone through some of the tests, but going through all of them will take more time. But I wanted at least to share some of the comments I had for the non-test code changes because I know this has taken a while.
| } | ||
| const numOfGranules = (payload.query && payload.query.size) | ||
| || (payload.granules && payload.granules.length); | ||
| const description = `Bulk run ${payload.workflowName} on ${numOfGranules || ''} granules`; |
There was a problem hiding this comment.
At first I thought this was odd or missing a condition because of the '' granules, but looking at what it is replacing it is consistent.
| } | ||
| export interface PostgresGranule extends PostgresGranuleUniqueColumns { | ||
| archived: boolean, | ||
| collection_cumulus_id: number, |
There was a problem hiding this comment.
just making sure I understand - this is no longer a unique column for a granule because of the duplicate granule changes?
There was a problem hiding this comment.
Previously, granule_id + collection_cumulus_id combined to be unique, and is used to retrieve a granule. Now we only need granule_id as a unique key to get granule, and the type needs to match.
packages/db/src/lib/granule.ts Outdated
| @@ -132,71 +113,31 @@ export const getUniqueGranuleByGranuleId = async ( | |||
| knexOrTransaction: Knex | Knex.Transaction, | |||
There was a problem hiding this comment.
Seems like the function description can be updated to reflect the changes in it. Also, if we are no longer that worried about getting a unique granule (I assume because dupe granules makes sure it will be unique), then we might be able to shorten/change the function name? But if changing the name feels out of scope for the ticket, maybe we can write a ticket to do that. (though not sure if it will ever get prioritized)
There was a problem hiding this comment.
I go ahead delete this function —we can call granulePgModel.get() directly to get the granule. 7443b70
| const queryGranules = granules || []; | ||
| async function* getGranulesFromS3InBatches({ | ||
| s3Uri, | ||
| batchSize = 100, |
There was a problem hiding this comment.
How did you arrive at 100? Wondering if we might want to make this configurable, either for testing purposes to figure out an optimal number, or in case it is a good idea for users to be able to change (though that last thought seems maybe a little complicated with concurrency and such).
There was a problem hiding this comment.
batchSize is configurable via the payload. There is no clear optimal value; there is little difference between 100 and 500, as long as sufficient memory is available to process a batch of granules.
| } | ||
| const response = await s3Utils.getObject(awsClients.s3(), parsed); | ||
| | ||
| const rl = readline.createInterface({ |
There was a problem hiding this comment.
This seems like a good approach. The one thought (for now) that I have with it is, do we need to be worried about a timeout issue with this at all if the file is large? I assume this will run pretty fast, but it is running in a lambda and needs to do/wait for other things, so just wondering if that is something we need to test. Or, if it makes sense to have it go through for now, and then have another ticket to work on testing the limits and then making modifications based on that, I could be on board with that.
charleshuang80 left a comment
There was a problem hiding this comment.
Finished looking at the tests. One missing integration test case, but otherwise some minor comments and changes on the tests.
| .set('Authorization', `Bearer ${jwtAuthToken}`) | ||
| .send(body) | ||
| .expect(400, /One of granules or query is required/); | ||
| .expect(400, |
There was a problem hiding this comment.
the test text should be modified to reflect the expectation change
| .set('Authorization', `Bearer ${jwtAuthToken}`) | ||
| .send(body) | ||
| .expect(400, /no values provided for granules/); | ||
| .expect(400, /granules is empty and no alternative input source was provided/); |
There was a problem hiding this comment.
the test text should be modified to reflect the expectation change
| .set('Authorization', `Bearer ${jwtAuthToken}`) | ||
| .send(body) | ||
| .expect(400, /One of granules or query is required/); | ||
| .expect(400, |
There was a problem hiding this comment.
test text should be modified to reflect the expectation change
| .set('Authorization', `Bearer ${jwtAuthToken}`) | ||
| .send(body) | ||
| .expect(400, /no values provided for granules/); | ||
| .expect(400, /granules is empty and no alternative input source was provided/); |
There was a problem hiding this comment.
test text should be modified to reflect the expectation change
| .set('Authorization', `Bearer ${jwtAuthToken}`) | ||
| .send(body) | ||
| .expect(400, /One of granules or query is required/); | ||
| .expect(400, |
There was a problem hiding this comment.
test text should be modified to reflect the expectation change
| .set('Authorization', `Bearer ${jwtAuthToken}`) | ||
| .send(body) | ||
| .expect(400, /no values provided for granules/); | ||
| .expect(400, /granules is empty and no alternative input source was provided/); |
There was a problem hiding this comment.
test text should be modified to reflect the expectation change
| .set('Authorization', `Bearer ${jwtAuthToken}`) | ||
| .send(body) | ||
| .expect(400, /One of granules or query is required/); | ||
| .expect(400, |
There was a problem hiding this comment.
test text should be modified to reflect the expectation change
| }); | ||
| | ||
| expect(asyncOperation.status).toEqual('TASK_FAILED'); | ||
| expect(asyncOperation.status).toEqual('SUCCEEDED'); |
There was a problem hiding this comment.
I'm confused by this. If you didn't substantively change anything in the test, why does it succeed now when it was failing before? Also the comments say Published granule will fail to delete unless override for bulk delete request is specified, and I think that's not happening and why the task was supposed to fail before.
There was a problem hiding this comment.
Instead of failing the bulk delete operation, I updated to save the error for each granule in the result, this change is consistent with other bulk operations.
https://github.com/nasa/cumulus/pull/4215/files#diff-439a2112d88fd4de009b6e81b89124e656ada8001e5e7a0e0cadfdbe901166e2R193-R194
There was a problem hiding this comment.
Ah, I see now. Thanks for explaining.
| } = payload; | ||
| | ||
| // Direct S3 reference | ||
| if (s3GranuleIdInputFile) { |
There was a problem hiding this comment.
So you added an integration test that uses the reconciliation Granule Inventory Report to run bulk delete. I think we are missing an integration test that uses a s3 input file as being used here.
There was a problem hiding this comment.
The bulk operation with granuleInventoryReportName has an extra step to retrieve the S3 file URI. I believe we covered the s3GranuleIdInputFile option in unit tests, and an additional integration test is not necessary.
Summary: Summary of changes
Addresses CUMULUS-4473: Bulk Operations API: Support Granule Inventory Report and User Granule List in S3
Changes
granule objects in the payload.
/executions/search-by-granulesand/executions/workflows-by-granulesendpointsto accept granuleIds instead of granule objects in the payload.
granuleInventoryReportNameands3GranuleIdInputFilein the payload.while others returned per-granule errors)
Related PRs:
nasa/cumulus-dashboard#1273
nasa/cumulus-api#387
nasa/cumulus-api#388
PR Checklist
📝 Note:
For most pull requests, please Squash and merge to maintain a clean and readable commit history.