- Notifications
You must be signed in to change notification settings - Fork 7
Description
While working on refactoring the HERMES workflow to use the new data model, I've thought about the workflow, especially the Postprocess and Deposit steps.
I have four issues with them:
- Before the first deposit the project will most likely not have a DOI yet. Therefore one is generated by e.g. Zenodo in the Deposit step. But it currently can't be included in any files immediately. (see also Write prereserved DOIs into files before depositing them to Zenodo #131) Additionally e.g. a link to the PyPI page (when publishing to PyPI) can't be included immediately either and possibly even more useful information.
- The Deposit plugins only have to return JSON data for their own Postprocess plugins. Therefore every combination of Deposit plugin and metadata source (that should be updated) in the project needs its own Postprocess plugin.
- In the Postprocess step only the new information from the Deposit step can be added to the metadata sources. But we have a by the user curated dataset that we could use. (see also Unite Harvest- and Post-Process-Plugins #443)
- Maybe not as relevant but it could be nice to publish to multiple platforms.
In #443 I suggested uniting the Harvest and Postprocess Plugins (i.e. each Harvest plugin can implement support for writing metadata back into the source harvested). This would solve 3 but 1 and 2 are still problems.
I've given this some thought and came to the conclusion that like in #131 mentioned 1 can't be fixed with our currrent HERMES workflow.
My suggestion is to change the workflow to follow the illustration in this grafik:

(source: suggested_HERMES_workflow.drawio)
That means another step "Predeposit" would be added that allows all Deposit plugins to create their initial deposits/ fetch data to include in the fiels before actually publishing/ uploading any. (Important that should be done sequential and in a order defined by the user. Because the first one will return a DOI and the others should use this one and not get others assigned to the same project.)
In this step each Predeposit plugin would add all valuable information into the curated metadataset (from the curate step). (I considered this little enough to be done right by the plugins, i.e. as not needing another curation.)
The resulting updated metadataset can the be used in the Postprocess step by the "inverse Harvest plugins" to update the metadata sources in the project.
After that only actually publishing the files/ metadata/ etc. is left which would done not much different than now but should be supported for multiple targets (but doesn't have to be in a strict sequential order).
It would be great to have some feedback on how reasonable/ good this solution actually is.