Skip to content

(How) Should the HERMES workflow be adapted to enable better Deposition and Postprocessing? #474

@notactuallyfinn

Description

@notactuallyfinn

While working on refactoring the HERMES workflow to use the new data model, I've thought about the workflow, especially the Postprocess and Deposit steps.
I have four issues with them:

  1. Before the first deposit the project will most likely not have a DOI yet. Therefore one is generated by e.g. Zenodo in the Deposit step. But it currently can't be included in any files immediately. (see also Write prereserved DOIs into files before depositing them to Zenodo #131) Additionally e.g. a link to the PyPI page (when publishing to PyPI) can't be included immediately either and possibly even more useful information.
  2. The Deposit plugins only have to return JSON data for their own Postprocess plugins. Therefore every combination of Deposit plugin and metadata source (that should be updated) in the project needs its own Postprocess plugin.
  3. In the Postprocess step only the new information from the Deposit step can be added to the metadata sources. But we have a by the user curated dataset that we could use. (see also Unite Harvest- and Post-Process-Plugins #443)
  4. Maybe not as relevant but it could be nice to publish to multiple platforms.

In #443 I suggested uniting the Harvest and Postprocess Plugins (i.e. each Harvest plugin can implement support for writing metadata back into the source harvested). This would solve 3 but 1 and 2 are still problems.
I've given this some thought and came to the conclusion that like in #131 mentioned 1 can't be fixed with our currrent HERMES workflow.
My suggestion is to change the workflow to follow the illustration in this grafik:
Image
(source: suggested_HERMES_workflow.drawio)

That means another step "Predeposit" would be added that allows all Deposit plugins to create their initial deposits/ fetch data to include in the fiels before actually publishing/ uploading any. (Important that should be done sequential and in a order defined by the user. Because the first one will return a DOI and the others should use this one and not get others assigned to the same project.)
In this step each Predeposit plugin would add all valuable information into the curated metadataset (from the curate step). (I considered this little enough to be done right by the plugins, i.e. as not needing another curation.)
The resulting updated metadataset can the be used in the Postprocess step by the "inverse Harvest plugins" to update the metadata sources in the project.
After that only actually publishing the files/ metadata/ etc. is left which would done not much different than now but should be supported for multiple targets (but doesn't have to be in a strict sequential order).

It would be great to have some feedback on how reasonable/ good this solution actually is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureDescribes some architectural decisions that need to be madeenhancementNew feature or requestmeeting-discussionIssues that should be discussed at the next project meetingquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions