We have this issue we can't figure out... would really need some ideas on where to start on debugging it. The workflows are nintex, but don't think the issue is nintex related... yet.
The workflow setup:
- list workflows "UpdateReport" & "EmailReport"
- "UpdateReport" auto-starts when item is modified
- if the item field "Email?" is set to yes, "UpdateReport" will trigger "EmailReport"
- The trigger to "EmailReport" is the very last step within "UpdateReport", so basically "UpdateReport" should start "EmailReport" and done
Problem:
"UpdateReport" started "EmailReport" when required. "EmailReport" actually completed successfully, and yet the log on "UpdateReport" said it failed to start "EmailReport". This caused "UpdateReport" for the item to stalled as "In Progress". So any subsequent modification on the same item by user will fail to trigger these 2 workflows.
Tests done & what we know:
- for sure the logic of the 2 workflows are correct
- I made another scheduled workflow which modifies items on that list to trigger the 2 workflows in question (so should simulate user modifying the items?!)
- ZERO failure after tested ~2500 times on a cloned test-site, and ~500 times on the real site
- Yet when users use the site, there's a 10-30% failure rate on "UpdateReport" stalling as described
- Looking at the sharepoint server log briefly, there are workflow thread being aborted at the time when "UpdateReport" stalled.
What other things I can check and look into this?