is it worth discussing rolling back orchestrations? #1828
Replies: 2 comments 8 replies
-
It's an interesting idea, and certainly possible given the event-sourcing nature of Durable Functions. In fact, we have a narrower implementation of this pattern for rolling back orchestrations that have failed due to some external issue. More info on that here: Rewind instances. As you say, we could theoretically generalize this to support non-failed orchestrations as well. The challenge, however, becomes figuring out what the API is for deciding how far back to rewind. Would an API that gives you a list of timestamps to rewind back to be easy enough to use (I'm guessing not)? What about an API that gives you a list of tasks and allowing users to choose where in that list to rewind to (I have worries about this as well)? Another option would be to add a "checkpoint" feature into the Durable Functions programming model, and allowing users to rewind back to specific checkpoints. That feels like it might provide the best usability in terms of how easy it is to understand and provide a good UX. However, the part that gets really tricky is what to do with durable timers. These timers are scheduled for specific points in time, so it's not really something you can "rewind" (if the timer already expired before the rewind, it will be forced to expire immediately after the rewind). It's part of the reason why we haven't spent more time on the rewind scenario in general. I'm open to hearing if folks have thoughts or opinions on how to deal with this. |
Beta Was this translation helpful? Give feedback.
-
There are some versions of this story that I think could make sense. For example, the saga pattern is quite popular for workflows. The idea is that one first defines some 'compensation' for every component of a workflow. For example, for an activity that makes a flight reservation one can define a compensation that cancels the reservation. The framework can then take care of calling compensations in reverse order when unrolling an orchestration, and this can be taken care of in a recursive way. As usual the devil is in the detail though. For example, it is not clear what should be done if a compensation fails and the workflow gets stuck in a half-rolled-back state. Still, I think it may be worth considering some solution for this. Entities are also interesting in this context because it is always possible to roll back the entity state when needed. Perhaps experimenting with something like this, but implemented as a library, would be a good starting point, so we can iterate on the design a bit. |
Beta Was this translation helpful? Give feedback.
-
assuming a suitably idempotent orchestration and associated activities
what's to stop somebody from reaching a point in the orchestration and deciding they want to roll back the orchestration to some point, and begin as new?
perhaps the orchestration gets out of synch with some external feature (like the source branch for instance), and you want to have a circuit breaker that treats some subset of the durable functions replay mechanism history as discarded
not trying to rewrite math on a distributed transaction manager here, but is this worth discussing?
Beta Was this translation helpful? Give feedback.
All reactions