How A Workflow Runs
Understand workflow instances, step execution, statuses, retries, timeouts, and livemode isolation.
A workflow is a stored definition until something triggers it. When a trigger fires (event or schedule), the Workflow API creates an instance: a single execution of the workflow with its own input, status, and step history. The instance runs the steps in source order. Each step finishes successfully, fails, or times out. The instance ends in one of six terminal statuses.
Lifecycle
stateDiagram-v2
[*] --> running
running --> completed
running --> failed
running --> canceled
running --> terminated
running --> timed_out
completed --> [*]
failed --> [*]
canceled --> [*]
terminated --> [*]
timed_out --> [*]
Instance statuses
| Status | Meaning |
|---|---|
running | The instance is in progress. A step is executing, or the workflow is paused on a wait_event step. |
completed | All steps finished successfully. The instance output contains every step's output. |
failed | A step returned a non-retryable error and the workflow did not recover. |
canceled | The instance was stopped before completion through an explicit cancellation. |
terminated | The instance was stopped non-gracefully (administrative or platform action). |
timed_out | A step exceeded its time budget and the workflow surfaced the timeout as the terminal state. |
There is nowaitingstatus on instances. A workflow paused on await_eventstep stays inrunning. To find waiting work, query thewait_staterecords (see Observing Workflow Runs).
Step statuses
Each step has its own status, separately tracked from the instance status.
| Status | Meaning |
|---|---|
pending | The step is queued but has not started yet. |
running | The step is actively executing. |
completed | The step finished successfully. |
failed | The step returned a non-retryable error. |
canceled | The step was stopped through an explicit cancellation. |
terminated | The step was stopped non-gracefully. |
timed_out | The step exceeded its time budget. |
Retries and timeouts
The Workflow API automatically retries transient failures with exponential backoff for up to a few minutes per step. The retry policy depends on the step type. Long-running operations (send_money, wait_event) get a longer budget than fast-running operations.
A step is not retried when:
- The upstream PayMongo API returned a 4xx error (validation, not-found, conflict, client error).
- The step body fails validation at definition time.
- The instance is canceled or terminated.
Each step type can also surface its own timeout. wait_event carries an explicit timeout parameter (default 24 hours, capped at 7 days). send_money waits for an asynchronous transfer callback; if no callback arrives within the provider's window, the workflow polls the Transfers API for the final status.
Livemode isolation
Test-mode and live-mode resources are isolated. A test-mode key cannot read, modify, or trigger a live-mode workflow. Listing endpoints (GET /v1/workflows, GET /v1/instances, GET /v1/triggers) only return resources in the same livemode as the request. Cross-livemode reads on a specific resource ID return 404 Not Found rather than 403, so the existence of a resource is not leaked across the boundary.
See Authentication for how livemode is determined from the secret key prefix.
Updated 3 days ago