How A Workflow Runs

Understand workflow instances, step execution, statuses, retries, timeouts, and livemode isolation.


A workflow is a stored definition until something triggers it. When a trigger fires (event or schedule), the Workflow API creates an instance: a single execution of the workflow with its own input, status, and step history. The instance runs the steps in source order. Each step finishes successfully, fails, or times out. The instance ends in one of six terminal statuses.

Lifecycle

stateDiagram-v2
    [*] --> running
    running --> completed
    running --> failed
    running --> canceled
    running --> terminated
    running --> timed_out
    completed --> [*]
    failed --> [*]
    canceled --> [*]
    terminated --> [*]
    timed_out --> [*]

Instance statuses

StatusMeaning
runningThe instance is in progress. A step is executing, or the workflow is paused on a wait_event step.
completedAll steps finished successfully. The instance output contains every step's output.
failedA step returned a non-retryable error and the workflow did not recover.
canceledThe instance was stopped before completion through an explicit cancellation.
terminatedThe instance was stopped non-gracefully (administrative or platform action).
timed_outA step exceeded its time budget and the workflow surfaced the timeout as the terminal state.
📌

There is no waiting status on instances. A workflow paused on a wait_event step stays in running. To find waiting work, query the wait_state records (see Observing Workflow Runs).

Step statuses

Each step has its own status, separately tracked from the instance status.

StatusMeaning
pendingThe step is queued but has not started yet.
runningThe step is actively executing.
completedThe step finished successfully.
failedThe step returned a non-retryable error.
canceledThe step was stopped through an explicit cancellation.
terminatedThe step was stopped non-gracefully.
timed_outThe step exceeded its time budget.

Retries and timeouts

The Workflow API automatically retries transient failures with exponential backoff for up to a few minutes per step. The retry policy depends on the step type. Long-running operations (send_money, wait_event) get a longer budget than fast-running operations.

A step is not retried when:

  • The upstream PayMongo API returned a 4xx error (validation, not-found, conflict, client error).
  • The step body fails validation at definition time.
  • The instance is canceled or terminated.

Each step type can also surface its own timeout. wait_event carries an explicit timeout parameter (default 24 hours, capped at 7 days). send_money waits for an asynchronous transfer callback; if no callback arrives within the provider's window, the workflow polls the Transfers API for the final status.

Livemode isolation

Test-mode and live-mode resources are isolated. A test-mode key cannot read, modify, or trigger a live-mode workflow. Listing endpoints (GET /v1/workflows, GET /v1/instances, GET /v1/triggers) only return resources in the same livemode as the request. Cross-livemode reads on a specific resource ID return 404 Not Found rather than 403, so the existence of a resource is not leaked across the boundary.

See Authentication for how livemode is determined from the secret key prefix.