Configuring job-level timeouts
By default, Dagster jobs have no automatic timeout. A stuck or runaway run can execute indefinitely until it's terminated manually. Job-level timeouts let you set a maximum runtime per job so the system can terminate runs that exceed it.
This is especially useful for:
- Scheduled jobs that should not overlap with subsequent ticks
- Jobs that depend on external systems prone to hangs
- Long-running jobs where you want to bound the worst case
Configuration
Use the dagster/max_runtime tag on the job (or run) to set the timeout in seconds. The mechanism is the same in Dagster Open Source and Dagster+, but how you enable it differs slightly between each.
Dagster Open Source
-
Add the
dagster/max_runtimetag (in seconds) to your job definition. -
Enable run monitoring in your
dagster.yaml:run_monitoring:
enabled: true -
Restart your Dagster instance so the daemon picks up the configuration change.
Dagster+
- Add the
dagster/max_runtimetag (in seconds) to your job definition. Tags can also be applied at run launch time from the Launchpad or via the Dagster+ deployment settings.
No agent or instance restart is required.
Choosing a timeout value
Pick a timeout based on observed runtime, with headroom for normal variance:
- Watch a few weeks of completion times for the job.
- Set the timeout above the typical p99 runtime, not just the average.
- Re-evaluate after meaningful changes to data volume, dependencies, or upstream systems.