Sync jobs and snapshots

Sync jobs are durable units of work that move a dependency (or catalog broadcast) through the spec pipeline. Snapshots are the artifacts jobs produce when parsing succeeds. Together they explain why monitoring is sometimes delayed, retried, or paused.

In the app: open a dependency and use Sync history / Recent syncs (or similar) to see each run, its stage, and any error message—no API required.

Sync job stages (org dependencies)

Org-owned dependencies advance through checkpoints:

fetch — download the OpenAPI document (respecting timeouts and SSRF defenses).
parse — interpret JSON/YAML into an OpenAPI model.
normalize — canonicalize representation for stable diffs.
snapshot — persist a new spec_snapshots row linked to the dependency.
diff — compare against the prior snapshot.
persist — write new change events idempotently.
alert — evaluate alert preferences and enqueue channel deliveries.
done — terminal success.

Provider jobs use an analogous sequence; catalog pipelines include a notify/fan-out stage to subscribers.

If a worker crashes after normalize but before snapshot, the next worker resumes at the last completed stage—no duplicate alerts from half-written state.

Retry and cancel

If your UI exposes Retry / Cancel on a failed or stuck job, use those first.

Developers: GET /api/dependencies/:id/sync-jobs, GET /api/sync-jobs/:id, POST .../retry, POST .../cancel mirror the same actions—see Core resources.

Heartbeats and staleness

Workers heartbeat in-flight jobs. If a worker dies without closing a job, orchestration reclaims stale claims so another worker can continue. You may briefly see a job “stuck” in a stage before reclamation—usually measured in tens of seconds to a few minutes depending on deployment tuning.

Snapshots

Each successful snapshot stage creates a spec snapshot row:

Immutable — historical audits compare snapshot IDs, not mutable URLs.
Linked to the dependency (or provider surface) that produced it.
Inspectable in the UI on the dependency’s Snapshots / History views (wording varies).

Developers: GET /api/dependencies/:id/snapshots and GET .../snapshots/:snapId.

From a snapshot in the UI you can usually open operations (resolved methods/paths) and sometimes raw spec body for debugging.
Developers: same data via .../snapshots/:id/operations and .../raw (large responses—don’t log verbatim).

Baseline vs current

Diffing always compares previous snapshot vs new snapshot. The first successful sync establishes baseline only—no change events until a second successful sync finds differences.

Health circuit breaker (dependencies)

Repeated failures update dependency health:

Status	Typical meaning
`healthy`	Recent successes.
`degraded`	Multiple consecutive failures; still trying.
`unhealthy`	More failures; may have emitted operator notifications.
`paused`	Automatic scheduling stops enqueueing; manual sync still attempts and resets counters on success.

This protects shared worker pools from hammering a permanently bad URL.