Priority Scheduler Reference

Precise contract for the priority-aware admission scheduler: the request header clients send, the response codes the gateway returns, and every configuration knob with its exact name and default. For how it works conceptually, see Priority Scheduling.

The scheduler is disabled by default. When off, the gateway uses its legacy concurrency-limit admission path.


Request header: x-smg-priority

Clients request a priority class with the x-smg-priority request header.

Property Behavior
Header name x-smg-priority
Values system, interactive, default, bulk
Case Case-insensitive (Bulk, INTERACTIVE, SyStEm all parse). Surrounding whitespace is trimmed.
Missing header Treated as default.
Unknown value Any unrecognized value (including the empty string) silently degrades to default — admission never fails because of a typo in this header. Counted under smg_scheduler_unknown_priority_value_total.

Tenant clamp

The header chooses a class; the tenant's configured maximum class caps it. The effective class is:

effective = min(requested_class, tenant_max_class)
  • The clamp only ever moves a request down. The header can never promote a request above the tenant's ceiling.
  • A tenant whose max_class is default that sends x-smg-priority: system is admitted as default.
  • A clamp (effective class below requested) is counted under smg_scheduler_clamp_total.

A tenant's max_class comes from the per-tenant policy in the YAML config, or from the gateway-wide default (--priority-scheduler-default-max-class) for tenants not listed. See Tenant policy.


Response codes

The scheduler surfaces admission and preemption outcomes as HTTP status codes. Each rejection also carries the gateway's standard JSON error body and X-SMG-Error-Code header.

Status Condition X-SMG-Error-Code Extra headers
503 Service Unavailable Preempted — admitted, then cancelled before its first byte to make room for a higher-priority request scheduler_preempted X-SMG-Preempted: true, Retry-After: 1
429 Too Many Requests Queue full — the request's per-class queue is at its configured depth scheduler_queue_full
408 Request Timeout Queue timeout — the request waited longer than its class's queue_timeout scheduler_queue_timeout
499 Client Closed Request Client gone — the client disconnected before admission completed (nginx convention; never actually read) scheduler_client_cancelled

Enabling the scheduler

The scheduler is controlled by CLI flags (also settable in the config file). Per-class tuning and per-tenant policy live in a separate optional YAML file.

smg \
  --worker-urls http://w1:8000 http://w2:8000 \
  --priority-scheduler-enabled \
  --priority-scheduler-default-max-class interactive \
  --priority-scheduler-config /etc/smg/priority.yaml

CLI flags

Flag Default Description
--priority-scheduler-enabled false Master switch. When unset, the legacy concurrency-limit middleware stays wired and no scheduler is constructed.
--priority-scheduler-default-max-class default Maximum class for tenants not listed in the YAML (system | interactive | default | bulk). Parsed with the same rules as the header — an unknown value falls back to default.
--priority-scheduler-config unset Path to the optional priority-scheduler YAML (per-class overrides + per-tenant policy). Absent → built-in defaults and an empty tenant policy map.
--priority-scheduler-tenant-metric-top-n 32 Intended cap on per-tenant metric label cardinality. Not yet enforced — the value is stored but no top-N bucketing is applied today; per-tenant counters currently intern the raw tenant.

YAML configuration

The file referenced by --priority-scheduler-config has two top-level maps, both optional. An empty or absent file means "use built-in defaults for every class, no per-tenant overrides."

# Per-class tuning. Any class you omit keeps its built-in default.
classes:
  interactive:
    reserved_floor: 128       # always at least 128 slots
    reserved_per_slot: 0.25   # ...and 25% of capacity once the fleet is large
    queue_size: 256
    queue_timeout_secs: 30
    starvation_threshold_secs: 5
    can_preempt: true
  bulk:
    reserved_floor: 0
    queue_size: 1024
    queue_timeout_secs: 300
    starvation_threshold_secs: 120
    can_preempt: false

# Per-tenant priority ceiling. Tenants not listed use
# --priority-scheduler-default-max-class.
tenant_policies:
  "auth:acme":
    max_class: interactive
  "auth:internal-cron":
    max_class: system

Class keys and max_class values are lowercase: system, interactive, default, bulk. An unknown class name in the YAML is a parse error (which triggers the fail-safe fallback above), unlike the lenient request header.

Per-class knobs

Each entry under classes accepts the following fields. All are per-class.

Field Type Meaning
reserved_floor integer (slots) Minimum slots reserved for this class — the value the effective reservation never drops below. A higher class's unused reservation is held back from lower classes; a class's own reservation never reduces its own headroom.
reserved_per_slot float (0.0–1.0+) Share of live capacity reserved for this class, on top of the floor: effective = max(reserved_floor, ceil(reserved_per_slot × capacity)), recomputed as capacity changes so the reservation tracks the fleet. 0.0 (the default) means purely absolute (just the floor). Must be finite and ≥ 0. At startup, if the floors + shares exceed capacity the scheduler fails safe to legacy admission; at runtime a capacity dip is absorbed by clamping the lowest classes first.
queue_size integer Per-class queue depth limit. A request that arrives when the queue is full is rejected with 429.
queue_timeout_secs integer (seconds) How long a queued request waits before it is rejected with 408. Must be > 0.
starvation_threshold_secs integer (seconds) Head-of-queue age past which the dispatcher promotes a waiter out of normal priority order (and lets it use a reserved-but-unused slot) to avoid starvation. Must be > 0.
can_preempt boolean Whether admissions in this class may preempt a lower-class in-flight request that has not yet emitted its first byte.

Built-in defaults

These apply to any class with no YAML override.

Class reserved_floor reserved_per_slot queue_size queue_timeout_secs starvation_threshold_secs can_preempt
system 32 0.0 64 30 5 true
interactive 128 0.25 256 30 5 true
default 0 0.10 512 60 30 false
bulk 0 0.0 1024 300 120 false

Higher classes fail fast (short queues, short timeouts) and reserve capacity — interactive and default reserve a share that grows with the fleet, while system keeps a fixed floor (control-plane traffic is low-volume regardless of fleet size). Lower classes wait patiently (deep queues, long timeouts) and reserve nothing.

Validation

At startup the scheduler validates:

  • queue_timeout_secs > 0 for every class (else startup fails for that class).
  • starvation_threshold_secs > 0 for every class.
  • The sum of all reserved values must not exceed the live backend capacity. On a capacity shrink that would otherwise break this invariant, the scheduler scales reservations down proportionally rather than locking itself out.

Any validation failure triggers the fail-safe fallback to legacy admission.


Tenant policy

A tenant's priority ceiling is resolved per request:

  1. If the tenant key appears in tenant_policies, its max_class is used.
  2. Otherwise the gateway-wide --priority-scheduler-default-max-class applies.

The resolved max_class is the upper bound for the tenant clamp. Tenant keys are the same keys the gateway uses elsewhere for tenancy (for example auth:acme).

Field Type Meaning
max_class system | interactive | default | bulk Highest class this tenant may be admitted under. A request's effective class is min(header_class, max_class).

Metrics

The scheduler exposes these Prometheus metrics (see the Metrics Reference for the full catalog):

Metric Type Key labels Use
smg_scheduler_admit_total Counter class, outcome Admission outcomes (admitted, rejected_queue_full, rejected_queue_timeout, preempted, client_cancelled).
smg_scheduler_queue_wait_seconds Histogram class Time spent queued before admission, timeout, or cancel.
smg_scheduler_preemption_total Counter victim_class, by_class Successful preemptions. Authoritative preemption count.
smg_scheduler_clamp_total Counter tenant, requested_class, effective_class Requests clamped below the class they asked for.
smg_scheduler_unknown_priority_value_total Counter tenant Requests with an unrecognized x-smg-priority value.
smg_scheduler_starvation_promotion_total Counter class Waiters admitted via the starvation override.
smg_scheduler_inflight Gauge class Current in-flight requests per class.
smg_scheduler_queue_depth Gauge class Current queued waiters per class.
smg_scheduler_queue_size_limit Gauge class Configured queue limit per class.
smg_scheduler_utilization Gauge Total in-flight divided by backend capacity.
smg_scheduler_class_capacity_pressure Gauge class Normalized 0.0–1.0 pressure (worse of queue and slot pressure).

See also

Configuration Reference

All gateway CLI flags and configuration options.

Configuration →

Metrics Reference

Full catalog of Prometheus metrics.

Metrics →