Priority Scheduler Reference¶

Precise contract for the priority-aware admission scheduler: the request header clients send, the response codes the gateway returns, and every configuration knob with its exact name and default. For how it works conceptually, see Priority Scheduling.

The scheduler is disabled by default. When off, the gateway uses its legacy concurrency-limit admission path.

Request header: `x-smg-priority`¶

Clients request a priority class with the x-smg-priority request header.

Property	Behavior
Header name	`x-smg-priority`
Values	`system`, `interactive`, `default`, `bulk`
Case	Case-insensitive (`Bulk`, `INTERACTIVE`, `SyStEm` all parse). Surrounding whitespace is trimmed.
Missing header	Treated as `default`.
Unknown value	Any unrecognized value (including the empty string) silently degrades to `default` — admission never fails because of a typo in this header. Counted under `smg_scheduler_unknown_priority_value_total`.

Tenant clamp¶

The header chooses a class; the tenant's configured maximum class caps it. The effective class is:

effective = min(requested_class, tenant_max_class)

The clamp only ever moves a request down. The header can never promote a request above the tenant's ceiling.
A tenant whose max_class is default that sends x-smg-priority: system is admitted as default.
A clamp (effective class below requested) is counted under smg_scheduler_clamp_total.

A tenant's max_class comes from the per-tenant policy in the YAML config, or from the gateway-wide default (--priority-scheduler-default-max-class) for tenants not listed. See Tenant policy.

Response codes¶

The scheduler surfaces admission and preemption outcomes as HTTP status codes. Each rejection also carries the gateway's standard JSON error body and X-SMG-Error-Code header.

Status	Condition	`X-SMG-Error-Code`	Extra headers
503 Service Unavailable	Preempted — admitted, then cancelled before its first byte to make room for a higher-priority request	`scheduler_preempted`	`X-SMG-Preempted: true`, `Retry-After: 1`
429 Too Many Requests	Queue full — the request's per-class queue is at its configured depth	`scheduler_queue_full`	—
408 Request Timeout	Queue timeout — the request waited longer than its class's `queue_timeout`	`scheduler_queue_timeout`	—
499 Client Closed Request	Client gone — the client disconnected before admission completed (nginx convention; never actually read)	`scheduler_client_cancelled`	—

Enabling the scheduler¶

The scheduler is controlled by CLI flags (also settable in the config file). Per-class tuning and per-tenant policy live in a separate optional YAML file.

smg \
  --worker-urls http://w1:8000 http://w2:8000 \
  --priority-scheduler-enabled \
  --priority-scheduler-default-max-class interactive \
  --priority-scheduler-config /etc/smg/priority.yaml

CLI flags¶

Flag	Default	Description
`--priority-scheduler-enabled`	`false`	Master switch. When unset, the legacy concurrency-limit middleware stays wired and no scheduler is constructed.
`--priority-scheduler-default-max-class`	`default`	Maximum class for tenants not listed in the YAML (`system` \| `interactive` \| `default` \| `bulk`). Parsed with the same rules as the header — an unknown value falls back to `default`.
`--priority-scheduler-config`	unset	Path to the optional priority-scheduler YAML (per-class overrides + per-tenant policy). Absent → built-in defaults and an empty tenant policy map.
`--priority-scheduler-tenant-metric-top-n`	`32`	Intended cap on per-tenant metric label cardinality. Not yet enforced — the value is stored but no top-N bucketing is applied today; per-tenant counters currently intern the raw tenant.

YAML configuration¶

The file referenced by --priority-scheduler-config has two top-level maps, both optional. An empty or absent file means "use built-in defaults for every class, no per-tenant overrides."

# Per-class tuning. Any class you omit keeps its built-in default.
classes:
  interactive:
    reserved_floor: 128       # always at least 128 slots
    reserved_per_slot: 0.25   # ...and 25% of capacity once the fleet is large
    queue_size: 256
    queue_timeout_secs: 30
    starvation_threshold_secs: 5
    can_preempt: true
  bulk:
    reserved_floor: 0
    queue_size: 1024
    queue_timeout_secs: 300
    starvation_threshold_secs: 120
    can_preempt: false

# Per-tenant priority ceiling. Tenants not listed use
# --priority-scheduler-default-max-class.
tenant_policies:
  "auth:acme":
    max_class: interactive
  "auth:internal-cron":
    max_class: system

Class keys and max_class values are lowercase: system, interactive, default, bulk. An unknown class name in the YAML is a parse error (which triggers the fail-safe fallback above), unlike the lenient request header.

Per-class knobs¶

Each entry under classes accepts the following fields. All are per-class.

Field	Type	Meaning
`reserved_floor`	integer (slots)	Minimum slots reserved for this class — the value the effective reservation never drops below. A higher class's unused reservation is held back from lower classes; a class's own reservation never reduces its own headroom.
`reserved_per_slot`	float (0.0–1.0+)	Share of live capacity reserved for this class, on top of the floor: `effective = max(reserved_floor, ceil(reserved_per_slot × capacity))`, recomputed as capacity changes so the reservation tracks the fleet. `0.0` (the default) means purely absolute (just the floor). Must be finite and ≥ 0. At startup, if the floors + shares exceed capacity the scheduler fails safe to legacy admission; at runtime a capacity dip is absorbed by clamping the lowest classes first.
`queue_size`	integer	Per-class queue depth limit. A request that arrives when the queue is full is rejected with 429.
`queue_timeout_secs`	integer (seconds)	How long a queued request waits before it is rejected with 408. Must be `> 0`.
`starvation_threshold_secs`	integer (seconds)	Head-of-queue age past which the dispatcher promotes a waiter out of normal priority order (and lets it use a reserved-but-unused slot) to avoid starvation. Must be `> 0`.
`can_preempt`	boolean	Whether admissions in this class may preempt a lower-class in-flight request that has not yet emitted its first byte.

Built-in defaults¶

These apply to any class with no YAML override.

Class	`reserved_floor`	`reserved_per_slot`	`queue_size`	`queue_timeout_secs`	`starvation_threshold_secs`	`can_preempt`
`system`	32	0.0	64	30	5	`true`
`interactive`	128	0.25	256	30	5	`true`
`default`	0	0.10	512	60	30	`false`
`bulk`	0	0.0	1024	300	120	`false`

Higher classes fail fast (short queues, short timeouts) and reserve capacity — interactive and default reserve a share that grows with the fleet, while system keeps a fixed floor (control-plane traffic is low-volume regardless of fleet size). Lower classes wait patiently (deep queues, long timeouts) and reserve nothing.

Validation¶

At startup the scheduler validates:

queue_timeout_secs > 0 for every class (else startup fails for that class).
starvation_threshold_secs > 0 for every class.
The sum of all reserved values must not exceed the live backend capacity. On a capacity shrink that would otherwise break this invariant, the scheduler scales reservations down proportionally rather than locking itself out.

Any validation failure triggers the fail-safe fallback to legacy admission.

Tenant policy¶

A tenant's priority ceiling is resolved per request:

If the tenant key appears in tenant_policies, its max_class is used.
Otherwise the gateway-wide --priority-scheduler-default-max-class applies.

The resolved max_class is the upper bound for the tenant clamp. Tenant keys are the same keys the gateway uses elsewhere for tenancy (for example auth:acme).

Field	Type	Meaning
`max_class`	`system` \| `interactive` \| `default` \| `bulk`	Highest class this tenant may be admitted under. A request's effective class is `min(header_class, max_class)`.

Metrics¶

The scheduler exposes these Prometheus metrics (see the Metrics Reference for the full catalog):

Metric	Type	Key labels	Use
`smg_scheduler_admit_total`	Counter	`class`, `outcome`	Admission outcomes (`admitted`, `rejected_queue_full`, `rejected_queue_timeout`, `preempted`, `client_cancelled`).
`smg_scheduler_queue_wait_seconds`	Histogram	`class`	Time spent queued before admission, timeout, or cancel.
`smg_scheduler_preemption_total`	Counter	`victim_class`, `by_class`	Successful preemptions. Authoritative preemption count.
`smg_scheduler_clamp_total`	Counter	`tenant`, `requested_class`, `effective_class`	Requests clamped below the class they asked for.
`smg_scheduler_unknown_priority_value_total`	Counter	`tenant`	Requests with an unrecognized `x-smg-priority` value.
`smg_scheduler_starvation_promotion_total`	Counter	`class`	Waiters admitted via the starvation override.
`smg_scheduler_inflight`	Gauge	`class`	Current in-flight requests per class.
`smg_scheduler_queue_depth`	Gauge	`class`	Current queued waiters per class.
`smg_scheduler_queue_size_limit`	Gauge	`class`	Configured queue limit per class.
`smg_scheduler_utilization`	Gauge	—	Total in-flight divided by backend capacity.
`smg_scheduler_class_capacity_pressure`	Gauge	`class`	Normalized 0.0–1.0 pressure (worse of queue and slot pressure).

Priority Scheduler Reference¶

Request header: `x-smg-priority`¶

Tenant clamp¶

Response codes¶

Enabling the scheduler¶

CLI flags¶

YAML configuration¶

Per-class knobs¶

Built-in defaults¶

Validation¶

Tenant policy¶

Metrics¶

See also¶

Configuration Reference¶

Metrics Reference¶

Priority Scheduler Reference¶

Request header: x-smg-priority¶

Tenant clamp¶

Response codes¶

Enabling the scheduler¶

CLI flags¶

YAML configuration¶

Per-class knobs¶

Built-in defaults¶

Validation¶

Tenant policy¶

Metrics¶

See also¶

Configuration Reference¶

Metrics Reference¶

Request header: `x-smg-priority`¶