Priority Scheduler Reference¶
Precise contract for the priority-aware admission scheduler: the request header clients send, the response codes the gateway returns, and every configuration knob with its exact name and default. For how it works conceptually, see Priority Scheduling.
The scheduler is disabled by default. When off, the gateway uses its legacy concurrency-limit admission path.
Request header: x-smg-priority¶
Clients request a priority class with the x-smg-priority request header.
| Property | Behavior |
|---|---|
| Header name | x-smg-priority |
| Values | system, interactive, default, bulk |
| Case | Case-insensitive (Bulk, INTERACTIVE, SyStEm all parse). Surrounding whitespace is trimmed. |
| Missing header | Treated as default. |
| Unknown value | Any unrecognized value (including the empty string) silently degrades to default — admission never fails because of a typo in this header. Counted under smg_scheduler_unknown_priority_value_total. |
Tenant clamp¶
The header chooses a class; the tenant's configured maximum class caps it. The effective class is:
effective = min(requested_class, tenant_max_class)- The clamp only ever moves a request down. The header can never promote a request above the tenant's ceiling.
- A tenant whose
max_classisdefaultthat sendsx-smg-priority: systemis admitted asdefault. - A clamp (effective class below requested) is counted under
smg_scheduler_clamp_total.
A tenant's max_class comes from the per-tenant policy in the YAML config, or from the gateway-wide default (--priority-scheduler-default-max-class) for tenants not listed. See Tenant policy.
Response codes¶
The scheduler surfaces admission and preemption outcomes as HTTP status codes. Each rejection also carries the gateway's standard JSON error body and X-SMG-Error-Code header.
| Status | Condition | X-SMG-Error-Code |
Extra headers |
|---|---|---|---|
| 503 Service Unavailable | Preempted — admitted, then cancelled before its first byte to make room for a higher-priority request | scheduler_preempted |
X-SMG-Preempted: true, Retry-After: 1 |
| 429 Too Many Requests | Queue full — the request's per-class queue is at its configured depth | scheduler_queue_full |
— |
| 408 Request Timeout | Queue timeout — the request waited longer than its class's queue_timeout |
scheduler_queue_timeout |
— |
| 499 Client Closed Request | Client gone — the client disconnected before admission completed (nginx convention; never actually read) | scheduler_client_cancelled |
— |
Enabling the scheduler¶
The scheduler is controlled by CLI flags (also settable in the config file). Per-class tuning and per-tenant policy live in a separate optional YAML file.
smg \
--worker-urls http://w1:8000 http://w2:8000 \
--priority-scheduler-enabled \
--priority-scheduler-default-max-class interactive \
--priority-scheduler-config /etc/smg/priority.yamlCLI flags¶
| Flag | Default | Description |
|---|---|---|
--priority-scheduler-enabled |
false |
Master switch. When unset, the legacy concurrency-limit middleware stays wired and no scheduler is constructed. |
--priority-scheduler-default-max-class |
default |
Maximum class for tenants not listed in the YAML (system | interactive | default | bulk). Parsed with the same rules as the header — an unknown value falls back to default. |
--priority-scheduler-config |
unset | Path to the optional priority-scheduler YAML (per-class overrides + per-tenant policy). Absent → built-in defaults and an empty tenant policy map. |
--priority-scheduler-tenant-metric-top-n |
32 |
Intended cap on per-tenant metric label cardinality. Not yet enforced — the value is stored but no top-N bucketing is applied today; per-tenant counters currently intern the raw tenant. |
YAML configuration¶
The file referenced by --priority-scheduler-config has two top-level maps, both optional. An empty or absent file means "use built-in defaults for every class, no per-tenant overrides."
# Per-class tuning. Any class you omit keeps its built-in default.
classes:
interactive:
reserved_floor: 128 # always at least 128 slots
reserved_per_slot: 0.25 # ...and 25% of capacity once the fleet is large
queue_size: 256
queue_timeout_secs: 30
starvation_threshold_secs: 5
can_preempt: true
bulk:
reserved_floor: 0
queue_size: 1024
queue_timeout_secs: 300
starvation_threshold_secs: 120
can_preempt: false
# Per-tenant priority ceiling. Tenants not listed use
# --priority-scheduler-default-max-class.
tenant_policies:
"auth:acme":
max_class: interactive
"auth:internal-cron":
max_class: systemClass keys and max_class values are lowercase: system, interactive, default, bulk. An unknown class name in the YAML is a parse error (which triggers the fail-safe fallback above), unlike the lenient request header.
Per-class knobs¶
Each entry under classes accepts the following fields. All are per-class.
| Field | Type | Meaning |
|---|---|---|
reserved_floor |
integer (slots) | Minimum slots reserved for this class — the value the effective reservation never drops below. A higher class's unused reservation is held back from lower classes; a class's own reservation never reduces its own headroom. |
reserved_per_slot |
float (0.0–1.0+) | Share of live capacity reserved for this class, on top of the floor: effective = max(reserved_floor, ceil(reserved_per_slot × capacity)), recomputed as capacity changes so the reservation tracks the fleet. 0.0 (the default) means purely absolute (just the floor). Must be finite and ≥ 0. At startup, if the floors + shares exceed capacity the scheduler fails safe to legacy admission; at runtime a capacity dip is absorbed by clamping the lowest classes first. |
queue_size |
integer | Per-class queue depth limit. A request that arrives when the queue is full is rejected with 429. |
queue_timeout_secs |
integer (seconds) | How long a queued request waits before it is rejected with 408. Must be > 0. |
starvation_threshold_secs |
integer (seconds) | Head-of-queue age past which the dispatcher promotes a waiter out of normal priority order (and lets it use a reserved-but-unused slot) to avoid starvation. Must be > 0. |
can_preempt |
boolean | Whether admissions in this class may preempt a lower-class in-flight request that has not yet emitted its first byte. |
Built-in defaults¶
These apply to any class with no YAML override.
| Class | reserved_floor |
reserved_per_slot |
queue_size |
queue_timeout_secs |
starvation_threshold_secs |
can_preempt |
|---|---|---|---|---|---|---|
system |
32 | 0.0 | 64 | 30 | 5 | true |
interactive |
128 | 0.25 | 256 | 30 | 5 | true |
default |
0 | 0.10 | 512 | 60 | 30 | false |
bulk |
0 | 0.0 | 1024 | 300 | 120 | false |
Higher classes fail fast (short queues, short timeouts) and reserve capacity — interactive and default reserve a share that grows with the fleet, while system keeps a fixed floor (control-plane traffic is low-volume regardless of fleet size). Lower classes wait patiently (deep queues, long timeouts) and reserve nothing.
Validation¶
At startup the scheduler validates:
queue_timeout_secs > 0for every class (else startup fails for that class).starvation_threshold_secs > 0for every class.- The sum of all
reservedvalues must not exceed the live backend capacity. On a capacity shrink that would otherwise break this invariant, the scheduler scales reservations down proportionally rather than locking itself out.
Any validation failure triggers the fail-safe fallback to legacy admission.
Tenant policy¶
A tenant's priority ceiling is resolved per request:
- If the tenant key appears in
tenant_policies, itsmax_classis used. - Otherwise the gateway-wide
--priority-scheduler-default-max-classapplies.
The resolved max_class is the upper bound for the tenant clamp. Tenant keys are the same keys the gateway uses elsewhere for tenancy (for example auth:acme).
| Field | Type | Meaning |
|---|---|---|
max_class |
system | interactive | default | bulk |
Highest class this tenant may be admitted under. A request's effective class is min(header_class, max_class). |
Metrics¶
The scheduler exposes these Prometheus metrics (see the Metrics Reference for the full catalog):
| Metric | Type | Key labels | Use |
|---|---|---|---|
smg_scheduler_admit_total |
Counter | class, outcome |
Admission outcomes (admitted, rejected_queue_full, rejected_queue_timeout, preempted, client_cancelled). |
smg_scheduler_queue_wait_seconds |
Histogram | class |
Time spent queued before admission, timeout, or cancel. |
smg_scheduler_preemption_total |
Counter | victim_class, by_class |
Successful preemptions. Authoritative preemption count. |
smg_scheduler_clamp_total |
Counter | tenant, requested_class, effective_class |
Requests clamped below the class they asked for. |
smg_scheduler_unknown_priority_value_total |
Counter | tenant |
Requests with an unrecognized x-smg-priority value. |
smg_scheduler_starvation_promotion_total |
Counter | class |
Waiters admitted via the starvation override. |
smg_scheduler_inflight |
Gauge | class |
Current in-flight requests per class. |
smg_scheduler_queue_depth |
Gauge | class |
Current queued waiters per class. |
smg_scheduler_queue_size_limit |
Gauge | class |
Configured queue limit per class. |
smg_scheduler_utilization |
Gauge | — | Total in-flight divided by backend capacity. |
smg_scheduler_class_capacity_pressure |
Gauge | class |
Normalized 0.0–1.0 pressure (worse of queue and slot pressure). |
See also¶
Configuration Reference¶
All gateway CLI flags and configuration options.
Metrics Reference¶
Full catalog of Prometheus metrics.