Configuration Reference
Complete configuration reference for tuning SMG behavior.
Configuration Methods
SMG can be configured through:
Command-line arguments (highest priority)
Environment variables
Default values (lowest priority)
Worker Configuration
Host
Network interface to bind to.
Option
--host
Environment
-
Default
0.0.0.0
Value
Description
127.0.0.1
Localhost only
0.0.0.0
All IPv4 interfaces
::
All IPv6 interfaces
::1
IPv6 localhost
Port
Port for the main API server.
Option
--port
Environment
-
Default
30000
Worker URLs
List of worker URLs to route requests to.
Option
--worker-urls
Environment
-
Default
Empty
Format
Space-separated URLs
Examples :
Copy --worker-urls http://worker1:8000 http://worker2:8000
--worker-urls http://[::1]:8000 http://192.168.1.1:8000
--worker-urls grpc://worker1:50051
Routing Policy Configuration
Load Balancing Policy
Controls how requests are distributed across workers.
Option
--policy
Environment
-
Default
cache_aware
Values
random, round_robin, cache_aware, power_of_two, prefix_hash, consistent_hashing, bucket, manual
Policy Comparison :
Policy
Use Case
KV Cache
Load Balance
random
Simple deployments
Poor
Fair
round_robin
Uniform workloads
Poor
Good
power_of_two
Variable workloads
Poor
Excellent
cache_aware
LLM inference
Excellent
Good
prefix_hash
Consistent routing by prefix
Good
Good
consistent_hashing
Session affinity via hash ring
Good
Good
bucket
Load balancing with bucket boundaries
Poor
Excellent
manual
Sticky sessions with LRU eviction
Good
Manual
Recommendation : Use cache_aware for LLM workloads to maximize KV cache hit rates.
Cache-Aware Policy Options
Option
Description
Default
--cache-threshold
Cache threshold (0.0-1.0) for cache-aware routing
0.3
--balance-abs-threshold
Absolute threshold for load balancing trigger
64
--balance-rel-threshold
Relative threshold for load balancing trigger
1.5
--eviction-interval
Interval in seconds between cache eviction operations
120
--max-tree-size
Maximum size of the approximation tree
67108864
--block-size
KV cache block size for event-driven cache-aware routing
16
Prefix Hash Policy Options
Option
Description
Default
--prefix-token-count
Number of prefix tokens to use for hashing
256
--prefix-hash-load-factor
Load factor threshold for rebalancing
1.25
Manual Policy Options
Option
Description
Default
--max-idle-secs
Maximum idle time before eviction
14400 (4 hours)
--assignment-mode
Mode for new routing key assignment
random
Assignment Modes :
random - Assign to a random worker
min_load - Assign to worker with fewest active requests
min_group - Assign to worker with fewest routing keys
Advanced Routing Options
Option
Description
Default
--dp-aware
Enable data parallelism aware scheduling
false
--enable-igw
Enable IGW (Inference Gateway) mode for multi-model support
false
--dp-minimum-tokens-scheduler
Enable minimum tokens scheduler for data parallel group
false
--load-monitor-interval
Interval in seconds between load monitor checks for PowerOfTwo routing
10
PD Disaggregation Configuration
Prefill-Decode disaggregated mode separates prefill and decode operations across different workers.
Enable PD Mode
Option
--pd-disaggregation
Environment
-
Default
false
Prefill Servers
Option
--prefill
Format
URL [BOOTSTRAP_PORT]
Multiple
Yes (specify multiple times)
Examples :
Copy --prefill http://prefill1:30001 9001 \
--prefill http://prefill2:30002 9002 \
--prefill http://prefill3:30003 none Decode Servers
Option
--decode
Format
URL
Multiple
Yes (specify multiple times)
Example :
Copy --decode http://decode1:30003 \
--decode http://decode2:30004PD-Specific Policies
Option
Description
Default
--prefill-policy
Specific policy for prefill nodes
Uses main --policy
--decode-policy
Specific policy for decode nodes
Uses main --policy
Worker Startup Configuration
Option
Description
Default
--worker-startup-timeout-secs
Timeout for worker startup and registration
1800 (30 min)
--worker-startup-check-interval
Interval between worker startup checks
30
Service Discovery (Kubernetes)
Enable Service Discovery
Option
--service-discovery
Environment
-
Default
false
Note: Enabling service discovery automatically enables IGW mode.
Label Selector
Option
--selector
Format
key=value (space-separated for multiple)
Example :
Copy --selector app=sglang-worker tier=inferenceNamespace
Option
--service-discovery-namespace
Environment
-
Default
All namespaces
Worker Port
Option
--service-discovery-port
Environment
-
Default
80
PD Service Discovery Selectors
Option
Description
--prefill-selector
Label selector for prefill server pods
--decode-selector
Label selector for decode server pods
HA Mesh Router Discovery
Option
Description
--router-selector
Label selector for router pod discovery in HA mesh mode (format: key=value)
Per-Worker Model ID Override
Option
Description
--model-id-from
Override each worker's model_id from pod metadata. Accepted values: namespace, label:<key>, or annotation:<key>.
Tokenizer Configuration
Model Path
Option
--model-path
Environment
-
Default
None
Description
HuggingFace model ID or local path for loading tokenizer
Tokenizer Path
Option
--tokenizer-path
Environment
-
Default
None
Description
Explicit tokenizer path (overrides model_path tokenizer)
Chat Template
Option
--chat-template
Environment
-
Default
None
Description
Path to chat template file
Disable Tokenizer Autoload
Option
--disable-tokenizer-autoload
Environment
-
Default
false
Description
Disable automatic tokenizer loading at startup and during worker registration. Useful when tokenizers are loaded on-demand via the API.
Tokenizer Cache (L0 - Exact Match)
Option
Description
Default
--tokenizer-cache-enable-l0
Enable L0 exact match cache
false
--tokenizer-cache-l0-max-entries
Maximum entries in L0 cache
10000
Tokenizer Cache (L1 - Prefix Matching)
Option
Description
Default
--tokenizer-cache-enable-l1
Enable L1 prefix matching cache
false
--tokenizer-cache-l1-max-memory
Maximum memory for L1 cache (bytes)
52428800 (50MB)
Parser Configuration
Reasoning Parser
Option
--reasoning-parser
Environment
-
Default
None
Values
deepseek-r1, qwen3, etc.
Description
Parser for reasoning models with thinking tokens
Option
--tool-call-parser
Environment
-
Default
None
Values
json, qwen, etc.
Description
Parser for tool-call/function-calling interactions
MCP Configuration
MCP Config Path
Option
--mcp-config-path
Environment
-
Default
None
Description
Path to MCP (Model Context Protocol) server configuration file
Backend Configuration
Backend Runtime
Option
--backend
Environment
-
Default
None (auto-detected)
Values
sglang, vllm, trtllm, openai, anthropic, gemini
History Backend
Option
--history-backend
Environment
-
Default
memory
Values
memory, none, oracle, postgres, redis
Storage Configuration
Oracle Database
Option
Environment
Description
--oracle-wallet-path
ATP_WALLET_PATH
Path to Oracle ATP wallet directory
--oracle-tns-alias
ATP_TNS_ALIAS
Oracle TNS alias from tnsnames.ora
--oracle-dsn
ATP_DSN
Oracle connection descriptor/DSN
--oracle-user
ATP_USER
Oracle database username
--oracle-password
ATP_PASSWORD
Oracle database password
--oracle-external-auth
ATP_EXTERNAL_AUTH
Enable Oracle external authentication (default: false)
--oracle-pool-min
ATP_POOL_MIN
Minimum connection pool size (default: 1)
--oracle-pool-max
ATP_POOL_MAX
Maximum connection pool size (default: 16)
--oracle-pool-timeout-secs
ATP_POOL_TIMEOUT_SECS
Pool timeout in seconds (default: 30)
PostgreSQL Database
Option
Environment
Description
Default
--postgres-db-url
POSTGRES_DB_URL
PostgreSQL connection URL
-
--postgres-pool-max-size
POSTGRES_POOL_MAX
Maximum pool size
16
Redis Database
Option
Environment
Description
Default
--redis-url
REDIS_URL
Redis connection URL
-
--redis-pool-max-size
REDIS_POOL_MAX
Maximum pool size
16
--redis-retention-days
REDIS_RETENTION_DAYS
Data retention (-1 for persistent)
30
WASM Configuration
Enable WebAssembly
Option
--enable-wasm
Environment
-
Default
false
Description
Enable WebAssembly support
Storage Hook WASM Component
Option
--storage-hook-wasm-path
Environment
-
Default
None
Description
Path to a WASM component implementing storage hooks. When set, wraps all storage backends with hook-based interceptors.
Schema Config File
Option
--schema-config
Environment
-
Default
None
Description
Path to a YAML schema config file for storage table/column remapping.
WebRTC Configuration
Option
Description
Default
--webrtc-bind-addr
Bind address for WebRTC UDP sockets (client-facing ICE candidate IP). Set to 127.0.0.1 for local development on the same machine.
0.0.0.0 (auto-detect via routing table)
--webrtc-stun-server
STUN server for ICE candidate gathering (host:port). Set to your own STUN server for enterprise deployments that restrict outbound traffic to external STUN servers.
stun.l.google.com:19302
Mesh Server Configuration
High-availability mesh networking for multi-router coordination.
Option
Description
Default
--enable-mesh
Enable mesh server for HA multi-router coordination. Requires at least two SMG instances.
false
--mesh-server-name
Name for this mesh node. If not set, a random name is generated (e.g., Mesh_a1b2).
Auto-generated
--mesh-host
Bind address for the mesh server.
0.0.0.0
--mesh-advertise-host
Routable address advertised to other mesh peers. Required when --mesh-host is an unspecified bind address such as 0.0.0.0.
--mesh-host
--mesh-port
Port for the mesh server.
39527
--mesh-peer-urls
Peer mesh node addresses to join (format: host:port). Used for initial cluster formation.
(none)
Example :
Copy smg \
--enable-mesh \
--mesh-server-name router-1 \
--mesh-advertise-host 192.168.1.10 \
--mesh-port 39527 \
--mesh-peer-urls 192.168.1.10:39527
Request Handling Configuration
Request Timeout
Option
--request-timeout-secs
Environment
-
Default
1800 (30 minutes)
Description
Maximum time for request processing
Shutdown Grace Period
Option
--shutdown-grace-period-secs
Environment
-
Default
180 (3 minutes)
Description
Time to wait for in-flight requests during shutdown
Maximum Payload Size
Option
--max-payload-size
Environment
-
Default
536870912 (512MB)
Description
Maximum request payload size in bytes
CORS Configuration
Option
--cors-allowed-origins
Environment
-
Default
Empty
Format
Space-separated URLs
Example :
Copy --cors-allowed-origins http://localhost:3000 https://example.com
Option
--request-id-headers
Environment
-
Default
None (uses common defaults)
Description
Custom HTTP headers to check for request IDs
Example :
Copy --request-id-headers x-request-id x-trace-id x-correlation-id
Option
--storage-context-headers
Environment
-
Default
Empty
Format
Space-separated header=context_key entries
Description
Maps request headers into storage hook request context
Example :
Copy --storage-context-headers x-tenant-id=tenant_id x-user-id=user_idThis lets storage hooks read values such as tenant_id and user_id from the
request context without hard-coding specific headers in the gateway.
Only map headers that are injected or sanitized by a trusted upstream. Client-supplied
headers can otherwise spoof storage hook request context values.
Rate Limiting Configuration
Concurrent Request Limit
Option
--max-concurrent-requests
Environment
-
Default
-1 (unlimited)
Range
-1 or 1+
Sizing Guide :
Copy max_concurrent_requests = num_workers * requests_per_worker_capacity
Worker GPU Memory
Suggested per Worker
16GB
4-8
40GB
8-16
80GB
16-32
Queue Configuration
Option
Description
Default
--queue-size
Maximum requests waiting when rate limit reached
100
--queue-timeout-secs
Maximum time a request can wait in queue
60
Token Bucket Rate Limiting
Option
--rate-limit-tokens-per-second
Environment
-
Default
Same as max-concurrent-requests
Description
Token bucket refill rate
Retry Configuration
Retry Options
Option
Description
Default
--retry-max-retries
Maximum retry attempts
5
--retry-initial-backoff-ms
Initial backoff delay (ms)
50
--retry-max-backoff-ms
Maximum backoff delay (ms)
30000
--retry-backoff-multiplier
Exponential backoff multiplier
1.5
--retry-jitter-factor
Jitter factor (0.0-1.0)
0.2
--disable-retries
Disable automatic retries
false
Backoff Formula :
Copy delay = min (initial_backoff * multiplier^attempt, max_backoff) * (1 + random(0 , jitter_factor))
Circuit Breaker Configuration
Option
Description
Default
--cb-failure-threshold
Failures before circuit opens
10
--cb-success-threshold
Successes needed to close in half-open state
3
--cb-timeout-duration-secs
Time before attempting recovery
60
--cb-window-duration-secs
Sliding window for tracking failures
120
--disable-circuit-breaker
Disable circuit breaker
false
Circuit Breaker States :
Closed : Normal operation, tracking failures
Open : All requests fail fast, circuit tripped
Half-Open : Testing if service recovered
Health Check Configuration
Option
Description
Default
--health-failure-threshold
Failures before marking unhealthy
3
--health-success-threshold
Successes before marking healthy
2
--health-check-timeout-secs
Timeout for health check requests
5
--health-check-interval-secs
Interval between health checks
60
--health-check-endpoint
Health check endpoint path
/health
--disable-health-check
Disable all health checks
false
--remove-unhealthy-workers
Remove workers from the registry when marked unhealthy by health checks. Useful for ephemeral worker pools where failed workers should be deregistered.
false
Prometheus Metrics Configuration
Metrics Server
Option
Description
Default
--prometheus-port
Port for Prometheus metrics endpoint
29000
--prometheus-host
Host for Prometheus metrics server
0.0.0.0
--prometheus-duration-buckets
Custom histogram buckets
Default buckets
Example :
Copy --prometheus-duration-buckets 0.001 0.005 0.01 0.025 0.05 0.1 0.25 0.5 1.0 2.5 5.0 10.0
OpenTelemetry Configuration
Enable Tracing
Option
--enable-trace
Environment
-
Default
false
OTLP Endpoint
Option
--otlp-traces-endpoint
Environment
-
Default
localhost:4317
Format
host:port
Example :
Copy smg --enable-trace --otlp-traces-endpoint jaeger:4317
TLS/mTLS Security Configuration
Server TLS
For HTTPS on the gateway:
Option
Description
--tls-cert-path
Path to server certificate (PEM format)
--tls-key-path
Path to server private key (PEM format)
Client mTLS
For secure communication to workers (Python bindings):
Option
Description
--client-cert-path
Path to client certificate
--client-key-path
Path to client private key
--ca-cert-paths
Path(s) to CA certificate(s)
Control Plane Authentication
API Key (Worker Authorization)
Option
--api-key
Environment
-
Default
None
Description
API key for worker authorization (useful with dp-aware scheduling)
Control Plane API Keys
Option
--control-plane-api-keys
Environment
CONTROL_PLANE_API_KEYS
Format
id:name:role:key
Multiple
Yes
Example :
Copy --control-plane-api-keys 'key1:Admin:admin:secret123' 'key2:ReadOnly:user:secret456' JWT/OIDC Authentication
Option
Environment
Description
--jwt-issuer
JWT_ISSUER
OIDC issuer URL
--jwt-audience
JWT_AUDIENCE
Expected audience claim
--jwt-jwks-uri
JWT_JWKS_URI
Explicit JWKS URI (auto-discovered if not set)
--jwt-role-claim
-
JWT claim containing role (default: roles)
--jwt-role-mapping
-
Role mapping from IDP to gateway role
JWT Role Mapping Example :
Copy --jwt-role-mapping 'Gateway.Admin=admin' 'Gateway.User=user' Audit Logging
Option
--disable-audit-logging
Environment
-
Default
false (audit logging enabled)
Logging Configuration
Log Level
Option
--log-level
Environment
RUST_LOG
Default
info
Values
debug, info, warn, error
Per-Module Logging :
Copy RUST_LOG=smg=debug,hyper=warn smg ...Log Directory
Option
--log-dir
Environment
-
Default
None (console only)
Description
Directory to store log files
JSON Logs
Option
--log-json
Environment
-
Default
false
Description
Output logs as JSON (structured). Defaults to human-readable text logs.
Configuration Examples
Minimal Configuration
Copy smg --worker-urls http://localhost:8000High-Throughput Configuration
Copy smg \
--worker-urls http://w1:8000 http://w2:8000 http://w3:8000 http://w4:8000 \
--policy cache_aware \
--max-concurrent-requests 200 \
--queue-size 400 \
--queue-timeout-secs 60 \
--retry-max-retries 3Low-Latency Configuration
Copy smg \
--worker-urls http://w1:8000 http://w2:8000 \
--policy power_of_two \
--max-concurrent-requests 50 \
--queue-size 25 \
--queue-timeout-secs 5 \
--health-check-interval-secs 5 \
--request-timeout-secs 30PD Disaggregated Mode
Copy smg \
--pd-disaggregation \
--prefill http://prefill1:30001 9001 \
--prefill http://prefill2:30002 9002 \
--decode http://decode1:30003 \
--decode http://decode2:30004 \
--prefill-policy cache_aware \
--decode-policy round_robinKubernetes Service Discovery
Copy smg \
--service-discovery \
--selector app=sglang-worker \
--service-discovery-namespace inference \
--service-discovery-port 8000 \
--policy cache_awareHigh-Availability Mesh
Copy
smg \
--enable-mesh \
--mesh-server-name router-1 \
--mesh-advertise-host 192.168.1.10 \
--mesh-port 39527 \
--mesh-peer-urls 192.168.1.11:39527 \
--worker-urls http://worker1:8000
smg \
--enable-mesh \
--mesh-server-name router-2 \
--mesh-advertise-host 192.168.1.11 \
--mesh-port 39527 \
--mesh-peer-urls 192.168.1.10:39527 \
--worker-urls http://worker2:8000Secure Production Configuration
Copy smg \
--service-discovery \
--selector app=sglang-worker \
--service-discovery-namespace inference \
--policy cache_aware \
--max-concurrent-requests 100 \
--tls-cert-path /etc/certs/server.crt \
--tls-key-path /etc/certs/server.key \
--jwt-issuer https://login.microsoftonline.com/tenant/v2.0 \
--jwt-audience api://smg-gateway \
--jwt-role-mapping 'Gateway.Admin=admin' 'Gateway.User=user' \
--enable-trace \
--otlp-traces-endpoint jaeger:4317 \
--host 0.0.0.0 \
--port 443With Tokenizer and Parsers
Copy smg \
--worker-urls http://localhost:8000 \
--model-path meta-llama/Llama-3-8B-Instruct \
--tokenizer-cache-enable-l0 \
--tokenizer-cache-l0-max-entries 50000 \
--reasoning-parser deepseek-r1 \
--tool-call-parser jsonWith Database Backend
Copy
smg \
--worker-urls http://localhost:8000 \
--history-backend postgres \
--postgres-db-url "postgres://user:pass@localhost:5432/smg" \
--postgres-pool-max-size 32
smg \
--worker-urls http://localhost:8000 \
--history-backend redis \
--redis-url "redis://localhost:6379" \
--redis-pool-max-size 32 \
--redis-retention-days 7
Environment Variable Reference
Environment Variable
CLI Option
Description
RUST_LOG
--log-level
Log level
ATP_WALLET_PATH
--oracle-wallet-path
Oracle wallet path
ATP_TNS_ALIAS
--oracle-tns-alias
Oracle TNS alias
ATP_DSN
--oracle-dsn
Oracle DSN
ATP_USER
--oracle-user
Oracle username
ATP_PASSWORD
--oracle-password
Oracle password
ATP_EXTERNAL_AUTH
--oracle-external-auth
Enable Oracle external authentication
ATP_POOL_MIN
--oracle-pool-min
Oracle min pool size
ATP_POOL_MAX
--oracle-pool-max
Oracle max pool size
ATP_POOL_TIMEOUT_SECS
--oracle-pool-timeout-secs
Oracle pool timeout
POSTGRES_DB_URL
--postgres-db-url
PostgreSQL URL
POSTGRES_POOL_MAX
--postgres-pool-max-size
PostgreSQL max pool
REDIS_URL
--redis-url
Redis URL
REDIS_POOL_MAX
--redis-pool-max-size
Redis max pool
REDIS_RETENTION_DAYS
--redis-retention-days
Redis retention
JWT_ISSUER
--jwt-issuer
JWT issuer URL
JWT_AUDIENCE
--jwt-audience
JWT audience
JWT_JWKS_URI
--jwt-jwks-uri
JWKS URI
CONTROL_PLANE_API_KEYS
--control-plane-api-keys
Control plane API keys