The High-performance inference gateway for production LLM deployments

Route, balance, and orchestrate traffic across your LLM fleet with enterprise-grade reliability.

%

Reduction in Inference Latency

< ms

Routing Overhead

+

Metrics Available

%

Open AI Compatible

SMG sits between your applications and LLM workers, providing a unified control and data plane for managing inference at scale. Whether you're running a single model or orchestrating hundreds of workers across multiple clusters, SMG gives you the tools to do it reliably.

Gateway Layer

Gateway
  • Rate Limiter
  • OIDC Auth
  • WebAssembly
  • Metrics
  • OpenTelemetry Support
  • Multi-Tenant

Router Layer

Router
  • Tokenization
  • Chat History
  • Function Calls
  • Reasoning
  • MCP Execution
  • Load Balancing
  • Circuit Breaker

One APIAny BackendEnterprise Reliability

SMG sits between your applications and LLM workers, providing a unified control and data plane for managing inference at scale. Whether you're running a single model or orchestrating hundreds of workers across multiple clusters, SMG gives you the tools to do it reliably.

Start here to understand what SMG does and get it running in minutes.

New to SMG

Understand SMG's architecture, routing strategies, and reliability features.

Learn the Concepts

Continue onboarding with monitoring, logging, and TLS guides.

Ops Setup

Complete reference for the OpenAI-compatible API and SMG extensions.

API Reference