The High-performance inference gateway for production LLM deployments

Route, balance, and orchestrate traffic across your LLM fleet with enterprise-grade reliability.

Get Started View on GitHub

TIFT REDUCTION

Reduction in Inference Latency

ROUTING LATENCY

< ms

Routing Overhead

METRICS

Metrics Available

OPEN AI COMPATIBLE

Open AI Compatible

WORKS WITH

WHY SHEPHERD MODEL GATEWAY?

SMG sits between your applications and LLM workers, providing a unified control and data plane for managing inference at scale. Whether you're running a single model or orchestrating hundreds of workers across multiple clusters, SMG gives you the tools to do it reliably.

HOW IT WORKS

Gateway Layer

Gateway

Rate Limiter
OIDC Auth
WebAssembly
Metrics
OpenTelemetry Support
Multi-Tenant

Router Layer

Router

Tokenization
Chat History
Function Calls
Reasoning
MCP Execution
Load Balancing
Circuit Breaker

One APIAny BackendEnterprise Reliability

CHOOSE YOUR PATH

Start here to understand what SMG does and get it running in minutes.

New to SMG

Understand SMG's architecture, routing strategies, and reliability features.

Learn the Concepts

Continue onboarding with monitoring, logging, and TLS guides.

Ops Setup

Complete reference for the OpenAI-compatible API and SMG extensions.

API Reference