Managed LLM Relay for
Your AI Workers

Route inference requests to your GPU workers through a secure, authenticated relay. Always-on queueing, streaming, and cancellation.

Get Started →

Why ModelRelay?

Secure WebSocket Relay

Workers connect via authenticated WebSocket from anywhere — behind NAT, in the cloud, or on bare metal. No inbound ports required.

OpenAI-Compatible API

Drop-in replacement for any OpenAI client. Point your SDK at ModelRelay and your requests are routed to your own workers.

Managed Infrastructure

Hosted on reliable Kubernetes infrastructure. No ops burden — we handle uptime, TLS, scaling, and monitoring so you can focus on models.

Simple Pricing

One plan. Everything included.

$20
per month
  • One managed relay instance
  • Unlimited workers
  • Unlimited requests
  • WebSocket streaming
  • Request queueing & cancellation
  • TLS & authentication included
Get Started →