<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://aigateway.envoyproxy.io/blog</id>
    <title>Envoy AI Gateway Blog</title>
    <updated>2025-12-08T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://aigateway.envoyproxy.io/blog"/>
    <subtitle>Envoy AI Gateway Blog</subtitle>
    <icon>https://aigateway.envoyproxy.io/img/favicon.ico</icon>
    <entry>
        <title type="html"><![CDATA[The Reality and Performance of MCP Traffic Routing with Envoy AI Gateway]]></title>
        <id>https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway</id>
        <link href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway"/>
        <updated>2025-12-08T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[In this article explore how Envoy AI Gateway handles stateful MCP sessions, keeps performance competitive, and stays aligned with the broader Envoy ecosystem.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" src="https://aigateway.envoyproxy.io/assets/images/mcp-routing-feature-d7db3e6242e2cc1000f3c076af1930e0.png" width="1200" height="628" class="img_ev3q"></p>
<p>Envoy AI Gateway (AIGW) provides a production-ready bridge between AI agents and their tools by handling Model Context Protocol (MCP) traffic. As teams adopt MCP, questions about scale, performance, and architecture naturally arise.</p>
<p>This post addresses those questions by first clearing up common misunderstandings, then diving into the actual architecture, the design choices we made (and why), and how you can test and evaluate whether it is the right solution for your system.</p>
<p><strong>This post will give you the context you need to:</strong></p>
<ul>
<li class="">Evaluate MCP routing in Envoy AI Gateway with realistic expectations</li>
<li class="">Understand the design decisions, their impact, and how they impact you</li>
<li class="">Learn about how you can configure and tune MCP routing in Envoy AI Gateway to meet your needs</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="part-1-common-misconceptions"><strong>Part 1: Common Misconceptions</strong><a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#part-1-common-misconceptions" class="hash-link" aria-label="Direct link to part-1-common-misconceptions" title="Direct link to part-1-common-misconceptions" translate="no">​</a></h2>
<p>Before discussing how it works, let’s address how people <em>think</em> it works. There are a few frequent misconceptions about how Envoy AI Gateway handles MCP traffic.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="misconception-1-ai-gateways-mcp-implementation-is-slow"><strong>Misconception 1:</strong> "AI Gateway's MCP implementation is slow."<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#misconception-1-ai-gateways-mcp-implementation-is-slow" class="hash-link" aria-label="Direct link to misconception-1-ai-gateways-mcp-implementation-is-slow" title="Direct link to misconception-1-ai-gateways-mcp-implementation-is-slow" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="reality">Reality<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#reality" class="hash-link" aria-label="Direct link to Reality" title="Direct link to Reality" translate="no">​</a></h4>
<p>The AIGW MCP implementation offers performance comparable to other cloud-native solutions while providing full access to all Envoy traffic-handling features.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-nuance">The Nuance<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#the-nuance" class="hash-link" aria-label="Direct link to The Nuance" title="Direct link to The Nuance" translate="no">​</a></h4>
<p>The default configuration settings balance performance, functionality, and security. If you want to further reduce latency based on your use case, you can easily adjust the configuration to prioritize raw speed in internal or low-latency environments.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="misconception-2-envoy-ai-gateway-is-a-separate-project-that-ignores-core-envoy"><strong>Misconception 2:</strong> "Envoy AI Gateway is a separate project that ignores core Envoy."<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#misconception-2-envoy-ai-gateway-is-a-separate-project-that-ignores-core-envoy" class="hash-link" aria-label="Direct link to misconception-2-envoy-ai-gateway-is-a-separate-project-that-ignores-core-envoy" title="Direct link to misconception-2-envoy-ai-gateway-is-a-separate-project-that-ignores-core-envoy" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="reality-1">Reality<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#reality-1" class="hash-link" aria-label="Direct link to Reality" title="Direct link to Reality" translate="no">​</a></h4>
<p>Envoy AI Gateway is deeply integrated into the Envoy Ecosystem. It extends the Envoy Gateway control plane and leverages Envoy Proxy data-plane extensions.</p>
<p>It is not a fork or a side project running in isolation; it is designed to feed proven implementations back into Envoy core, accelerating the adoption of emerging patterns like MCP that require faster iteration than the core release cycle typically allows.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="misconception-3-envoy-is-not-able-to-handle-ai-traffic"><strong>Misconception 3:</strong> “Envoy is not able to handle AI traffic.”<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#misconception-3-envoy-is-not-able-to-handle-ai-traffic" class="hash-link" aria-label="Direct link to misconception-3-envoy-is-not-able-to-handle-ai-traffic" title="Direct link to misconception-3-envoy-is-not-able-to-handle-ai-traffic" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="reality-2">Reality<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#reality-2" class="hash-link" aria-label="Direct link to Reality" title="Direct link to Reality" translate="no">​</a></h4>
<p>The Envoy proxy's architecture is ideally suited for managing AI traffic, including use cases like MCP. Both LLM and MCP traffic ultimately rely on HTTP. As a cloud-native, battle-tested HTTP proxy utilized at scale for nearly a decade, Envoy is a proven solution.</p>
<p>Furthermore, the <a href="https://github.com/search?q=repo%3Aenvoyproxy%2Fenvoy+mcp&amp;type=pullrequests&amp;s=created&amp;o=desc" target="_blank" rel="noopener noreferrer" class="">Envoy community is actively working</a> toward making the core Envoy proxy fully MCP-compliant. This future compatibility means Envoy AI Gateway users will be able to leverage the standard Envoy proxy for their data plane needs.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-nuance-1">The Nuance<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#the-nuance-1" class="hash-link" aria-label="Direct link to The Nuance" title="Direct link to The Nuance" translate="no">​</a></h4>
<p>Currently, Envoy AI Gateway leverages Envoy Proxy extensions to fill functionality gaps and enable MCP routing via Envoy Proxy. This is thanks to Envoy Proxy's extensible architecture, which allows us to rapidly respond to and handle the various challenges the new era of AI gives us.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="part-2-the-actual-architecture"><strong>Part 2: The Actual Architecture</strong><a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#part-2-the-actual-architecture" class="hash-link" aria-label="Direct link to part-2-the-actual-architecture" title="Direct link to part-2-the-actual-architecture" translate="no">​</a></h2>
<p>The core challenge of MCP is that it is currently a <strong>stateful protocol</strong>. A session ID must be reused across calls so the server can maintain context. When you place a gateway in the middle, a single client session might need to route to multiple upstream MCP servers (e.g., GitHub, Jira, local files), each maintaining its own independent session.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-stateless-mcp-routing-design">The "Stateless" MCP Routing Design<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#the-stateless-mcp-routing-design" class="hash-link" aria-label="Direct link to The &quot;Stateless&quot; MCP Routing Design" title="Direct link to The &quot;Stateless&quot; MCP Routing Design" translate="no">​</a></h3>
<p>Instead of maintaining persistent session mappings in a centralized session store, Envoy AI Gateway uses a <strong>token-encoding architecture</strong>.</p>
<ol>
<li class=""><strong>Initialization:</strong> When an agent initializes an MCP session, the gateway establishes upstream sessions with the backend servers (e.g., GitHub, Jira).</li>
<li class=""><strong>Secure session encoding:</strong> The gateway builds a compact description of these upstream sessions and wraps it into a safe, self-contained client session ID. This ensures the map is tamper-proof and that the client cannot see the internal topology.</li>
<li class=""><strong>Routing:</strong> The client receives this session ID. On subsequent requests, the client returns it. Any gateway replica can decode it and immediately route traffic to the correct upstream servers.</li>
</ol>
<p><img decoding="async" loading="lazy" src="https://aigateway.envoyproxy.io/assets/images/mcp-routing-47d4c56aeb8b3335a74aba6a88e104be.png" width="1042" height="327" class="img_ev3q"></p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>Tune performance to your needs</div><div class="admonitionContent_BuS1"><p>This design allows Envoy AI Gateway users to configure further and tune performance to meet their needs.</p></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="understanding-and-tuning-performance">Understanding and Tuning Performance<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#understanding-and-tuning-performance" class="hash-link" aria-label="Direct link to Understanding and Tuning Performance" title="Direct link to Understanding and Tuning Performance" translate="no">​</a></h3>
<p>The performance overhead users may observe stems from the key‑derivation function (KDF) used during session encryption, not from MCP routing itself. Envoy AI Gateway encrypts the session ID to prevent leaking internal details to MCP clients.</p>
<p>The default settings favor "stronger" key derivation over raw speed, leading to two regimes:</p>
<ul>
<li class=""><strong>Default encryption settings (e.g., 100k KDF iterations):</strong> Adds on the order of <strong>tens of milliseconds</strong> of overhead per new session.</li>
<li class=""><strong>Tuned encryption settings (e.g., ~100 iterations):</strong> Overhead drops to around <strong>1–2 ms per session,</strong> comparable to other MCP gateways.</li>
</ul>
<p>The following graph illustrates proxy performance across different Session Encryption values and you can run the benchmarks on your own hardware to explore what would work best for you.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>Benchmark Setup</div><div class="admonitionContent_BuS1"><p>These benchmarks were run on a MacBook Pro 17,1 (M1) laptop with 8 cores to demonstrate the impact of session encryption on overall performance.</p></div></div>
<p>The benchmarks measure the time taken to call a simple “echo” tool directly vs. calling it via Envoy AI Gateway with different session encryption configurations.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>Analysis of Proxy Performance with Session Encryption</div><div class="admonitionContent_BuS1"><p><em>Average execution time in milliseconds (lower is better)</em>
<img decoding="async" loading="lazy" src="https://aigateway.envoyproxy.io/assets/images/mcp-routing-benchmark-chart-d70e7a84a7322e6e7198ba658ca6ed45.png" width="2934" height="1652" class="img_ev3q"></p></div></div>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>You can run these benchmarks yourself to explore how configuring encryption settings meets your performance needs.</p></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="part-3-how-we-arrived-at-this-design-and-why"><strong>Part 3: How We Arrived at This Design (and Why)</strong><a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#part-3-how-we-arrived-at-this-design-and-why" class="hash-link" aria-label="Direct link to part-3-how-we-arrived-at-this-design-and-why" title="Direct link to part-3-how-we-arrived-at-this-design-and-why" translate="no">​</a></h2>
<p>When designing this, we faced a binary architectural choice: <strong>Keep state in the gateway</strong> or <strong>Encode state in the client</strong>.</p>
<p><img decoding="async" loading="lazy" src="https://aigateway.envoyproxy.io/assets/images/mcp-routing-options-406b5cd9d826a5b96ed4de737bb943b1.png" width="3998" height="1692" class="img_ev3q"></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-alternative-we-rejected-centralized-state">The Alternative We Rejected: Centralized State<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#the-alternative-we-rejected-centralized-state" class="hash-link" aria-label="Direct link to The Alternative We Rejected: Centralized State" title="Direct link to The Alternative We Rejected: Centralized State" translate="no">​</a></h3>
<p>One approach (Option One) would have been to return a UUID to the client and store the mapping (<code>UUID -&gt; [Upstream_Session_1, Upstream_Session_2]</code>) in a shared store like Redis.</p>
<ul>
<li class=""><strong>The Problem:</strong>
<ul>
<li class=""><strong>More components to manage:</strong> This introduces an additional component and dependency. To scale the gateway, you must also scale and manage a highly available session store. If that store fails, all traffic stops.</li>
<li class=""><strong>High Availability Architecture Complexity:</strong> To make the solution multi-region and highly available, you must ensure that the data store backing the gateway is replicated across regions.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-path-we-chose-encoded-state">The Path We Chose: Encoded State<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#the-path-we-chose-encoded-state" class="hash-link" aria-label="Direct link to The Path We Chose: Encoded State" title="Direct link to The Path We Chose: Encoded State" translate="no">​</a></h3>
<p>We chose the second option—encoding the session information into the client session ID—for three specific reasons:</p>
<ol>
<li class=""><strong>Simple Horizontal Scaling:</strong> Because the session information travels with the request, you can spin up new gateway replicas instantly. Any replica can handle any request without needing to sync with a database.</li>
<li class=""><strong>Operational Simplicity:</strong> This eliminates the need to provision, secure, and operate an entire component.</li>
</ol>
<p>Encoding session information into the client token is the <strong>current</strong> design and will evolve to meet the needs of AI traffic routing. Any evolution will follow the principles of focusing on real‑world production use cases, maintaining consistent configuration, and aligning with Envoy core.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="addressing-the-trade-offs">Addressing the Trade-offs<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#addressing-the-trade-offs" class="hash-link" aria-label="Direct link to Addressing the Trade-offs" title="Direct link to Addressing the Trade-offs" translate="no">​</a></h4>
<p>This design choice creates a deliberate trade-off: we accept a negligible amount of compute overhead in exchange for operational simplicity to address current use cases and traffic routing needs.</p>
<p>Instead of paying the cost of managing, scaling, and querying a database for every request, the gateway simply spends a few CPU cycles processing the token. This is an efficient exchange that keeps the architecture clean and stateless.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="part-4-evaluation--is-it-right-for-you"><strong>Part 4: Evaluation — Is it Right for You?</strong><a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#part-4-evaluation--is-it-right-for-you" class="hash-link" aria-label="Direct link to part-4-evaluation--is-it-right-for-you" title="Direct link to part-4-evaluation--is-it-right-for-you" translate="no">​</a></h2>
<p>The best way to decide if this architecture meets your needs is to test it against your specific performance and security requirements.</p>
<ol>
<li class=""><strong>Configure an MCP route in Envoy AI Gateway:</strong> Point your agent at AIGW and <strong>experiment with encryption tuning.</strong></li>
<li class=""><strong>Run the benchmark harness</strong> that you can <a href="https://github.com/envoyproxy/ai-gateway/blob/main/tests/bench/bench_test.go" target="_blank" rel="noopener noreferrer" class="">find here</a> in the Envoy AI Gateway repo.</li>
</ol>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Clone the repo and build the latest CLI</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">git clone git@github.com:envoyproxy/ai-gateway.git</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">cd ai-gateway</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">make build.aigw</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Run the benchmarks</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">go test -timeout=15m -run='^$' -bench=. -count=10 ./tests/bench/...</span><br></span></code></pre></div></div>
<ol start="3">
<li class=""><strong>Share your experience:</strong> Open an issue or join the community channels.</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="part-5-the-envoy-ecosystem-alignment"><strong>Part 5: The Envoy Ecosystem Alignment</strong><a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#part-5-the-envoy-ecosystem-alignment" class="hash-link" aria-label="Direct link to part-5-the-envoy-ecosystem-alignment" title="Direct link to part-5-the-envoy-ecosystem-alignment" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="part-of-the-battle-tested-envoy-ecosystem">Part of the Battle-Tested Envoy Ecosystem<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#part-of-the-battle-tested-envoy-ecosystem" class="hash-link" aria-label="Direct link to Part of the Battle-Tested Envoy Ecosystem" title="Direct link to Part of the Battle-Tested Envoy Ecosystem" translate="no">​</a></h3>
<p>Envoy AI Gateway is part of the Envoy ecosystem, which offers key benefits:</p>
<ul>
<li class=""><strong>Battle‑tested data plane:</strong> Routing traffic through AIGW is built on Envoy Proxy’s decade of experience as a high‑performance, production‑grade data plane.</li>
<li class=""><strong>Ongoing core MCP work:</strong> AIGW can quickly adapt to new AI patterns and feed proven implementations back into Envoy core. AIGW leverages the proven Envoy Proxy extension mechanisms to address traffic-handling needs for AI system builders rapidly.</li>
<li class=""><strong>Shared investments and community:</strong> Improvements to observability, security, and performance in Envoy generally benefit AI workloads.</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary-and-conclusion">Summary and Conclusion<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#summary-and-conclusion" class="hash-link" aria-label="Direct link to Summary and Conclusion" title="Direct link to Summary and Conclusion" translate="no">​</a></h2>
<p>Envoy AI Gateway (AIGW) addresses the challenges of routing Model Context Protocol (MCP) traffic, specifically the need to handle long-lived, stateful sessions at scale.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-design-decisions">Key Design Decisions<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#key-design-decisions" class="hash-link" aria-label="Direct link to Key Design Decisions" title="Direct link to Key Design Decisions" translate="no">​</a></h3>
<table><thead><tr><th style="text-align:left">Feature</th><th style="text-align:left">Design Choice in AIGW</th><th style="text-align:left">Benefit</th></tr></thead><tbody><tr><td style="text-align:left"><strong>Session State Management</strong></td><td style="text-align:left">Encode session information into an encrypted client session ID (stateless gateway).</td><td style="text-align:left">Simple horizontal scaling, failure isolation, and operational simplicity (no shared session store required).</td></tr><tr><td style="text-align:left"><strong>Latency</strong></td><td style="text-align:left">Conservative, security-heavy default encryption settings introduce latency during encryption and decryption.</td><td style="text-align:left">High integrity and defense-in-depth, configurable to tune speed vs. security (tens of milliseconds default; tunable to 1-2ms).</td></tr><tr><td style="text-align:left"><strong>Ecosystem Alignment</strong></td><td style="text-align:left">Integrated with the battle-tested Envoy Proxy data plane, and investing in enhancing Envoy Proxy to support MCP and AI traffic.</td><td style="text-align:left">Reliability, high performance, and rapid adaptation to new AI/MCP patterns. Allowing adopters of the AIGW control plane to benefit from Envoy Ecosystem advancements.</td></tr><tr><td style="text-align:left"><strong>Best Practice</strong></td><td style="text-align:left">Encourage agent context, appropriate routing patterns, and tool grouping.</td><td style="text-align:left">Keeps session tokens compact, enables granular security policy enforcement, and improves LLM performance.</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://aigateway.envoyproxy.io/blog/mcp-in-envoy-ai-gateway#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Envoy AI Gateway is a production-ready solution for managing stateful MCP traffic, prioritizing horizontal scalability and operational simplicity through its session-encoding design. While the default cryptographic settings offer strong integrity and add slight latency, this overhead is explicitly configurable to meet various performance benchmarks.</p>
<p>As part of the Envoy ecosystem, AIGW is positioned to evolve rapidly, incorporating features like optional session persistence and advanced policy based on MCP metadata, ensuring it remains the reliable control point for connecting LLM agents to their tools. Teams are encouraged to test performance with the provided benchmark harness and share real-world usage feedback.</p>]]></content>
        <author>
            <name>Ignasi Barrera</name>
            <uri>https://nacx.dev</uri>
        </author>
        <author>
            <name>Erica Hughberg</name>
            <uri>https://github.com/missBerg</uri>
        </author>
        <category label="Features" term="Features"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Announcing Model Context Protocol Support in Envoy AI Gateway]]></title>
        <id>https://aigateway.envoyproxy.io/blog/mcp-implementation</id>
        <link href="https://aigateway.envoyproxy.io/blog/mcp-implementation"/>
        <updated>2025-10-02T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Envoy AI Gateway now supports Model Context Protocol (MCP), bringing enterprise-grade security, routing, and observability to AI agent tool integrations with full spec compliance, OAuth authentication, and zero-friction deployment.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Hero feature image of title." src="https://aigateway.envoyproxy.io/assets/images/mcp-blog-hero-515b7436853af8caf3727f1810e6476b.png" width="1920" height="1080" class="img_ev3q"></p>
<p>We’re excited to announce that the next release of Envoy AI Gateway will introduce first-class support for <a href="https://modelcontextprotocol.io/" target="_blank" rel="noopener noreferrer" class="">Model Context Protocol</a> (MCP), cementing Envoy AI Gateway (EAIGW) as the universal gateway for modern production AI workloads.</p>
<p>Envoy AI Gateway started in close collaboration with <a href="https://www.bloomberg.com/" target="_blank" rel="noopener noreferrer" class="">Bloomberg</a> and <a href="https://tetrate.io/" target="_blank" rel="noopener noreferrer" class="">Tetrate</a> to meet production-scale AI workload demands, combining real-world expertise and innovation from some of the industry’s largest adopters. Built upon the battle-tested <a href="https://www.envoyproxy.io/" target="_blank" rel="noopener noreferrer" class="">Envoy Proxy</a> data plane as the AI extension of Envoy Gateway, it is trusted for critical workloads by thousands of enterprises worldwide. EAIGW already provides unified LLM access, cost and quota enforcement, credential management, intelligent routing, resiliency, and robust observability for mission-critical AI traffic.</p>
<p>With the addition of MCP, we have brought these features to the communication between Agents and external tools, making EAIGW even more versatile for enterprise-scale AI deployments. For a deeper look at the collaborative story and technical vision, see the <a href="https://tetrate.io/blog/tetrate-bloomberg-collaborating-on-envoy-ai-gateway" target="_blank" rel="noopener noreferrer" class="">Bloomberg partnership announcement</a>, their <a href="https://www.bloomberg.com/company/press/tetrate-and-bloomberg-release-open-source-envoy-ai-gateway-built-on-cncfs-envoy-gateway-project/" target="_blank" rel="noopener noreferrer" class="">official release coverage</a>, and previous <a class="" href="https://aigateway.envoyproxy.io/blog/01-release-announcement">project announcements</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-mcp-matters-for-ai-gateways">Why MCP Matters for AI Gateways<a href="https://aigateway.envoyproxy.io/blog/mcp-implementation#why-mcp-matters-for-ai-gateways" class="hash-link" aria-label="Direct link to Why MCP Matters for AI Gateways" title="Direct link to Why MCP Matters for AI Gateways" translate="no">​</a></h2>
<p>MCP is rapidly becoming the industry’s open standard for enabling AI agents to securely and flexibly connect to external tools and data sources. As the AI ecosystem shifts from monolithic models to agentic architectures, <strong>building robust, policy-driven, and observable pathways between AI and the rest of the enterprise stack has never been more critical</strong>. Integrating MCP directly into Envoy AI Gateway means:</p>
<ul>
<li class=""><strong>Seamless interoperability</strong> between AI agents, tools, and context providers, whether they’re third-party cloud LLMs or internal enterprise services.</li>
<li class=""><strong>Consistent security and governance</strong>: The gateway can now apply fine-grained authentication, authorization, and observability over tool invocations and data access flowing through MCP.</li>
<li class=""><strong>Accelerated developmen</strong>t: With MCP supported natively, teams can adopt the latest agent-based AI flows <strong>on their existing Envoy infrastructure without custom or glue code.</strong></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-features-in-the-first-implementation">Key Features in the First Implementation<a href="https://aigateway.envoyproxy.io/blog/mcp-implementation#key-features-in-the-first-implementation" class="hash-link" aria-label="Direct link to Key Features in the First Implementation" title="Direct link to Key Features in the First Implementation" translate="no">​</a></h2>
<p>The initial implementation aims for a reliable implementation of the latest version of the spec, covering the full spectrum of features, not only focusing on tool calling:</p>
<table><thead><tr><th><strong>Feature</strong></th><th><strong>Description</strong></th></tr></thead><tbody><tr><td><strong>Streamable HTTP Transport</strong></td><td>Full support for MCP’s streamable HTTP transport, aligning with the <a href="https://modelcontextprotocol.io/specification/2025-06-18" target="_blank" rel="noopener noreferrer" class="">June 2025 MCP spec</a>.<br>Efficient handling of stateful sessions and multi-part JSON-RPC messaging over persistent HTTP connections.</td></tr><tr><td><strong>OAuth Authorization</strong></td><td>Native enforcement of <a href="https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization" target="_blank" rel="noopener noreferrer" class="">OAuth authentication flows</a> for AI agents and services bridging via MCP, ensuring secure tool usage at scale.<br>Backwards compatibility with the <a href="https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization" target="_blank" rel="noopener noreferrer" class="">previous version of the authorization spec</a> to maximize compatibility with existing agents.</td></tr><tr><td><strong>MCP Server multiplexing, Tool Routing, and Filtering</strong></td><td>Route tool calls and notifications to the right MCP backends, aggregating and filtering available tools based on gateway policy.<br>Dynamically aggregate, merge, and filter messages and streaming notifications from multiple MCP servers, so agents receive a unified, policy-governed interface to all available services.</td></tr><tr><td><strong>Upstream Authentication</strong></td><td>Built-in upstream authentication primitives to securely connect to external MCP servers, with support for credential injection and validation using existing Envoy Gateway patterns.</td></tr><tr><td><strong>Full MCP Spec Coverage</strong></td><td>Complete <a href="https://modelcontextprotocol.io/specification/2025-06-18" target="_blank" rel="noopener noreferrer" class="">June 2025 MCP spec</a> compliance, including support for tool calls, notifications, prompts, resources, and bi-directional server-to-client requests.<br>Robust session and stream management, including reconnection logic (e.g., Last-Event-ID for SSE), ensuring resilience and correctness for long-lived agent conversations.</td></tr><tr><td><strong>Zero Friction Developer and Production UX</strong></td><td>All features are supported in standalone mode, allowing them to start the Envoy AI Gateway locally on their machine with a single command and start leveraging all MCP features. These configurations can be used as-is in production environments, as there is full compatibility between local standalone mode and Kubernetes.</td></tr><tr><td><strong>Tested in Real Life</strong></td><td>The implementation has full protocol test coverage and is tested with real-world providers like GitHub and agents like <a href="https://block.github.io/goose/" target="_blank" rel="noopener noreferrer" class="">Goose</a>, validating end-to-end functionality in the existing ecosystem.</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="under-the-hood">Under the Hood<a href="https://aigateway.envoyproxy.io/blog/mcp-implementation#under-the-hood" class="hash-link" aria-label="Direct link to Under the Hood" title="Direct link to Under the Hood" translate="no">​</a></h2>
<p>Adding MCP support meant more than just passing bytes through the stack. Our approach <strong>leverages Envoy’s architecture and existing features</strong>, introducing a lightweight MCP Proxy that handles session management, multiplexes streams, and bridges the gap between the stateful JSON-RPC protocol and the broader Envoy extension mechanisms.</p>
<p>Key design decisions included:</p>
<ul>
<li class=""><strong>Minimal Architectural Complexity</strong>: The implementation does not add any additional component/complexity to the existing Envoy AI Gateway’s architecture</li>
<li class=""><strong>Fully leverages Envoy's networking stack</strong>: The MCP proxy harnesses Envoy’s proven networking stack for connection management, load balancing, circuit breaking, rate-limiting, observability, etc.</li>
<li class=""><strong>Decoupled Iteration Velocity</strong>: The MCP Proxy is implemented as a lightweight Go server to keep pace with the rapidly evolving MCP specification, while still relying on Envoy for networking primitives.</li>
</ul>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>You can find more details about the architecture and design decisions in the design document of the <a href="https://github.com/envoyproxy/ai-gateway/pull/1260" target="_blank" rel="noopener noreferrer" class="">MCP contribution pull request</a>.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="getting-started">Getting Started<a href="https://aigateway.envoyproxy.io/blog/mcp-implementation#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>One of the easiest ways to get started with the MCP features in Envoy AI Gateway is by using the standalone mode to run it on your machine with no friction. In standalone mode, you can start the MCP Gateway with your existing MCP servers configuration file used by agents like Claude Code.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="use-your-existing-mcp-servers-file">Use your existing MCP servers file<a href="https://aigateway.envoyproxy.io/blog/mcp-implementation#use-your-existing-mcp-servers-file" class="hash-link" aria-label="Direct link to Use your existing MCP servers file" title="Direct link to Use your existing MCP servers file" translate="no">​</a></h3>
<p>In the following example, we’ll start Envoy AI Gateway to proxy the GitHub and Context7 MCP servers. First, define the servers in the mcp-servers.json file:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"mcpServers"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"context7"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"http"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"url"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"https://mcp.context7.com/mcp"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"github"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"http"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"url"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"https://api.githubcopilot.com/mcp/readonly"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"headers"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"Authorization"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Bearer ${GITHUB_ACCESS_TOKEN}"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p>And then start Envoy AI Gateway:</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>Refer to the <a class="" href="https://aigateway.envoyproxy.io/docs/cli/aigwinstall">CLI installation instructions</a> if you haven’t installed the CLI yet.</p></div></div>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">$ aigw run --mcp-config mcp-servers.json</span><br></span></code></pre></div></div>
<p>This will start the Envoy AI Gateway locally and start serving the MCP servers at <a href="http://localhost:1975/mcp" target="_blank" rel="noopener noreferrer" class="">http://localhost:1975/mcp</a>.<br>
<strong>You can point your preferred agent (Claude, Goose, etc) to this URL as a Streamable HTTP MCP server.</strong></p>
<p>You can also use some features, like Tool Filtering, with the existing server file. To do it, you just need to add the list of tools to expose for each server (by default, all tools are exposed), like in the following example:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"mcpServers"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"context7"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"http"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"url"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"https://mcp.context7.com/mcp"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"github"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"http"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"url"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"https://api.githubcopilot.com/mcp/readonly"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"headers"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"Authorization"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Bearer ${GITHUB_ACCESS_TOKEN}"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"tools"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"issue_read"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"list_issues"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="using-the-new-mcproute-api">Using the new MCPRoute API<a href="https://aigateway.envoyproxy.io/blog/mcp-implementation#using-the-new-mcproute-api" class="hash-link" aria-label="Direct link to Using the new MCPRoute API" title="Direct link to Using the new MCPRoute API" translate="no">​</a></h3>
<p>You can also use the new <code>MCPRoute</code> API, which will allow a more fine-grained configuration and will work in standalone mode and in Kubernetes as well.</p>
<p>The following example defines a complete <code>MCPRoute</code> that shows how simple it is to configure:</p>
<ul>
<li class="">MCP Server multiplexing.</li>
<li class="">MCP Authentication using OAuth.</li>
<li class="">Tool filtering.</li>
<li class="">MCP Server upstream authentication.</li>
</ul>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> aigateway.envoyproxy.io/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> MCPRoute</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> mcp</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">route</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">parentRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> aigw</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">run</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Gateway</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gateway.networking.k8s.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">backendRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> context7</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Backend</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gateway.envoyproxy.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">path</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"/mcp"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> github</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Backend</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gateway.envoyproxy.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">path</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"/mcp/readonly"</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic"># Use the radonly endpoint</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># Only expose certain tools</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">toolSelector</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">includeRegex</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> .</span><span class="token important">*pull_requests?.*</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> .</span><span class="token important">*issues?.*</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># Configure upstream authentication</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">securityPolicy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">apiKey</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">secretRef</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> github</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">access</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">token</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Configure the gateway to enforce authentication using OAuth.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">securityPolicy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">oauth</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">issuer</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"https://auth-server.example.com"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">protectedResourceMetadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># The URL here must match the URL of the Envoy AI Gateway</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">resource</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"http://localhost:1975/mcp"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">scopesSupported</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"profile"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"email"</span><br></span></code></pre></div></div>
<p>This file can be used in your Kubernetes cluster and in standalone mode. You can quickly try it out with:</p>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">$ aigw run mcp-route.yaml</span><br></span></code></pre></div></div>
<p>And point your local agents again to <a href="http://localhost:1975/mcp" target="_blank" rel="noopener noreferrer" class="">http://localhost:1975/mcp</a>.<br>
Once you’re happy with the configuration, you can apply it as-is on your Kubernetes cluster and production environments.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="whats-next">What’s Next?<a href="https://aigateway.envoyproxy.io/blog/mcp-implementation#whats-next" class="hash-link" aria-label="Direct link to What’s Next?" title="Direct link to What’s Next?" translate="no">​</a></h2>
<p>This is just the beginning. As MCP and agentic architectures advance, we’ll continue to evolve and upstream features—ensuring Envoy AI Gateway remains the <strong>universal,</strong> <strong>most reliable, policy-driven, and interoperable AI gateway available</strong> modern production AI workloads.</p>
<p>We are proud to contribute the implementation of the MCP protocol, and look forward to continue enhancing Envoy AI Gateway together with the community and bring more use cases and features to the project.</p>
<p>We’d love your feedback as you start exploring MCP in your GenAI or agent infrastructure. Join our community meetings or file an issue to help shape the roadmap!</p>]]></content>
        <author>
            <name>Ignasi Barrera</name>
            <uri>https://nacx.dev</uri>
        </author>
        <author>
            <name>Takeshi Yoneda</name>
            <uri>https://github.com/mathetake</uri>
        </author>
        <category label="News" term="News"/>
        <category label="Features" term="Features"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Enhancing AI Gateway Observability - OpenTelemetry Tracing Arrives in Envoy AI Gateway]]></title>
        <id>https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability</id>
        <link href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability"/>
        <updated>2025-08-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Envoy AI Gateway v0.3 introduces OpenTelemetry tracing with OpenInference conventions, providing complete visibility into LLM application behavior beyond metrics.Understand not just what happened, but why.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Hero feature image of title." src="https://aigateway.envoyproxy.io/assets/images/openinference-feature-2df5b0cd53f76aae58d4303589a968da.png" width="1920" height="1080" class="img_ev3q"></p>
<p>Aggregated metrics like latency, error rates, and throughput on their own won't reveal the source of <em>why</em> a system's output was wrong, slow, or expensive.</p>
<p>The <strong>v0.3 release</strong> of Envoy AI Gateway brings comprehensive <strong>OpenTelemetry tracing support</strong> with <strong>OpenInference semantic conventions</strong>, extending the existing metrics foundation to provide complete visibility into LLM application behavior.</p>
<p>This enables you to improve the quality and safety of your AI-integrated applications by allowing you to understand the full context of a request journey, as your LLM traces will inform application improvements and guardrail needs.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-observability-challenges-in-ai-applications">The Observability Challenges in AI Applications<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#the-observability-challenges-in-ai-applications" class="hash-link" aria-label="Direct link to The Observability Challenges in AI Applications" title="Direct link to The Observability Challenges in AI Applications" translate="no">​</a></h2>
<p>Traditional observability looks at request speed, request volume, and error rates. These work well for simple stateless HTTP services. But they are not enough for AI applications.</p>
<p><strong>AI applications present unique challenges:</strong></p>
<ul>
<li class="">LLM requests have complicated costs based on token usage</li>
<li class="">Responses can vary and may stream out tokens slowly</li>
<li class="">Semantic failures occur when the AI fails to understand or produce the correct answer</li>
<li class="">These issues don't show up as HTTP errors</li>
</ul>
<p>Envoy AI Gateway has addressed some of these challenges. It collects a set of robust metrics through OpenTelemetry. It tracks token use, request times, and provider performance data.</p>
<p>The <strong>v0.1.3 update</strong> added GenAI-specific metrics. These include time-to-first-token, available via Prometheus endpoints.</p>
<p>However, these metrics alone can't tell you the full story or the root cause of an issue; this is where <strong>OpenTelemetry tracing with OpenInference conventions</strong> comes in.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="specifications-made-for-ai-openinference-semantic-conventions">Specifications Made for AI: OpenInference Semantic Conventions<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#specifications-made-for-ai-openinference-semantic-conventions" class="hash-link" aria-label="Direct link to Specifications Made for AI: OpenInference Semantic Conventions" title="Direct link to Specifications Made for AI: OpenInference Semantic Conventions" translate="no">​</a></h2>
<p>To enable you to gain traffic insights in the best way, Envoy AI Gateway chooses to stay close to standards. Instead of creating custom trace formats, Envoy AI Gateway uses <strong><a href="https://github.com/Arize-ai/openinference/blob/main/spec/semantic_conventions.md" target="_blank" rel="noopener noreferrer" class="">OpenInference.</a></strong> This is a widely accepted standard for AI applications, compatible with OpenTelemetry. Many frameworks, like <strong><a href="https://docs.beeai.dev/observability/custom-agent-traceability" target="_blank" rel="noopener noreferrer" class="">BeeAI</a></strong> and <strong><a href="https://huggingface.co/docs/smolagents/v1.21.1/en/tutorials/inspect_runs#inspecting-runs-with-opentelemetry" target="_blank" rel="noopener noreferrer" class="">HuggingFace SmolAgents,</a></strong> support it. OpenInference sets rules for tracking how large language models work. It covers things like the prompt, model settings, tokens used, and the response. Key moments, such as the time to get the first token during streaming, are recorded as span events. These relate to the earlier discussed metrics.</p>
<p>This OpenTelemetry approach uses spans and works well with common tracing systems. These systems usually link traces, not logs. For example, you can set up Envoy AI Gateway to <strong><a href="https://www.jaegertracing.io/" target="_blank" rel="noopener noreferrer" class="">work with Jaeger.</a></strong> This setup is similar to how LLM evaluation systems like Arize Phoenix handle OpenInference directly. Redaction controls are available from the start. They help you manage your trace data and balance it with your evaluation needs.</p>
<p><strong>Redaction controls</strong> are available from the start. They help you manage your trace data and balance it with your evaluation needs.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-it-all-fits-together-envoy-ai-gateway-opentelemetry-tracing-architecture">How it All Fits Together: Envoy AI Gateway OpenTelemetry Tracing Architecture<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#how-it-all-fits-together-envoy-ai-gateway-opentelemetry-tracing-architecture" class="hash-link" aria-label="Direct link to How it All Fits Together: Envoy AI Gateway OpenTelemetry Tracing Architecture" title="Direct link to How it All Fits Together: Envoy AI Gateway OpenTelemetry Tracing Architecture" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Diagram showing how it all fits together" src="https://aigateway.envoyproxy.io/assets/images/Envoy%20AI%20gateway%20+%20Phoenix.drawio-b450b71f1161e1f81b22532a1f18ad3d.png" width="2462" height="1542" class="img_ev3q"></p>
<p>Here's an example of a simple trace that includes both application and gateway spans, shown in Arize Phoenix:</p>
<p><img decoding="async" loading="lazy" alt="Screenshot of Phoenix UI" src="https://aigateway.envoyproxy.io/assets/images/phoenix-072c54623b615f214c28eb3a4ffad22e.webp" width="2938" height="1660" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="capture-and-evaluate-your-traffic-llm-evaluation">Capture and Evaluate your Traffic: LLM Evaluation<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#capture-and-evaluate-your-traffic-llm-evaluation" class="hash-link" aria-label="Direct link to Capture and Evaluate your Traffic: LLM Evaluation" title="Direct link to Capture and Evaluate your Traffic: LLM Evaluation" translate="no">​</a></h2>
<p>Tracing data isn't only for in-the-moment troubleshooting; this data is your key to optimizing your AI system.</p>
<p><strong>OpenInference traces provide the structured data foundation necessary for comprehensive LLM evaluation:</strong></p>
<ul>
<li class="">Capture the whole interaction context, including prompts, model parameters, and outputs</li>
<li class="">Leverage evaluation frameworks to identify patterns over time</li>
<li class="">Find optimization opportunities for performance, accuracy, and/or cost</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="evaluation-patterns">Evaluation Patterns<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#evaluation-patterns" class="hash-link" aria-label="Direct link to Evaluation Patterns" title="Direct link to Evaluation Patterns" translate="no">​</a></h3>
<p>To analyze the trace data, you can leverage an <strong><a href="https://arize.com/llm-as-a-judge/" target="_blank" rel="noopener noreferrer" class="">LLM-as-a-Judge</a></strong> evaluation pattern using your production trace data.</p>
<p>Since your traces are in OpenInference format, your easiest option is to leverage a solution that can consume them. Platforms like <strong>Arize Phoenix</strong> consume OpenInference traces directly, enabling easy analysis of inference traffic captured through Envoy AI Gateway.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="privacy-controls">Privacy Controls<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#privacy-controls" class="hash-link" aria-label="Direct link to Privacy Controls" title="Direct link to Privacy Controls" translate="no">​</a></h3>
<p>When capturing tracing data, you want to ensure you keep private data private. The gateway includes <strong>configurable privacy controls</strong> for sensitive environments:</p>
<ul>
<li class="">Selectively redact content from spans</li>
<li class="">Limit data captured in multimodal interactions</li>
<li class="">Apply custom filtering based on organizational requirements</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="telemetry-via-the-gateway-zero-application-change-integration">Telemetry via the Gateway: Zero-Application-Change Integration<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#telemetry-via-the-gateway-zero-application-change-integration" class="hash-link" aria-label="Direct link to Telemetry via the Gateway: Zero-Application-Change Integration" title="Direct link to Telemetry via the Gateway: Zero-Application-Change Integration" translate="no">​</a></h2>
<p><strong>Gateway-level tracing requires no code changes in your applications.</strong> As traffic is routed via Envoy AI Gateway, the OpenInference-compliant traces are automatically created for all requests to the OpenAI-compatible API. This happens regardless of whether the calling applications include OpenTelemetry instrumentation.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="automatic-trace-propagation">Automatic Trace Propagation<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#automatic-trace-propagation" class="hash-link" aria-label="Direct link to Automatic Trace Propagation" title="Direct link to Automatic Trace Propagation" translate="no">​</a></h3>
<p>If your calling applications are already instrumented with OpenTelemetry, the client spans will automatically join the same distributed trace as the gateway spans, providing <strong>end-to-end visibility</strong>. This trace propagation offers a complete view of the AI interactions.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="seamless-integration">Seamless Integration<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#seamless-integration" class="hash-link" aria-label="Direct link to Seamless Integration" title="Direct link to Seamless Integration" translate="no">​</a></h3>
<p>As it integrates with your current tools, you and your team can begin using this new feature without needing to learn and adopt new tooling and instrumentation patterns.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="follow-the-development-lifecycle-deployment-flexibility">Follow the Development Lifecycle: Deployment Flexibility<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#follow-the-development-lifecycle-deployment-flexibility" class="hash-link" aria-label="Direct link to Follow the Development Lifecycle: Deployment Flexibility" title="Direct link to Follow the Development Lifecycle: Deployment Flexibility" translate="no">​</a></h2>
<p>As you move from local development, through dev, test, staging, and to production, you can capture and trace traffic in the same way.</p>
<p><strong>The tracing capability works consistently across deployment modes:</strong></p>
<ul>
<li class=""><strong>Local development:</strong> Standalone CLI tool</li>
<li class=""><strong>Production:</strong> Kubernetes Gateway</li>
<li class=""><strong>All environments:</strong> Full tracing support for OpenAI-compatible requests, including streaming responses and multimodal inputs</li>
</ul>
<p>This deployment consistency simplifies your observability integration throughout your development lifecycle. Teams can establish observability patterns during local development that integrate seamlessly into production environments without architectural changes.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="looking-ahead-ai-applications-evolve-fast-and-infrastructure-and-observability-with-it">Looking Ahead: AI Applications Evolve Fast, and Infrastructure and Observability with it<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#looking-ahead-ai-applications-evolve-fast-and-infrastructure-and-observability-with-it" class="hash-link" aria-label="Direct link to Looking Ahead: AI Applications Evolve Fast, and Infrastructure and Observability with it" title="Direct link to Looking Ahead: AI Applications Evolve Fast, and Infrastructure and Observability with it" translate="no">​</a></h2>
<p>The new tracing capability is available with the <strong>v0.3 Envoy AI Gateway release</strong>. For complete configuration details and integration examples, see the <strong><a class="" href="https://aigateway.envoyproxy.io/docs/capabilities/observability/tracing">tracing documentation</a></strong> to get started.</p>
<p>As AI infrastructure continues to evolve, comprehensive observability becomes essential for managing operational complexity and ensuring application quality. <strong>OpenTelemetry tracing with OpenInference conventions</strong> provides the foundation teams need to build reliable, observable AI systems.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="get-involved">Get Involved<a href="https://aigateway.envoyproxy.io/blog/openinference-for-ai-observability#get-involved" class="hash-link" aria-label="Direct link to Get Involved" title="Direct link to Get Involved" translate="no">​</a></h3>
<p>Want to get involved in building Envoy AI Gateway and further improve the observability capabilities?</p>
<ul>
<li class=""><strong>Raise an issue</strong> on our GitHub repository</li>
<li class=""><strong>Join us on Slack</strong> in the <code>#envoy-ai-gateway</code> channel</li>
<li class=""><strong>Join our weekly community meetings</strong></li>
</ul>]]></content>
        <author>
            <name>Erica Hughberg</name>
            <uri>https://github.com/missBerg</uri>
        </author>
        <author>
            <name>Adrian Cole</name>
            <uri>https://github.com/codefromthecrypt</uri>
        </author>
        <category label="Features" term="Features"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Announcing the Envoy AI Gateway v0.3 Release]]></title>
        <id>https://aigateway.envoyproxy.io/blog/v0.3-release-announcement</id>
        <link href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement"/>
        <updated>2025-08-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Envoy AI Gateway v0.3 brings intelligent inference routing with Endpoint Picker, production-ready Google Vertex AI support, and enterprise observability with OpenInference tracing—transforming AI infrastructure from static to intelligent.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Envoy AI Gateway v0.3.0 Release" src="https://aigateway.envoyproxy.io/assets/images/v0.3-hero-44d6cb88ddd80007a24e56cdfd564823.png" width="1920" height="1080" class="img_ev3q"></p>
<p>The <strong>Envoy AI Gateway v0.3</strong> release introduces <strong>intelligent inference routing</strong> through Endpoint Picker (EPP) integration, expands our provider ecosystem with <strong>Google Vertex AI Production Support</strong> as well as <strong>Native Anthropic API</strong>, and delivers <strong>Enterprise-Grade Observability</strong> with OpenInference tracing.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-big-shifts-in-v03"><strong>The Big Shifts in v0.3</strong><a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#the-big-shifts-in-v03" class="hash-link" aria-label="Direct link to the-big-shifts-in-v03" title="Direct link to the-big-shifts-in-v03" translate="no">​</a></h2>
<p>Envoy AI Gateway v0.3 isn't just another feature release; it's a <strong>fundamental shift</strong> toward intelligent, production-ready AI infrastructure. This release addresses three critical challenges that have been holding back AI adoption in enterprise environments:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-from-static-to-intelligent-routing">1. From Static to Intelligent Routing<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#1-from-static-to-intelligent-routing" class="hash-link" aria-label="Direct link to 1. From Static to Intelligent Routing" title="Direct link to 1. From Static to Intelligent Routing" translate="no">​</a></h3>
<p>Traditional load balancers treat AI inference endpoints like web servers, but AI workloads are fundamentally different. With <strong>Endpoint Picker integration</strong>, Envoy AI Gateway now makes intelligent routing decisions based on real-time AI-specific metrics like KV-cache usage, queue depth, and LoRA adapter information.</p>
<p><strong>What this means for you:</strong></p>
<table><thead><tr><th>Benefit</th><th>Description</th></tr></thead><tbody><tr><td><strong>Latency reduction</strong></td><td>Optimal endpoint selection based on real-time AI metrics</td></tr><tr><td><strong>Automatic resource optimization</strong></td><td>Intelligent resource allocation across your inference infrastructure</td></tr><tr><td><strong>Zero manual intervention</strong></td><td>Automated endpoint management without operational overhead</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-expanded-provider-ecosystem">2. Expanded Provider Ecosystem<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#2-expanded-provider-ecosystem" class="hash-link" aria-label="Direct link to 2. Expanded Provider Ecosystem" title="Direct link to 2. Expanded Provider Ecosystem" translate="no">​</a></h3>
<p>We've moved beyond experimental integrations to deliver <strong>production-grade support</strong> for the AI providers that matter most to enterprises.</p>
<p><strong>Google Vertex AI</strong> is now supported with complete streaming capabilities for Gemini models. <strong>Anthropic on Vertex AI</strong> moves from experimental to production-ready with multi-tool support and configurable API versions.</p>
<p><strong>What this means for you:</strong></p>
<table><thead><tr><th>Benefit</th><th>Description</th></tr></thead><tbody><tr><td><strong>Unified OpenAI-compatible API</strong></td><td>Single interface across Google, Anthropic, AWS, and more providers</td></tr><tr><td><strong>Enterprise-grade reliability</strong></td><td>Production-ready stability for mission-critical AI workloads</td></tr><tr><td><strong>Provider flexibility</strong></td><td>Switch between providers without architectural changes or vendor lock-in</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-enterprise-observability-for-ai">3. Enterprise Observability for AI<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#3-enterprise-observability-for-ai" class="hash-link" aria-label="Direct link to 3. Enterprise Observability for AI" title="Direct link to 3. Enterprise Observability for AI" translate="no">​</a></h3>
<p>AI workloads require specialized observability that traditional monitoring tools can't provide. v0.3 delivers comprehensive AI-specific monitoring across four key areas.</p>
<p><strong>What this means for you:</strong></p>
<table><thead><tr><th>Observability Feature</th><th>Description</th></tr></thead><tbody><tr><td><strong>OpenInference tracing</strong></td><td>Complete request lifecycle visibility and evaluation system compatibility</td></tr><tr><td><strong>Configurable metrics labels</strong></td><td>Granular monitoring based on request headers for custom filtering</td></tr><tr><td><strong>Embeddings metrics support</strong></td><td>Comprehensive token usage tracking for accurate cost attribution</td></tr><tr><td><strong>Enhanced GenAI metrics</strong></td><td>Improved accuracy with OpenTelemetry semantic conventions</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="notable-new-features-in-v03"><strong>Notable New Features in v0.3</strong><a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#notable-new-features-in-v03" class="hash-link" aria-label="Direct link to notable-new-features-in-v03" title="Direct link to notable-new-features-in-v03" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="endpoint-picker-provider-the-future-of-ai-load-balancing">Endpoint Picker Provider: The Future of AI Load Balancing<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#endpoint-picker-provider-the-future-of-ai-load-balancing" class="hash-link" aria-label="Direct link to Endpoint Picker Provider: The Future of AI Load Balancing" title="Direct link to Endpoint Picker Provider: The Future of AI Load Balancing" translate="no">​</a></h3>
<p>A highlight of v0.3 is our integration with the <a href="https://gateway-api-inference-extension.sigs.k8s.io/" target="_blank" rel="noopener noreferrer" class="">Gateway API Inference Extension</a>, which allows intelligent endpoint selection that understands AI workloads.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># AIGatewayRoute with InferencePool</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> aigateway.envoyproxy.io/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> AIGatewayRoute</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> intelligent</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">routing</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">rules</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">matches</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">headers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">ai</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">eg</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama/Llama</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8B</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">Instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">backendRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference.networking.x</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">k8s.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferencePool</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pool</span><br></span></code></pre></div></div>
<p>This isn't just about load balancing; it's about <strong>intelligent infrastructure</strong> that adapts to your AI workloads in real-time.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="google-vertex-ai-enterprise-ai-at-scale">Google Vertex AI: Enterprise AI at Scale<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#google-vertex-ai-enterprise-ai-at-scale" class="hash-link" aria-label="Direct link to Google Vertex AI: Enterprise AI at Scale" title="Direct link to Google Vertex AI: Enterprise AI at Scale" translate="no">​</a></h3>
<p>Google Vertex AI support moves to production-ready status with:</p>
<ul>
<li class=""><strong>GCP Vertex AI Authentication</strong> with Service Account Key or Workload Identity Federation.</li>
<li class=""><strong>Complete Gemini Support</strong> with OpenAI API compatibility for function calls, multimodal, reasoning and streaming.</li>
<li class=""><strong>Complete Anthropic on Vertex AI Support</strong> with OpenAI API compatibility for function calls, multimodal, extended thinking and streaming.</li>
<li class=""><strong>Native Anthropic API</strong> via GCP Vertex AI to unlock use case like ClaudeCode.</li>
<li class=""><strong>Enterprise-grade reliability</strong> for mission-critical deployments.</li>
</ul>
<p>This brings the power of Google's AI platform into your unified AI infrastructure, managed through a single, consistent API.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="comprehensive-ai-observability">Comprehensive AI Observability<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#comprehensive-ai-observability" class="hash-link" aria-label="Direct link to Comprehensive AI Observability" title="Direct link to Comprehensive AI Observability" translate="no">​</a></h3>
<p>Traditional observability tools fall short when monitoring AI workloads. v0.3 delivers four significant observability enhancements:</p>
<table><thead><tr><th>Enhancement</th><th>Feature</th><th>Benefit</th></tr></thead><tbody><tr><td><strong>OpenInference Tracing Integration</strong></td><td>Complete LLM request tracing with timing and token information</td><td>Deep visibility into AI request lifecycle</td></tr><tr><td><strong>OpenInference Tracing Integration</strong></td><td>Evaluation system compatibility with tools like Arize Phoenix</td><td>Seamless integration with AI evaluation workflows</td></tr><tr><td><strong>OpenInference Tracing Integration</strong></td><td>Full chat completion request/response data capture</td><td>Complete audit trail for debugging and analysis</td></tr><tr><td><strong>Configurable Metrics Labels</strong></td><td>Custom labeling based on HTTP request headers</td><td>Flexible monitoring and alerting setup</td></tr><tr><td><strong>Configurable Metrics Labels</strong></td><td>Granular monitoring by user ID, API version, or application context</td><td>Enhanced filtering and segmentation</td></tr><tr><td><strong>Configurable Metrics Labels</strong></td><td>Enhanced filtering and alerting capabilities</td><td>More targeted monitoring and alerts</td></tr><tr><td><strong>Embeddings Metrics Support</strong></td><td>Comprehensive token usage tracking for both chat and embeddings APIs</td><td>Better cost control and usage insights</td></tr><tr><td><strong>Embeddings Metrics Support</strong></td><td>Accurate cost attribution across different operation types</td><td>Precise cost allocation and budgeting</td></tr><tr><td><strong>Embeddings Metrics Support</strong></td><td>OpenTelemetry semantic conventions compliance</td><td>Standardized observability integration</td></tr><tr><td><strong>Enhanced GenAI Metrics</strong></td><td>Improved error handling and attribute mapping</td><td>More reliable performance monitoring</td></tr><tr><td><strong>Enhanced GenAI Metrics</strong></td><td>More accurate token latency measurements</td><td>Better performance analysis data</td></tr><tr><td><strong>Enhanced GenAI Metrics</strong></td><td>Better performance analysis data</td><td>Improved optimization insights</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="model-name-virtualization-abstraction-for-flexibility">Model Name Virtualization: Abstraction for Flexibility<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#model-name-virtualization-abstraction-for-flexibility" class="hash-link" aria-label="Direct link to Model Name Virtualization: Abstraction for Flexibility" title="Direct link to Model Name Virtualization: Abstraction for Flexibility" translate="no">​</a></h3>
<p>The new <code>modelNameOverride</code> field enables powerful model abstraction:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">backendRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> openai</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">backend</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">modelNameOverride</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"gpt-4"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> anthropic</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">backend</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">modelNameOverride</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"claude-3"</span><br></span></code></pre></div></div>
<p>By abstracting away the model name, application developers can use standardized model names, while the gateway handles provider-specific routing. This is, for example, useful when doing A/B testing, gradual migrations, safeguarding against provider lock-in, and multi-provider strategies.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="unified-llm-and-non-llm-apis">Unified LLM and non-LLM APIs<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#unified-llm-and-non-llm-apis" class="hash-link" aria-label="Direct link to Unified LLM and non-LLM APIs" title="Direct link to Unified LLM and non-LLM APIs" translate="no">​</a></h2>
<p>Enhanced Gateway resource management by allowing both standard HTTPRoute and AIGatewayRoute to be attached to the same Gateway object.</p>
<p>This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="community-impact-and-momentum"><strong>Community Impact and Momentum</strong><a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#community-impact-and-momentum" class="hash-link" aria-label="Direct link to community-impact-and-momentum" title="Direct link to community-impact-and-momentum" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="growing-community">Growing Community<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#growing-community" class="hash-link" aria-label="Direct link to Growing Community" title="Direct link to Growing Community" translate="no">​</a></h3>
<p>The v0.3 release represents the collaborative effort of our <strong>rapidly expanding community</strong>:</p>
<ul>
<li class=""><strong>Contributors from Tetrate, Bloomberg, Tencent, Google, and Nutanix</strong></li>
<li class=""><strong>Independent developers</strong> driving innovation</li>
<li class=""><strong>Enterprise adopters</strong> providing real-world feedback</li>
</ul>
<p>This diversity of perspectives has shaped v0.3 into a release that serves both <strong>bleeding-edge innovators</strong> and <strong>enterprise production needs</strong>.</p>
<p><a href="https://www.star-history.com/#envoyproxy/ai-gateway&amp;Date" target="_blank" rel="noopener noreferrer" class=""><img decoding="async" loading="lazy" src="https://api.star-history.com/svg?repos=envoyproxy/ai-gateway&amp;type=Date" alt="Star History Chart" class="img_ev3q"></a></p>
<p><a href="https://github.com/envoyproxy/ai-gateway" target="_blank" rel="noopener noreferrer" class="">Visit on GitHub</a> and Star the Repo to show your support.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="standards-leadership">Standards Leadership<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#standards-leadership" class="hash-link" aria-label="Direct link to Standards Leadership" title="Direct link to Standards Leadership" translate="no">​</a></h3>
<p>Our integration with the <strong>Gateway API Inference Extension</strong> demonstrates our commitment to <strong>open standards</strong> and <strong>vendor-neutral solutions</strong>. By building on proven Gateway API patterns, we're ensuring that Envoy AI Gateway remains interoperable and future-proof.</p>
<p>Enabling tracing through <strong>OpenInference Tracing Integration</strong> further cements and showcases our community's commitment to industry standards, collaboration, and ecosystem integration.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-release-enables">What This Release Enables<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#what-this-release-enables" class="hash-link" aria-label="Direct link to What This Release Enables" title="Direct link to What This Release Enables" translate="no">​</a></h2>
<table><thead><tr><th>Benefit</th><th>Impact</th></tr></thead><tbody><tr><td>Simplified model deployment with intelligent routing</td><td>Faster development cycles</td></tr><tr><td>Performance optimization through real-time metrics</td><td>Better model performance</td></tr><tr><td>Cost control with token-based rate limiting</td><td>More predictable operating costs</td></tr><tr><td>Multi-model support in a single infrastructure</td><td>Reduced complexity and maintenance</td></tr><tr><td>Unified AI infrastructure supporting diverse workloads</td><td>Scalable, future-proof architecture</td></tr><tr><td>Standards-based architecture for long-term sustainability</td><td>Vendor-neutral, interoperable solutions</td></tr><tr><td>Vendor flexibility without architectural changes</td><td>Reduced lock-in risk</td></tr><tr><td>Enterprise observability for production confidence</td><td>Production-ready monitoring</td></tr><tr><td>Reduced operational complexity through automation</td><td>Lower operational overhead</td></tr><tr><td>Improved reliability with intelligent failover</td><td>Higher system reliability</td></tr><tr><td>Better resource utilization across infrastructure</td><td>Optimized infrastructure costs</td></tr><tr><td>Streamlined monitoring with AI-specific telemetry</td><td>Simplified troubleshooting</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="get-involved-join-the-ai-infrastructure-revolution">Get Involved: Join the AI Infrastructure Revolution<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#get-involved-join-the-ai-infrastructure-revolution" class="hash-link" aria-label="Direct link to Get Involved: Join the AI Infrastructure Revolution" title="Direct link to Get Involved: Join the AI Infrastructure Revolution" translate="no">​</a></h2>
<p>The future of AI infrastructure is <strong>open, collaborative, and community-driven</strong>. Here's how you can be part of it:</p>
<table><thead><tr><th>Action</th><th>Resource</th><th>Description</th></tr></thead><tbody><tr><td><strong>🚀 Try v0.3 Today</strong></td><td><a href="https://github.com/envoyproxy/ai-gateway/releases/tag/v0.3.0" target="_blank" rel="noopener noreferrer" class="">Download the release</a></td><td>Get the latest release and start exploring</td></tr><tr><td></td><td><a href="https://aigateway.envoyproxy.io/docs/getting-started/" target="_blank" rel="noopener noreferrer" class="">Follow our getting started guide</a></td><td>Step-by-step setup instructions</td></tr><tr><td></td><td><a href="https://github.com/envoyproxy/ai-gateway/tree/main/examples" target="_blank" rel="noopener noreferrer" class="">Explore the examples</a></td><td>Real-world configuration examples</td></tr><tr><td><strong>💬 Join the Community</strong></td><td><a href="https://docs.google.com/document/d/10e1sfsF-3G3Du5nBHGmLjXw5GVMqqCvFDqp_O65B0_w/edit?tab=t.0#heading=h.6nxfjwmrm5g6" target="_blank" rel="noopener noreferrer" class="">Weekly Community Meetings</a></td><td>Add to your calendar</td></tr><tr><td></td><td><a href="https://envoyproxy.slack.com/channels/envoy-ai-gateway" target="_blank" rel="noopener noreferrer" class="">Slack Channel #envoy-ai-gateway</a></td><td>Join the conversation on Envoy Slack</td></tr><tr><td></td><td><a href="https://github.com/envoyproxy/ai-gateway/discussions" target="_blank" rel="noopener noreferrer" class="">GitHub Discussions</a></td><td>Share experiences and ask questions</td></tr><tr><td><strong>🛠️ Contribute to the Future</strong></td><td><a href="https://github.com/envoyproxy/ai-gateway/issues/new?template=bug_report.md" target="_blank" rel="noopener noreferrer" class="">Report Issues</a></td><td>Help us improve by reporting bugs</td></tr><tr><td></td><td><a href="https://github.com/envoyproxy/ai-gateway/issues/new?template=feature_request.md" target="_blank" rel="noopener noreferrer" class="">Request Features</a></td><td>Tell us what you need for future releases</td></tr><tr><td></td><td><a href="https://github.com/envoyproxy/ai-gateway/blob/main/CONTRIBUTING.md" target="_blank" rel="noopener noreferrer" class="">Submit Code</a></td><td>Contribute to the next release</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="acknowledgments-the-power-of-open-source">Acknowledgments: The Power of Open Source<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#acknowledgments-the-power-of-open-source" class="hash-link" aria-label="Direct link to Acknowledgments: The Power of Open Source" title="Direct link to Acknowledgments: The Power of Open Source" translate="no">​</a></h2>
<p>v0.3 wouldn't exist without our incredible community. Special recognition goes to:</p>
<ul>
<li class=""><strong>Enterprise contributors</strong> who provided production feedback and requirements</li>
<li class=""><strong>Open source maintainers</strong> from the Gateway API and CNCF communities</li>
<li class=""><strong>Individual developers</strong> who contributed code, documentation, and ideas</li>
<li class=""><strong>Early adopters</strong> who tested pre-releases and reported issues</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="get-started-today">Get Started Today<a href="https://aigateway.envoyproxy.io/blog/v0.3-release-announcement#get-started-today" class="hash-link" aria-label="Direct link to Get Started Today" title="Direct link to Get Started Today" translate="no">​</a></h2>
<p>Ready to experience the future of AI infrastructure?</p>
<p><strong>Get started with Envoy AI Gateway v0.3</strong> and see how intelligent inference routing, expanded provider support, and enterprise observability can transform your AI deployments.</p>
<p>The future of AI infrastructure is <strong>open, intelligent, and community-driven</strong>. Join us in building it.</p>
<p>🚀 <a href="https://aigateway.envoyproxy.io/docs/getting-started/" target="_blank" rel="noopener noreferrer" class=""><strong>Get Started with v0.3 →</strong></a></p>
<hr>
<p><em>Envoy AI Gateway v0.3 is available now. For detailed release notes, API changes, and upgrade guidance, visit our <a href="https://aigateway.envoyproxy.io/release-notes/v0.3" target="_blank" rel="noopener noreferrer" class="">release notes page</a>.</em></p>]]></content>
        <author>
            <name>Erica Hughberg</name>
            <uri>https://github.com/missBerg</uri>
        </author>
        <author>
            <name>Xunzhuo (Bit) Liu</name>
            <uri>https://github.com/Xunzhuo</uri>
        </author>
        <category label="News" term="News"/>
        <category label="Releases" term="Releases"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Envoy AI Gateway Introduces Endpoint Picker Support]]></title>
        <id>https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing</id>
        <link href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing"/>
        <updated>2025-07-30T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Envoy AI Gateway introduces Endpoint Picker Provider support for intelligent inference routing based on real-time AI metrics like KV-cache usage and queue depth—moving beyond traditional load balancing to optimize AI workload performance.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Reference Architecture for Envoy AI Gateway" src="https://aigateway.envoyproxy.io/assets/images/epp-blog-feature-69f00db96205813b56a1cdf4163fe2cb.png" width="1920" height="1080" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p><strong>Envoy AI Gateway now supports Endpoint Picker Provider (EPP) integration as per the</strong> <a href="https://gateway-api-inference-extension.sigs.k8s.io/" target="_blank" rel="noopener noreferrer" class="">Gateway API Inference Extension</a>.</p>
<p>This feature enables you to leverage intelligent, dynamic routing for AI inference workloads through intelligent endpoint selection based on real-time metrics, including KV-cache usage, queued requests, and LoRA adapter information.</p>
<p>When running AI inference at scale, this means your system can automatically select the optimal inference endpoint for each request, thereby optimizing resource utilization.</p>
<p><img decoding="async" loading="lazy" alt="An overview of Endpoint Picker together with Envoy AI Gateway" src="https://aigateway.envoyproxy.io/assets/images/epp-blog-overview-968581c2cd4664b84a19d2050ee33d60.png" width="1121" height="619" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-problem-traditional-load-balancing-falls-short-for-ai-workloads">The Problem: Traditional Load Balancing Falls Short for AI Workloads<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#the-problem-traditional-load-balancing-falls-short-for-ai-workloads" class="hash-link" aria-label="Direct link to The Problem: Traditional Load Balancing Falls Short for AI Workloads" title="Direct link to The Problem: Traditional Load Balancing Falls Short for AI Workloads" translate="no">​</a></h2>
<p>As LLM inference workloads have a new set of characteristics compared to traditional API servers, informing the load balancer of which upstream target is the most optimal to serve the request requires a new set of information for making those decisions.</p>
<p>Some of the unique characteristics of AI inference workloads:</p>
<ul>
<li class=""><strong>Variable processing times</strong> based on model complexity and input size</li>
<li class=""><strong>Different resource requirements</strong> for different models and configurations</li>
<li class=""><strong>Real-time performance metrics</strong> that change constantly (KV-cache usage, queue depth, etc.)</li>
<li class=""><strong>Specialized hardware requirements</strong> (GPUs, TPUs, different model variants)</li>
</ul>
<p>Without taking these into account when routing, you’ll end up with:</p>
<ul>
<li class="">Overloaded endpoints while others sit idle</li>
<li class="">Increased latency due to poor endpoint selection</li>
<li class="">Resource waste from suboptimal load distribution</li>
<li class="">Manual intervention is required for optimal performance</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-solution-endpoint-picker-provider-integration">The Solution: Endpoint Picker Provider Integration<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#the-solution-endpoint-picker-provider-integration" class="hash-link" aria-label="Direct link to The Solution: Endpoint Picker Provider Integration" title="Direct link to The Solution: Endpoint Picker Provider Integration" translate="no">​</a></h2>
<p>Envoy AI Gateway's new EPP support addresses these challenges by providing users with the option to integrate an Endpoint Picker. The endpoint picker will take these pieces of information into account and inform the gateway of the best upstream target to route the request to.</p>
<p><img decoding="async" loading="lazy" alt="An overview of Endpoint Picker together with Envoy AI Gateway" src="https://aigateway.envoyproxy.io/assets/images/epp-blog-step-by-step-d2146e4b4593923781b4ce5d1c850e0c.png" width="3483" height="2916" class="img_ev3q"></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-intelligent-endpoint-selection">1. Intelligent Endpoint Selection<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#1-intelligent-endpoint-selection" class="hash-link" aria-label="Direct link to 1. Intelligent Endpoint Selection" title="Direct link to 1. Intelligent Endpoint Selection" translate="no">​</a></h3>
<p>The Endpoint Picker (EPP) automatically routes requests to the most suitable backend by analyzing real-time metrics from your inference endpoints, such as:</p>
<ul>
<li class="">Current KV-cache usage</li>
<li class="">Number of queued inference requests</li>
<li class="">LoRA adapter information</li>
<li class="">Endpoint health and performance metrics</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-dynamic-load-balancing">2. Dynamic Load Balancing<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#2-dynamic-load-balancing" class="hash-link" aria-label="Direct link to 2. Dynamic Load Balancing" title="Direct link to 2. Dynamic Load Balancing" translate="no">​</a></h3>
<p>Unlike static load balancers, the EPP continuously receives endpoint status, which affects routing decisions in real-time. This dynamic decision ensures optimal resource utilization across your entire inference infrastructure.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-automatic-failover">3. Automatic Failover<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#3-automatic-failover" class="hash-link" aria-label="Direct link to 3. Automatic Failover" title="Direct link to 3. Automatic Failover" translate="no">​</a></h3>
<p>When an endpoint becomes unavailable or performance degrades, the EPP automatically routes traffic to healthy alternatives, ensuring high availability for your AI services.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-extensible-architecture">4. Extensible Architecture<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#4-extensible-architecture" class="hash-link" aria-label="Direct link to 4. Extensible Architecture" title="Direct link to 4. Extensible Architecture" translate="no">​</a></h3>
<p>The EPP architecture supports custom endpoint picker providers, allowing you to implement domain-specific routing logic tailored to your unique requirements.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-it-works-two-integration-approaches">How It Works: Two Integration Approaches<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#how-it-works-two-integration-approaches" class="hash-link" aria-label="Direct link to How It Works: Two Integration Approaches" title="Direct link to How It Works: Two Integration Approaches" translate="no">​</a></h2>
<p>Envoy AI Gateway supports EPP integration through two powerful approaches:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="httproute--inferencepool">HTTPRoute + InferencePool<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#httproute--inferencepool" class="hash-link" aria-label="Direct link to HTTPRoute + InferencePool" title="Direct link to HTTPRoute + InferencePool" translate="no">​</a></h3>
<p>For simple inference workloads, you can use the standard Gateway API HTTPRoute with InferencePool. This approach offers basic intelligent routing, automatic load balancing, and straightforward configuration, making it ideal for simple use cases.</p>
<p><img decoding="async" loading="lazy" alt="Showing the relationship between HTTPRoute and InferencePool Kubernetes Resources" src="https://aigateway.envoyproxy.io/assets/images/epp-blog-http-route-2b2a33fc73ddba7583f3f9e023a8ee24.png" width="2628" height="1932" class="img_ev3q"></p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gateway.networking.k8s.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> HTTPRoute</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pool</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">with</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">httproute</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">parentRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gateway.networking.k8s.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Gateway</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pool</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">with</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">httproute</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">rules</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">backendRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference.networking.k8s.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferencePool</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">weight</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">matches</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">path</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> PathPrefix</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> /</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">timeouts</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">request</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 60s</span><br></span></code></pre></div></div>
<p>This approach provides:</p>
<ul>
<li class="">Basic intelligent routing</li>
<li class="">Automatic load balancing</li>
<li class="">Simple configuration</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="aigatewayroute--inferencepool">AIGatewayRoute + InferencePool<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#aigatewayroute--inferencepool" class="hash-link" aria-label="Direct link to AIGatewayRoute + InferencePool" title="Direct link to AIGatewayRoute + InferencePool" translate="no">​</a></h3>
<p>For advanced AI-specific features, use Envoy AI Gateway's custom AIGatewayRoute.</p>
<p><img decoding="async" loading="lazy" alt="Showing the relationship between AIGatewayRoute and InferencePool Kubernetes Resources" src="https://aigateway.envoyproxy.io/assets/images/epp-blog-aigwroute-ee122e0b5632a473bccfcfbec5c80aec.png" width="3645" height="2005" class="img_ev3q"></p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> aigateway.envoyproxy.io/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> AIGatewayRoute</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pool</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">with</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">aigwroute</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">namespace</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">targetRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pool</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">with</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">aigwroute</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Gateway</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gateway.networking.k8s.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">rules</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">matches</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">headers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Exact</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">ai</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">eg</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama/Llama</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8B</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">Instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">backendRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference.networking.k8s.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferencePool</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">matches</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">headers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Exact</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">ai</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">eg</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> mistral</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">latest</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">backendRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">group</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference.networking.k8s.io</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferencePool</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> mistral</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">matches</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">headers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Exact</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">ai</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">eg</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> some</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">cool</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">hosted</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">backendRefs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> envoy</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">ai</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gateway</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">basic</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">testupstream</span><br></span></code></pre></div></div>
<p>This enhanced approach adds:</p>
<ul>
<li class=""><strong>Multi-model routing</strong> based on the request body modelName</li>
<li class=""><strong>Token-based rate limiting</strong> for cost control in self-hosted Models</li>
<li class=""><strong>Advanced LLM observability</strong> across LLM Models</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="real-world-benefits">Real-World Benefits<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#real-world-benefits" class="hash-link" aria-label="Direct link to Real-World Benefits" title="Direct link to Real-World Benefits" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="for-aiml-engineers">For AI/ML Engineers<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#for-aiml-engineers" class="hash-link" aria-label="Direct link to For AI/ML Engineers" title="Direct link to For AI/ML Engineers" translate="no">​</a></h3>
<ul>
<li class=""><strong>Lower latency</strong>: Requests automatically routed to optimal endpoints</li>
<li class=""><strong>Better throughput</strong>: Intelligent distribution prevents bottlenecks</li>
<li class=""><strong>Cost optimization</strong>: Efficient resource usage reduces infrastructure costs</li>
<li class=""><strong>Enhanced observability</strong>: Real-time metrics and performance insights</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="for-platform-teams">For Platform Teams<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#for-platform-teams" class="hash-link" aria-label="Direct link to For Platform Teams" title="Direct link to For Platform Teams" translate="no">​</a></h3>
<ul>
<li class=""><strong>Standards compliance</strong>: Built on the Gateway API Inference Extension</li>
<li class=""><strong>Vendor flexibility</strong>: Support for multiple EPP implementations</li>
<li class=""><strong>Future-proof architecture</strong>: Extensible design for evolving requirements</li>
<li class=""><strong>Kubernetes-native</strong>: Seamless integration with existing infrastructure</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="for-devops-teams">For DevOps Teams<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#for-devops-teams" class="hash-link" aria-label="Direct link to For DevOps Teams" title="Direct link to For DevOps Teams" translate="no">​</a></h3>
<ul>
<li class=""><strong>Reduced operational overhead</strong>: No more manual endpoint management</li>
<li class=""><strong>Improved reliability</strong>: Automatic failover and health monitoring</li>
<li class=""><strong>Better resource utilization</strong>: Dynamic load balancing maximizes efficiency</li>
<li class=""><strong>Simplified scaling</strong>: Add new endpoints without configuration changes</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="getting-started">Getting Started<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Setting up EPP support in Envoy AI Gateway is straightforward:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-install-inferencepool-crds">1. Install InferencePool CRDs<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#1-install-inferencepool-crds" class="hash-link" aria-label="Direct link to 1. Install InferencePool CRDs" title="Direct link to 1. Install InferencePool CRDs" translate="no">​</a></h3>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Install Gateway API Inference Extension</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.5.1/manifests.yaml</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-deploy-your-inference-backends">2. Deploy Your Inference Backends<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#2-deploy-your-inference-backends" class="hash-link" aria-label="Direct link to 2. Deploy Your Inference Backends" title="Direct link to 2. Deploy Your Inference Backends" translate="no">​</a></h3>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Deploy sample vLLM backend</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/v0.5.1/config/manifests/vllm/sim-deployment.yaml</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-configure-inferenceobjective-and-inferencepool">3. Configure InferenceObjective and InferencePool<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#3-configure-inferenceobjective-and-inferencepool" class="hash-link" aria-label="Direct link to 3. Configure InferenceObjective and InferencePool" title="Direct link to 3. Configure InferenceObjective and InferencePool" translate="no">​</a></h3>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference.networking.k8s.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceObjective</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> base</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">modelName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama/Llama</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8B</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">Instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">criticality</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Critical</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">poolRef</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">---</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> inference.networking.k8s.io/v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferencePool</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">targetPortNumber</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">8000</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">selector</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">app</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">extensionRef</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">instruct</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">epp</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-create-your-route">4. Create Your Route<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#4-create-your-route" class="hash-link" aria-label="Direct link to 4. Create Your Route" title="Direct link to 4. Create Your Route" translate="no">​</a></h3>
<p>Choose between HTTPRoute or AIGatewayRoute based on your needs, and you're ready to go!</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="whats-next">What's Next<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next" translate="no">​</a></h2>
<p>The introduction of EPP support represents a significant milestone in Envoy AI Gateway's evolution. This feature enables intelligent routing for AI workloads, making it easier than ever to deploy and manage production AI inference services.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="upcoming-enhancements">Upcoming Enhancements<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#upcoming-enhancements" class="hash-link" aria-label="Direct link to Upcoming Enhancements" title="Direct link to Upcoming Enhancements" translate="no">​</a></h3>
<ul>
<li class=""><strong>Upstream Conformance Test</strong>: Integrate with <strong>Gateway API Inference Extension</strong> Conformance Tests.</li>
<li class=""><strong>Fallback Support</strong>: This feature provides support for endpoint picker fallback when the Endpoint Picker is unavailable, based on the Envoy HostOverride LbPolicy. It ensures continuous service availability even in the event of Endpoint Picker failure.</li>
<li class=""><strong>Internal Managed EPP</strong>: This upcoming feature will provide support for an Envoy AI Gateway-managed Endpoint Picker, simplifying the management of your AI inference endpoints.</li>
<li class=""><strong>Enhance End-to-End (E2E) Testing</strong>: Add more E2E tests for InferencePool support.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="community-contributions">Community Contributions<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#community-contributions" class="hash-link" aria-label="Direct link to Community Contributions" title="Direct link to Community Contributions" translate="no">​</a></h3>
<p>We're excited to see how the community leverages this capability. Whether you're building custom EPP implementations, contributing to the Gateway API Inference Extension, or sharing your deployment experiences, we'd love to hear from you.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Endpoint Picker Provider support enables Envoy AI Gateway to serve not only as an Egress AI Gateway but also as an Ingress AI Gateway for AI inference workloads. By automatically selecting optimal endpoints based on real-time metrics, this feature improves performance and maximizes resource utilization for hosted inference systems.</p>
<p>Whether you're running a small AI service or a large-scale inference platform, EPP support provides the intelligent routing capabilities you need to deliver reliable, high-performance AI services to your users.</p>
<p>Ready to get started with Envoy AI Gateway? Check out our <a href="https://aigateway.envoyproxy.io/docs/capabilities/inference/" target="_blank" rel="noopener noreferrer" class="">documentation</a> for guides and examples, join our <a href="https://github.com/envoyproxy/ai-gateway/discussions" target="_blank" rel="noopener noreferrer" class="">community discussions</a> to share your experiences, raise issues, request features, and learn from others.</p>
<hr>
<p><em>Envoy AI Gateway continues to evolve as the premier solution for routing and managing AI workloads. Stay tuned for more exciting features and capabilities as we work to simplify, enhance, and improve the reliability and efficiency of AI deployment.</em></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="resources">Resources<a href="https://aigateway.envoyproxy.io/blog/endpoint-picker-for-inference-routing#resources" class="hash-link" aria-label="Direct link to Resources" title="Direct link to Resources" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://aigateway.envoyproxy.io/docs/capabilities/inference/">Inference Optimization Documentation</a></li>
<li class=""><a href="https://gateway-api-inference-extension.sigs.k8s.io/" target="_blank" rel="noopener noreferrer" class="">Gateway API Inference Extension</a></li>
<li class=""><a class="" href="https://aigateway.envoyproxy.io/docs/capabilities/inference/aigatewayroute-inferencepool/">AIGatewayRoute + InferencePool Guide</a></li>
<li class=""><a class="" href="https://aigateway.envoyproxy.io/docs/capabilities/inference/httproute-inferencepool/">HTTPRoute + InferencePool Guide</a></li>
<li class=""><a href="https://github.com/envoyproxy/ai-gateway" target="_blank" rel="noopener noreferrer" class="">GitHub Repository</a></li>
</ul>]]></content>
        <author>
            <name>Erica Hughberg</name>
            <uri>https://github.com/missBerg</uri>
        </author>
        <author>
            <name>Xunzhuo (Bit) Liu</name>
            <uri>https://github.com/Xunzhuo</uri>
        </author>
        <category label="News" term="News"/>
        <category label="Features" term="Features"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[A Reference Architecture for Adopters of Envoy AI Gateway]]></title>
        <id>https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture</id>
        <link href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture"/>
        <updated>2025-07-15T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Explore how to build a scalable GenAI platform using a two-tier architecture with Envoy AI Gateway. Centralize access to external and self-hosted LLMs through a unified API while managing credentials, costs, and governance.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Reference Architecture for Envoy AI Gateway" src="https://aigateway.envoyproxy.io/assets/images/ref-arch-blog-hero-0fc3e3c70a1020aeb7fc6df45f87f776.png" width="1920" height="1080" class="img_ev3q"></p>
<h1>Building a Scalable, Flexible, Cloud-Native GenAI Platform with Open Source Solutions</h1>
<p>AI workloads are complex, and unmanaged complexity kills velocity. Your architecture is the key to mastering it.</p>
<p>As generative AI (GenAI) becomes foundational to modern software products, developers face a chaotic new reality, juggling different APIs from various providers while also attempting to deploy self-hosted open-source models. This leads to credential sprawl, inconsistent security policies, runaway costs, and an infrastructure that is difficult to scale and govern.</p>
<p><strong>Your architecture doesn’t have to be this complex.</strong></p>
<p>Platform engineering teams need a secure, scalable way to serve both internal and external LLMs to their users. That’s where Envoy AI Gateway comes in.</p>
<p>This reference architecture outlines how to build a flexible GenAI platform using the open source solutions Envoy AI Gateway, KServe, and complementary tools. Whether you're self-hosting models or integrating with model-serving services on cloud providers, such as OpenAI and Anthropic, this architecture enables a unified, governable interface for LLM traffic.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="core-architecture-two-tier-gateway-design">Core Architecture: Two-Tier Gateway Design<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#core-architecture-two-tier-gateway-design" class="hash-link" aria-label="Direct link to Core Architecture: Two-Tier Gateway Design" title="Direct link to Core Architecture: Two-Tier Gateway Design" translate="no">​</a></h2>
<p>The foundation of this platform is a Two-Tier Gateway Architecture:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tier-one-gateway">Tier One Gateway<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#tier-one-gateway" class="hash-link" aria-label="Direct link to Tier One Gateway" title="Direct link to Tier One Gateway" translate="no">​</a></h3>
<p>Deployed in a centralized Gateway Cluster. It serves as the main API traffic entry point for client applications.</p>
<p><strong>Its Job:</strong> To route traffic to external LLM providers (e.g., OpenAI, Anthropic, Bedrock, Vertex) or to the appropriate internal gateway to access an internal model-serving cluster.</p>
<p><strong>Why It Matters:</strong> This gateway provides a unified API for all application developers. They don't need to know or care if a model is hosted by a third party or in-house. It centralizes coarse-grained policies, such as authentication, top-level routing, and global rate limiting, simplifying the developer experience and providing a single control point for platform-wide governance.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tier-two-gateway">Tier Two Gateway<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#tier-two-gateway" class="hash-link" aria-label="Direct link to Tier Two Gateway" title="Direct link to Tier Two Gateway" translate="no">​</a></h3>
<p>Deployed as part of a self-hosted model serving cluster</p>
<p><strong>Its Job:</strong> To handle internal traffic routing, load balancing, and policy enforcement specific to self-hosted models running on platforms like KServe.</p>
<p><strong>Why It Matters:</strong> This empowers platform teams. They can manage model versions, conduct releases, and apply specific security rules for self-hosted models without needing to make changes to the primary, customer-facing gateway. This separation ensures that internal operational changes don't impact external clients.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="design-benefits">Design Benefits<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#design-benefits" class="hash-link" aria-label="Direct link to Design Benefits" title="Direct link to Design Benefits" translate="no">​</a></h3>
<p>This design cleanly separates external access from internal implementation, giving teams the autonomy they need to move fast without breaking things, and provides:</p>
<ul>
<li class="">Centralized credential management</li>
<li class="">Unified API access</li>
<li class="">Cost tracking and traffic governance</li>
</ul>
<p><img decoding="async" loading="lazy" src="https://aigateway.envoyproxy.io/assets/images/aigw-ref.drawio-bff3f4209ccc3b0bbd7b57e98dd3d092.png" width="1398" height="1182" class="img_ev3q"></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="routing-and-traffic-management">Routing and Traffic Management<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#routing-and-traffic-management" class="hash-link" aria-label="Direct link to Routing and Traffic Management" title="Direct link to Routing and Traffic Management" translate="no">​</a></h3>
<p>Envoy AI Gateway provides a consistent client interface and abstracts the complexity of accessing diverse GenAI backends. It supports:</p>
<table><thead><tr><th>Feature</th><th>Problem</th><th>Solution</th></tr></thead><tbody><tr><td><strong>Upstream Authentication with Credential Injection</strong></td><td>Application developers must manage, store, and rotate API keys for multiple external LLM providers. This is a security risk and an operational burden that slows down development.</td><td>The gateway injects the correct credentials per provider. Developers can make requests to the gateway using a single, internal authentication token, and the gateway handles attaching the appropriate third-party API key. <strong>This decouples applications from external secrets, dramatically improving security and simplifying code.</strong></td></tr><tr><td><strong>Token-Based Rate Limiting and Cost Optimization</strong></td><td>A single buggy script or a new feature can lead to unexpected spikes in usage, resulting in huge bills from LLM providers or overwhelming your self-hosted infrastructure.</td><td>The gateway acts as an intelligent financial and operational circuit breaker. You can enforce policies based on token usage, requests, or estimated cost. <strong>This prevents abuse and ensures that you stay within budget, providing critical protection for the business.</strong></td></tr><tr><td><strong>Observability Hooks for Usage Patterns and Latency</strong></td><td>When you use multiple model providers, it's nearly impossible to get a clear picture of usage. Which teams are spending the most? Which models have the highest latency? How many tokens is our new feature consuming?</td><td>By routing all traffic through a single point, the gateway provides a unified source of truth and built-in support for metrics, logs, and traces tailored for GenAI workloads. <strong>This enables you to accurately track costs, identify performance bottlenecks, and understand usage patterns across your entire GenAI stack.</strong></td></tr></tbody></table>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>Clients simply hit a single endpoint and let the Gateway handle routing to the appropriate backend—self-hosted or third-party.</p></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="self-hosted-model-serving-with-kserve">Self-Hosted Model Serving with KServe<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#self-hosted-model-serving-with-kserve" class="hash-link" aria-label="Direct link to Self-Hosted Model Serving with KServe" title="Direct link to Self-Hosted Model Serving with KServe" translate="no">​</a></h2>
<p>While many organizations start with external providers, self-hosting models offer advantages in terms of cost, privacy, and customization. If you are self-hosting models, KServe is a powerful addition.</p>
<p>For many data scientists and ML engineers, turning a trained model into a production-ready API is a major hurdle that requires deep expertise in Kubernetes, networking, and infrastructure. KServe bridges this gap and eliminates the complexity. KServe is a model serving platform that automates the heavy lifting, allowing an engineer to simply provide a configuration file for their model while KServe builds the scalable, resilient API endpoint.</p>
<p><strong>A few highlights of what KServe provides:</strong></p>
<ul>
<li class="">Autoscaling (including token-based autoscaling for LLMs and scale-to-zero for GPUs)</li>
<li class="">Multi-node inference (via vLLM)</li>
<li class="">Support for OpenAI-compatible APIs and advanced runtime integrations (vLLM)</li>
<li class="">Built-in support for model and prompt caching</li>
</ul>
<p>Check out the <a href="https://kserve.github.io/website/latest/" target="_blank" rel="noopener noreferrer" class="">KServe Documentation</a> to learn more about all the capabilities.</p>
<p>Use tools like Backstage to simplify model deployment and namespace management for teams.</p>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>Self-hosting is optional. Many adopters will initially rely on external providers and add internal hosting as requirements evolve.</p></div></div>
<p><img decoding="async" loading="lazy" src="https://aigateway.envoyproxy.io/assets/images/kserve-ref-fe5d5a1149c6f6b253a91a899975b7a7.png" width="3176" height="1786" class="img_ev3q"></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="observability-control-and-optimization-for-production-readiness">Observability, Control, and Optimization for Production Readiness<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#observability-control-and-optimization-for-production-readiness" class="hash-link" aria-label="Direct link to Observability, Control, and Optimization for Production Readiness" title="Direct link to Observability, Control, and Optimization for Production Readiness" translate="no">​</a></h2>
<p>In a production GenAI platform environment, observability, control, and optimization become must-haves. Envoy AI Gateway and KServe offer integrations that cater to these needs.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="observability">Observability<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#observability" class="hash-link" aria-label="Direct link to Observability" title="Direct link to Observability" translate="no">​</a></h3>
<p>Achieving visibility into your system enables the identification of bottlenecks, effective cost management, and ensures model reliability.</p>
<ul>
<li class=""><strong>Unified Metrics and Tracing:</strong>
<ul>
<li class="">Envoy AI Gateway integrates seamlessly with <strong>OpenTelemetry</strong>, adhering to the GenAI Semantic Conventions. This delivers a unified view of requests, latency, token usage, and errors across all external and internal models.</li>
<li class="">Leverage specialized telemetry, like <strong>OpenLLMetry</strong>, for granular insights into LLM-specific metrics, ensuring you capture essential details like prompt and completion lengths, token throughput, and model-specific performance.</li>
</ul>
</li>
<li class=""><strong>Centralized Logging:</strong>
<ul>
<li class="">Centralize logging across providers through Envoy AI Gateway, facilitating easier debugging, auditing, and compliance.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="control">Control<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#control" class="hash-link" aria-label="Direct link to Control" title="Direct link to Control" translate="no">​</a></h3>
<p>Enforcing policies and controls ensures your platform remains secure, stable, and cost-effective:</p>
<ul>
<li class=""><strong>Policy Enforcement and Guardrails:</strong>
<ul>
<li class="">Set usage-based guardrails directly in Envoy AI Gateway to prevent cost overruns, enforce compliance, and safeguard against prompt misuse or model hallucination.</li>
<li class="">Implement safety checks and output validation rules that enable your team to control quality and compliance centrally, rather than embedding these checks individually within applications.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="optimization">Optimization<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#optimization" class="hash-link" aria-label="Direct link to Optimization" title="Direct link to Optimization" translate="no">​</a></h3>
<p>Maximizing the efficiency and responsiveness of your models can significantly enhance user experience and operational efficiency:</p>
<ul>
<li class=""><strong>Caching Strategies:</strong>
<ul>
<li class="">KServe's <strong>model caching</strong> further optimizes inference, significantly lowering response times and improving model utilization.</li>
</ul>
</li>
<li class=""><strong>Disaggregated Serving:</strong>
<ul>
<li class="">Optimize hardware resources by leveraging KServe’s support for <strong>disaggregated inference</strong>, which enables the separate management of compute-intensive inference stages and memory-intensive operations, thereby maximizing hardware efficiency and reducing costs.</li>
</ul>
</li>
</ul>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>Be sure to review the documentation for each project to learn more about preparing your setup for production.</p></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="pluggable-and-flexible">Pluggable and Flexible<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#pluggable-and-flexible" class="hash-link" aria-label="Direct link to Pluggable and Flexible" title="Direct link to Pluggable and Flexible" translate="no">​</a></h2>
<p>This is not a rigid, all-or-nothing platform. It’s a set of foundational components you can adopt and extend to fit your unique environment. This architecture works whether you're fully committed to Kubernetes or using a hybrid cloud.</p>
<p><strong>You can:</strong></p>
<ul>
<li class="">Start with externally hosted LLMs or self-hosted inference</li>
<li class="">Use Envoy AI Gateway with any compatible provider via a unified API</li>
<li class="">Add your own authorization logic and/or custom functionality via Envoy’s extension filters</li>
<li class="">Deploy your Gateways in different clusters and cloud providers, giving you flexibility of choice in hosting environments</li>
</ul>
<p>This is not a rigid platform. It’s a foundational component you can build on and extend.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary">Summary<a href="https://aigateway.envoyproxy.io/blog/envoy-ai-gateway-reference-architecture#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<p>The Envoy AI Gateway reference architecture gives platform teams a guide to:</p>
<ul>
<li class="">Centralize access to GenAI models with a unified API</li>
<li class="">Enforce consistent traffic and security policies across model providers</li>
<li class="">Support both internal and external LLM usage without refactoring client applications</li>
<li class="">Scale safely and cost-effectively without reinventing core infrastructure</li>
</ul>
<p>Envoy AI Gateway sits at the heart of this design, an intelligent, extensible control point for all your GenAI traffic.</p>]]></content>
        <author>
            <name>Erica Hughberg</name>
            <uri>https://github.com/missBerg</uri>
        </author>
        <author>
            <name>Alexa Griffith</name>
            <uri>https://github.com/alexagriffith</uri>
        </author>
        <category label="Reference" term="Reference"/>
        <category label="Adopters" term="Adopters"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Announcing the first Envoy AI Gateway Release – A Community Milestone!]]></title>
        <id>https://aigateway.envoyproxy.io/blog/01-release-announcement</id>
        <link href="https://aigateway.envoyproxy.io/blog/01-release-announcement"/>
        <updated>2025-02-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[On February 25, 2025 Envoy AI Gateway v0.1 was Released. The first AI gateway built on CNCF Envoy Gateway, delivering unified LLM API access, upstream authorization, and token-based rate limiting to simplify enterprise AI adoption.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Announcing the first Envoy AI Gateway Release" src="https://aigateway.envoyproxy.io/assets/images/0.1-release-image-2f1b083f2c0de6576ea917e25150bd95.png" width="1280" height="640" class="img_ev3q"></p>
<p>Today, we're excited to announce the <strong>0.1 release</strong> of the <strong>Envoy AI Gateway,</strong> the first AI gateway built on <strong>CNCF's <a href="https://gateway.envoyproxy.io/" target="_blank" rel="noopener noreferrer" class="">Envoy Gateway</a></strong> and backed by a thriving, growing community.</p>
<p>The journey to the <strong>Envoy AI Gateway</strong> started with a simple but powerful vision: make it easier for enterprises to integrate and scale AI in their applications.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="where-we-are-now">Where We Are Now<a href="https://aigateway.envoyproxy.io/blog/01-release-announcement#where-we-are-now" class="hash-link" aria-label="Direct link to Where We Are Now" title="Direct link to Where We Are Now" translate="no">​</a></h2>
<p>The <strong>Envoy AI Gateway</strong> is now available on GitHub and ready for developers to deploy and explore. It enables enterprises to integrate AI services through a unified API while managing authorization, cost control, and scalability with built-in features:</p>
<ul>
<li class="">✅ <strong>Unified API</strong> for seamless integration with multiple LLM providers (starting with <strong>AWS Bedrock</strong> and <strong>OpenAI</strong>).</li>
<li class="">✅ <strong>Upstream Authorization</strong> to simplify authentication across AI providers.</li>
<li class="">✅ <strong>Usage Rate Limiting</strong> based on word tokens to control costs and ensure operational efficiency.</li>
</ul>
<p>With this release, we're making AI adoption in cloud-native environments more straightforward and accessible to organizations of all sizes.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-power-of-community">The Power of Community<a href="https://aigateway.envoyproxy.io/blog/01-release-announcement#the-power-of-community" class="hash-link" aria-label="Direct link to The Power of Community" title="Direct link to The Power of Community" translate="no">​</a></h2>
<p>This milestone wouldn't be possible without the incredible contributions and participation from across the industry.</p>
<p><strong>A shout out to our community members from</strong> Tetrate, Bloomberg, WSO2, RedHat, Google, and our independent contributors who have joined discussions, provided feedback, and helped shape the roadmap.</p>
<p>🐱 Even <strong>the cat Mellow</strong> has attended community meetings, proving that AI isn't just for humans!</p>
<p>The momentum behind <strong>Envoy AI Gateway</strong> speaks to the need for an open, <strong>collaborative</strong> approach to GenAI infrastructure. Our contributor's and early adopters' excitement and expertise drive <strong>innovation forward.</strong></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="where-were-headed">Where We're Headed<a href="https://aigateway.envoyproxy.io/blog/01-release-announcement#where-were-headed" class="hash-link" aria-label="Direct link to Where We're Headed" title="Direct link to Where We're Headed" translate="no">​</a></h2>
<p>We're just getting started! Here's a sneak peek at what's next:</p>
<ul>
<li class=""><strong>Google Gemini 2.0 integration</strong> out-of-the-box.</li>
<li class=""><strong>Provider and Model Fallback Logic</strong> to ensure continued service availability.</li>
<li class=""><strong>Prompt Templating</strong> for consistent AI interactions.</li>
<li class=""><strong>Semantic Caching</strong> to optimize response efficiency and reduce costs.</li>
</ul>
<p>The <strong>roadmap is community-driven</strong>, and we'd love for <strong>more contributors</strong> to help shape the future of AI infrastructure!</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="join-the-movement">Join the Movement<a href="https://aigateway.envoyproxy.io/blog/01-release-announcement#join-the-movement" class="hash-link" aria-label="Direct link to Join the Movement" title="Direct link to Join the Movement" translate="no">​</a></h2>
<p>Want to be part of the journey? Here's how you can get involved:</p>
<ul>
<li class=""><strong>Download &amp; Try Envoy AI Gateway</strong> → <a href="https://github.com/envoyproxy/ai-gateway/releases/tag/v0.1.5" target="_blank" rel="noopener noreferrer" class="">GitHub Repo</a></li>
<li class=""><strong>Join the Conversation</strong> → <a href="https://docs.google.com/document/d/10e1sfsF-3G3Du5nBHGmLjXw5GVMqqCvFDqp_O65B0_w/edit?tab=t.0#heading=h.6nxfjwmrm5g6" target="_blank" rel="noopener noreferrer" class="">Attend our community meetings</a> (Mellow might show up too!)</li>
<li class=""><strong>Join us on Slack</strong> → <a href="https://communityinviter.com/apps/envoyproxy/envoy?email=test" target="_blank" rel="noopener noreferrer" class="">Register for Envoy Slack</a> and join #envoy-ai-gateway</li>
<li class=""><strong>Contribute</strong> → Raise issues, suggest improvements, and submit PRs on GitHub</li>
<li class=""><strong>Meet Us In Person</strong> → <a href="https://www.eventbrite.com/e/hands-on-workshop-deploy-configure-and-useenvoy-ai-gateway-tickets-1255461000649?aff=oddtdtcreator" target="_blank" rel="noopener noreferrer" class="">Register for the Envoy AI Gateway Workshop in London on March 31st</a></li>
</ul>
<p>The future of GenAI infrastructure <strong>is open and collaborative,</strong> and we're excited to work with you to build it!</p>
<p>🚀 <strong>Onward to 1.0!</strong></p>]]></content>
        <author>
            <name>Erica Hughberg</name>
            <uri>https://github.com/missBerg</uri>
        </author>
        <author>
            <name>Dan Sun</name>
            <uri>https://github.com/yuzisun</uri>
        </author>
        <author>
            <name>Takeshi Yoneda</name>
            <uri>https://github.com/mathetake</uri>
        </author>
        <author>
            <name>Aaron Choo</name>
            <uri>https://github.com/aabchoo</uri>
        </author>
        <author>
            <name>Yao Weng</name>
            <uri>https://github.com/wengyao04</uri>
        </author>
        <category label="News" term="News"/>
        <category label="Releases" term="Releases"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[End User Keynote at KubeCon 2024]]></title>
        <id>https://aigateway.envoyproxy.io/blog/kubecon-end-user-keynote-2024</id>
        <link href="https://aigateway.envoyproxy.io/blog/kubecon-end-user-keynote-2024"/>
        <updated>2024-11-14T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Watch the KubeCon 2024 End User Keynote featuring Envoy AI Gateway—learn how Bloomberg and Tetrate are centralizing enterprise AI workflows with unified model access, usage limiting, and upstream authorization at scale.]]></summary>
        <content type="html"><![CDATA[<p>At KubeCon North America 2024, Alexa Griffith had the opportunity to present the End User Keynote on <strong>Centralizing &amp; Simplifying Enterprise AI Workflows with Envoy AI Gateway</strong>.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/do1viOk8nok?si=a1nn7sbummXaYPXl" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin"></iframe>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="about-the-presentation">About the presentation<a href="https://aigateway.envoyproxy.io/blog/kubecon-end-user-keynote-2024#about-the-presentation" class="hash-link" aria-label="Direct link to About the presentation" title="Direct link to About the presentation" translate="no">​</a></h2>
<p>As Generative AI reshapes the industry, the demands on AI platforms have rapidly evolved. Organizations now require centralized infrastructure to manage and optimize access to self-trained, open source, and commercial AI models at scale.</p>
<p>In this talk, we introduce the Envoy AI Gateway, a collaborative open source effort led by engineers from <a href="https://www.bloomberg.com/company/what-we-do/engineering-cto/" target="_blank" rel="noopener noreferrer" class="">Bloomberg</a> and <a href="https://tetrate.io/" target="_blank" rel="noopener noreferrer" class="">Tetrate</a>.</p>
<p>Learn how the Envoy AI Gateway, which is built atop Envoy Gateway and Envoy Proxy, provides a unified, scalable solution for model access, usage limiting, and upstream authorization.</p>]]></content>
        <author>
            <name>Erica Hughberg</name>
            <uri>https://github.com/missBerg</uri>
        </author>
        <author>
            <name>Alexa Griffith</name>
            <uri>https://github.com/alexagriffith</uri>
        </author>
        <category label="News" term="News"/>
        <category label="Presentations" term="Presentations"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Introducing Envoy AI Gateway]]></title>
        <id>https://aigateway.envoyproxy.io/blog/introducing-envoy-ai-gateway</id>
        <link href="https://aigateway.envoyproxy.io/blog/introducing-envoy-ai-gateway"/>
        <updated>2024-10-18T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Open collaboration to bring AI Gateway features to the Envoy community]]></summary>
        <content type="html"><![CDATA[<p><strong>The industry is embracing Generative AI functionality, and we need to evolve how we handle traffic on an industry-wide scale. Keeping AI traffic handling features exclusive to enterprise licenses is counterproductive to the industry’s needs. This approach limits incentives to a single commercial entity and its customers. Even single-company open-source initiatives do not promote open multi-company collaboration.</strong></p>
<p>A shared challenge like this presents an opportunity for open collaboration to build the necessary features. We believe bringing together different use cases and requirements through open collaboration will lead to better solutions and accelerate innovation. The industry will benefit from diverse expertise and experiences by openly collaborating on software across companies and industries.</p>
<p>That is why Tetrate and Bloomberg have started an open collaboration to bring critical features for this new era of Gen AI integration. Collaborating openly in the Envoy community, bringing AI traffic handling features to Envoy, via Envoy Gateway and Envoy Proxy.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-we-need-ai-traffic-handling-features">Why we need AI traffic handling features<a href="https://aigateway.envoyproxy.io/blog/introducing-envoy-ai-gateway#why-we-need-ai-traffic-handling-features" class="hash-link" aria-label="Direct link to Why we need AI traffic handling features" title="Direct link to Why we need AI traffic handling features" translate="no">​</a></h2>
<p>What makes traffic to LLM models different from traditional API traffic?</p>
<p>On the surface it appears similar. Traffic comes from a client app that is making an API request, and this request has to get to the provider that hosts the LLM model.</p>
<p>However, it is different. Managing LLM traffic from multiple apps, to multiple LLM providers, introduces new and different challenges where traditional API Gateway features fall short.</p>
<p>For example, traditional rate-limiting based on number of requests doesn’t work for controlling usage of LLM providers as they’re computationally complex services. To measure usage LLM providers tokenize the words in the request message and response message, and count the number of tokens used. This count gives a good approximation of the computational complexity and cost of serving the request.</p>
<p>Beyond controlling usage of LLMs there are many more challenges relating to ease of integration and high-availability architectures. It’s no longer enough to just optimize for quality of service alone, adopters must consider costs of usage in real time. As adopters of Gen AI look for Gateway solutions to handle these challenges for their system, they often find the necessary features locked behind enterprise licenses.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-key-mvp-features">Three key MVP features<a href="https://aigateway.envoyproxy.io/blog/introducing-envoy-ai-gateway#three-key-mvp-features" class="hash-link" aria-label="Direct link to Three key MVP features" title="Direct link to Three key MVP features" translate="no">​</a></h2>
<p>Now, let’s look at how handling AI traffic poses new challenges for Gateways. There are several features we discussed together with our collaborators at Bloomberg, and together we decided on three key features for the MVP:</p>
<ul>
<li class=""><strong>Usage Limiting</strong> – to control LLM usage based on word tokens</li>
<li class=""><strong>Unified API</strong> – to simplify client integration with multiple LLM providers</li>
<li class=""><strong>Upstream Authorization</strong> – to configure Authorization to multiple upstream LLM providers
What other features are you looking for? Get in touch with us to share your use case and define the future of Envoy AI Gateway.</li>
</ul>
<p>We are really excited about these features being part of Envoy. They will benefit those integrating with LLM providers and, ultimately, also Gateway users for general API request traffic.</p>
<p>When it comes to AI Gateway features, we have chosen to collaborate and build within the CNCF Envoy project because we believe multi-company, open-source projects benefit the entire industry by enabling innovation without creating single vendor risk.</p>]]></content>
        <author>
            <name>Erica Hughberg</name>
            <uri>https://github.com/missBerg</uri>
        </author>
        <category label="News" term="News"/>
    </entry>
</feed>