Blog | KServe

Announcing KServe v0.17 - Production-Ready LLM Serving with LLMInferenceService

March 13, 2026 · 14 min read

Co-Founder, KServe

Published on March 13, 2026

We are excited to announce the release of KServe v0.17, a landmark release that brings LLMInferenceService to production readiness with a GenAI-first architecture built on the llm-d framework. This release introduces KV-cache aware intelligent routing, disaggregated prefill-decode, distributed inference with tensor/data/expert parallelism, Envoy AI Gateway integration with token-based rate limiting, and a completely restructured modular Helm chart architecture.

Best of Both Worlds: Cloud-Native AI Inference at Scale using KServe and llm-d

March 5, 2026 · 8 min read

Yuan Tang

Project Lead, KServe; Senior Principal Software Engineer, Red Hat

Ran Pollak

Manager, AI Catalyst at Red Hat

Enterprises today seek to integrate generative AI (GenAI) capabilities into their applications. However, scaling large AI models introduces complexity: managing high-volume traffic from large language models (LLMs), optimizing inference performance, maintaining predictable latency, and controlling infrastructure costs.

Platform engineering leaders require more than just model deployment capabilities. They need a robust, Kubernetes-native infrastructure that supports:

Efficient GPU utilization
Intelligent request routing
Distributed inference patterns
Cost-aware autoscaling
Production-grade governance

This article demonstrates how two open-source solutions, KServe and llm-d, can be combined to address these challenges.

We explore the role of each solution, illustrate their integration architecture, and provide practical guidance for AI platform teams, with deeper focus on KServe's LLMInferenceService, available since KServe v0.16.

Announcing KServe v0.15 - Advancing Generative AI Model Serving

May 27, 2025 · 7 min read

Alexa Griffith

Software Engineer @ Bloomberg

Dan Sun

Co-Founder, KServe

Yuan Tang

Project Lead, KServe; Senior Principal Software Engineer, Red Hat

Johnu George

Reviewer, KServe

Lize Cai

Approver, KServe

Published on May 27, 2025

We are thrilled to announce the release of KServe v0.15, marking a significant leap forward in serving both predictive and generative AI models. This release introduces enhanced support for generative AI workloads, including advanced features for serving large language models (LLMs), improved model and KV caching mechanisms, and integration with Envoy AI Gateway.

!generative_inference

Announcing KServe v0.14

December 13, 2024 · 7 min read

Edgar Hernández

KServe Maintainer

Dan Sun

Co-Founder, KServe

Published on December 23, 2024

We are excited to announce KServe v0.14. In this release we are introducing a new Python client designed for KServe, and a new model cache feature; we are promoting OCI storage for models as a stable feature; and we added support for deploying models directly from Hugging Face.

From Serverless Predictive Inference to Generative Inference - Introducing KServe v0.13

May 15, 2024 · 5 min read

Alexa Griffith

Software Engineer @ Bloomberg

Dan Sun

Co-Founder, KServe

Yuan Tang

Project Lead, KServe; Senior Principal Software Engineer, Red Hat

Published on May 15, 2024

We are excited to unveil KServe v0.13, marking a significant leap forward in evolving cloud native model serving to meet the demands of Generative AI inference. This release is highlighted by three pivotal updates: enhanced Hugging Face runtime, robust vLLM backend support for Generative Models, and the integration of OpenAI protocol standards.

!kserve-components

Announcing KServe v0.11

October 8, 2023 · 7 min read

Dan Sun

Co-Founder, KServe

Published on October 8, 2023

We are excited to announce the release of KServe 0.11. In this release we introduced Large Language Model (LLM) runtimes, made enhancements to the KServe control plane, Python SDK Open Inference Protocol support and dependency management. For ModelMesh we have added features PVC, HPA, payload logging to ensure feature parity with KServe.

Announcing KServe v0.10.0

February 5, 2023 · 7 min read

Dan Sun

Co-Founder, KServe

Published on February 5, 2023

We are excited to announce KServe 0.10 release. In this release we have enabled more KServe networking options, improved KServe telemetry for supported serving runtimes and increased support coverage for Open(aka v2) inference protocol for both standard and ModelMesh InferenceService.

Announcing KServe v0.9.0

July 21, 2022 · 6 min read

Dan Sun

Co-Founder, KServe

Published on July 21, 2022

Today, we are pleased to announce the v0.9.0 release of KServe! KServe has now fully onboarded to LF AI & Data Foundation as an Incubation Project! 🎉

In this release we are excited to introduce the new InferenceGraph feature which has long been asked from the community. Also continuing the effort from the last release for unifying the InferenceService API for deploying models on KServe and ModelMesh, ModelMesh is now fully compatible with KServe InferenceService API!

Announcing KServe v0.8

February 18, 2022 · 6 min read

Co-Founder, KServe

KServe Contributor

KServe Contributor

Reviewer, KServe

Published on February 18, 2022

Today, we are pleased to announce the v0.8.0 release of KServe! While the last release was focused on the transition of KFServing to KServe, this release was focused on unifying the InferenceService API for deploying models on KServe and ModelMesh.

Note: For current users of KFServing/KServe, please take a few minutes to answer this short survey and provide your feedback!

Announcing KServe v0.7 - Smooth Transition from KFServing to KServe

October 11, 2021 · 4 min read

Co-Founder, KServe

KServe Contributor

KServe Contributor

KServe Contributor

Published on October 11, 2021

KFServing is now KServe and KServe 0.7 release is available, the release also ensures a smooth user migration experience from KFServing to KServe.