How do you take a stateful, performance-critical system like Elasticsearch and make it serverless?

At Elastic, we reimagined everything — from storage to orchestration — to build a truly serverless platform that customers can trust.

Elastic Cloud Serverless is a fully managed, cloud-native platform designed to bring the power of Elastic Stack to developers without the operational burden. In this blog post, we will walk you through why we built it, how we approached the architecture, and what we learned along the way.

https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt9f8cd60b46d8cd65/685d01ca9f5d27e1aa1435b8/diagram.png,diagram.pngOptimizing object store efficiency

While the shift to object storage delivered operational and durability benefits, it introduced a new challenge: object store API costs. Writes to Elasticsearch — particularly translog updates and refreshes — translate directly into object store API calls, which can scale up quickly and unpredictably, especially under high-ingestion or high-refresh workloads.

To address this, we implemented a per-node translog buffering mechanism that coalesces writes before flushing to the object store, significantly reducing write amplification. We also decoupled refreshes from object store writes, instead sending refreshed segments directly to search nodes while deferring object store persistence. This architectural refinement reduced refresh-related object store API calls by two orders of magnitude, with no compromise to data durability. For more details, please refer to this blog post.

Managing infrastructurehttps://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt19bd38934784231e/685d01e91f43fc7d2ab17b11/managing-infrastructure.png,managing-infrastructure.png

The Unified layer is the operator-facing management layer, providing Kubernetes CRDs for service owners to manage their Kubernetes clusters. They are able to define parameters including the CSP, region, and type (explained in the next section). It enriches operators' requests and forwards them to the Management layer.

The Management layer acts as a proxy between the Unified layer and CSP APIs, transforming requests from the Unified layer to CSP resource requests and reporting the status back to the Unified layer.

In our current setup, we maintain two management Kubernetes clusters for each CSP within every environment. This dual-cluster approach primarily serves two key purposes. Firstly, it allows us to effectively address potential scalability concerns that may arise with Crossplane. Secondly, and more importantly, it enables us to use one of the clusters as a canary environment. This canary deployment strategy facilitates a phased rollout of our changes, starting with a smaller, controlled subset of each environment, minimizing risk.

The Workload layer contains all the kubernetes workload clusters running applications that users interact with (Elasticsearch, Kibana, MIS, etc.).

https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt8e4fc6951b81026b/685d020712948f2738752da4/the-push-model.png,the-push-model.png

The Control Plane is the user-facing management layer. We provide UIs and APIs for users to manage their Elastic Cloud Serverless projects. This is where users can create new projects, control who has access to their projects, and get an overview of their projects.

The Data Plane is the infrastructure layer that powers the Elastic Cloud Serverless projects and that users interact with when they want to use their projects.

A fundamental design decision we faced was how the global control plane should communicate with Kubernetes clusters in the data plane. We explored two models:

Push Model: The control plane proactively pushes configurations to regional Kubernetes clusters.
Pull Model: Regional Kubernetes clusters periodically fetch configurations from the control plane.

After evaluating both approaches, we adopted the Push Model due to its simplicity, unidirectional data flow, and ability to operate Kubernetes clusters independently from the control plane during failures. This model allowed us to maintain straightforward scheduling logic while reducing operational overhead and failure recovery complexities.

https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt91e62aeaab98cdc5/685d0241b1848748d8921f0c/intelligent-scaling-strategy.png,intelligent-scaling-strategy.png

This layered, intelligent scaling strategy ensures performance and efficiency across diverse workloads — and it’s a big step toward a truly serverless platform.

Elastic Cloud Serverless introduces nuanced autoscaling capabilities tailored for the search tier — leveraging inputs such as boosted data windows, search power settings, and search load metrics (including thread pool load and queue load). These signals work together to define baseline configurations and trigger dynamic scaling decisions based on customer search usage patterns. For a deeper dive into search tier autoscaling, read this blog post. To learn more about how indexing tier autoscaling works, check out this blog post.

https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/bltb877b4aa825b8b3c/685d025b52f9c8c8bae90453/Usage-pipeline.png,Usage-pipeline.pngBilling pipeline

Once usage records are deposited in object storage, the billing pipeline picks up the data and turns it into quantities of ECU (Elastic Consumption Units, our currency-agnostic billing unit) that we bill for. The basic process looks like this:

https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt84a2bec3a76f3437/685d026f4641882bbf60097f/billing-pipeline.png,billing-pipeline.png

A transform process consumes the metered usage records from object storage and turns them into records that can actually be billed. This process involves unit conversion (the metered application may measure storage in bytes, but we may bill in GB), filtering out usage sources that we don't bill for, mapping the record to a specific product (this involves parsing metadata in the usage records to tie the usage to a solution-specific product that has a unique price), and sending this data to an Elasticsearch cluster which is queried by our billing engine. The purpose of this transform stage is to provide a centralized place where logic lives to convert the generic metered usage records into product-specific quantities that are ready to be priced. This enables us to keep this specialized logic out of the metered applications and the billing engine, which we want to keep simple and product-agnostic.

The billing engine then rates these billable usage records, which now contain an identifier that maps to a product in our prices database. At a minimum, this process entails summing the usage over a given period and multiplying the quantity by the product's price to compute the ECUs. In some cases, it must additionally segment the usage into tiers based on cumulative usage throughout the month and map these to individually priced product tiers. In order to tolerate delays in the upstream process without missing records, usage is billed at the time it arrives in the billable usage datastore, but it’s priced according to when it occurred (to ensure we don't apply the wrong price for usage that arrived "late"). This provides a "self-healing" capability to our billing process.

Finally, once the ECUs are computed, we assess any add-on costs (such as for support) and then feed this into the billing calculations, which ultimately result in an invoice (sent by us or one of our cloud marketplace partners). This final part of the process is not new or unique to Serverless and is handled by the same systems that bill our Hosted product.