Posts by Elastic (old posts, page 4)
Elastic Stack 8.18.2 released
Elastic and AWS collaborate to bring GenAI to DevOps, security, and search
Today, we are happy to celebrate Elastic and AWS committing to a five-year strategic collaboration agreement (SCA). Our collaboration underscores the efforts of Elastic and AWS to provide you with increased speed and greater flexibility as you adopt generative AI technology.
How the MOD can achieve decision superiority against cyber threats
Military leaders are well-acquainted with the expansion of conventional warfare into digital battlefields. The recent attack and breach of a UK Ministry of Defence (MoD) supplier exposed data of 270,000 service personnel,1 representing not an isolated incident but a pattern in an escalating cyber conflict. When the threat is sophisticated nation-state actors who want to penetrate military networks and could remain undetected for months or longer, it signals a need for change in the nature of defence priorities.
With UK military networks enduring over 90,000 cyber attacks in a two-year span,2 the question becomes not whether attacks will occur, or how sophisticated they will be, but how we can, on the worst day, quickly identify and neutralise them. And do this while managing tightening defence budgets and cybersecurity talent shortages.
Defence operations require both comprehensive visibility and rapid response capabilities. A unified security approach addresses these challenges by consolidating multiple functions — threat detection, orchestration, endpoint security, and cloud protection — into a single platform. This integration not only provides essential situational awareness but can halve operational costs. Defense teams managing complex infrastructures benefit from interoperability that connects disparate systems without disruption. This enables secure bridging between legacy databases and NATO partner networks while maintaining workflow continuity.
A unified, AI-enhanced security platform empowers defence teams with a decision advantage, offering robust capabilities for protecting and integrating sensitive military data across complex environments:
Retrieval augmented generation (RAG) combines search with text generation. First it finds relevant information from proprietary data, and then it uses this to create accurate, informed responses via generative AI. It’s a secure approach that offers defence-specific security insights without the resource-intensive process of retraining custom large language models (LLMs) on continuously changing internal data.
AI-empowered attack detection can reduce investigation processes from days to minutes. Rather than drowning teams in alerts, AI-driven analysis distills numerous notifications into actionable intelligence with a single click — providing immediate clarity during potential breach scenarios.
AI Assistant can work as a force multiplier for SOC teams, helping users to write previously complex queries. This lowers the barrier to entry for new analysts, who become productive faster, and allows skilled personnel to focus on strategic response rather than routine administration — addressing the persistent shortage of specialised security personnel in the defence sector.
- Retrieval augmented generation (RAG) combines search with text generation. First it finds relevant information from proprietary data, and then it uses this to create accurate, informed responses via generative AI. It’s a secure approach that offers defence-specific security insights without the resource-intensive process of retraining custom large language models (LLMs) on continuously changing internal data.
The Elastic’s Search AI Platform can integrate with legacy military systems (even those dating back to the 1980s)3 and operate across all security classification levels. Our platform eliminates the need for costly infrastructure replacements by acting as a bridge that enables communication and interoperability between systems by ingesting, normalising, and analysing all data. This approach preserves your existing infrastructure while enhancing it with new and emerging technologies. Additionally, our security and search capabilities provide a unified view across networks and their components, allowing teams to focus on meaningful threats rather than piecing together information from isolated sources.
These kinds of capabilities help defence organisations achieve enhanced security and improved efficiency, as well as reduce costs. By streamlining data management across previously siloed systems, Defence can see remarkable financial benefits, whilst ensuring compliance with UK and NATO standards. It’s intelligence sharing without duplicating infrastructure.
The result is decision superiority with information that is actionable and relevant.
Join Mission Advantage: Strategic Conversations with Defence Leaders, a virtual series on AI, cyber resilience, data, and decision-making in defence. Gain insights from top industry leaders on turning challenges into opportunities.
Explore additional resources:
Behind the scenes of Elastic Security’s generative AI features
Secure data is superior data: A security-first approach to the DoD Data Strategy
Sources:
1. Security Daily Review, “UK’s MOD Data Breached: China Hacked Ministry of Defence, UK Armed Forces’ Personal Data Exposed,” 2024.
2. The Independent, “Military to fast-track recruitment of ‘cyber warriors’ as online threat grows,” 2025.
3. PublicTechnology.net, “MoD’s arsenal-management hampered by ageing IT and data siloes, report finds,” 2023.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.
Elastic, Elasticsearch, and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.
Cyber threats explained: How to safeguard your enterprise
Cyber threats (also known as cybersecurity threats) are events, actions, or circumstances that have the potential to negatively impact an individual or an organization by taking advantage of security vulnerabilities. Cyber threats can affect the confidentiality, integrity, or availability of data, systems, operations, or people’s digital presence.
Cybersecurity threats are constantly evolving, with the rapid adoption of artificial intelligence (AI) further exacerbating their scale and sophistication. Cybersecurity awareness is critical for preventing these threats from turning into full-blown cyber attacks. When security teams are knowledgeable about the different types of cyber threats, they can prevent, detect, and respond more holistically and effectively.
Overview of common cybersecurity threatsNation states, terrorist groups, criminal organizations, or individual hackers can all be perpetrators of cyber threats. Cybersecurity threats can be:
External (i.e., malicious attack) or internal (i.e., insider threat)
Intentional (i.e., hacking) or accidental (i.e., sensitive data sharing)
In the past, security teams were concerned with simple viruses that infiltrated a computer, causing minor damage. However, today’s world grows ever more interconnected, resulting in widespread implications for cybersecurity threats. Sophisticated attacks like malware, ransomware, and more can grind the operations of multinational enterprises and even entire countries to a halt. Security teams are now tasked with finding vulnerabilities and protecting much larger attack surfaces that encompass distributed systems, Internet of Things (IoT), mobile devices, and other vectors.
Malicious actors design some cyber attacks to steal data, sensitive information, or secrets for financial gain, while others are designed to cause reputational harm for political or personal gains.
While AI lowers the barrier for junior security analysts to investigate and respond to attacks, it does the same for threat actors. With AI, less skilled hackers and cybercriminals can carry out effective and sophisticated attacks at scale, making even more organizations and individuals around the world less safe.
Sources
1. Statista, “Estimated cost of cybercrime worldwide 2018-2029,” 2024.
2. Chainalysis, “35% Year-over-Year Decrease in Ransomware Payments, Less than Half of Recorded Incidents Resulted in Victim Payments,” 2025.
3. Federal Bureau of Investigation, “Internet Crime Report,” 2023.
4. Anti-Phishing Working Group (APWG), “Phishing Activity Trends Report,” 2024.
5. Cybersecurity & Infrastructure Security Agency, “Phishing,” 2023.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
How to implement business observability
It sounds simple: You define metrics for success, you track them, and if they fail, you fix them. For decades, this was how businesses monitored their systems. However, a reactive monitoring approach, which alerts businesses about failures only after the issue has already impacted operations, became insufficient as digital architectures grew more complex.
Traditional monitoring can help detect issues, but it often lacks the depth needed to understand an environment, its dependencies, and the broader business impact of system performance. To address these challenges, monitoring has evolved into observability, offering deeper insights and proactive problem-solving.
Observability is a comprehensive method for businesses to explore and analyze their systems in real time. Modern observability provides a single pane of glass, uncovering the root causes of problems and predicting potential disruptions before they happen. As a business, getting actionable insights from your data requires the ability to see it holistically. Enter: business observability.
While business observability is quickly becoming indispensable to modern business practices, implementation and maintenance can be tricky. Key challenges include:
How to ensure data qualityMaintaining data observability is contingent on continuous improvement and adaptability in data management processes. Consider these best practices:
Regularly update monitoring systems. Business processes and technologies are constantly evolving. Ensure that your monitoring tools are regularly updated to keep up with the changes and continuously provide real-time, relevant data for your observability practices.
Ensure data quality. Poor data quality leads to incorrect insights. Consider implementing data validation techniques and automated anomaly detection.
Conduct regular audits. While automation is key to handling massive datasets, periodic audits help identify gaps and improve data reliability.
Adapt to changing business needs. When market trends and customer behaviors evolve, your business needs do, too. Your observability strategies should adapt to these changes.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
SRE essentials: What to expect in site reliability engineering
Over the past 20 years, most leading businesses have adopted cloud computing and distributed systems to develop their applications. An unintended consequence: Traditional IT operations (ITOps) often struggle to handle the complexities of increased workloads and cloud technologies.
As distributed systems scale, keeping operations and development separate ultimately leads to stagnation. Developers might want to push out new applications or updates, while the operations team, already overwhelmed with keeping tabs on the existing infrastructure, might push back on any risks to the infrastructure.
Site reliability engineering (SRE) is a discipline that offers a more nuanced approach by combining software engineering principles with operational practices that ensure service reliability and optimal performance at scale. The people in this role are site reliability engineers (SREs), simplifying and automating tasks that the operations team would perform manually. Less time spent on tedious, repetitive work opens the door for innovation and business growth.
Site reliability engineering has become an essential component of a modern organization. The benefits include saying goodbye to reactive problem-solving and hello to predictable performance, proactive system design, improved scalability, minimized service disruptions, and new opportunities for improvement.
Want to know more about the SRE role and the world of site reliability engineering? Let’s start with the basics.
Key practices in site reliability engineeringWhen running services, SRE teams focus on key everyday activities such as monitoring and observability, incident management, capacity planning, and change management.
Sources1. Google, “Google SRE Book,” 2017.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
Enhancing workflow efficiency with Elasticsearch and Red Hat OpenShift AI
We’re excited to share that Elastic and Red Hat have partnered to create validated patterns that integrate Elasticsearch’s generative AI (GenAI) and vector search capabilities with Red Hat OpenShift AI. This integration can run on accelerated hardware on-prem or in IBM Cloud to power retrieval augmented generation (RAG) solutions.
https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt81deef38e9d4a915/682d0e73d8c026b6a71c4c61/Red-Hat-Validated-Patterns.png,Red-Hat-Validated-Patterns.pngThis boosts the quality, accuracy, and efficiency of responses generated by GenAI applications. Red Hat OpenShift AI provides the enterprise-grade container orchestration and DevSecOps capabilities needed to operationalize the AI workloads at scale. In this pattern, Elastic is run on OpenShift using Elastic Cloud on Kubernetes (ECK), which includes an operator to simplify deploying and maintaining Elastic clusters.
Elastic at Microsoft Build 2025 — Developers, developers, developers!
At Elastic, we love developers! We’re excited to engage with the developer community at Microsoft Build 2025 as a top tier sponsor. We’re kicking off the event with a variety of activities, announcements, and technical content to share our latest innovations.
Read on to learn more about what we’ve been working on with our Microsoft counterparts and how you can harness the power of AI to transform your data into actionable insights.
How to benchmark Elasticsearch performance with ingest pipelines and your own logs
When setting up an Elasticsearch cluster, one of the most common use cases is to ingest and search through logs. This blog post focuses on getting a benchmark that will tell you how well your cluster will handle your workload. It allows you to create a reproducible environment for testing things out. Do you want to change the mapping of something, drop some fields, alter the ingest pipeline? Or are you just curious about pushing the limits on your dataset to identify how many documents you can handle per second? What is disk usage? Further down, think about running alerts on it and figuring out how that impacts your overall cluster.
Every workload looks different, and a lot of log messages are quite different. Somebody collecting firewall logs might have a nice ratio between allow and deny rules and nearly no VPN connection logs. Someone else might have a lot of VPN connection logs. In the grand scheme of things, one can simplify and generalize and say that every log source is different.
Using the custom log trackToday we are focusing on the custom log track, and that’s nice if you do not want to use any of the prebaked solutions we offer like the security or logs track.
We will need to perform the following tasks, and we’ll walk you through them:
- Reindex a subset of the data with the required fields
- Pull out data from an index/data stream
- Put that onto the disk
- Pull out metadata information as needed (ingest pipelines, etc.)
What is mandatory for a custom log track? When an ingest pipeline is involved that modifies data, we need to ensure that we have an original field that contains all the data before we do the extraction. The most commonly used is the event.original field.
Let’s get started. In this case, we will be using the Kibana Web Log sample data. This blog can only be followed by leveraging a Rally version of at least 2.12 (before that, we had a different folder and file structure).
"_source": { "agent": "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1", "bytes": 5166, "clientip": "33.16.170.252", "extension": "zip", "geo": { "srcdest": "US:PH", "src": "US", "dest": "PH", "coordinates": { "lat": 33.6324825, "lon": -83.84955806 } }, "host": "artifacts.elastic.co", "index": "kibana_sample_data_logs", "ip": "33.16.170.252", "machine": { "ram": 2147483648, "os": "win xp" }, "memory": null, "message": "33.16.170.252 - - [2018-08-03T09:27:38.140Z] \"GET /kibana/kibana-6.3.2-windows-x86_64.zip HTTP/1.1\" 200 5166 \"-\" \"Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1\"", "phpmemory": null, "referer": "http://nytimes.com/success/sunita-suni-williams", "request": "/kibana/kibana-6.3.2-windows-x86_64.zip", "response": 200, "tags": [ "success", "security" ], "@timestamp": "2024-12-27T09:27:38.140Z", "url": "https://artifacts.elastic.co/downloads/kibana/kibana-6.3.2-windows-x86_64.zip", "utc_time": "2024-12-27T09:27:38.140Z", "event": { "dataset": "sample_web_logs" }, "bytes_gauge": 5166, "bytes_counter": 17071806 }This means that we want to keep just the message field in the Rally track. We will create an ingest pipeline with a remove processor that does a keep operation. We just drop the _id that is set, because we want to duplicate the data. We only have ~14,000 documents in the original dataset, and we want to benchmark the impact of the ingest pipeline and various processors. We can do that only if we have enough data. Duplicating the data though means that we cannot conclude the disk usage, since the compression can be quite high due to the similarity of the messages.
PUT _ingest/pipeline/rally-drop-fields { "processors": [ { "remove": { "field": "_id" } }, { "remove": { "keep": [ "message", "@timestamp" ] } } ] }This will keep just the message and timestamp fields, since these are the ones that contain the most information.
The next step is to create an index template that defines the custom-track as a data stream. This will ensure that we have a template and the correct mapping is applied as well. We will leverage the built-in ecs@mappings template that makes sure to map all ECS fields. If you are using anything that is not ECS, I would recommend you specifically map out the fields and how you want them mapped. Rally will copy all of that and make it part of the track.
PUT _index_template/custom-track { "data_stream": { "allow_custom_routing": false, "hidden": false }, "index_patterns": [ "custom-track" ], "composed_of": [ "ecs@mappings" ] }The next step is to reindex the data and we will execute this command a couple of times.
POST _reindex { "source": { "index": "kibana_sample_data_logs*" }, "dest": { "index": "custom-track", "pipeline": "rally-drop-fields", "op_type": "create" } }Now we can do a simple:
GET custom-track/_countAnd we will get an answer that tells us how many documents there are in this index, as well as how many shards there are. We have roughly 13 million documents in there.
{ "count": 13004376, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 } }That should be enough for a proper test around the impact of the ingest pipeline. There are still a couple of things that we need to watch out for. We multiplied original documents a couple of times, which means that when we look into the shard size and disk usage we will get a better compression than with more diverse data. Therefore the shard size and disk usage might not be representative of your real data.
esrally create-track --data-streams "custom-track" --track "webserver" --target-hosts=https://es:port --client-options="verify_certs:false,basic_auth_user:'username',basic_auth_password:'password'"We create a track called a webserver and we load the data from the custom-track data stream. This creates a track with a single challenge and the following console output.
____ ____ / __ \____ _/ / /_ __ / /_/ / __ `/ / / / / / / _, _/ /_/ / / / /_/ / /_/ |_|\__,_/_/_/\__, / /____/ [INFO] Connected to Elasticsearch cluster version [8.17.0] flavor [default] Extracting documents for index [.ds-custom-track-2024.... 1000/1000 docs [100.0% done] Extracting documents for index [.ds-custom-tra... 13004376/13004376 docs [100.0% done] [INFO] Track webserver has been created. Run it with: esrally --track-path=/home/philippkahr/tracks/webserver ---------------------------------- [INFO] SUCCESS (took 1146 seconds) ----------------------------------Now we have a track! That’s amazing! Per default, Rally will create the folder in the home directory of the user currently executing called tracks with a subfolder called webserver because that is what we named it.
There is a default.json file within the challenges folder. There are two folders: operations and challenges. For this blog post, we ignore the operations folder. This file contains the challenge description, which by default looks like this (bit below). If you want to know more about the different actions here, you can check out the first blog that explains this in detail.
We need to adjust a couple of things in the track.json. In the indices object, we want to rename the name to custom-track-rally as well as in the target-index in the corpora object, also set it to custom-track-rally. We are using a normal index now and not any data stream; otherwise, we need to use the data stream configuration.
This is the track.json.
{% import "rally.helpers" as rally with context %} { "version": 2, "description": "Tracker-generated track for webserver", "indices": [ { "name": "custom-track-rally", "body": ".ds-custom-track-2024.12.23-000001.json" } ], "corpora": [ { "name": "custom-track-rally", "documents": [ { "target-index": "custom-track-rally", "source-file": ".ds-custom-track-2024.12.23-000001-documents.json.bz2", "document-count": 13004376, "compressed-bytes": 213486562, "uncompressed-bytes": 3204126156 } ] } ], "operations": [ {{ rally.collect(parts="operations/*.json") }} ], "challenges": [ {{ rally.collect(parts="challenges/*.json") }} ] }This is the default.json and there are no changes needed.
{ "name": "my-challenge", "description": "My new challenge", "default": true, "schedule": [ { "operation": "delete-index" }, { "operation": { "operation-type": "create-index", "settings": {{index_settings | default({}) | tojson}} } }, { "operation": { "operation-type": "cluster-health", "index": "custom-track-rally", "request-params": { "wait_for_status": "{{cluster_health | default('green')}}", "wait_for_no_relocating_shards": "true" }, "retry-until-success": true } }, { "operation": { "operation-type": "bulk", "bulk-size": {{bulk_size | default(5000)}}, "ingest-percentage": {{ingest_percentage | default(100)}} }, "clients": {{bulk_indexing_clients | default(8)}} } ] }As we can see, the first step is to delete the index, and it will always be the first step. Now we want to create two challenges. The first one is to just index the documents as fast as we can, and the second one is where we write the ingest pipeline.
How can you run this track now? For the out-of-the-box tracks provided by Elastic, it is enough to just say --track and it goes off because it knows where to find all of the data. This is not true for a custom track — for that one, we can simply specify the --track-path parameter. The full command looks like this. At this point, we should execute it just to make sure that this track works and it indexes the data. The --challenge parameter is only needed when you need to specify any other challenge. We will create the challenge with the ingest pipeline further down. For now we can remove that parameter, as we saw in the default.json there is a flag called default: true.
esrally race --user-tags='{"benchmark_id":"custom-1"}' --track-path=~/tracks/webserver --kill-running-processes --target-hosts=https://10.164.15.204:9200 --pipeline=benchmark-only --client-options="verify_certs:false,basic_auth_user:'username',basic_auth_password:'password'" --track-params='{"bulk_indexing_clients":20,"number_of_shards":1,"number_of_replicas":1}'Now that we have confirmation that the track worked, we can just copy and paste the entire default.json file and rename it to index-pipeline.json. The default challenge created is called my-challenge and has a flag called default: true. We need to adjust that now to false and set the name: ingest-pipeline. The name is important, as this is the value for the --challenge parameter.
{ "name": "ingest-pipeline", "description": "My ingest pipeline challenge", "default": false, "schedule": [ { "operation": "delete-index" },....Now the schedule array contains the same steps: deleting index, creating index, bulk request. We need one additional step, and that is to add the ingest pipeline.
{ "name": "index-pipeline", "schedule": [ { "operation": "delete-index" }, { "operation": { "operation-type": "create-index", "settings": {{index_settings | default({}) | tojson}} } }, { "operation": { "operation-type": "put-pipeline", "id": "custom-track-pipeline", "body": { "processors": [ { "dissect": { "field": "message", "pattern": "%{source.ip} %{} [%{@timestamp}] \"%{http.request.method} %{url.path} %{http.version}\" %{http.request.status_code} %{http.request.bytes} \"-\" \"%{user_agent}" } }, { "user_agent": { "field": "user_agent" } }, { "geoip": { "field": "source.ip", "target_field": "source.geo" } } ] } } }, { "operation": { "operation-type": "cluster-health", "index": "custom-track-rally", "request-params": { "wait_for_status": "{{cluster_health | default('green')}}", "wait_for_no_relocating_shards": "true" }, "retry-until-success": true } }, { "operation": { "operation-type": "bulk", "pipeline": "custom-track-pipeline", "bulk-size": {{bulk_size | default(5000)}}, "ingest-percentage": {{ingest_percentage | default(100)}} }, "clients": {{bulk_indexing_clients | default(8)}} } ] }Not a lot has changed — we added a new object that puts the ingest pipeline and we added the pipeline name to the bulk operation in the bottom. This ensures that the pipeline is always the same version as in Rally.
We can run the same command above; just add the challenge to index-pipeline` instead.
POST _ingest/pipeline/_simulate { "docs": [ {"_source": { "message":"66.154.51.14 - - [2018-09-14T10:41:52.659Z] \"GET /styles/app.css HTTP/1.1\" 200 6901 \"-\" \"Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24\"","@timestamp":"2025-02-07T10:41:52.659Z"}} ], "pipeline": { "processors": [ { "dissect": { "field": "message", "pattern": """%{source.ip} %{} [%{@timestamp}] "%{http.request.method} %{url.path} %{http.version}" %{http.request.status_code} %{http.request.bytes} "-" "%{user_agent}""" } }, { "user_agent": { "field": "user_agent" } }, { "geoip": { "field": "source.ip", "target_field": "source.geo" } } ] } }Let’s go quickly through it. We want to extract a couple of things and put them into the respective Elastic Common Schema (ECS) fields. Additionally, we want to parse out the user_agent string. We are not using any date processor inside the pipeline, since the date is presented as ISO8601 and therefore automatically parsed by the mapping. One more thing we are doing is the geoip lookup to enrich the data with geolocation information.
https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt2c891c96b656499e/681e45fda6dd347e1922b41d/dashboard.png,dashboard.pngRead this next: A step-by-step guide to creating custom ES Rally tracks.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.