🦄 Making great presentations more accessible. This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Autonomous agents powered by streaming data and Retrieval Augmented Generation
In this video, Prashant Agrawal and Rahul Sonawane demonstrate how intelligent autonomous agents combine real-time streaming data with Retrieval Augmented Generation to enable context-aware decision making for IT ops and network operations. They address the challenge of analysts drowning in alerts and logs during incidents, explaining how 70% of anomalies go undetected unti…
🦄 Making great presentations more accessible. This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Autonomous agents powered by streaming data and Retrieval Augmented Generation
In this video, Prashant Agrawal and Rahul Sonawane demonstrate how intelligent autonomous agents combine real-time streaming data with Retrieval Augmented Generation to enable context-aware decision making for IT ops and network operations. They address the challenge of analysts drowning in alerts and logs during incidents, explaining how 70% of anomalies go undetected until damage occurs. The session covers key technologies including Amazon MSK, Apache Flink, Amazon OpenSearch Service, and Amazon Bedrock with its agent capabilities. A live demo shows how agents analyze simulated DDoS attacks by correlating streaming logs with knowledge bases containing runbooks and security standards, generating actionable recommendations with severity assessments and time windows. The architecture integrates EventBridge, MSK, Flink processing, Lambda functions, Bedrock Agents with Claude 3.7 Sonnet, and OpenSearch vector databases. They also demonstrate the multi-agent framework approach for developers preferring code-based solutions. Applications extend beyond IT ops to manufacturing predictive maintenance, automotive services, healthcare monitoring, and travel booking, with agents maintaining contextual memory for personalized recommendations.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
Introduction: The Challenge of Real-Time Anomaly Detection in IT Operations
All right. Good morning, everyone. Welcome to this session. My name is Prashant Agrawal. I’m a Principal OpenSearch Serverless Solution Architect, and I’m joined by Rahul Sonawane, who is a Principal Solutions Architect focusing on GenAI and ML. We are super excited to be here and talk about how you can use intelligent autonomous agents which could use real-time streaming data and Retrieval Augmented Generation to enable dynamic, context-aware decision making. This can help you deliver real-time, real-world value in various use cases such as IT ops, FinOps, NetOps, and healthcare systems.
To set the expectations for this session, as you know, this is a silent session, so we won’t be having any live Q&A, but we would be available after the session. If you have any questions, we would be happy to answer them. To get started, here is a quick agenda. We’ll start with a customer challenge which is faced by IT ops and network ops, and that is just an example to get started with. Then we will discuss some of the key technologies to use for building this kind of solution. Rahul is going to cover the overall architecture which we have used for making this kind of system, followed by a live demo. I hope the demo works well because we are planning to do a live one. If not, we might switch to a recorded one. Then we will wrap it up with some of the key takeaways and a call to action.
So let’s jump into the challenges which you all might be having. Let’s say you are an analyst on call and suddenly at night, at midnight, 2 a.m., you start getting pager alerts where you are having unusual outbound traffic coming from, say, one of the US East 1 regions. The system is reporting a sudden spike of 600% in the traffic with an anomaly detected at very high severity. You pause for a second and think about why you’re getting these alerts at this time. It’s really midnight, right? And where is this 600% traffic spike coming from? You get up, grab your laptop, and within a few seconds, you start looking through those monitors. But the thing is, your monitors and charts are filled up with data and you are seeing spikes on those charts.
The thing here is there are many alerts which are stacked one over another. Log entries are streaming from various sources which you cannot really manually read. Each line looks the same, but out of those lines of errors, there is one certain line which is actually the anomaly in that particular data. Here, the thing is your system is telling you what the error is, but it’s not really giving you the context because you are scrolling through 30,000 different lines to find that one line which is having the issue. If you look at those data sets, there is one hidden line which could be causing that kind of spike for the anomaly detection.
If you look at previous data sets and various studies, one missed anomaly and that kind of issue can cost millions of dollars in downtime. It could also be a false alarm which you want to isolate from those alerts. If I’m not wrong, in this room there could be many people who could be getting those kinds of issues. Raise your hands if you have seen any such kind of incident in the past where you got a lot of alerts but were not able to find any such issues. Okay, so a lot of you are raising your hands, which means you all have had these kinds of situations, and definitely we all have had that kind of night. That’s the reality with respect to networking and IT ops because our networks are not static.
They are distributed with multiple devices connected, such as laptops, camera systems, printers, monitors, mobile phones, and so on, which are completely connected. All these events are generated from various devices and various sources. This could be coming from your IoT devices or from different API systems. Over a period of the last few years, the data volume has really exploded. Earlier it used to be a few thousand lines per minute, but now it’s like millions of events coming at a very high scale in a few minutes or a few seconds. Our networks are not really quiet anymore. They are moving data and they speak in millions of events per second.
But the problem here is the threats that you are getting evolve in a very fast way which human eyes or dashboards cannot read. If I look at some of the Gartner studies and other studies, we have found that 70% of such anomalies go undetected until after the damage is done.
Now we see how the network has evolved over a period of time, which is no longer a static map, and it is causing problems in the real-time system.
The Human Cost: From Analysts to End Users
We talked about the networks. Let’s now zoom into the human side of the aspect where for each such dashboard there is basically an analyst who is looking through the dashboards and the alerts, which are dozens of dashboards or hundreds of alerts, and most of them could be meaningless and some could even be redundant because you have the data explosion coming at a very high scale. Here the real problem is not really on the volume, but it’s more on the context. You have log data which is scattered across multiple devices and multiple systems, which I mentioned in the previous slide, which is missing the historical and the contextual linkage.
You might have logs in one system, you have metrics on another system, and then traces on a different system. So you need a system which can correlate all of such incidents and events, which could be helpful to find what might have happened in the past so that you can make some judgment and get the responses to what might be happening today. If we talk about manually looking at those errors or manual triaging, it could take hours or days of the analyst’s time, but the irony is every organization has some sort of data. You might be having the Confluence pages or the wiki pages where you have the SOPs or the runbooks where you have seen those kinds of issues in the past, but the thing is those answers exist in those PDFs which could be a 500-page manual or a playbook.
But when the real anomaly happens, you cannot really go back and look into those PDFs to find the exact issue of what might be happening. So in the middle of the incident, the knowledge might become very critical and important, which could be as good as you have locked it in a vault which you want to find in the middle of the night. So it’s you have the answer, but you’re not able to find that, and that’s where we are thinking of building a system where you can have the context for your real-time data which is linked to your knowledge base, which is stored into one of the knowledge bases which can find the anomalies that might have happened in the past and then what’s coming next.
If we talk about who feels the pain, I talked about the analyst who is feeling the pain, but if we look at the broader ecosystem, then the analyst on call is not only the person who feels the pain. There is a full hierarchy. It flows from CISO to CTO, but they have a different level of viewing the system. If we talk about the CISO who is sitting at the top, he feels the pain on the visibility and the trust. He might be getting into the board meeting and they would be asking the question, what could have been done to avoid that kind of situation? So that’s the pain what the CISO is feeling.
Then top to bottom we have the security analyst. As I was talking about earlier, the analyst’s screen is flooded with the alerts and the dashboards, and he might be getting into the alert overload or the alert fatigue, which could be getting into the risk of oversight where he could miss one line which could be having an issue. Then we have the network engineer, which would be trying to find out the data across different sources. Your logs are buried in one system, metrics on another, or maybe traces on another. So it’s trying to find those correlations between those data to find the reason for the anomalies.
Then the third one is the ops manager, and their view is also not very different, but he’s trying to reduce the mean time to resolution for any such errors, and most of the times ops managers are reactive, not proactive. So we want to really build a system which is proactive in terms of those monitoring so that it can tell you what the error is and what could be the resolution of that. Now if we get into the next layer, we have the CTO, who is getting to the innovation, building product, and working with the development team. So there is a developer who wants to shift the code faster, but he’s buried into investigating the issue which is happening at early midnight or the next morning. So in that way the developer productivity gets impacted.
Then we have the developer manager who keeps on missing the deadlines for delivering the feature. So instead of working on the innovation and building new features, they spend a lot of time troubleshooting the issues in their system.
We talk about the organizational things where you have the pains, but even before these people see those pains, we have the end user or the customer who experiences those pains even when it comes to the system, because the customer might be seeing the latency being high or maybe they’re not able to access the system whenever any such downtime happens. So they would be having the outages or the degraded performance for their application, which would be impacting millions of dollars for these customers as well.
Envisioning Intelligent Networks: Autonomous Agents as the Solution
So now think of if you have a network which could really think. That’s what we are heading into with the autonomous agents. So instead of looking at the dashboard for your attention, your system could actually reason just like your best analyst would. Instead of looking at the playbooks, let’s say if you have the incident history and then you have the tribal knowledge from the past experiences, then it could come alive and respond in minutes.
So now picture the same scenario we had at the beginning. It’s 2 a.m. in the morning and the spike happens, but this time what happens is instead of showing you hundreds of alerts, your system gets you a single message. Outbound spike is coming from the US East 1, which is resembling the DDoS pattern we might have had in the previous quarter, and the resolution for that is to isolate that particular network, move the traffic to the US West 2 so that your system starts working again. So that could be the suggested action based on the agents, and how does this really work under the hood?
So it’s not really magic. It’s kind of marrying your data which you are getting in real time from various devices. Then you get the enrichment of those data, identify the key metrics or the pattern from those logs which could be causing that anomaly. Then you have the autonomous agents which would be looking at your streaming data, and then on the other side you have the knowledge base which is having the history from the past incidents. Then using that knowledge base, you will be getting the suggestion with respect to predicted maintenance or what could be the alerts which you need to solve at the right moment. So that’s how the autonomous agents have come into the picture.
And now if I correlate this with the network analyst, what I had in the beginning is looking at different charts for the monitors to find the issue. That has been converted into the agents which are looking for the suspicious traffic detected. But the thing is, I think you all have been here and you all have been hearing about the agentic AI agents almost in all the sessions, keynotes, and everything. And you might have a question, can the agent replace the human?
So I would just like to highlight Matt’s keynote yesterday if you were there. He was talking about how agents are not really going to replace the human, but they’re going to multiply them. The action or the task which you were doing earlier in 24 hours or a week, that could get reduced to 4 hours or 6 hours or even a day, so that you get more time working on the product or building the innovation rather than working on fixing the problem.
So this is not really the sci-fi dream, I would say, and you can think of the agent which never sleeps but is all-time attentive. Whenever any issue happens, it will just respond with its best judgment and the knowledge it has in the database. So there will be kind of another human in the loop which would be speaking human language.
So now if we look at the dashboard, instead of the analyst asking the question, your agent can ask the question, show me the top 3 anomalies which have happened in the last 10 minutes, and the agents would be responding with an intelligent summary of that data after looking through tens of thousands of similar events. These agents would be kind of complementing you by reducing the noise from the system and doing the heavy lifting of correlation across various systems. So in that case, humans can really focus on the decision making.
High-Level Architecture Overview
So let’s look at how the architecture looks. So at the top you have the different devices which are streaming the data. At the streaming side, you have the streaming services which would receive the data, and then you will send it for the analytical processing. In this particular demo of the architecture, we are not really using the streaming data for the analytical processing. We’re only forwarding that data for the next layer to find the anomalies. Then you have the knowledge base where you would be storing your previous experiences, notebook or the runbook, SOP, etc. And then you would be running the large
language models which would be queried by the agents to find the anomalies by looking at the knowledge base. It’s a high level architecture, but Rahul is going to cover more detailed architecture once we go forward along with the demo. So let’s see, we talked about the high level architecture and the issue what we have. So what are key technologies we have used to build this kind of solution?
Key Technology Components: Streaming and Processing Layers
The first one is the streaming services where we saw devices are generating the events every second. Then you really need a service which can capture, store, and stream data reliably to the next layer. This is where a streaming service comes into the picture. So we have two options. One is the Amazon MSK, which is the Managed Streaming for Apache Kafka, and the Amazon Kinesis Data Streams. So if you’re looking for the managed service, then you can go for either of them. And with Kafka you can also have the self-managed Kafka which could be used for receiving the data at the streaming layer.
Both of these services provide a mechanism to stream data durably and to scale as per the demand. It can handle the high velocity data as per the agent’s capacity or as per the capacity of what you’re planning to send. And then if needed, it can also hold the data into the buffer until the next layer is ready to receive that data. And lastly, the data which is being sent through those streaming services is ordered so that it guarantees the ordered delivery of the event, which is really crucial for deterministic agent reasoning by the agents.
Once you have the data flowing from the streaming services, then you need a processing layer which can help enrich or transform the data. So here we have Apache Flink which could process continuous stream of data in motion by aggregating the metrics, correlating the anomalies, and producing the real-time features which AI agents can use for the reasoning. And they can then decide, think of each log line being sent to the agents. It creates the rolling average of, for example, the errors what you’re getting, the network latency or the response code, and maybe you might want to capture the frequency of failed login attempts if any of the DDoS attack is happening into your network.
It can also chain together the events from various sources. So as I was talking about MSK and Kinesis Data Streams, in Flink you can receive the data from both of these streaming solutions and then the agent can use those asynchronously. And lastly, it also has the built-in state management, so it resembles the recent activity by maintaining the context and then whatever data it gets, it can generate the prompts to an agent which could be summarized, the top three anomalies in a particular region in a particular time frame so that you will get the overall context about your data what you’re getting in a certain time window.
Storage and AI Foundation: Amazon OpenSearch Service and Amazon Bedrock
So now we have the streaming, we have the processing, then we need the storage layer. That’s where Amazon OpenSearch Service comes into the picture. So OpenSearch Service is a managed service which simplifies AI powered search, durability and vector database operations and gives you really a secure and cost effective managed service. So in terms of the operational side, it’s a managed service you can deploy into the AWS Cloud. We have both options, the managed service as well as the serverless option over there.
And if we talk about the performance with respect to OpenSearch, we have seen a tremendous improvement into OpenSearch from its previous version, which was there around one to two years back. We have seen six times performance improvement. And when we talk about the agent’s knowledge base and the vector database, you can use OpenSearch to get the single digit millisecond level latencies. Then we have the different tiering strategy in OpenSearch, so that can help you reduce the cost. So if you have any data which you’re not really actively querying, you can push it into the warm tier or cold tier, and the recent data you can keep it into the hot tier. And lastly we have the integration with other AWS services and even open source tooling as well with OpenSearch.
So one of the integrations is with Amazon Bedrock. That’s where we have OpenSearch as the knowledge base for Bedrock, and Bedrock has a set of capabilities which could be used for building and scaling generative AI applications with the highest level of privacy and security. So if you look at the top left side, one of the reasons why a lot of customers and people are using Bedrock is because you have an ability to select a different foundation model using a few simple APIs. And once you have your own, and the thing is you’re not really bound to use the foundation model.
You can customize the foundation model with your own data so that it gives you relevant output without hallucinating. All of this is available at the best price performance because Bedrock is serverless, so you don’t have to manage and maintain any infrastructure. Once you have that data and once you have finalized the models, it gives you the capability to use Retrieval Augmented Generation, where Bedrock acts as the knowledge base so that you can use it for different RAG use cases. That way you’re not hallucinating on the output. Apart from the foundation model, Bedrock also has other capabilities, which Rahul is going to talk about, focusing more on the agent core and the Bedrock Agents. Lastly, it operates by following strict compliance and security standards so that your data is not impacted by any security breaches.
As I was mentioning earlier, OpenSearch is the recommended vector database whenever you are building a generative AI application and using it with Bedrock. The reason why people are using OpenSearch is because you have one-click semantic embedding for your text data, and it even supports multi-modal for image or video data as well. You can scale OpenSearch for billions, and not even billions. We recently announced yesterday that you can scale OpenSearch for trillion-scale vector embeddings with very high dimensions of up to 16,000. It supports different algorithms like HNSW and IVF, and it has different quantization options if you’re looking to build a cost-optimized vector database. Lastly, it’s not only a vector database. You can do hybrid search by combining lexical and vector search into the same search solution.
Understanding AI Agents: The ReAct Framework and Agentic Loop
With that, I have covered the challenges and the key technologies that we have used to build this kind of streaming application. Next, I will hand it over to Rahul, who is going to talk about how AI agents act as the bridge between the streaming data and the knowledge base.
Thank you, Prashant for that. Before I start diving deeper into the autonomous side of it, mainly into the agent side of it, can I see a raise of hands where some of you might have already tried writing some agents by yourself, maybe in a proof of concept or maybe in a development environment? I’d say quite a few hands. Thank you for that. Now, can you raise your hand again if you integrated that agent into real-time streaming data to find the analysis around that? Okay, very few. Thank you for that. This session will be focusing on how we can integrate the agent into a real-time dataset and then analyze the particular event and try to find out corresponding context awareness to understand what action needs to be taken care of.
Before I start, I’m going to quickly put the definition of an agent. At the end of it, an agent is just a piece of software which is capable of working with artificial intelligence systems, predominantly making use of large language models to understand the problem statement, create a corresponding plan to execute it, and at the end of it you have the option to execute those steps on behalf of a human. That’s where the main power of large language models with respect to agents is set in.
Now to make that happen, imagine once you get a problem statement, what do you do? First of all, you try to understand. Let’s say I received a log file. I need to understand what is present in the log file. Then I understand what the corresponding IP addresses are, whether they belong to one of my EC2 machines or belong to some of my other applications which might be running on surveillance applications as well. Once I get that information, I try to find out what the corresponding impact is, the blast radius associated with that. Now let’s put that brain thought inside an agent. The core of the agent is understanding the goal, the problem statement, with understanding of which different tools are available. In our case, maybe calling an API to get more contextual information or going to my knowledge base to understand more detailed information about my runbooks, my security standards, combining it together and then making decisions about the corresponding actions that it can take to either eliminate it or at least notify what the right action needs to be taken next. That’s the power of the agent in a nutshell. That’s how multiple components come into the picture to work together as a brain of the system.
Now when I talk about the brain of the system, that’s where the agent comes with a very significant change in the paradigm which is called ReAct, which is nothing but reasoning and action. As you can guess, reasoning is nothing but using large language models’ power to understand the problem statement, get the corresponding context, and decide which different tools are available and which one to use at that point in time.
The action is the part where the agent actually invokes the tool to get more information. Now in the real world, this cannot be a one-step process. As a human, I go through log files, then I go through runbooks, then I go through security logs in multiple different places. That means I’m looping myself multiple times, getting the inputs from my previous steps and deciding and orchestrating my next steps. That is also repeatable.
So combined together, the agentic loop is not just capable of doing reasoning and deciding the action. In addition to that, it is capable of orchestrating that action repeated multiple times to come up with a common conclusion or a final conclusion of the result. Now let’s see that in the diagram process flow wise. So first of all, your prompt query, which is nothing but in this case maybe a security log, maybe a problem statement, maybe some information which you’re giving it, and with that you’re giving the system prompt, what you’re expecting the agent to do. That’s what it mainly is.
The agent takes that information, combining it together, and does the LLM call, which is nothing but calling a large language model. Now the large language model behaves as the brain of the system, which takes all this information together to decide which different tools to invoke because at the same time it understands which different tools are available because the agent is providing all that information to the large language model. Once it decides what tools to invoke, it sends that information back to the agent.
The agent combines that information together and invokes that particular tool. That’s where the execution step comes in the picture. The execution step can be debugging your logs, going through a knowledge base to understand corresponding context, getting some more information, multiple different steps involved, and this loop generates a result. That result may not be the final result yet because this might be just a step, the first step before I reach the final conclusion. This information can continuously repeat. At the end of it, when the model gives a green signal that yes, this is a good final result, the agent combines it together and sends the result out. So that’s what the agent loop in a nutshell is.
Building with Bedrock: Agents, Multi-Agent Framework, and Model Context Protocol
Now let’s dive further deeper, especially when we’re talking about Bedrock. I’m going to look at one of the important features of Bedrock. Importantly, when we start talking about writing the agents, we have multiple options. In AWS, we like to give you multiple options, and I’m going to talk about a couple of them. So the first one is related to Bedrock Agents. This is one of the key features inside Bedrock, which is completely fully managed agent creation. It is capable of doing its own orchestration as well, and you have an easy option to host it as well. So that’s the key feature that we’ll be exploring today.
There are a couple of features which we will dive further deeper into within the agent. The first thing is, I talked about calling some APIs. That’s where the action group comes in the picture. That means the tools which my agent can invoke, making my agent aware of those tools. That’s where the action group comes in the picture. Prashant talked about the runbooks and talked about the previous SOPs. All that information combined together creates a knowledge basis. So these are the two important key features we will be talking about today, but there is a lot of variety of different features available.
Now let’s change the gears a bit. So I’ve talked about the Bedrock Agents, which is in my mind more towards a lower coding environment. But now, as a developer, you might say, hey, I know my Python coding, just give me the framework to work with. That’s where the multi-agent framework comes in the picture. So these are option number two if you decide to write the agent by yourself. The multi-agent framework is an open SDK. Until yesterday it was Python, and today morning you must have heard that TypeScript started supporting it as well, where you as a developer, your persona is more of a developer persona where you’re writing a few lines of code to write the complete agent functionality. So we have two options. We’re going to explore the first one in details, and the second one, I’ll show the snippets of that as well.
Now let’s dive further deeper. Let’s talk about how the agent is calling a tool, because when you talk about tools, the tool is capable of doing a lot of variety of actions. It may be capable of going to a database to get some information, capable of going to my knowledge base to get some information around that, maybe calling an API to do some action steps around that. Each of these steps has its own nature of execution. So there should be a common mechanism where agents should be able to understand and hand over the responsibilities to the tool to execute it. That’s where the Model Context Protocol came in the picture. Again, it’s generated by Anthropic and it is widely used by many of the agents and many of the tools around the world now.
So let’s dive further deeper into what we’ve done with the agentic loop and where the MCP plays a role. Let’s look at that part.
As usual, first of all, the query comes in where the agent absorbs the query. It gets the corresponding information as well as which different tools are available. It passes along all that information to the large language model. The large language model, as you’ve seen previously, is capable of doing the reasoning. It understands the problem statement, it understands the context, and now it’s creating a plan to execute different agents and different tools. That’s where the invocation of tools comes into the picture.
Now when the invocation of tools comes into the picture, that’s where the MCP protocol comes into the picture. So your MCP client is making a call to the MCP server, which is going to invoke a tool. That tool can be different API calls, or the tool can be accessing the knowledge base for getting further information. That knowledge base can be based on your runbooks, but the whole idea here is giving the option for the large language model to discover the tools. And once discovery happens, the action part of it is taken over by a common protocol. That’s where the MCP comes into the picture.
Now this loop, as you can see, results come back to the large language model where the large language model can decide whether it’s a final good result or not. If it is not, it can repeat this particular step again and again. That’s where the MCP protocol comes into the picture for coordinating or communicating between the agent and different tools. At the end of it, when the LLM says yeah this looks good, all the results look fine, that’s when the results come back out. Now you see that fragmentation attack, that’s the scenario we have mimicked today.
Okay, let’s combine the things together. So till now we talked about RAG, which is nothing but a good OpenSearch mechanism where we can save all the information. We talked about the agentic side of it, which is nothing but capable of gathering the information, creation of plans, and executing them together. Now let’s move one step forward. What if we combine them together now? That means dynamically whenever the problem comes together, the first thing the LLM is doing is nothing but breaking the problem into smaller chunkable steps.
For each step it’s complementing it with its understandable knowledge, which is nothing but RAG. Based on the RAG and the problem statement, it can combine them together to create more context awareness, making more personalized or more specific results to come up with the tool selections. Once the tool execution happens using orchestration by large language models, it can generate the results which are more effective and dynamic in nature as well. That’s where the agentic RAG comes into the picture, and we will dive deeper into the agentic RAG as a part of the demo today.
Alternatively, because as I said for a developer things change a bit, that’s where the agent core comes into the picture. Whenever I write an agent using frameworks, now I have to deploy that. So there are different paradigms available as a part of agent core which give me a serverless environment to host my agents quickly. Runtime is an environment where I can host the agent. Identity is another paradigm where I can connect back to my identity provider to make sure if I’m calling the agent, it should not step over my responsibilities. It should follow the same responsibility with authentication.
As well, it is capable of integrating with different tools using gateways to discover the tools and call the tools. That’s where the MCP protocol plays a quite vital role in there as well. Additionally, it is capable of understanding what is the current context of the issue or context of the current problem, as well as what are the previous contexts of similar problems that might have arrived. That’s where the memory comes into the picture.
There are a couple of additional tools which are browser and code interpreter. If you’re interested, we have a bunch of these sessions going around for the agent core. I will encourage you to attend some of them, which will dive further deeper into this. So that being said, I’m going to start diving deeper into the demo architecture that we are generating today.
Live Demo: Real-Time DDoS Attack Detection and Response System
So the first part of it, what we are doing is we are mimicking a scenario where we have multiple different systems or machines sitting either inside the cloud or inside on premises. All the logs are coming together. Those logs are getting transferred through a streaming solution. Amazon MSK is the one which we are using. They’re getting analyzed initially to extract the key attribute information using Apache Flink application.
Once that information is available, we are passing it along to another MSK to integrate that with Lambda. What that Lambda is capable of is it is integrated with a Bedrock agent. So whenever the record comes into the picture, it will invoke the agent. The agent already has a couple of tools already mentioned through Lambda, as well as has integration with a large language model. We are using Anthropic Claude 3.7 Sonnet model for our purposes.
But at the same time, for context awareness, we have created a knowledge base inside Bedrock. For that knowledge base we are using Titan Text Embeddings V2 model, and the knowledge base is saved inside OpenSearch as vector dimensions.
At the end of it, as an action, we are just sending a notification at this moment, but absolutely you can further improvise this by adding some more tools to actually take the action on behalf of the user as well.
That being said, one small change which we have done is that I cannot just create a system which can keep attacking the system, so that’s why we have simulated it. I simulated an attack using EventBridge, which will generate an attack every two minutes. Every two minutes it’s going to attack one of my EC2 machines and it will send the logs accordingly. My demo is going to focus on this particular architecture, so let me switch to a live demo now. Again, please be patient with me. It’s a live demo, so things might not go in my favor, but let’s try.
All right, so first of all, I’m gonna go a little bit backwards. We call it working backwards. I’m going to go from the right to left this time, talking first of all about what we have set up in Bedrock. We’re starting from the top. First I’m gonna start explaining about Bedrock Knowledge Bases, how we have set up that first.
This is my Bedrock service where I have created a Knowledge Base. Again, Knowledge Bases is one of the options available in Bedrock. I intentionally kept the tabs open to make sure that it’s quickly available. First of all, in Knowledge Bases, again, it’s a managed part of it, so as a user you can pick and choose the model. I use the Titan Text Embeddings V2 model as a target. I’m saving the data into OpenSearch Serverless, and my number of vector dimensions is 1,024. So that’s my vector dimensionality.
Now what information I’m saving there is my data sources. Here the sources I have is information which is previously available with me. I have two of them available. One of them is nothing but my AWS Network Firewall Developer Guide, which came from AWS, and I’m using a NIST standard network protocol document coming from publicly available information. You can easily enhance this by adding your own specific security standards for your company or your protocols or your knowledge base of your network as well.
So two documents which I’ve embedded are, again, if you’re interested, these are the two documents. It’s a combination of multimodality, so I have text, images, multiple tables in there. All of that information is available as a part of it. Similarly, I have a NIST document which has all this information. So these two documents are for my additional context awareness.
Coming back into the agent, which is nothing but in the diagram, I’m focusing on the heart or the brain of the system right now. I’m using Bedrock Agents. I do have a Strand agent which I’ll show towards the end. In Bedrock agent creation, I created a couple of versions of it, so I’m going to focus on version two.
The first part when you’re creating an agent is which model you’ll be using. I’m using Anthropic Claude 3.7 Sonnet model. And then for us, important information is associated with the instruction which I’m giving to my agent, which is called a system prompt. In this case, I’m telling my system prompt, hey, you are a security analyst. What are the analysis or key information you are receiving? Analyze it, whether it’s going to create an attack. If it is creating an attack, categorize those attacks, whether it’s a severity one attack, severity two attack, or severity three attack. If it is severity two or three, generate an email and send out a notification as soon as possible. At the same time, provide the information by which somebody can overcome that attack as well. If possible, give the information related to how much time they have to act on it as well. That’s the complete, if you see it’s an actual natural language prompt I have written, no coding required in this case.
Now another part which I talked about is the tools availability. One tool which I have, again, I will talk about the tool in a minute, but I will talk about Knowledge Base integration first. The Knowledge Base which we have created is also available for my agent to work with, which behaves like a tool for me. Now let’s look at the tool which is called the Security Tool. I’ve written that Security Tool in a Lambda.
The Security Tool is a very simple Lambda invocation, which is again going through a simple prompt and using the Bedrock information, creating a prompt. Again, the prompt is related to very simply analyzing the logs, and at the end of it, generate an email as well if it is severity two. That’s all part of the invocation as well as generating the email and sending a notification. Those are the two parts I’ve created in there.
Once these parts are available, let’s look at the other part of the system. Again, it’s nothing but at the end of it sending the email too. I’m going to dive deeper into the second Lambda, which I’m calling the invocation Lambda, which behaves as a bridge between my internal data set, which is coming through the streaming, as well as integrated with my Bedrock agent.
Whenever the data comes in, it is capable of calling the Bedrock agent. In that Lambda function, I’m just going to close this and show you that this Lambda is a very simple one. It has the agent ID and corresponding alias names, which both came from my previous Bedrock agent. If you go to the agent details, you’ll get the corresponding information related to my alias names and the agent ID as well. I’m using the ID and my version 2, so the alias name of version 2. Those are the two pieces of information it is passing along. While it’s passing, it’s formulating a message and sending that information to the agent to act upon it. That’s the whole purpose of it.
It’s a very simple Lambda which is capable of getting and sending, let me find the right one, this one. It’s a very simple Lambda which is capable of taking the agent, the corresponding aliases, and passing on the information which is extracted by the previous Apache Flink application. Now let’s go one step back and look at the ingestion part of it, where the Amazon MSK and the Apache Flink application came into the picture. In this part of it, I have a simple MSK Serverless cluster created. I’m constantly sending some messages to a topic to create a simulated attack, and that is getting captured in here. This is simple streaming data ingestion using MSK and with streaming Apache Flink for analyzing and extracting the key information.
This is one of the applications I hosted, and from the last few minutes I think I sent out almost nine million records, around 1.84 gigabytes of a dataset. If you want to dive further deeper, you can click on it and get the corresponding detailed information just like any other Flink application. I’m generating attacks constantly. Now to generate that attack, I’m going to go back in here and talk about the attack. As I said, I’m going to create an EventBridge system which will go ahead and every two minutes send an attack to one of my EC2 machines. So let’s look at that part.
While generating an attack, I created an EventBridge rule which will go ahead and every two minutes generate a fragmented attack which is behaving as a DDoS attack. Every two minutes some system is trying to send a lot of traffic to my EC2 machine and trying to generate a fragmented attack for me. At the end of it, I’m assuming that it’s going to generate a DDoS attack for me. That’s what I’m trying to do here. This is the corresponding EC2 machine, the machine which I’m attacking at this moment. For our analysis, I went ahead and displayed that attack as well. Every few seconds, every two minutes it’s sending an attack, and I’m just exporting that attack as well.
Now overall, the system which we have built is that our EventBridge is sending an attack every two minutes to the EC2 machine. The EC2 machine is sending that information through MSK, and it’s integrated with Apache Flink to extract the key attributes. The key attributes now pass on to my Bedrock agent, which is capable of doing the reasoning as well as using the knowledge base to find out the attack mechanism. As I started the attack already, at the end of it you’re going to send me an email. Again, these emails are coming constantly, but I have intentionally identified a couple of those emails for us to dive deeper into it. That’s the outcome of the attack now.
When my large language model extracted the information, it identified the start time of the attack, end time of the attack, different IP addresses involved, the corresponding size and percentage of that as well, and some of the key attributes. Now the second part, if you see the risk assessment, that came from my knowledge base. That assessment made use of the previous AWS Network Firewall guide, developer guide, as well as NIST documents to come up with the corresponding information and mix it together with the metrics available. Towards the end, it came up with the recommended actions that need to be taken care of. The actions it’s suggesting is yes, there is an attack coming from one of the IP addresses ending with 1.7, and it’s suggesting me to potentially treat this as a threat and block the traffic from that particular IP address. The other IP address, I’m hitting only one IP address, so the destination IP address is also provided in that case.
Now combining it together, it identified that this is a critical attack, so it came up with the timeline, a projected timeline that within four hours you have to act on it because this is one of the critical attacks. Mainly the driver is the attack duration and the percentage as well. I’ll show you another one which is more of a medium severity attack. Now if you see the difference, this is another one which my LLM realized is not like a high critical attack, but it is like a medium sensitivity attack. Now it generated the similar information but provided a time window of twenty-four hours. Overall what’s happening here is my agent is making use of my runbooks and plus integrating that with the current log information to generate recommendations as well as time estimates for when an attack might become more critical.
Those time windows differ based on the specific attacks. Sometimes the system will not provide a time window and will simply indicate that action is needed as soon as possible, while other times it may provide a specific time window. That being said, I’ve now covered option number one. Now let me change gears. I’m going to talk about option two. This approach is more suitable if you come from a development background, and that’s where Strands comes into the picture.
The complete agent that we saw a few minutes ago, where we used the Bedrock agent to write the corresponding system prompts and corresponding tool integration, I rewrote all of that using Strands. Mainly I’m using Strands as my agentic framework, combining it together with Agent Core, where I’m using Agent Core runtime to host my agent and Agent Core gateway to make the tools available for it. Again, it’s using MCP protocol for tool communication as well. If you look at the code, first of all, I’m integrating with the Strands libraries, combining it together. I’ve created the model that I’m using, which is Claude 3.7 Sonnet model. Then if you look at the complete system prompt, it provides information about which different tools are available and how to identify the sensitivity and generate the sensitivity corresponding to the prompt based on the input.
At the end of it, it talks about the gateway, which is essentially the integration of available tools. Agent Core gateway ensures that different tools which I have already defined are available for my agent to work with. When you talk about gateway, the important part is authentication as well. That’s where my client ID and client secret come into the picture to make sure the agent is communicating securely and it’s the right communication where the agent has the authority to call the particular tools as well.
Further going down the line, we have the URL. That’s where the server is hosted, which provides the ability for the agent to communicate with the tool using the protocol. And then there’s tool discovery, which shows the support for the tools. Another important key aspect which we observe here is that there is a tool description, because the LLM will use the tool description to understand what the tool’s functionality is. That’s why when you are defining a tool, it’s quite important that the description is clear enough so that whenever an agent makes a decision to call one tool versus another, it has clear information associated with that as well.
Once these tools are available, the next part is just the invocation of the agent. That’s what this part demonstrates. Overall, this is more of a developer-friendly environment where I’m writing using Python. If you want to use the alternative, which is Bedrock agent, you don’t have to write a lot of code and it’s more towards configuration. That was part of our demo. Again, these demos are available for you. We have already created a workshop and Prashant will talk about further details on how you can access those as well. Thank you, and I’ll hand it over back to you.
Beyond IT Ops: Cross-Industry Applications and Use Cases
All right, thanks Rahul for showing the excellent demo. You saw how real-time attacks are being simulated and then y