Lessons from 70 interviews on deploying AI Agents in production

“Customers look at Microsoft’s Copilot and think, ‘Oh great, Clippy 2.0!’” – Marc Benioff, CEO of Salesforce

“Copilot? Think of it as Clippy after a decade at the gym.” – Satya Nadella, CEO of Microsoft

If you don’t know what Clippy is (or better yet, never encountered it), consider yourself lucky. Introduced in Microsoft Office in 1996, Clippy was the infamously annoying digital paperclip that offered unsolicited advice to users, and instantly became the world’s most hated virtual assistant. And why are we referencing an oft-reviled 1996 virtual assistant that was eventually turned off in 2007? Because history never repeats itself, but it does often rhyme.

With all the buzz arou…

“Customers look at Microsoft’s Copilot and think, ‘Oh great, Clippy 2.0!’” – Marc Benioff, CEO of Salesforce

“Copilot? Think of it as Clippy after a decade at the gym.” – Satya Nadella, CEO of Microsoft

With all the buzz around agentic AI (and Gartner’s prediction that over 40% of agent-based AI initiatives will be abandoned by 2027), we asked ourselves: What does it take for AI agents to be deployed in production environments at large enterprises? What does it take for agentic products and copilots to actually be loved and used by employees (unlike the much-maligned Clippy)?

To this end, we surveyed over 30 of the top agentic AI startup founders in Europe and interviewed 40+ practitioners to not only build a view around the State of Agentic AI, but also create a playbook based on the common practices we’ve seen successful agentic startups deploy. We also include raw, unfiltered commentary from the survey. To give you a flavour of some of the things we’ve learned:

The biggest challenges founders encounter when they are deploying AI Agents in production environments are actually not of the technical variety, instead they are:
Workflow integration and the human-agent interface (60% of startups)
Employee resistance and other non-technical factors (50%)
Data privacy and security (50%)
As a result, the most successful deployment strategies involve a “Think Small” approach, starting with low-risk yet medium-impact, easy-to-verify tasks that quickly demonstrate clear ROI. Even better if it’s automating a task the human user hates, and it’s pitched as a co-pilot that augments (rather than replaces) humans.
A significant 62% of agentic AI startups are already tapping into Line of Business or core spend budgets, proving the technology is moving beyond the experimental phase.
Although pricing strategies are continuously evolving, Hybrid and Per Task are most commonly used (23% each), while the “Holy Grail” of Outcome-based pricing is currently used by only 3% because different customers value different outcomes, it’s hard to attribute, measure and monitor these outcomes, and that makes pricing unpredictable.
As the ecosystem is in such nascent stages, most (52%) startups are building their agentic infrastructure fully or predominantly in-house.
Startups are focusing on reliability, as over 90% report at least 70% accuracy in their solutions. While healthcare startups reported the highest levels of accuracy (unsurprising), medium levels of accuracy is acceptable for simpler, low-risk use cases with easily verifiable output, when the high volume of automation offsets the lower accuracy, or when the AI enables an entirely new, previously impossible capability.

Based on our findings and interviews with enterprise practitioners, we’ve outlined observations on successful agentic deployment drivers, covering everything from strategic roll-out of use cases to (what we call) the 3Es framework (Education, Entertainment and Expectation management). Are you an agentic AI startup looking to overcome the various challenges around enterprise deployments? Jump straight to our observations here.

If you’re a founder building in this space, please reach out to Advika, Sevi or Mina – we’d love to chat.

What’s an AI agent, and why do we need them?

“What’s in a name? That which we call an AI Agent By any other name would be just as hypey”

– Not said by William Shakespeare

A variety of definitions of AI agents are thrown about, but for the purposes of our discussion, we describe their key attributes:

Goal orientation: AI agents are assigned specific tasks or objectives, and their actions are aligned with achieving those goals.
Reasoning: Agents create plans to achieve the aforementioned goals and incorporate the ever-changing real-world context in their planning; they break down their main goal or complex problems into smaller, manageable tasks and think about the next best steps.
Autonomy: AI agents act independently without needing constant inputs/instructions from humans; they make decisions and take actions (via tool calling) based on the changing world around them. Given the nascence of GenAI-powered agents and assorted reliability issues around them (and enterprise practitioners’ caution around deploying fully autonomous systems), our definition of agents does not require full autonomy. As a result, co-pilots are included in our definition (so long as they meet the the other criteria we’ve listed out, such as goal orientation, reasoning, and actions via tool use).
Persistence: Agents have memory, or are able to remember their prior experiences and maintain focus on a long-term goals across sessions. This is also known as state management.

Agentic AI mirrors humans across three C’s: Cache (Memory) — like our memory, it recalls past events with vector DBs or state; Command (Muscles) — like muscles, it acts on the world through tools, plugins, or MCP functions; Connect (Mouth) — like the mouth choosing whom to talk to, it picks AI partners at runtime. Together, these give AI human-like autonomy: remembering, acting, and speaking.

Alex PolyakovCo-founder and CTO at Adversa

AI agents are different from basic LLM chatbots because state management and tool calling are a harder engineering challenge, making their deployments much trickier. An evolution of this is multi-agent systems (MAS) is where agents can have shared memory, overarching goals, coordination amongst themselves. These MAS involve individual agents with specialised capabilities (or distinct sub-components of a broader goal) working together to solve complex problems, even across organisational boundaries.

Given MAS distribute cognitive load across multiple agents (each optimised for specific sub-tasks), they have demonstrated superior performance in handling complex, open ended problems vs single-agent approaches. They improve efficiency, reduce costs, and offer better fault tolerance and flexibility – which means they outperform single-agent systems in overall performance.

But why use AI agents at all? Why not RPA (Robotic Process Automation) or other traditional forms of automation? That’s because AI agents are better for complex, dynamic, and unstructured tasks that require cognitive ability, reasoning, and adaptability. Unlike RPA which follows rigid, pre-defined rules, AI agents can reason toward a goal, make dynamic decisions on the fly, and learn or improve over time – this allows them to handle edge cases and changes in the environment without breaking.

Previous automation solutions for Air and Ocean Freight spot rate pricing are brittle and rarely break 50% automation. We can hit 90%+ with new agentic capabilities by shifting away from fixed linear processes and enabling the agent to process and retrieve more unstructured data needed to make decisions.

Dan BaileyCo-founder and CEO at Nexcade

Rules-based automation is fast but narrow—it can only act on what it already knows. The toughest problems in security sit in the grey areas where intent isn’t obvious and signals are incomplete. Agentic AI takes this further, applying analyst-level reasoning to the edge cases rules can’t resolve. It brings together the scale of automation and the nuance of human judgement, giving defenders the upper hand against attacks that thrive in ambiguity.

Rahul PowarFounder and CEO at Red Sift

What does enterprise adoption of Agentic AI look like?

Certain surveys, such as KPMG’s AI 3Q 2025 quarterly pulse, note that AI agent deployment has nearly quadrupled, with 42% of organisations now having deployed “at least some agents,” up from 11% two quarters ago. While this may sound promising, we think “at least some agents deployed” is a poor measure of the true picture of adoption. Our conversations with practitioners suggest that, yes, most large enterprises are deploying AI agents in production environments, but these deployments are typically fairly small. They’re also mostly concentrated in (relatively) more mature areas as such as Customer Support, Sales and Marketing, Cybersecurity and Tech (e.g. AI coding agents).

We think it would be more useful to think of adoption through the following lenses:

How many teams and employees are actually using Agentic AI in their day-to-day work: A May 2025 PwC survey noted that for most respondents (68%), half or fewer of their employees interact with agents in their everyday work. That said, our conversations with practitioners suggests that employees are using personal accounts where enterprises are not adopting the tech, triggering a “Shadow AI” problem where compliance issues run rampant.
The extent to which employees are using AI agents for their “potentially automatable” workflows (for very few of their workflows, some, or most): We stress on the “potentially automatable” point because it may not be desirable for EVERY workflow to be automated, and agentic AI may not necessarily be the best automation technique for that specific task.

While collecting data for actual workflows automated vs potentially automatable workflows is challenging, KPMG’s observation (from the same survey) on “How have AI agents been received by employees?” is a relatively useful proxy: only 10% of respondents indicated “significant adoption” where employees are enthusiastically adopted AI agents and are fully integrating them into workflows, while 45% pointed to “slight adoption” where employees are beginning to accept and integrate AI agents into their work (the remainder are getting mixed responses).

The degree of autonomy given to the AI agent for each workflow (whether it can execute only some tasks within a given workflow, or it can drive the entire workflow end-to-end): Our conversations with enterprise practitioners suggest that they are taking a conservative approach. Even if agentic AI solutions can theoretically be run reliably at 80% levels of autonomy, most practitioners will veer towards greater levels of human-in-the-loop and run the solution at 50% levels of autonomy.

Survey Findings

We surveyed 30 European agentic AI startup founders, and interviewed 40+ enterprise practitioners and founders to ascertain:

Levels of accuracy and autonomy that their Agentic AI solutions are operating at
Pricing strategies most commonly used by founders
Budgets that agentic startups are able to tap into (just innovation budgets, or the core Line of Business budgets)
Challenges startup founders typically encounter when deploying agentic AI solutions at large enterprises
Agentic infrastructure that founders have built internally, and third party tools they have used

Autonomy and Accuracy

Autonomy and Accuracy are linked dimensions – after all, you only automate to the extent to which you can get reliable and accurate outputs from AI agents. In an ideal agentic world, we would have extremely high levels of both accuracy and autonomy. By accuracy, we mean the % of agent-executed tasks result in a successful or accepted outcome (i.e. 0 being completely overridden by a human, 10 being fully accepted without changes).

While we’re not there yet, we’ve found that currently >90% of Agentic AI startups have at least 70% accuracy, though only 66% of Agentic AI startups operate at least 70% autonomy. Unsurprisingly, the acceptable levels of accuracy vary by industry and use case – e.g. 80% average accuracy for financial services, 90% accuracy for healthcare and so on. The more interesting question is: under what circumstances is a medium level of accuracy acceptable?

Given the interplay between accuracy and autonomy, we’ve identified three configurations that startups mainly sit in:

Medium Accuracy, High Autonomy: A medium level of accuracy (60-70%) is acceptable if the use case is:

low risk and results in an output that is easy for a human to verify and modify; and
such that the lower level of accuracy is more than offset by a very high level of automation, so if it’s a time-consuming task with overwhelming volumes you would take the higher level of automation so you can move through massive volumes and only focus on the edge cases which the agent can’t handle; or
an entirely new capability, which was previously impossible, so the trade-off is that you would rather be able to perform a certain activity at 70% accuracy than not be able to do it at all.

High Accuracy, Low Autonomy: This category predominantly comprised of agentic healthcare startups, where the typical levels of accuracy and autonomy were 90% and 40% respectively – and these were for much more high-stakes use cases (e.g. research for clinical trials, mental health care) where accuracy is of paramount importance. As one founder noted (regarding their agentic AI solution’s >85% accuracy):

“This accuracy level is not sufficient to remove human oversight and achieve full autonomy, especially in the sensitive context of clinical trials where regulatory standards are stringent.”

High Accuracy, High Autonomy: Majority of the startups in this category operate at 80-90% accuracy and autonomy levels, and are typically focused on the financial services use cases (e.g. compliance) as well as relatively more mature areas of AI deployment, such as customer support, cybersecurity, and research. In these cases, we’ve observed that founders are increasingly marrying probabilistic LLMs with more deterministic AI methods to enhance accuracy and consequently further autonomy.

Our clients are doing hardware engineering. Their goal is to get a 100% working blueprint of whatever product they’re trying to manufacture or assemble. It’s a hard science, with a binary result: either it’s working on the production line, or it’s not. AI agents working in this context need to strive for this perfection – or at least help humans get closer or faster to this result. That inherently conflicts with the probabilistic nature of some of the tech we need (esp. LLMs), which is why we need to balance it with other more deterministic AI/ML methods.

Matthias Berahya-LazarusCEO and Co-founder of Cognyx

Here’s a visual summary of the three Accuracy/Autonomy configurations that agentic startups mainly sit in:

As we increasingly deploy agents on multi-step problems or introduce multi-agent systems, the bar for accuracy is only going to increase – like what happens when you chain a 90% accurate agent with another 90% accurate agent, and so on with the errors compounding at each step? It triggers something called cascade failure, a phenomenon we explore in our upcoming research reports (along with how knowledge graphs and neurosymbolic AI are the way forward) – so stay tuned!

Pricing

Given the agentic AI ecosystem is in early stages, most founders we spoke to see their pricing strategies as something to be evolved over time, a position we believe makes sense. For instance, a “per user” pricing makes sense at lower levels of autonomy (because a co-pilot needs a human user alongside) but at higher levels of autonomy, where an agent could perform most of an employee’s tasks (plus unlock new capabilities), a “per agent” pricing with an outcomes bonus may make more sense.

As SaaS license and API usage-based pricing are well understood, we’re focusing here on other pricing strategies and their implications:

Outcome based

Outcome-based pricing is often touted as the Holy Grail of AI Monetisation, because it lets customers pay only when a specific, pre-defined business result is achieved. A great example of this is Intercom, which charges customers $0.99 for every successful conversation resolution achieved autonomously by its Fin AI Agent, ensuring they only pay when the agent delivers. This way price is aligned with the business value delivered, the customers’ risk is lower and because it’s connected with a tangible output, it’s relatively easier for customers to understand than a token-based calculation (which doesn’t feel intuitive).

However, in practice, outcome based pricing is difficult to achieve for a wide variety of reasons. Firstly you’ll need to agree on what outcomes the customer values (and different customers may value different outcomes, so you could potentially end up in a situation where you have loads of customised contracts). Secondly, you’ll need to figure out attribution – for instance, with a Sales Co-pilot, it would be hard to attribute how much of a new customer win is driven by the AI agent vs the human sales rep. Linked with that is the problem of how to measure the outcome – and all of this makes the calculation more complex. Finally, it can be unpredictable because it is challenging to forecast certain outcomes (e.g. % of cost savings) in advance – as in, not only are you uncertain of the magnitude of outcome but also the timing of the outcome (it could be deferred). Here’s what a founder had to say:

“But the problem is ultimately it’s very difficult to agree on what those outcomes are. It’s very difficult to agree on tracking that, and it’s very hard to do at scale. You can’t really do that self serve because it’s so gamified – people are incentivised not to report outcomes to you.”

It’s much easier to work with outcome based pricing when:

the desired outcomes are well-defined and similar across your customers;
the agent operates the entire workflow or task end-to-end, so it’s easier to attribute; and
when the outcomes are simple to measure and monitor in real time (like Intercom’s outcomes are binary – either the agent resolves it or doesn’t, and that feedback is received quickly).

Consequently, we expect to see more hybrid models than pure outcome-based, where a per-agent pricing model is augmented with outcome bonuses rather than charging only for outcomes.

Per user

From a budget allocation perspective, this is easier for customers to understand, and it also makes sense for co-pilots where a human user is necessarily required alongside your product. The disadvantage of this pricing model is that it doesn’t distinguish between power users and casual users of your agentic AI solution, leading to the casual users subsidising the lower or negative margin power users. However, if the price point of your co-pilot product is high enough to cover even the costs of supporting power users, it’s a good starting point. As one founder noted:

“We’re fortunate to be in an industry [financial services] where price anchoring is quite high; if you have premium product you can charge a better price. While usage is very high, usage would need to be rather absurd to sufficiently eat into the margins.”

Also, if your agentic solution is highly successful in automating away a great many tasks, it would end up reducing the number of seats to be had in the first place – so it is unsuitable for highly automated solutions. That said, most founders we spoke to intend to evolve their pricing to a hybrid model, particularly as they enable greater autonomy.

Per agent

This is an intuitive pricing model when you’re automating the vast majority of tasks a particular employee carries out; that way you’re replacing a human and it comes out of the headcount budget. It’s also predictable and easy for customers to understand. However, an interesting dimension we observed around how founders (running this pricing model) are positioning it – rather than pitching their product as a replacement for a human employee (or focusing on the tasks that an employee currently performs), they are focusing on the net new capabilities the AI agent unlocks that a human employee couldn’t, which allows them to charge more premium prices.

Per task

This is intuitively easy to understand, because it directly connects usage with the cost (so customers pay only for what they use). This is especially helpful in cases where it’s challenging to predict the frequency and volumes of the tasks to be performed. Because it’s linked with tasks performed, it also helps startups tap into the services budget.

Hybrid

We increasingly see founders opting for a hybrid strategy, which typically involves some sort of base fee, and variable pricing on top, with tiers and overages. Or it could be charging per agent plus an outcome based bonus. Or it could be charging per agent plus metered dedicated tools (so it’s a bit like a human employee asking for SaaS tools to perform their work). As you can see, there are a variety of ways to implement a hybrid pricing model.

It’s good because it’s much more flexible, and protects margins by capping usage (so startups can control costs and reduce the risk of unprofitable customers). However, it can quickly get complex, and helping customers predict consumption is key – whether it’s by having a pre-installation analysis for existing volumes of work that could be automated, setting usage reminders and hard usage limits, or credit rollovers, depending on the implementation of the hybrid model.

Agentic AI increasingly a part of line of business budgets

We asked founders: “Which enterprise budgets are you currently tapping into?” and we were heartened to see that agentic AI startups are predominantly selling into Line of Business or core spend budgets. It goes to show that we are moving past a pure experimental stage (which is where innovation budgets come in) and that AI agents are making a difference to real business use cases or core activities. It’s an excellent way of tracking the mainstreaming of AI agents- even though deployments currently are “broad” rather than “deep” it’s undeniably positive.

Our findings were corroborated by other enterprise-oriented surveys as well:

On average, CFOs report dedicating 25% of their current, total AI budget on AI agents. (Salesforce, August 2025 survey of 261 global CFOs)
88% of executives say their companies plan to up their AI-related budgets this year due to agentic AI. Over a quarter of them plan increases of 26% or more. (PwC, May 2025 survey of 300 senior executives)
Organisations are redirecting their AI investments toward core functions, which now command 64% of AI budgets compared to 36% for noncore activities. This reallocation suggests a growing sophistication: a recognition that AI delivers its most compelling value when applied to central business operations rather than peripheral processes. (IBM, June 2025 survey of 2,900 executives globally)

The challenges with agentic deployments

We asked founders in our survey: “What are the biggest issues you have encountered when deploying AI Agents for your customers? Please rank them in order of magnitude (e.g. Rank 1 assigned to the biggest issue)”

The results of the Top 3 issues were illuminating: we’ve frequently heard that integrating with legacy tech stacks and dealing with data quality issues are painful. These issues haven’t gone away; they’ve merely been eclipsed by other major problems. Namely:

Difficulties in integrating AI agents into existing customer/company workflows, and the human-agent interface (60% of respondents)
Employee resistance and non-technical factors (50% of respondents)
Data privacy and security (50% of respondents)

Workflow integration and the human-agent interface

By this we are referring to both the conceptual aspects (e.g. How should my processes, workflows or even role evolve to accommodate AI? Where and how can AI agents help me?) and practical aspects (e.g. what does the UI look like?).

Conceptually, it takes end-users some time to adapt to his new paradigm. First it’s about accepting and realising that processes need to change, second it’s about figuring out how they need to change. And it’s not just for the end-users to work that out, but also for the team that makes the buying decisions for agentic AI solutions.

Practically, startups are focused on making sure their agents are deployed within the context the user needs and also show up in other UIs (e.g. ServiceNow, Slack) in workflows across systems. Basically, meeting users wherever they are, to make the process of adopting agents as frictionless as possible. It’s also about making sure that the workflows and outputs are customised to the human user. As one founder observed:

“A lot of companies will want very specific workflows – which makes sense – but supporting multiple unique instances is still quite difficult as some users will want it in very specific formats e.g. specific excel output – supporting that ‘last mile’ UI is probably the biggest headache.”

Employee resistance and non-technical factors

We observed an interesting pattern in the survey results, where startups whose agents operate at higher levels of autonomy (9/10 or higher) were more likely to report employee resistance as a bigger issue. Those operating in heavily regulated industries and domains (healthcare, compliance) and therefore require high accuracy also noted that customers were sceptical of agentic solutions. Our findings around autonomy, accuracy and their effect on employee resistance are simply expressions of a single problem: Trust issues.

These trust issues invariably have other manifestations. Our conversations with enterprise practitioners suggested that human-AI collaborations weren’t always working well; either humans overrelied on the AI which gave faulty responses or they underrelied and double-checked everything the AI did, reducing efficiency. This phenomenon was also observed in an MIT study which suggests that human-AI collaboration often underperforms compared to AI or humans working alone. Reasons for this include the presence of communication barriers, trust issues, ethical concerns, and lack of effective coordination between humans and AI systems. As one founder noted:

“They [human users] often think AI is ‘magic’, and don’t fully grasp its advantages and downsides. Failing to understand how AI works can sometimes lead to frustration and confusion. There is also a certain reluctance to drop old processes and taking the plunge fully with AI.”

Another major non-technical factor that founders pointed to was that customers often lacked a coherent AI and data strategy, leading to a plethora or use cases and test pilots but no cohesive plan for AI adoption at scale. In fairness to the customers, another founder highlighted:

“AI proliferation creates selling friction. Every incumbent provider promises AI enabled point solutions now, which are often initially attractive to customers as its covered by committed budget. But this results in a fragmented AI strategy and very often fails to bring the latest innovation; not all AI is equal.”

Integration with legacy tech stacks

This isn’t a new problem; we’ve always had these issues with enterprise software. But here’s a fun fact for you – 42% of enterprises need access to eight or more data sources to deploy AI agents successfully. It’s not as much fun when you’re working through it all: legacy tech stacks don’t always have an API, documentation is lacking, customers rely on a variety of super-walled archaic applications that keep the company knowledge blocked, so data is siloed and distributed… and the list goes on. We aren’t sharing any quotes from founders here because they’ve mostly said similar things, which speaks to the universality of a painful experience that doesn’t need any further elucidation.

Observability, monitoring and evaluation

In our previous research on Responsible AI, we covered why it’s so hard to ensure that AI systems function as intended, and interpreting what the AI model did and why. It was hard enough to interpret the behaviour of a single LLM-powered agent, but this complexity is compounded when multiple agents interact asynchronously and dynamically with each other. Each agent may have its own memory, task objective and reasoning path, so tracing the chain of events leading to a final decision or failure is difficult. Also, you can have cascading errors in a multi-agent system, where they end up reinforcing each others’ bad decisions. And all of these would be difficult to detect unless you have ongoing monitoring and robust eval mechanisms. By ensuring that the AI agents are working as intended, observability, monitoring and evals give customers the confidence to launch them with their end users. It’s also about traceability. As a founder highlighted:

“The challenge is to find a rationale for the AI agent’s output that is understood and verifiable by humans, so as to increase trust and actually free up time.”

Here’s an interesting aside: As part of our survey, we ask founders “What would you like to learn through our research into Agentic AI? What would be most helpful or useful to you?” and a surprising number of questions were around observability and evals. Stay tuned, we’re covering it in our upcoming research.

Data privacy and security

There are the actual issues, and then there are the perceived issues. In terms of actual issues, founders talked about everything from doing considerable engineering (with several complete restarts) to work around large financial services requirements for the data they can or can’t send to LLMs, to getting ISO 27001 certifications to overcome issues with MedTech clients. Yet even if the data is protected, there are perceived issues leading to resistance or slower rollout of agentic solutions. To illustrate with some founder observations:

“Data and privacy are not so much as a blocker as a major source of slowing us down.”

“Data privacy is not a problem per se, but on occasion we have experienced resistance from senior leadership because of concerns around privacy and security.”

Data quality, data infrastructure issues

Much like with integration issues, data quality and data infrastructure issues are not new. Almost all the founders in our survey talked about having to do a lot of data clean-up to get to reliable agentic workflows. However, the problem is exacerbated because agentic AI aims to tackle actual tasks performed, and clients typically have poor or outdated documentation of processes. Not to mention that there is lots of embedded knowledge of processes that sits within users’ heads.

Infrastructure costs

Sam Altman noted in his blog that “The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use.” Even as the price per token (for a given level of AI) has decreased, the newer cutting-edge reasoning models are more expensive, and the number of tokens consumed has skyrocketed. Epoch AI found that average output length for reasoning models has grown at 5x per year (vs 2.2x for non-reasoning models), and reasoning models exhibit longer response lengths overall – 8x more tokens on average compared to non-reasoning models. And even a simple query may use about 5,000 reasoning tokens internally to return only a 100 token response. The token bloat is a real problem, and the quest for quality (and consistent) model outputs exacerbates the issue, as a founder called out:

“Model consistency is a challenge and has implications for infrastructure costs. Infrastructure costs are a balancing act as it limits the tiers we can make agentic flows available. We have found we need a lot of context and multi pass/reasoning models for most real tasks to get at the required reliability in 2025 which could become significant enough to impact margin.”

Agentic infrastructure predominantly built in-house

We asked founders: “Which third-party AI agent infrastructure solutions do you work with when building, deploying, monitoring the agents? Eg. solutions for memory, tool calling, agentic frameworks, browser infrastructure, agentic payments etc.”

Based on their responses, we found that 52% of the founders surveyed built their agentic AI infrastructure in-house (either predominantly or fully). We attribute this primarily to the nascence of the agentic ecosystem.

Here are some of the excerpts from our founder survey (click on the carousel arrows to see more quotes):

None, all built in-house. External tools haven’t provided us with the flexibility that we need.

We have built a lot of things internally to ensure adaptability and less dependencies – our customers are averse to long lists of dependencies, and from a strategic POV it made sense for us to develop this, however things like Azure Foundry, Browser-use and Phoenix have made it through. We’re constantly evaluating other options. For example, MCP is a strong contender, but in our industry it isn’t being properly adopted.

We manage AI model hosting, memory, and tool integration internally, using a controlled, modular framework tailored to compliance and fraud detection workflows.

We have built our own systems for memory, browser infrastructure and tooling. Having the full stack control on the system is important for giving the agent a predictable, shared language to the way it interacts with the world.

In terms of the third-party tools most frequently cited, ChatGPT and Claude models were most often mentioned along with the Google Agent Development Kit, while LangChain (unsurprisingly) came up as the most popular framework. Other tools that received shout-outs include: frameworks and orchestration platforms (Pydantic, Temporal, Inngest, Pipecat); monitoring, observability and evaluations (Langfuse, Langtrace, Coval); agentic browsers (Browserbase, Browser Use, Strawberry) and vector databases (Qdrant).

Observations

Based on our 40+ interviews with enterprise practitioners and startup founders, we outline the common approaches taken by startups to successfully deploy AI agents in enterprises.

Strategic rollout of use cases

The most successful deployment strategies we’ve seen started with:

simple and specific use cases with clear value drivers, that were low risk yet medium impact;
weren’t majorly disruptive to existing workflows;
preferably automating a task that the human user dislikes (or was outsourced);
the output of the workflow can be easily/quickly verified by the human for accuracy or suitability; and
demonstrated clear ROI quickly

Given the current levels of technological development, AI Agents work best when narrowly applied to very specific tasks and operating under a specific context. For instance, we’ve seen this in healthcare with revenue cycle management processes (claim and denial management) that health systems were already outsourcing to third-party providers.

The land-and-expand strategy for AI agents is very different to traditional SaaS. Given enterprises are increasingly under pressure from the C-Suite to incorporate AI into their work, there are plenty of opportunities for startups to “land” but it’s much harder to “expand” – and not only that, it’s taking much longer to expand even when they want to expand, because it’s a use case by use case rollout. Much like the iconic Volkswagen ad, sometimes it’s better to “Think Small” and build trust first, rather than attempt too many use cases (and excessively complex use cases) right off the bat.

Hand-holding and more hand-holding

Successful enterprise deployments of agentic AI require significant levels of hand-holding and education. This is primarily because enterprises aren’t often fully clear on the best use cases to apply agentic AI to, the opportunities and limitations of the technology, how best to use the tools, how to redesign workflows… and more critically, how to evaluate and buy agentic AI products.

Whenever I talk about product strategy, I always talk about having “zero feet” between us and the customer. If you don’t understand what your customers are doing and what their pain points are, you’re really not going to build a helpful solution.

Hanah-Marie DarleyCo-founder and Chief AI Officer at Geordie AI

Workshops and consultative GTM: Pre-installation analysis and workshops at the very outset are critical for setting and managing expectations, on everything from identifying areas where agents can or can’t be helpful, to providing clarity upfront on expected usage and pricing. For instance, Health Force (AI Agents that automate daily administrative tasks at hospitals), does an AI Readiness Assessment for free, and helps hospitals identify the workflows where AI agents would be most beneficial. Or Runwhen (AI Agents for developer experience) performs a pre-installation analysis on existing alerts or chats and measures which could be automated via Runwhen. Using a consultative GTM approach also gives the enterprise comfort around the degree of customisability of third-party solutions (every organisation has some workflows unique to them, and incorporating their specific needs is key to driving adoption).

Forward Deployed Engineers (FDE) driving adoption forward: A Forward Deployed Engineer (FDE) is a software engineer who works directly with customers, often embedded within their teams, to solve complex, real-world problems – so it’s a hybrid role where an FDE is a software developer, a consultant and a product manager, all rolled into one.

Most of the agentic startups we spoke to have found Palantir-style forward deployment useful when selling to enterprises/mid-market clients who have complex data that is fragmented across different data sources. But there other forms of complexity as well, such as product complexity and process complexity, that necessitate a deeper partnership with customers at the very outset to ensure the agentic solutions are achieving the desired outcomes. The more complex the data integration, the product and the business processes, the greater is the need for an FDE to help drive the best outcomes for clients.

The human-agent interface and the three E’s (education, entertainment and expectation management): As we observed earlier, our survey suggests that 60% of agentic AI startups struggle with workflow integration and the human-agent interface. Startups such as Strawberry (AI agents on browsers) are focused on building out multiple dimensions of that, such as: (a) moving beyond merely a chatbot-style interface; (b) having the AI agents themselves educate the customers on what they can or can’t do, plus give suggestions on how to better use the product whilst managing expectations; and (c) make the AI Agents fun or engaging to work with. For our part, we were vastly amused by Strawberry’s agents such as LinkedIn Linus or Competition Camille or Data Extraction Denise (as you can see, we have an ardent appreciation for alliteration).

The biggest thing is expectation management. If you give people a browser and you say, oh, it can just do anything on the web, then people will write queries like ‘get all the products from Amazon and build a table with prices’ and expect that to work, when that would need hundreds of thousands of dollars and professional web scrapers. But people will also underestimate what is possible, so they will write very simple prompts or very vague prompts, and then be disappointed with the results.

Charles MaddockCo-Founder and CEO at Strawberry

Besides educating the customers (in an engaging way) on how to best use agents and manage expectations, founders are also focused on enabling human users to educate the agents, so users can guide the agent’s behaviour to reflect changing priorities and workloads, as well as capture the users’ unique style of working. Users need to enjoy working with the agent enough to evangelise it (clearly, no Clippy!)

Positioning

A common question we’ve got from agentic AI founders is how to position their products when everyone’s marketing sounds the same. Also, many solutions claim to use Agentic AI; they over-promise and under-deliver, leading to buyer fatigue and scepticism – thus creating a challenging environment for truly high quality agentic AI solutions to cut through the noise. Taking a consultative, collaborative and problem-focused approach that demonstrates real value is critical (which we described above), but so are the various dimensions of positioning (which we discuss below). We fully acknowledge that the positioning is mostly a function of current perceptions and levels of technological development; as these systems see more mainstream acceptance and agents achieve higher levels of autonomy reliably, no doubt the positioning strategies will evolve as well.

To mention AI or not to mention AI, that is the question: We’ve observed an interesting dichotomy in positioning strategies. In verticals like Healthcare, founders are actively downplaying the use of agentic AI in their solutions. As two founders in Healthcare observed:

“You know what’s weird? If you use the words ‘agent’ or ‘AI’ it actually backlashes more than it benefits. The moment you put AI out to clients, it’s like, ‘oh, here goes a bunch of fluff again.’”

“We position more as a mental healthcare company than an agent company to our customers.”

However, in verticals such as Financial Services, founders are prominently featuring their “agentic AI” proposition, given the AI-forward positioning resonates with users and buyers. The good news is that in most verticals (outside of healthcare), the “agentic AI” positioning resonates well (provided it meets all the criteria we outlined in the section on “strategic rollout of use cases”).

Levels of autonomy: Most founders we spoke to have opted for a co-pilot approach to selling, even if their solutions were capable of higher levels of autonomy. This was mainly done to build trust with the customer. For instance, Juna AI (whose agents optimise complex manufacturing processes in heavy industries) started with a co-pilot approach where the agents give recommendations to the customer on how to optimally run the systems, and the customer still has the option to choose whether or not they implement it. While the idea is to eventually get to higher levels of autonomy (the solution is certainly capable of it), it’s baby steps for now.

Most practitioners we spoke to feel like they’re on a learning journey, and would much prefer the co-pilot approach than a fully autonomous one (though again, this depends on 3 factors: the criticality/impact of the task being automated, how easy it is to audit the mistakes the AI may potentially make and catch them before it does any harm, and whether it unlocks an entirely new capability e.g. being able to perform a task a human was never able to do before). However, being able to easily review the AI agent’s outputs were critical.

Augmentation, not replacement: Tied to the previous point on lower levels of autonomy, startups that have positioned themselves as “augmenting” rather than replacing existing employees or legacy tech stacks have found it easier to gain a foothold in large enterprises. It’s even better if they’re pitching a net new capability that wasn’t previously possible. From a tech standpoint, rip-and-replace is difficult for customers who have complex downstream workflows built on top of their existing ERPs like SAP and startups (such as askLio in the procurement space) are focused on working with existing technologies to get to faster deployments. From an employee standpoint, we’re not yet at a level where most AI agents are sufficiently reliable or capable of so many automated end-to-end workflows that enterprises could contemplate a true FTE replacement. Or even if both those things were true (tying back to our earlier point around levels of automation) enterprise practitioners are more cautious with highly autonomous deployments.

Articulation of value proposition and ROI: We can analyse the issue in two ways: (1) where the value proposition is well understood so it is relatively easier to articulate the ROI; or (2) where AI agents have unlocked entirely new capabilities (so it’s hard to compare to existing solutions) and therefore harder to characterise ROI.

Let’s take the first case, where it is easier to understand the use case and articulate the ROI because it’s an established workflow. Here, it’s usually about pitching time and cost savings and/or revenue uplift. For instance, Covecta (AI agents for financial services) talks about 70% time saving on tasks such as drafting detailed credit applications, while Biorce (clinical AI platform that speeds up drug development) talks about ROI both in terms of labour cost savings as well as faster time-to-market (Biorce’s calculation being that one hour spent on its platform saves 720 human hours), with the faster time-to-market itself creating revenue acceleration opportunities. Credit applications and drug discovery are still well understood; but what of entirely new developments such as Generative UI?

That brings us to the second case. Startups such as Architect provide AI agents to build, personalise, and optimise your web pages for every visitor – something we would call “Generative UI” because the visual presentation, content and visitor experience of the website changes on the fly depending on who the viewer is. Given the novelty of the solution, it may be challenging to pitch the product, but Architect overcomes this by positioning the product as complementary to ads systems/platforms (like Google AdWords) and measures success through improvement in conversion (emphasising the utility, not just the novelty).

Given we backed Synthesia (AI video platform that generates photorealistic performances of avatars) back in 2019, we’ve seen firsthand how startups with highly novel technologies get widespread adoption through emphasising utility over novelty. We don’t expect the agentic AI wave (for net new use cases) to be any different.

Getting to the desirable end state

Today’s AI agents are still (for the most part) reactive, because they are triggered in response to human prompts or explicit user instruction to act. However, in the future we expect to see more ambient agents and proactive agents that initiate tasks by themselves, and can reason more effectively around edge cases so that task execution is robust even under uncertainty. This means that agents need to be adaptable without becoming unreliable, and they need to learn continuously as well as retain those memories over long periods of time (much like how a human colleague learns about your organisation). Today, they operate in more constrained and controlled environments with organisations, but we see agents eventually interacting with “open” environments –