EMMA: Governed Enterprise Agent for Mott MacDonald on Azure and Microsoft 365

Mott MacDonald’s new enterprise agent, EMMA — Every Mott MacDonald Answer — shows how a global engineering consultancy can convert dispersed institutional knowledge into an actionable, auditable, and secure assistant by building on Azure’s emerging agent platform and Microsoft 365 ecosystem. The project targeted a clear, business-critical problem: experts and best practices sitting in siloed project folders and personal OneDrive sites made it expensive and slow for teams to find authoritative guidance, so the firm built an agent that surfaces curated, governed answers while preserving privacy, compliance, and human oversight. EMMA’s architecture centers on Microsoft trust primitives (SharePoint/OneDrive access, Microsoft Graph delegated authorization), Azure AI Foundry for agent runtim…

Background

Mott MacDonald is a global, employee‑owned engineering and consultancy business operating across hundreds of markets. Like many professional services firms, its differentiator is deep technical knowledge — design standards, lessons learned, templates, and subject‑matter expertise distributed across teams and document stores. That distribution creates two problems: first, time lost to searching and re‑creating institutional knowledge; second, inconsistent advice that can reduce technical quality or create compliance risks. The EMMA initiative set out to do three things:

Democratize access to best practices and precedents so junior staff can act with higher confidence.
Speed delivery by reducing research time and surfacing curated, company‑approved answers.
Maintain client trust and regulatory compliance by ensuring that only approved data and policies inform agent responses.

Because Mott MacDonald already relied on Microsoft 365 (SharePoint and OneDrive) as the daily content layer, integrating an agent into that surface and enforcing delegated access through Microsoft Graph was a natural fit. Building EMMA on Azure also enabled regional data residency and enterprise security controls, which were non‑negotiable for an international consultancy.

What EMMA is and how it works

High‑level design

EMMA is an enterprise assistant that blends retrieval‑augmented generation (RAG) with governed tool access and telemetry. Conceptually, the stack breaks down into three layers:

Knowledge and access layer — SharePoint sites and OneDrive files, indexed and scoped by access controls; retrieval uses Azure AI Search and Graph API to preserve delegated authorization.
Reasoning and orchestration layer — Azure AI Foundry and the Azure OpenAI Responses API operate as the agent brain, interpreting queries, selecting model endpoints, and invoking tools or data connectors.
Governance, safety and observability — policy controls, content safety filters, analytics stored in Azure PostgreSQL, and adversarial testing (AI red‑teaming) exercises to validate behavior under misuse. fileciteturn1file0turn1file2

Key platform choices and why they matter

Microsoft Graph API for delegated access: By using Graph, EMMA reads organizational content under the same on‑behalf‑of user model employees already use for SharePoint and OneDrive. That approach preserves least‑privilege semantics and makes authorization auditable.
Azure AI Foundry + Responses API: Foundry provides a managed catalog and runtime for models and agent orchestration; the Responses API consolidates chat/assistant primitives and file upload, speeding development by removing plumbing that would otherwise be reimplemented. These choices let engineering teams focus on prompt engineering, grounding strategies, and tools integration rather than building a custom LLM runtime. fileciteturn1file0turn1file2
AI Red Teaming and Content Safety: Before production rollout Mott MacDonald ran red‑teaming simulations and uses Azure AI Content Safety as a continuous control to block unsafe outputs. These steps are practical defenses against prompt‑injection and adversarial misuse in an environment where answers could influence engineering decisions. fileciteturn1file9turn1file0

Why Mott MacDonald’s approach is noteworthy

1) Architecting for governed knowledge, not just convenience

EMMA isn’t a consumer chatbot; it’s a governed knowledge service. That distinction shapes the design: the agent only retrieves and synthesizes information from curated corporate sources, and user results are scoped by existing tenancy and access rules. That design reduces the risk that the assistant will surface stale, proprietary, or personally sensitive material, because retrieval respects SharePoint/OneDrive scoping and Graph’s delegated access model. It’s an important operational difference between a convenience‑oriented LLM integration and an enterprise‑grade knowledge assistant.

2) Taking platform risk off the table through “known good” building blocks

By adopting Azure AI Foundry and Microsoft 365 primitives, Mott MacDonald traded custom infrastructure work for platform features that include enterprise SLAs, identity integration, and built‑in evaluation tooling. This reduces time‑to‑value and preserves standardized audit and logging patterns that compliance teams require. For organizations already committed to Microsoft 365, this path minimizes integration friction and lets teams focus on content curation and model evaluation rather than low‑level security plumbing. fileciteturn1file0turn1file2

3) Embedding evaluation and monitoring as a first‑class concern

Mott MacDonald stores analytics (user interactions, query patterns, failure modes) in Azure Database for PostgreSQL and uses red‑teaming to identify vulnerabilities before deployment. This operational emphasis is the difference between a deployed assistant and a maintainable corporate service: continuous tracking of hallucination rates, evidence‑retrieval quality, and human override frequency are the metrics that determine whether an agent is delivering safe, reliable value. fileciteturn1file0turn1file9

Technical anatomy — an engineer’s breakdown

Data ingestion and retrieval

Content sources: SharePoint site collections, project repositories, policy documents, and personal OneDrive files.
Indexing: Azure AI Search is used to create retrieval indexes; metadata and access flags map back to Graph tokens so the agent only surfaces content the querying user can legally access.
Vectorization and RAG: Documents are embedded into vector stores for semantic matching; the retrieval step returns evidence snippets and links that the model must cite or reference in its answer. This grounding mitigates hallucination by tying answers to documentary evidence. fileciteturn1file10turn1file9

Orchestration and reasoning

The Responses API interprets user queries and calls orchestrators in Azure AI Foundry that can:
Route to different model endpoints depending on cost/latency/accuracy.
Invoke tools (e.g., document summarizers, calculators, or custom company APIs) via OpenAPI connectors.
Persist conversational threads to BYO storage for auditing and long‑running tasks.
Copilot Studio or similar low‑code tooling can be layered for domain teams to author agent prompts and flows while pro teams manage production deployment. fileciteturn1file2turn1file16

Safety, evaluation, and operations

AI Red Teaming: adversarial tests simulate prompt injection, data exfiltration attempts, and bad‑actor prompts to validate safe behavior. These tests were used by Mott MacDonald before rollout.
Content Safety: automated filters block unsafe content continuously, reducing exposure to disallowed outputs.
Observability: OpenTelemetry traces and detailed action logs give auditors and incident responders the ability to reconstruct agent decisions. Metrics are stored and analyzed in a PostgreSQL instance to track adoption and risks. fileciteturn1file0turn1file9

Strengths and practical benefits

Faster, more consistent project delivery: EMMA reduces the “search tax” for engineering teams by delivering curated answers, templates, and precedent checks directly in conversational form. This improves throughput and reduces rework.
Improved quality control: A governed answers system enforces that responses align with company technical standards and sustainability policies — especially important for regulated project domains.
Enterprise security posture: By enforcing delegated Graph access, BYO thread storage, and private networking where needed, the design preserves corporate boundaries and supports data residency requirements.
Accelerated time‑to‑market: Using Responses API and Foundry’s built‑in primitives avoids re‑architecting chat semantics and file handling, letting teams concentrate on domain logic. fileciteturn1file0turn1file2
Operational observability and audit trails: Longitudinal analytics and traces mean leaders can measure adoption, accuracy, and human intervention rates — critical for continuous improvement and compliance reporting.

Risks, gaps and open questions

While EMMA demonstrates a thoughtful enterprise approach, several risk areas remain for any organization deploying agentic systems at scale.

1) Model hallucination and decision criticality

Large language models can produce plausible but incorrect assertions. In engineering contexts, an incorrect recommendation could impact safety, cost, or regulatory compliance. EMMA’s grounding and human approval gates reduce but do not eliminate that risk. Continuous evaluation, strict human‑in‑the‑loop policies for high‑impact actions, and measurable quality gates (e.g., required evidence cites, human edit rates) are essential.

2) Data exposure and misconfiguration

Agents that bridge multiple systems can increase the attack surface. If MCP connectors, token scopes, or Graph permissions are misconfigured, an agent may access or leak restricted data. Best practice includes ephemeral credentials for tool endpoints, just‑in‑time escalation, and tenant‑level data classification enforced by Purview‑like tooling. Enterprises should validate the precise enforcement points for BYO storage and egress controls before enabling any write‑back features. fileciteturn1file9turn1file13

3) Agent sprawl and lifecycle management

Once agents are easy to create, “agent sprawl” — hundreds of small, lightly governed assistants — becomes an operational burden. Organizations need a lifecycle plan: registration, role binding, cost center attribution, versioning, and retirement. Entra Agent IDs and an Agent System of Record are emerging architectural patterns, but operational discipline is the decisive factor. fileciteturn1file4turn1file13

4) Vendor lock‑in and platform maturity

Using Foundry, Copilot Studio, and Microsoft‑centric connectors offers speed but also increases dependency on the Microsoft ecosystem. For enterprises that require multi‑cloud resilience or prefer open stacks, this trade‑off must be weighed. Additionally, many agent platform features are still in preview; APIs and capabilities can change during the GA transition, so procurement should account for churn. fileciteturn1file2turn1file14

5) Cost and model economics

Multi‑agent workflows and multimodal models can be expensive when repeated across large user bases. Effective cost control requires hybrid model routing (use smaller models for routine queries, pro models for complex reasoning), caching, and token budget guardrails. Failure to instrument cost visibility leads to surprise spending.

Practical rollout playbook — recommended phased approach

Start with a tightly scoped pilot
Pick a high‑value domain (e.g., a repeatable technical standard, design guideline library) with measurable KPIs.
Limit agent permissions and disable write‑back actions.
Curate and classify knowledge
Inventory critical documents, tag with Purview‑style metadata, and create canonical sources to minimize contradictory inputs.
Design retrieval and grounding
Implement RAG with evidence citation, require the model to return the document reference for any prescriptive advice.
Harden identity and tool access
Use Microsoft Graph delegated tokens, apply least privilege, and require ephemeral credentials for external connectors.
Adversarial testing and red‑teaming
Conduct prompt injection and misuse simulations. Use red‑teaming outputs to add prompt shields and refractory checks.
Measure rollout metrics
Track human override rates, hallucination incidents, time‑saved metrics, and adoption trends stored in an observability backend.
Scale only with lifecycle governance
Register agents in an Agent System of Record, assign owners, cost centers, SLOs, and decommission stale agents regularly. fileciteturn1file9turn1file13

EMMA in industry context — what Mott MacDonald’s case tells us

Mott MacDonald’s EMMA project is emblematic of a broader enterprise shift: organizations are moving from disconnected copilots and point solutions to centrally governed enterprise agents that combine retrieval grounding, identity‑first access, and observability. Microsoft’s Azure AI Foundry (and the related agent runtime and Copilot Studio patterns) provide the primitives many organizations need — model catalogs, agent orchestration, MCP/A2A protocols, and identity integration — but success hinges on disciplined governance and operational design beyond the tech stack. The practical lesson is straightforward: platform features accelerate delivery, but operational rigor determines safety and sustained business value. fileciteturn1file2turn1file13

Final assessment — strengths, cautions and where to watch next

Mott MacDonald’s EMMA is a model example of responsible, enterprise‑grade agent adoption because it:

Prioritizes curated knowledge and delegated access over open web scraping.
Embeds safety engineering (red‑teaming and content safety) and observability early.
Leverages platform features to accelerate delivery without reinventing core identity and telemetry controls. fileciteturn1file0turn1file9

Caveats remain:

Agents are not a drop‑in replacement for professional judgement, especially in engineering disciplines where a single incorrect recommendation can have outsized consequences.
Platform preview features and evolving standards mean integration and API stability must be managed with careful change control.
Cost, agent sprawl, and precise legal/regulatory obligations (regional AI regulation or sectoral rules) require continuous governance investment. fileciteturn1file14turn1file9

Where to watch next:

Standards and interoperability — adoption and maturity of Model Context Protocol (MCP) and Agent‑to‑Agent (A2A) protocols will determine how portable agent investments become.
Operational tooling — enhancements in Copilot Studio, Foundry observability, and Entra Agent management will reduce the friction of scaling safely. fileciteturn1file16turn1file4
Regulatory frameworks — how regional laws (data residency, AI auditing requirements) influence on‑premises vs. hosted design patterns will reshape architecture decisions for global consultancies.

EMMA is not just a technology project — it is an organizational program for making knowledge actionable while preserving the governance and trust that high‑stakes engineering work demands. For enterprises building their own assistants, the Mott MacDonald playbook is a practical blueprint: prioritize curated knowledge, embed safety and observability from day one, and treat agents as managed, auditable assets rather than ephemeral chat experiments. fileciteturn1file0turn1file9

Source: Microsoft Driving technical excellence at Mott MacDonald by building an enterprise agent on Azure | Microsoft Customer Stories

Background​

What EMMA is and how it works​

High‑level design​

Key platform choices and why they matter​

Why Mott MacDonald’s approach is noteworthy​

1) Architecting for governed knowledge, not just convenience​

2) Taking platform risk off the table through “known good” building blocks​

3) Embedding evaluation and monitoring as a first‑class concern​

Technical anatomy — an engineer’s breakdown​

Data ingestion and retrieval​

Orchestration and reasoning​

Safety, evaluation, and operations​

Strengths and practical benefits​

Risks, gaps and open questions​

1) Model hallucination and decision criticality​

2) Data exposure and misconfiguration​

3) Agent sprawl and lifecycle management​

4) Vendor lock‑in and platform maturity​

5) Cost and model economics​

Practical rollout playbook — recommended phased approach​

EMMA in industry context — what Mott MacDonald’s case tells us​

Final assessment — strengths, cautions and where to watch next​

Similar Posts