Confidential Computing To Secure AI Workloads

Artificial Intelligence (AI), data analytics, and high-performance computing (HPC) are transforming industries such as healthcare, finance, and manufacturing. These workloads rely on distributed systems managing massive datasets with high reliability. As computational demand grows, so does the need for end-to-end data protection.

Traditional security addresses Data at Rest (DAR) and Data in Motion (DIM) through encryption and secure protocols. Yet Data in Use (DIU), data actively processed in memory, remains the weakest link.

Fig. 1: The three stages of data protection: DAR, DIM, and DIU.

Forecasts by Forbes and…

Traditional security addresses Data at Rest (DAR) and Data in Motion (DIM) through encryption and secure protocols. Yet Data in Use (DIU), data actively processed in memory, remains the weakest link.

Fig. 1: The three stages of data protection: DAR, DIM, and DIU.

Forecasts by Forbes and Gartner project sustained double-digit growth in AI and HPC investment through 2027 [1], emphasizing AI security and trusted execution as key enablers. As workloads expand across hybrid and multi-tenant environments, protection must extend beyond storage and network layers.

Confidential Computing addresses this by safeguarding data during computation. It employs Trusted Execution Environments (TEEs), hardware-protected enclaves that isolate code and data from untrusted software, including operating systems and hypervisors. Together with memory encryption, they ensure sensitive data remains secure throughout its lifecycle.

However, flawed TEE implementations can introduce vulnerabilities. Attacks such as TEE.fail [2] and Battering RAM [3] show that side-channel or bus-level exploits can extract secrets during execution. This highlights the need for TEEs to minimize off-die data exposure and define verifiable hardware boundaries.

This article examines how Confidential Computing principles are applied in heterogeneous architectures to secure AI workloads, covering architectural trends, DIU protection mechanisms, and implications for chip and system designers.

Hardware architecture trends for AI and HPC

Modern compute systems are increasingly heterogeneous. While Central Processing Units (CPUs) remain the general-purpose backbone, AI and HPC workloads rely on specialized accelerators such as Graphics Processing Units (GPUs), Neural Processing Units (NPUs), Data Processing Units (DPUs), and Domain-Specific Accelerators (DSAs) to deliver massive parallelism and energy efficiency. This integration introduces complex data flows across CPU cores, high-bandwidth device memory, and peer-to-peer fabrics [4].

Fig. 2: Example of heterogeneous computing.

Performance-driven designs minimize CPU involvement in data movement, using methodologies like:

Unified Memory Architecture (UMA): Shared memory between CPUs and XPU’s/accelerators to reduce redundant transfers [5].
Computational Storage: Processing near storage devices to alleviate network data motion [6].
GPUDirect Remote Direct Memory Access (RDMA): Direct GPU-to-device communication, bypassing CPU and system memory for lower latency [7].

While these optimizations boost throughput, they expand the attack surface. Unified memory migrations must preserve confidentiality; GPUDirect transfers require authentication and encryption; and storage controllers performing compute become part of the trust perimeter. In short, the performance-optimized data path must also be the secure data path.

Data-in-Use protection through Confidential Computing

Conventional encryption protects DAR and DIM but leaves data exposed during computation, and AI workloads often keep training data and model parameters in plaintext in memory.

Confidential Computing addresses this by using TEEs since they provide [8], [9]:

Hardware-based isolation: TEE ensures sensitive data stays within a secure boundary.
Attestation: Verifies TEE integrity before execution.
Transparent encryption: Protects memory and interconnect traffic outside the enclave.

Early TEEs (Intel SGX, AMD SEV-SNP, Arm CCA [10], [11]) focused on CPUs, but as AI workloads moved to GPUs and NPUs, equivalent protections became essential.

Confidential Computing in heterogeneous systems

AI workloads in regulated industries must meet compliance frameworks such as GDPR, HIPAA, and PCI-DSS, which require protecting data during processing, not just storage or transport [12].

Modern Confidential Computing therefore extends TEEs from CPUs to XPUs/accelerators. A CPU enclave establishes trust with an XPU/accelerator through remote attestation and encrypted command channels. Data is decrypted only inside XPU/accelerator memory, processed on-die, and immediately re-encrypted when leaving the XPU/accelerator.

Heterogeneous Isolated Execution (HIX) [13] and Confidential GPU Computing for Arm CCA [14] extended enclave principles to XPUs/accelerators via modified interconnects and drivers [13]. Commercial designs now incorporate these ideas. NVIDIA’s GB10x and GB20x Hopper architecture, for instance, supports secure boot, attestation, device certificates, and AES-GCM–protected CPU–GPU links [15].

The threat model mirrors that of CPU TEEs: adversaries may control drivers, OSs, or hypervisors and observe physical buses like PCIe. Attackers target DMA buffers and residual data. Although probing HBM directly is impractical, interconnects and DMA paths must be assumed observable and protected [15].

Key considerations for securing AI workloads

Securing AI workloads involves more than encryption. It requires balancing throughput, energy efficiency, and compliance while mitigating real-world attack vectors. The key considerations include:

CPU-Centric Bottlenecks: If decryption occurs in the CPU before XPU/accelerator use, plaintext appears too early, breaking DIU protection and introducing latency [14].
Multi-Tenant Isolation: Shared XPU/accelerators can expose artifacts from prior workloads if memory isn’t fully cleared. Per-context encryption mitigates this risk more efficiently than full scrubbing [15].
Scalability of Security Architecture: As systems adopt chiplets and disaggregated memory, fixed-function encryption units must scale with bandwidth and context count without degrading performance [16].
Side-Channel Leakage: Timing, power, and bus variations can reveal neural network inputs or parameters. Ignoring physical side channels effectively leaves them open [17].

Recommendations for next-generation architectures

Security must be a fundamental architectural property, not an add-on. The following principles guide data-in-use protection for heterogeneous AI systems:

Decrypt where you compute: Decrypt only within the XPU/accelerator’s trusted boundary and re-encrypt immediately afterward. This ensures the secure path equals the performance path.

Encrypt device memory per execution context: Treat XPU/accelerator memory as confidential. Per-tenant or per-context keys render residual data meaningless once the session ends, avoiding performance penalties of global scrubbing.

Implement context-aware key management: Associate encryption keys with specific execution contexts rather than static memory regions, maintaining isolation aligned with VM or process identity.

Modularize the security engine: Inline encryption units should be modular and decoupled from memory controllers, enabling independent scaling of cryptographic throughput and algorithms as threat models evolve.

Secure interconnects and DMA paths: High-speed fabrics like PCIe and Compute Express Link (CXL) must fall within the TEE boundary. Use link-level integrity and encryption—e.g., PCIe Integrity and Data Encryption (IDE)—and bind DMA mappings to attested sessions.

Mitigate side channels systematically: Employ constant-time execution, randomized scheduling, and isolation of shared telemetry. Assume observability and mitigate leakage at design time.

Strengthen TEE implementation boundaries: Lessons from TEE.fail and Battering RAM underscore the need for strict physical isolation of sensitive state. Critical TEE data and keys should remain in on-die SRAM, inaccessible to external buses. Combined with inline encryption, this mirrors a hardware Root of Trust model.

Conclusion

The central principle is simple yet decisive: decrypt as close to the compute engine as possible. Applied consistently, this aligns high-performance AI design methodologies, multi-level memory hierarchies, and interconnects with the confidentiality demands of modern data ecosystems.

Per-context memory encryption prevents residual data leakage; attestation transforms trust into verifiable proof; link-level encryption unifies performance and protection; and modular cryptographic engines enable future evolution.

As threats evolve—through new side channels, fabrics, and chip packaging—the response must be proactive architectural implementation, not reactive patching. Compute elements must prove integrity; memory must enforce isolation by default; and interconnects must assume observation.

When implemented according to these principles, Confidential Computing turns trust from a software convention into a hardware guarantee, allowing AI systems to achieve both high performance and verifiable security—demonstrating that throughput and trust can advance together rather than compete.

References

K. Haan, “22 Top AI Statistics And Trends,” Forbes Advisor, Oct 2024. [Online].
B. Toulas, “TEE.Fail attack breaks confidential computing on Intel, AMD, NVIDIA CPUs,” Bleeping Computer, Oct 2025. [Online]. Available: https://www.bleepingcomputer.com/news/security/teefail-attack-breaks-confidential-computing-on-intel-amd-nvidia-cpus/
R. Lakshmanan, “New $50 Battering RAM Attack Breaks Intel and AMD Cloud Security Protections,” The Hacker News, Sep 2025. [Online]. Available: https://thehackernews.com/2025/10/50-battering-ram-attack-breaks-intel.html
A. Krishnakumar, U. Ogras, R. Marculescu, M. Kishinevsky and T. Mudge, “Domain-Specific Architectures: Research Problems and Promising Approaches,” ACM Transactions on Embedded Computing Systems, 2023.
R. Landaverde, T. Zhang, A. K. Coskun and M. Herbordt, “An investigation of Unified Memory Access performance in CUDA,” in 2014 IEEE High Performance Extreme Computing Conference (HPEC), 2015.
Storage Networking Industry Association (SNIA), “What Is Computational Storage,” [Online]. Available: https://www.snia.org/education/what-is-computational-storage
NVIDIA Corporation, “GPUDirect RDMA 13.0 documentation,” [Online]. Available: https://docs.nvidia.com/cuda/gpudirect-rdma/
Wikipedia, “Confidential Computing,” [Online]. Available: https://en.wikipedia.org/wiki/Confidential_computing
C. C. Consortium, “Introduction to Confidential Computing: A Year-Long Exploration,” Feb 2024. [Online]. Available: https://confidentialcomputing.io/2024/02/27/introduction-to-confidential-computing-a-year-long-exploration-2/
L. Zhou, “Confidential computing solution case studies(Intel SGX, AMD SEV-SNP and ARM CCA comparison),” Medium, May 2022. [Online]. Available: https://medium.com/@zlhk100/confidential-computing-solution-case-studies-intel-sgx-amd-sev-snp-and-arm-cca-comparison-b29d53401f77
C. d. Dinechin, “Confidential computing platform-specific details,” Red Hat, Jun 2023. [Online]. Available: https://www.redhat.com/en/blog/confidential-computing-platform-specific-details
V. M. Kancherla, “Regulatory Challenges in AI-Powered Cloud Automation: Balancing Innovation and Compliance,” International Journal of AI, Big Data, Computational and Management Studies, 2024.
I. Jang, A. Tang, T. Kim, S. Sethumadhavan and J. Huh, “Heterogeneous Isolated Execution for Commodity GPUs,” in ASPLOS ’19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.
NVIDIA Corporation, “NVIDIA Secure AI with Blackwell and Hopper GPUs,” 2025. [Online]. Available: https://docs.nvidia.com/nvidia-secure-ai-with-blackwell-and-hopper-gpus-whitepaper.pdf
“Multi-tenant Computing Security Challenges and Solutions,” in Journal of Hardware and Systems Security, 2024.
S. Sadr and R. Lin, “Securing the New Frontier: Chiplets & Hardware Security Challenges,” Universal Chiplet Interconnect Express, 2024. [Online]. Available: https://www.uciexpress.org/post/securing-the-new-frontier-chiplets-hardware-security-challenges
S. Tizpaz-Niari, P. Černý, S. Sankaranarayanan and A. Trivedi, “Quantitative estimation of side-channel leaks with neural networks,” International Journal on Software Tools for Technology Transfer , 2021.
Q. Wang and D. Oswald, “Confidential Computing on Heterogeneous CPU-GPU Systems: Survey and Future Directions,” Aug 2024. [Online]. Available: https://arxiv.org/abs/2408.11601