The ESP Private Agents platform aims to ease the development of ESP32-based AI voice assistants with on-device processing

Espressif has just introduced the ESP Private Agents platform design to help developers build local, private, and customizable AI assistants for ESP32 devices running on-device, although they can also support hybrid AI workloads with a mix of on-device and cloud processing.

The ESP Private Agents platform offers a unified framework that allows developers to build applications combining speed, vision, automation, and agent-based interactions, for example, a multi-lingual, on-device voice agent (aka smart speaker) or task-oriented agents that automate workflows.

The solution is built on AWS cloud services using AWS Fargate as a primary application platform and Amazon Bedrock Foundation Models as backend LLM systems. It not only works with ESP32-power devices with speaker and microphon…

High-level architecture of ESP Private Agents

Espressif released a Web-based demo, which you can use as a text-based chatbot or as a voice assistant leveraging the speaker and microphone on your computer. The company said that for production cases, customers can deploy the solution to their own AWS account. I gave it a try in Firefox on Ubuntu 24.04. After logging in through ESP Rainmaker, I was able to use the chatbot just fine.

When I clicked on the microphone to switch audio one, the assistant repeated the answers in voice form, but it was unable to hear despite detecting my microphone. I could click on the microphone, talk, and press the stop button to send the audio, but nothing happened. The microphone is working on my computer, but maybe ESP Private Agents don’t play nice with Firefox.

Nevertheless, here’s a much more interesting demo using the EchoEar hardware as a multi-lingual AI voice assistant speaking in English, Hindi (TBC), German, and Spanish as different speakers take turns.

The announcement on Espressif’s developer blog explains in more detail the steps required to create your own AI agent and related hardware. Here’s a summary.

Creating an AI agent:

LLM Selection from a range of supported AWS Bedrock Foundation Models, each with its own performance, cost, and behavior.
System Prompt – It defines the agent’s behavior and establishes its persona, such as a voice controller, storyteller, or customer support assistant.
Tools – These are pluggable actions that an agent can invoke to perform specific tasks, for example, ESP RainMaker control, Volume Control, and Emotion Detection. Two types of tools are available:

Remote Tools compatible with the Model Context Protocol (MCP)
Local Tools executed directly on the client, such as the IoT device itself or a companion mobile application. One example would be turning a light on or adjusting a cooling fan speed.

Once an agent is defined, you can test it directly from the web dashboard. Once you are satisfied with the results, you can carry on with development on real hardware using one of the three supported development kits: EchoEar, ESP32-S3-Box, or M5Stack CoreS3, and perform the following steps:

Program the firmware – The solution will generate source code and binary for the firmware, which you can flash from your web browser. Two types of firmware are available for now: Generic Assistant or voice-controlled Matter controller with Thread support. More details can be found on GitHub.
Provision the device using the ESP RainMaker Home app
Configure a new Agent into the Device – Optional. This is to change the default Agent running on the device using a QR code
Interact with the device using voice

Visit agents.espressif.com to get started.

Jean Luc Aufranc

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress. We also use affiliate links in articles to earn commissions if you make a purchase after clicking on those links.

Post navigation

Similar Posts