Using CAN FD for Remote Hardware Debugging of Cortex-M Devices

Today’s projects and systems get more and more complex. Many systems include multiple MCUs, connected with a field bus or network, for example CAN. For example there can be up to 70 CAN nodes in modern cars. Such larger and connected systems are a challenge for debugging.

Traditional hardware debugging requires a hardware debug probe, connected with a dedicated SWD/JTAG debug cable to the target device. This needs dedicated pins on the target device plus physical access to the device itself. In many cases, this is not possible in the final product. The hardware debug probes, cables, pins and high speed signals are costly. And worse they can introduce new problems and are prone to interference.

If there is a field bus like CAN connecting a…

If there is a field bus like CAN connecting all the MCUs, why not use it for hardware debugging? Hardware debugging meaning programming the FLASH memory, halt the MCU, inspect the memory and registers, and step through the code?

Cortex-M Hardware Debugging over CAN Yes, we can! With the help of a rather unknown hardware feature on ARM Cortex-M devices. We can use the ARM DebugMonitor Interrupt to control and debug the target system. As we would use a JTAG/SWD connection. Instead, we use the CAN bus :-).

Outline

Hardware debugging usually means using a physical hardware debug probe with a dedicated debug protocoland a debug connector. The host usually uses USB and CMSIS-DAP to the debug probe. The debug probe requires dedicated target pins for JTAG or SWD (Single Wire Debug).

The approach presented here is a solution to use the CAN bus for hardware debugging. The bus is used both for the normal system communication, but as well transmitting the debug protocol messages.

This is very useful if:

normal debug port and pins are not available or disabled.
remote system or devices are nodes on a field bus like CAN or any other serial bus.
there is no space available for a physical debug probe.
we have production units where there is hardware debug connector.
external noise or interference affect the signals of a normal debug probe connection. The proposed solution uses a special ARM Cortex-M feature: the ARM DebugMonitorhardware. Using a special gateway, the CMSIS-DAP debug protocol gets translated between USB and the CAN network. Normal tools like gdb or VS Code are used on the host. On the target systems, a framework with the necessary hooks and code gets added, allowing hardware debugging. During debugging, the framework keeps the CAN communication going.

In my opinion, the DebugMonitorfeature of ARM Cortex-M is one of the most underrated features ARM has implemented. The DebugMonitor is available on all ARM-Cortex M, except for M0/M0+.

The framework with sources, examples and documentation is available on GitLab (see links at the end of the article).

Credits and Kudos to Simon Kathriner. He implemented the framework with the example applications. His work was part of his Master of Science in Engineering thesis. He completed this at the Lucerne University of Applied Sciences and Arts. Check out his documentation and presentation for the Embedded Computing Conference 2025.

If you want to see how this works, have a look:

We used the NXP LPC55S16-EVK in this project, as it is inexpensive and readily available. But the concept worked as well on a smaller custom PCB with the LPC55S16 MCU on it:

Custom PCB with CAN CMSIS-DAP Gateway using NXP LPC55S16 MCU.

Architecture

The system has three major parts:

System Architecture (Source: Simon Kathriner)

Host: Linux/Windows/Mac host machine for development. Uses normal development tools (CMake, git, gcc/gdb with command line tools or IDE, e.g. VS Code). The host debug system uses normal CMSIS-DAP capable debugging tools. Standard tools like gdb, Visual Studio Code, NXP Linkserver, pyOCD and others are used.
CAN Debug Gateway: Interface between the Host (USB) and the field bus. For the research project, the LPC55S16-EVK plus a custom LPC55S16 PCB have been used. The Gateway presents itself as a CMSIS-DAP debug probe to the host. The CMSIS-DAP messages then are sent between the USB and CAN bus. The Gateway can handle multiple CAN nodes and makes the address mapping. The host supports multiple gateways with multiple debug sessions in parallel.
Target: One or more devices on the CAN bus or CAN nodes which then get debugged. The node is linked with a small debug agent which implements a virtual CMSIS-DAP debug probe. The debug agent includes standard debugging features (read/write memory and hardware registers, stopping the target, stepping). The agent library or framework includes an optional flash programming module.

Hardware Setup

The example uses the NXP LPC55S16-EVK with the NXP FRDM-MCXN947 board.

The LPC55S16-EVK uses a DB9 connector for the CAN signals on J19:

CAN Connector on LPC55S16-EVK with CANH and CANL (Source: Schematics/NXP) On the FRDM-MCXN947, the CAN signals are on a pin header J10:

CAN Connector on FRDM-MCXN947 (Source: Schematics/NXP) The picture below shows the FRDM-MCXN947 as target on the left and the LPC55S16-EVK as gateway on the right. This is just a simple example with one gateway and one CAN node.

Hardware Setup for Hardware Debugging over CAN

Framework

The target application gets extended with a debugging framework. It adds 8 KByte FLASH and 350 bytes of RAM. The flash driver is only necessary if flash programming shall be performed with the debugger. With the added flash programming, it needs 11 KByte of FLASH and 11 KByte of RAM. It is the same amount of RAM, because during programming the framework plus CAN driver/stack needs to run in RAM.

CAN Debug Framework (Source: Simon Kathriner) The application CAN Driver/Stack initializes the framework. From the CAN interrupt, CMSIS-DAP Messages for the debug framework are passed to the Debug Agent. The framework utilizes the special Debug Monitor Interrupt which is implemented on all ARM Cortex-M except the ARM Cortex-M0/M0+.

Virtual Debug Probe

With the framework, the CMSIS-DAP protocol and commands are sent via the CAN bus. A Virtual Debug Probe is a part of the framework running on the target. It translates the Debug Port (DP) and Access Ports (AP) commands. These commands are converted to the registers and memory locations of the target system.

Normal debugging is controlled by the ARM DHCSR(Debug Halting and Control Status Register). In the case of the DebugMonitor it is controlled by the DEMCR(Debug Exception and Monitor Control Register). What the virtual Debug probe does is a mapping events/settings from the DHCSR to the DEMCR:

Mapping DHCSR and DEMCR With the help of the CMSIS-DAP commands, the debugger can halt the target or trigger debugging events. In the case of the virtual debug probe, it results in firing DebugMonitor Exceptions on the target.

Debug Monitor Exceptions

Whenever a debug monitor exception is raised, the core does a context switch. With the debugger, we do not want to see the context of the DebugMon exception. We want to see the state and call stack of the application under debug. For this, the DebugMonitor interrupt handler analyzes the current state. It sets the correct context to show the current stack and frame context for the debug session:

__attribute__((naked)) void DebugMon_Handler(void)
{
// Save general-purpose R4 to R11 which were not pushed to stack.
__ASM volatile("STM %0, {r4-r11}" : : "r"(&applicationContext.r4) : "memory");

/**
* Updates the condition flags based on the bitwise AND operation
* to check see if bit 2 of the link register is set or not.
* At this point the Exception Return Payload (EXC_RETURN) is stored in in LR.
*/
__ASM volatile("tst lr, #4");

/**
* The bit marks if which of the two stack pointer was active before the debug event:
* MSP (Main Stack Pointer) or PSP (Process Stack Pointer)
* If EQ (bit2 == 0) then move the MSP to R0 otherwise the PSP."
*/
__ASM volatile("ite eq \n"
"mrseq r0, msp \n"
"mrsne r0, psp \n");

// Branch to C function target address
__ASM volatile("b enterDebugMode");
}

The crucial point is that the target remains in the DebugMonitorexception. This state continues as long as the debugger assumes the target is halted. Certain interrupts are still firing in this state (see later). The DebugMonitor handler does this with triggering various debug monitor debug events. The code below handles this for halting the target, hitting a breakpoint and stepping through the code:

void enterDebugMode(contextStateFrame_t* frame)
{
// Copy context state
memcpy(&applicationContext, frame, sizeof(contextStateFrame_t));

// Use current SP value to calculate where it was before the interrupt
applicationContext.sp = (uint32_t)(frame + 1);

// Unless that handler clears MON_STEP, returning from the handler performs the next debug monitor
// step.
CoreDebug->DEMCR &= ~(CoreDebug_DEMCR_MON_STEP_Msk);

// Set monitor request flag in case the exception was triggered by a breakpoint (or similar)
CoreDebug->DEMCR |= CoreDebug_DEMCR_MON_REQ_Msk;

// Make sure that the target stays in ISR until it gets resumed
while (CoreDebug->DEMCR & CoreDebug_DEMCR_MON_REQ_Msk)
;
...
}

Flash Programming

Flash programming is essential for a debug session, as it prepares the target with a new binary for debugging. Typically it is using the following concept:

The debugger uses commands or operations like ‘Erase’, ‘WritePage’ or ‘Verify’.
The debugger writes some code (‘Flash programming applet’) into the RAM. It needs to be in RAM because usually the flash memory is not available during FLASH programming.
The programming applet communicates with the debugger to get the data and programs the memory.
After that, the target resumes. Flash programming of the device is very vendor and device specific. In the case of the NXP LinkServer flash programming, it uses a ‘mailbox’ system. The flash programming driver has been ported and adapted. For example, usually a flash programming driver disables all interrupts. In our case this is not a solution as we still need to use the CAN interface. Additionally the CAN CMSIS-DAP messages need to use the mailbox mechanism

Interrupts

There are two important interrupts in the target system: the communication(CAN) interrupt and the DebugMonitorinterrupt.

It is very important to understand that the debugging happens within the DebugMonitor ISR (Interrupt Service Routine). And that critical interrupts like the one for CAN communication still needs to be firing.

With this, the CAN ISR needs to have a higherurgency than the DebugMonitor ISR, as shown below:

Interrupt Priorites (Bare-Metal) (Source: Simon Kathriner) Lower priorities than the DebugMonitor ISR are blocked during debugging. Same as you would debug in a normal way. Consequently, all interrupts with a higher urgency than the DebugMonitor interrupt are not affected or blocked by the DebugMonitor. Which means that they still continue to run.

In an RTOS environment, for example with FreeRTOS, the ARM MaxSyscallPriority comes into play. See the ARM Cortex-M Interrupts and FreeRTOS: Part 3 series for details.

If the CAN ISR is using RTOS functionalities, then the ISR has to be at or below the MaxSyscall Priority, as shown below:

Possible Interrupt Priorities with RTOS (Source: Simon Kathriner) However, in that case it will not be possible to step through the RTOS critical section code. Because the RTOS (FreeRTOS in this case) masks all Interrupts below MaxSysCall Priority (or MAX_SYSCALL_INTERRUPT_PRIORITY in the FreeRTOS configuration).

So if the developer wants to step through the RTOS critical section code:

Make sure that the application CAN interrupt does not use RTOS functions.
For the DebugMonitor, assign an ISR urgency higher (numerically lower) than MaxSysCall
For the CAN ISR priority, assign an ISR urgency higher (numerically lower) than DebugMonitor ISR. With this, the framework allows debugging and stepping through the RTOS critical sections.

Configuration

The framework is configured with two header files:

config/DAP_config.h for the CMSIS-DAP part:

Timing settings
Packet size
Capabilities of the probe (JTAG, SWD, SWO, UART, ..)
Probe name, serial number and target device information config/DebugAgentConfig.h configures the target debug agent:
default CAN ID(s) of the target(s)
CAN interface settings, for example speed Have a look at the settings and comments in the above files on GitLab for details.

Integration Framework into Application

The framework itself is a git repository. That repository can be added as sub-repository to the application (or CAN node) project. It is built with the debug-agent.cmake file. See the examples on GitHub how this can be done.

The application CMakeLists.txt includes that file:

include of framework cmake file That way the framework gets compiled and added to the application.

Framework initialization

The CAN driver initialization routine has to be extended to call the debug framework initialization inside CAN_Init():

DebugAgent_Init(&enableInterrupts, NULL);

It passes a function pointer which enables the interrupts plus optionally (can be NULL) a logger function pointer.

Memory Map

In case the DebugMonitor and framework shall support flash programming, it needs to stay in RAM. Linker scripts are provided to place the framework in RAM. Below it shows an example for the MCXN947 MCU.

It relocates the vector table to RAM. All the agent code plus the FLASH programming driver are placed in RAM too:

Framework in RAM (MCXN947) (Source: Simon Kathriner)

💡 It would be possible to reduce the RAM usage by placing only the necessary parts into RAM.

Note that placing the code in RAM is only needed if the agent includes target system FLASH programming. Otherwise the framework code can be placed in normal FLASH memory and needs less RAM.

Performance

From a user perspective, it works like debugging locally. You can see this from the video at the beginning of this article. You can use typical command line tools for programming and debugging, or even GUI tools like Visual Studio Code.

The measure the performance, debugging with the on-board probe (O) and the debug monitor agent (A) has been performed. For this, the number of CMSIS-DAP/CAN packets (max size 64 bytes) has been counted. Additionally the time for typical action has been measured:

Performance Comparison (Source: Simon Kathriner) It is notable that the agent is using more packets and needs more time. The added time can be notable, but is not considered as slow compared to the normal operation.

The same way, the flash programming has been compared:

Flash programming performance (Source: Simon Kathriner) Similar picture here: using the CAN network it needs more packets. Here we see about twice the time compared with the on-board debug method.

USB and CMSIS-DAP State Polling

The traffic on the CAN bus has been analyzed. CMSIS-DAP is very much assuming an USB connection, and not a more limited connection like CAN. What we observed is that the host with CMSIS-DAP constantly polls the target about its state (running, halted, ..). This causes a ~9% CPU load on the target agent and around ~30% CAN bus load. On the USB bus, these status queries do not affect the target system all. But it has an impact on the CAN bus. This is because they occur between the host and the hardware debug probe. But in our case it affects both the shared CAN bus and the CPU load.

Clearly, the status messages should be reduced. Currently, there is no way on the host or CMSIS-DAP size to limit the frequency or amount of messages. One solution discussed is to filter out some of the frequent status messages on the CAN gateway. This would greatly reduce the traffic and the number of status queries over the CAN bus.

Pros and Cons

In this section the pros and cons of the proposed approach with the CAN DebugMonitor framework is compared with ‘normal’ on-board hardware debugging:

Pros

can debug target remotely, including flash programming
no debug header/connector or pins needed on target
shared communication interface (no dedicated debug interface needed)
can use standard debugging tools (gdb, VS Code, …) on the host
no expensive debug probe is required: an inexpensive eval board with CAN and USB can be used
reasonable debugging speed, even for flash programming
framework generic for all Cortex-M (except M0/M0+)

Cons

needs bandwidth on the communication channel
requires extra FLASH and RAM on target system (less RAM required if flash programming is not needed)
DebugMonitor and CAN interrupts are still running and can’t be debugged
flash programming depending on MCU device and vendor
ARM Cortex-M0/M0+ does not have the needed DebugMonitor hardware

Summary

The ARM Debug Monitor feature enabled us to develop a framework and examples for hardware debugging over CAN. We can use normal tools like gdb and VS Code. With a CMSIS-DAP gateway, we perform hardware debugging on remote target nodes, including flash programming.

We do have several ideas and proposals for future work:

reduce the number of CMSIS-DAP messages from the gateway to the CAN bus
add flash programming support for more devices
add more and different target communication interfaces: UART, Ethernet, USB, …
reduce memory footprint of the framework
Zephyr RTOS support and example So what do you think? Is that approach something you would consider for your projects? Have you ever used the ARM Cortex-M DebugMonitorfeature? Post a comment and let us know!

Happy Hardware Debugging 🙂