pico2-swd-riscv
A stateful SWD protocol implementation for debugging RP2350 RISC-V cores (Hazard3) from any Raspberry Pi Pico2 (target) using GPIOβs on another Pico (probe).
0. VIBE CODE WARNING
About 80% of the code is vibe coded; The readme is almost completely generated. I spent many nights with the oscilloscope and the docs and made a working prototype that was able ti do sba/read/write regs and do abstract commands and progbuf, the rest was done with claude code. The tests are quite comprehensive test suite and I use the core of the library in my own projects, but, as they say, βhic sunt draconesβ. I also read the readme and the code didnβt notice anything wrong (and removed the wrong/unβ¦
pico2-swd-riscv
A stateful SWD protocol implementation for debugging RP2350 RISC-V cores (Hazard3) from any Raspberry Pi Pico2 (target) using GPIOβs on another Pico (probe).
0. VIBE CODE WARNING
About 80% of the code is vibe coded; The readme is almost completely generated. I spent many nights with the oscilloscope and the docs and made a working prototype that was able ti do sba/read/write regs and do abstract commands and progbuf, the rest was done with claude code. The tests are quite comprehensive test suite and I use the core of the library in my own projects, but, as they say, βhic sunt draconesβ. I also read the readme and the code didnβt notice anything wrong (and removed the wrong/unclear parts).
This project was my casestudy of vibecoding a more complicated project that I dont understand 100% and there is no obvious existing code that can be βusedβ. It started as ~1000 loc that I have written and knew very well, reading the rp2350, arm swd and riscv debug docs, capturing data with oscilloscope and openocd then decoding it and analyzing the wakeup sequence and then read/write commands. After I got it working I gave it to claude to make it into a library that I can use in other projects, and then I slowly built it up.
After about 3-4k lines of code I completely lost track of what is going on, and I woudnβt consider this code that I have written, but adding more and more tests felt βniceβ, or at least reassuring.
There was a some gaslighting, particularly when it misunderstood dap_read_mem32 thinking it is reading from ram and not MEM-AP TAR/DRW/RDBUFF protocol, which lead to incredible amount of nonsense.
Overall I would say it was a horrible experience, even though it took 10 hours to write close to 10000 lines of code, I donβt consider this my project, and I have no sense of acomplishment or growth.
In contrast, using AI to read all the docs (which are thousands of pages) and write helpful scripts to decode the oscilloscope data, create packed C structs from docs and etc, was very nice, and I did feel good after. The moment I read the first register and then when I was able to read memory via SBA I felt amazing.
The main issue is taste, when I write code I feel if its good or bad, as I am writing it, I know if its wrong, but using claude code I get desensitized very quickly and I just canβt tell, it βreadsβ OK, but I donβt know how it feels. In this case it happened when the code grew about 4x, from 1k to 4k lines. And worse of all, my mental model of the code is completely gone, and with it my ownership.
The tokens have no reason or purpose, which makes reading code ridiculously difficult, as and each token can be complete nonsense. When reading human code the symbols have a purpose, someone thought βI will put this in a variable, later I will check its status.β, so I pretend I am them, and think why would they have written this? Shortly after I understand, as they are human and I am human. But the AI symbols have no reason, and worse of all, they all look deceptively correct, so I have to think 10 times harder if it is wrong. With any human code (including your own) it is quite easy to gauge how much you can trust it, and it is quite consistent, with the AI code, one function can be much better than what you woudldβve written, and the code 2 lines below can be cargo culted gunk that looks incredibly good, but is structurally wrong.
In the end I would say I have gained good understanding of the wires, timings, and the lower level ap/dp mechanics, sba and progbuf, but I regret not writing the whole thing myself, even if it wouldβve taken 10x the time.
I fucking hate this.
And I can not help, but feel dusgust and shame. Is this what programming is now? I really hope this is some intermediate stage and it changes for the better, the problem is I dont know what βbetterβ is, it seems for some people is not writing the code, for others is not modeling the problem and for third is not having to think. For me, I am not sure, I do want to make things, and many times I dont want to know something, but I want to use it, e.g. the rp2350 usb host controller the way you have to re-arm interrupts and the way the epx register is shared is super annoying, for good reasons probably, but I just want to use it to make my CBI driver.
I guess the question is what is the thing I want to make, because you can go way up the stack, from the USB chip registers to CBI to UFI to FAT16 to the OS of the old school computer I am making, but why stop? make the schematics, the pcbs, the cad files, maybe automatically send it to the factory? and then just ship it to me? but why stop? make my webshop, start selling, make a community, ads, marketing, generate some unboxing videos, maybe some viral memes? process the orders directly to the factory, on demand, if there is an issue, it is ready with customer support.
What do I do in the meanwhile? Sit on the beach? I hate the beach.
Where does it stop?
1. ARCHITECTURAL OVERVIEW
This library implements a complete three-layer abstraction for Serial Wire Debug protocol communication with RP2350βs RISC-V Debug Module, modeled after the Debug Access Port specification and informed by ARM Debug Interface Architecture Specification v5.2.
ββββββββββββββββββββββββββββββββββββββββββ
β Application Layer β
β (User Code) β
ββββββββββββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββββββββββΌββββββββββββββββββββββββ
β Debug Module Layer (rp2350.c) β
β - RISC-V Debug Specification v0.13 β
β - Hart control via DMCONTROL β
β - Abstract commands for GPR access β
β - System Bus Access (non-intrusive) β
β - PROGBUF execution for CSR access β
ββββββββββββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββββββββββΌββββββββββββββββββββββββ
β Debug Access Port Layer (dap.c) β
β - DP/AP register transactions β
β - RP2350-specific DP_SELECT encoding β
β - Bank selection caching β
β - Memory-mapped debug register access β
ββββββββββββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββββββββββΌββββββββββββββββββββββββ
β Serial Wire Debug Layer (swd*.c) β
β - 2-wire bidirectional protocol β
β - PIO state machine bit-banging β
β - Request/ACK/Data phase handling β
β - Parity computation and verification β
β - Line reset and dormant sequences β
ββββββββββββββββββββββββββββββββββββββββββ
The separation of concerns follows classical protocol stack design: each layer exposes a well-defined interface and maintains independent state, with lower layers unaware of higher-layer semantics.
2. RISC-V DEBUG ARCHITECTURE: A FORMAL MODEL
Before examining the protocol implementation, we must establish the theoretical foundations of RISC-V external debugging. This section develops the debug architecture from first principles, following the RISC-V External Debug Support Specification v0.13.
2.A The Hart State Machine
A RISC-V hart (hardware thread) exists in one of three abstract states:
βββββββββββββββ
β RUNNING β
β (Normal) β
ββββββββ¬βββββββ
β
halt_request, ebreak,
trigger_fire, step_complete
β
βΌ
βββββββββββββββ
β HALTED β
β (Debug) β
ββββββββ¬βββββββ
β
resume_request
β
βΌ
βββββββββββββββ
β RESUMING β
β (Transient) β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β RUNNING β
βββββββββββββββ
State 1: RUNNING - The hart executes instructions from main memory. PC advances according to program flow. All architectural state (GPRs, CSRs, memory) is accessible to the executing program.
State 2: HALTED - The hart has entered debug mode. No instructions from main memory execute. The hart is βparkedβ in a special debug ROM or implicit debug loop within the Debug Module. Debug-specific CSRs (DPC, DCSR, DSCRATCH) become accessible.
State 3: RESUMING - A transient state where the hart has received a resume request but has not yet returned to normal execution. This state exists to model the asynchronous nature of resume operations.
2.B The Debug Module: An Independent Controller
The Debug Module (DM) is a hardware block separate from the hart itself. It acts as a βshadow controllerβ that can:
- Observe hart state without halting (DMSTATUS register)
- Command hart transitions (halt, resume, reset via DMCONTROL)
- Access hart registers when halted (abstract commands)
- Access system memory independently of hart state (System Bus Access)
The DM is itself controlled by an external debugger via a Debug Transport Module (DTM). In our case, the DTM is the SWD interface.
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β External Debugger (Host CPU) β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ
β
βΌ SWD Protocol
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Debug Transport Module (DTM) β
β - Exposes DM registers as memory-mapped space β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ
β
βΌ Internal Bus
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Debug Module (DM) β
β ββββββββββββββββ ββββββββββββββββ β
β β Abstract β β System Bus β β
β β Command β β Master β β
β β Engine β β β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β
βββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββ ββββββββββββββββ
β Hart 0 β β System Bus β
β (Hazard3) β β β
βββββββββββββββ€ ββββββββββββββββ
β Hart 1 β
β (Hazard3) β
βββββββββββββββ
2.C Debug Mode: A Privileged Exception Context
When a hart enters debug mode, it is not simply βstopped.β Rather, it enters a special execution context analogous to an exception handler:
- PC is saved to DPC (Debug Program Counter, CSR 0x7b1)
- Privilege level is elevated to M-mode (Machine mode, highest privilege)
- DCSR.cause records the entry reason (halt request, ebreak, trigger, etc.)
- Hart begins executing from the debug exception vector (typically in debug ROM)
The debug exception vector contains a tight polling loop that repeatedly checks for commands from the Debug Module. This loop is architecturally invisible to the debuggerβwe simply observe the hart as βhalted.β
2.D Abstract Commands: The Debug Module API
The Program Buffer (PROGBUF) is a small instruction memory (2-16 entries) within the Debug Module. When abstract commands cannot accomplish a task (e.g., accessing debug-only CSRs), the debugger can:
- Write RISC-V instructions to PROGBUF
- Issue an abstract command with the
postexecbit set - The hart executes PROGBUF instructions while still in debug mode
- The final
ebreakinstruction returns control to the Debug Module
This is not a βcode injectionβ attackβthe hart never leaves debug mode. Itβs analogous to a debugger writing instructions into a trap handlerβs stack frame.
2.F System Bus Access: A Parallel Execution Path
SBA provides a second path to memory that bypasses the hart entirely:
Debugger Commands
β
βΌ
βββββββββββββββ
β DM β
ββββ¬ββββββββ¬βββ
β β
ββββββββββ βββββββββ
β β
βΌ Abstract Cmd βΌ SBA
ββββββββ βββββββββββ
β Hart βββββββββββββββββΆβ Memory β
ββββββββ Hart Accesses βββββββββββ
The hart and SBA compete for memory bus bandwidth. The hartβs view of memory may differ from SBAβs view due to:
- Cache: Hart caches writes; SBA sees stale memory
- MMU/PMP: Hart accesses are translated/protected; SBA bypasses
- Atomicity: Hartβs atomic operations (LR/SC) are invisible to SBA
This is not a bugβitβs a fundamental architectural trade-off. SBA provides speed and non-intrusiveness at the cost of coherency guarantees.
2.G The Debugging Contract
RISC-V debugging rests on several invariants:
Invariant 1: Debug Mode is Atomic While in debug mode, the hart executes no instructions from main memory. The debugger has exclusive control.
Invariant 2: Architectural Transparency Entering and exiting debug mode does not change architected state (except DPC/DCSR). The program cannot detect it was halted (modulo real-time constraints).
Invariant 3: Debug Privilege Debug mode executes at maximum privilege (M-mode). All memory is accessible, all CSRs are readable.
Invariant 4: No Interrupts in Debug Interrupts are masked while in debug mode. The debugger must explicitly re-enable them.
These invariants enable reproducible debugging: halting twice at the same PC should show identical state.
3. THE SERIAL WIRE DEBUG PROTOCOL
Serial Wire Debug (SWD) is a 2-wire replacement for JTAGβs 5-wire interface, developed by ARM. The protocol operates over two signals:
- SWCLK: Clock signal driven by the debugger (host)
- SWDIO: Bidirectional data signal with turnaround phases
3.A Protocol Packet Structure
Each SWD transaction consists of three phases:
Request Phase (8 bits, host drives SWDIO):
Bit 0: Start (always 1)
Bit 1: APnDP (0=DP access, 1=AP access)
Bit 2: RnW (0=Write, 1=Read)
Bit 3-4: A[3:2] (register address bits)
Bit 5: Parity (even parity of bits 1-4)
Bit 6: Stop (always 0)
Bit 7: Park (always 1)
Acknowledge Phase (3 bits, target drives SWDIO):
OK (001): Transaction accepted
WAIT (010): Target requests retry
FAULT (100): Error condition
Data Phase (33 bits, direction depends on RnW):
Bits 0-31: Data word
Bit 32: Parity bit
Turnaround cycles (host releases SWDIO, target can drive) occur between requestβack and during data phase direction changes.
Our implementation of the packet construction is in swd_protocol.c:97-113:
static uint8_t make_swd_request(bool APnDP, bool RnW, uint8_t addr) {
uint8_t a2 = (addr >> 2) & 1;
uint8_t a3 = (addr >> 3) & 1;
uint8_t parity = (APnDP + RnW + a2 + a3) & 1;
uint8_t request = 0;
request |= (1 << 0); // Start bit
request |= (APnDP << 1); // AP/DP select
request |= (RnW << 2); // Read/Write
request |= (a2 << 3); // Address bit 2
request |= (a3 << 4); // Address bit 3
request |= (parity << 5); // Parity
request |= (0 << 6); // Stop bit
request |= (1 << 7); // Park bit
return request;
}
3.B PIO-Based Physical Layer
Unlike software bit-banging (which suffers from timing jitter and CPU overhead), this implementation uses RP2040/RP2350βs Programmable I/O (PIO) blocks for deterministic timing.
The PIO program (swd.pio) implements a command-based interface where each FIFO entry encodes either a command or data payload. Command format:
Bits 0-7: Bit count - 1
Bit 8: Direction (0=input, 1=output)
Bits 9-13: Target instruction address
The state machine operates at 4 cycles per clock period, providing precise SWCLK generation independent of system clock frequency. See swd.pio:45-68 for the complete implementation.
Clock divider calculation (swd_protocol.c:313-330) accounts for this 4-cycle period:
uint32_t clk_sys_khz = clock_get_hz(clk_sys) / 1000;
uint32_t divider = (((clk_sys_khz + freq_khz - 1) / freq_khz) + 3) / 4;
3.C The Dormant State and Protocol Selection
ARM Debug Interface Architecture v6 introduces a Dormant State to enable coexistence of multiple debug protocols (JTAG and SWD) on the same pins. At power-up, RP2350βs SW-DP enters the Dormant state, requiring explicit activation before SWD operations can proceed.
The dormant state solves a fundamental problem: JTAG uses 5 signals (TMS, TCK, TDI, TDO, TRST), while SWD uses 2 (SWCLK, SWDIO). When both protocols share physical pins, the debug port must determine which protocol the debugger intends to use. The solution is to require a protocol-specific βunlockβ sequence that:
- Cannot be generated accidentally by non-debug traffic on the pins
- Is sufficiently long to avoid false positives (128 bits)
- Uniquely identifies the target protocol (JTAG vs SWD)
3.C.1 The State Transition Model
The SW-DP implements a finite state machine with three protocol modes:
Power-On β [Default State] β Dormant
β
ββββββββββββββΌβββββββββββββ
β β
JTAG Activation SWD Activation
Sequence (0x33bbbbba) Sequence (0x1a)
β β
βΌ βΌ
ββββββββββββ ββββββββββββ
β JTAG β β SWD β
β Active β β Active β
ββββββ¬ββββββ ββββββ¬ββββββ
β β
JTAG-to-Dormant SWD-to-Dormant
Sequence Sequence
β β
ββββββββββββ¬βββββββββββββββ
βΌ
ββββββββββββ
β Dormant β
ββββββββββββ
Once activated, the debug port remains in the selected protocol mode until:
- A transition-to-dormant sequence is sent
- Power is cycled
- The external reset (RUN) pin is asserted
3.C.2 The Selection Alert Sequence
Before sending a protocol-specific activation code, ARM requires transmission of a 128-bit Selection Alert Sequence. This sequence serves as a βwake-up callβ that:
- Synchronizes the targetβs bit-stream parser
- Ensures the target is listening for an activation sequence
- Provides sufficient entropy to avoid accidental activation
The Selection Alert Sequence is a fixed 128-bit pattern defined in the ADI v6 specification:
0x19bc0ea2_e3ddafe9_86852d95_6209f392 (transmitted LSB-first)
This constant was chosen for its Hamming distance propertiesβit is unlikely to occur in normal signal traffic or be generated by crosstalk, glitches, or other non-debug activity.
3.C.3 Implementation: Robust Activation Strategy
Our implementation (swd_protocol.c:357-382) uses a defensive activation strategy that ensures reliable connection regardless of the SW-DPβs initial state:
// Phase 1: Exit any prior protocol mode
static const uint8_t seq_jtag_to_dormant[] = {
0xff,0xff,0xff,0xff,0xff,0xff,0xff, 0xbc,0xe3
};
// Phase 2: Activate SWD mode
static const uint8_t seq_dormant_to_swd[] = {
0xff, // Line reset (8 ones)
0x92,0xf3,0x09,0x62,0x95,0x2d,0x85,0x86, // Selection Alert
0xe9,0xaf,0xdd,0xe3,0xa2,0x0e,0xbc,0x19, // (128 bits)
0xa0,0xf1,0xff, // SWD Activation Code (0x1a)
0xff,0xff,0xff,0xff,0xff,0xff,0xff, 0xff, // Line reset (>50 ones)
0x00 // Idle low
};
Why this two-phase approach?
The problem is that we donβt know the SW-DPβs current state:
- Fresh power-up: SW-DP is in Dormant mode (default)
- Prior debug session: SW-DP may be in SWD or JTAG mode
- Failed connection attempt: SW-DP may be in an undefined state
If the SW-DP is already in SWD or JTAG mode, sending the Selection Alert Sequence will be interpreted as data transactions, potentially putting the DP into an error state. Our solution:
Phase 1: Force transition to Dormant
Send the JTAG-to-Dormant sequence (56 ones followed by 0xbc, 0xe3). This sequence:
- If in JTAG mode: transitions to Dormant
- If in SWD mode: interpreted as line reset + invalid transactions (harmless)
- If already Dormant: has no effect (dormant state ignores invalid input)
The sequence consists of:
- 56 clock cycles high (JTAG TMS=1 β Test-Logic-Reset state)
- 0x3cbe (0xbc, 0xe3 LSB-first): JTAG-specific exit pattern
Phase 2: Activate SWD from Dormant
Now that weβre guaranteed to be in Dormant mode (or already in SWD mode where line reset is idempotent), we send:
- Line reset (8 ones): Clears any pending SWD transactions
- Selection Alert Sequence (128 bits): Wakes dormant state machine
- SWD Activation Code (0x1a, 8 bits): Selects SWD protocol
- Line reset (>50 ones): Enters SWD Reset state, clearing sticky errors
- Idle cycles: Ensures clean transition
The SWD Activation Code 0x1a decodes as:
Bits[7:0] = 0x1a = 0b00011010
This specific bit pattern was chosen to be distinct from valid JTAG TMS sequences, ensuring protocol disambiguation.
3.C.4 Why Not Use the RP2350 Datasheet Sequence?
The RP2350 datasheet (Section 3.5.1) describes a simpler connection sequence:
1. At least 8 Γ SWCLK cycles with SWDIO high
2. The 128-bit Selection Alert sequence
3. Four SWCLK cycles with SWDIO low
4. SWD activation code: 0x1a, LSB first
5. At least 50 Γ SWCLK cycles with SWDIO high (line reset)
6. A DPIDR read to exit the Reset state
This sequence assumes the SW-DP is in Dormant mode at power-up. However, in real-world scenarios:
- The target may have been previously debugged (SW-DP in SWD mode)
- A debugger crash may have left the SW-DP in an error state
- Multi-drop SWD configurations may require explicit state reset
Our JTAGβDormantβSWD sequence provides universal robustness: it works regardless of the SW-DPβs initial state. The cost is negligibleβapproximately 100 extra clock cycles, taking ~100Β΅s at 1 MHz SWCLKβwhile the benefit is reliable connection without manual power-cycling.
3.C.5 Post-Activation Verification
After activation, we immediately read DP_IDCODE (swd_protocol.c:386-397):
uint32_t idcode = 0;
err = swd_read_dp_raw(target, DP_IDCODE, &idcode);
if (err != SWD_OK) {
swd_set_error(target, err, "Failed to read IDCODE");
return err;
}
if ((idcode & 0x0fffffff) == 0) {
swd_set_error(target, SWD_ERROR_PROTOCOL, "Invalid IDCODE: 0x%08x", idcode);
return SWD_ERROR_PROTOCOL;
}
A successful IDCODE read confirms:
- SWD protocol is active
- The SW-DP is responding to transactions
- The SWCLK frequency is within tolerance
- SWDIO signal integrity is sufficient
For RP2350, the IDCODE is 0x4c013477
This defensive activation strategy, while not strictly necessary for fresh power-up scenarios, ensures our library works reliably across the full range of real-world debug connection scenariosβa critical property for a reusable debug library.
4. DEBUG ACCESS PORT ARCHITECTURE
The Debug Access Port (DAP) provides memory-mapped access to debug resources through two register banks:
4.A Debug Port Registers
The Debug Port (DP) manages power domains and AP selection:
- DP_IDCODE (0x0): Designer and part number identification
- DP_CTRL_STAT (0x4): Power control and status flags
- DP_SELECT (0x8): AP and register bank selection
- DP_RDBUFF (0xC): Read buffer for pipelined AP reads
4.B Access Port Registers
Access Ports (AP) provide interfaces to debug resources. RP2350 implements multiple APs:
- AP 0x0: ROM Table
- AP 0x2: ARM Core 0 AHB-AP
- AP 0x4: ARM Core 1 AHB-AP
- AP 0x8: RP2350-specific AP
- AP 0xA: RISC-V APB-AP (target of this library)
Each AP has standardized registers:
- AP_CSW (0x00): Control/Status Word
- AP_TAR (0x04): Transfer Address Register
- AP_DRW (0x0C): Data Read/Write Register
- AP_IDR (0xFC): Identification Register
4.C RP2350-Specific DP_SELECT Encoding
Standard ARM DP_SELECT format uses bits[31:24] for APSEL and bits[7:4] for APBANKSEL. RP2350 implements a non-standard encoding (dap.c:18-22):
uint32_t make_dp_select_rp2350(uint8_t apsel, uint8_t bank, bool ctrlsel) {
// [15:12] = APSEL, [11:8] = 0xD, [7:4] = bank, [0] = ctrlsel
return ((apsel & 0xF) << 12) | (0xD << 8) | ((bank & 0xF) << 4) | (ctrlsel ? 1 : 0);
}
The magic constant 0xD in bits[11:8] is undocumented but required for correct AP selection.
4.D Bank Selection Caching
AP registers are accessed through a banking mechanism where DP_SELECT must be written before each AP access. To minimize SWD transactions, the library maintains a cache of the current bank selection (dap.c:28-55):
static swd_error_t select_ap_bank(swd_target_t *target, uint8_t apsel, uint8_t bank) {
if (target->dap.current_apsel == apsel &&
target->dap.current_bank == bank &&
target->dap.ctrlsel == true) {
return SWD_OK; // Already selected
}
// Write DP_SELECT...
target->dap.current_apsel = apsel;
target->dap.current_bank = bank;
// ...
}
This caching reduces transaction count by approximately 50% in typical debug sessions.
5. DEBUG DOMAIN POWER SEQUENCING
Before any debug operations can proceed, the Debug Power Domain (DPD) and System Power Domain (SPD) must be powered up. This is not a physical power operation but rather clock and reset domain enabling.
The power-up sequence (dap.c:61-110) follows the ARM Debug Interface specification:
- Clear sticky errors: Write 0 to DP_CTRL_STAT
- Request power-up: Set CDBGPWRUPREQ (bit 28) and CSYSPWRUPREQ (bit 30)
- Poll acknowledgment: Wait for CDBGPWRUPACK (bit 29) and CSYSPWRUPACK (bit 31)
uint32_t ctrl_stat = (1 << 28) | (1 << 30);
swd_write_dp_raw(target, DP_CTRL_STAT, ctrl_stat);
for (int i = 0; i < 10; i++) {
swd_read_dp_raw(target, DP_CTRL_STAT, &status);
bool cdbgpwrupack = (status >> 29) & 1;
bool csyspwrupack = (status >> 31) & 1;
if (cdbgpwrupack && csyspwrupack) {
return SWD_OK;
}
sleep_ms(20);
}
Failure to complete this sequence results in all subsequent debug operations returning WAIT responses indefinitely.
6. RP2350 DEBUG MODULE INITIALIZATION
After DAP power-up, the RP2350-specific Debug Module must be initialized through an undocumented activation handshake. This sequence was reverse-engineered from OpenOCDβs RP2350 support with an oscilloscope and patience.
The activation sequence (rp2350.c:106-194) consists of:
6.A AP Selection and CSW Configuration
uint32_t sel_bank0 = make_dp_select_rp2350(AP_RISCV, 0, true);
dap_write_dp(target, DP_SELECT, sel_bank0);
uint32_t csw = 0xA2000002; // 32-bit access, auto-increment disabled
dap_write_ap(target, AP_RISCV, AP_CSW, csw);
6.B Bank 1 Activation Handshake
The Debug Module registers are normally accessed through Bank 0, but activation requires Bank 1:
uint32_t sel_bank1 = make_dp_select_rp2350(AP_RISCV, 1, true);
dap_write_dp(target, DP_SELECT, sel_bank1);
// Three-phase handshake
dap_write_ap(target, AP_RISCV, AP_CSW, 0x00000000); // Reset
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);
dap_write_ap(target, AP_RISCV, AP_CSW, 0x00000001); // Activate
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);
dap_write_ap(target, AP_RISCV, AP_CSW, 0x07FFFFC1); // Configure
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);
The expected status response is 0x04010001.
7. RISC-V DEBUG MODULE INTERFACE
The RISC-V Debug Module implements the RISC-V External Debug Support specification v0.13. Debug Module registers are memory-mapped at base address 0x40 (register addresses are byte offsets Γ 4).
7.A Debug Module Registers
Key registers (rp2350.c:17-29):
#define DM_DMCONTROL (0x10 * 4) // Hart control
#define DM_DMSTATUS (0x11 * 4) // Hart status
#define DM_ABSTRACTCS (0x16 * 4) // Abstract command status
#define DM_COMMAND (0x17 * 4) // Abstract command execution
#define DM_DATA0 (0x04 * 4) // Data transfer register
#define DM_PROGBUF0 (0x20 * 4) // Program buffer word 0
#define DM_PROGBUF1 (0x21 * 4) // Program buffer word 1
#define DM_SBCS (0x38 * 4) // System Bus Access Control
#define DM_SBADDRESS0 (0x39 * 4) // SBA Address
#define DM_SBDATA0 (0x3C * 4) // SBA Data
7.B Hart Control via DMCONTROL
Hart (hardware thread) execution is controlled through DMCONTROL register fields:
- dmactive (bit 0): Debug Module active (must be 1)
- haltreq (bit 31): Request hart halt
- resumereq (bit 30): Request hart resume
Halt sequence (rp2350.c:205-240):
uint32_t dmcontrol = (1 << 31) | (1 << 0); // haltreq | dmactive
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);
// Poll DMSTATUS.allhalted (bit 9)
for (int i = 0; i < 10; i++) {
swd_result_t result = dap_read_mem32(target, DM_DMSTATUS);
bool allhalted = (result.value >> 9) & 1;
if (allhalted) {
target->rp2350.hart_halted = true;
return SWD_OK;
}
sleep_ms(10);
}
7.C Abstract Commands for Register Access
Abstract commands provide a high-level interface to hart state without halting. The COMMAND register format for GPR access:
Bits 0-15: regno (0x1000 + reg_num for GPRs)
Bit 16: write (1=write, 0=read)
Bit 17: transfer (1=execute transfer)
Bits 20-22: aarsize (2=32-bit access)
GPR read implementation (rp2350.c:333-389):
uint32_t command = 0;
command |= (0x1000 + reg_num) << 0; // regno
command |= (1 << 17); // transfer
command |= (2 << 20); // aarsize=32-bit
dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target); // Poll ABSTRACTCS.busy
result = dap_read_mem32(target, DM_DATA0);
7.D Program Buffer Execution Model
The Program Buffer (PROGBUF) is a 16-entry instruction memory within the Debug Module that enables execution of arbitrary RISC-V code in the debug context. Understanding its operation requires examining the execution model, register preservation semantics, and synchronization mechanisms.
7.D.1 The Dual-Context Execution Model
A RISC-V hart operates in one of two contexts:
Normal Context: The hart executes from main memory, PC advances sequentially, and all architectural state is visible to the program. 1.
Debug Context: Upon entering debug mode (via halt request, ebreak, or trigger), the hart:
- Saves PC to DPC (Debug Program Counter, CSR 0x7b1)
- Enters a special execution mode where PROGBUF instructions execute
- Maintains all GPRs and CSRs in their pre-halt state
- Cannot access main memory without explicit instructions
The Debug Module provides a βscratch padβ where debugger-supplied instructions execute with full access to hart state, but without disturbing that state beyond explicit modifications.
7.D.2 PROGBUF Entry Layout
RP2350βs Debug Module provides 2 program buffer entries (PROGBUF0 and PROGBUF1), though the specification allows up to 16. Each entry holds one 32-bit RISC-V instruction:
#define DM_PROGBUF0 (0x20 * 4) // First instruction
#define DM_PROGBUF1 (0x21 * 4) // Second instruction (typically ebreak)
The execution model assumes the final instruction is ebreak (0x00100073), which returns control to the Debug Module and makes the hart available for further debug operations.
7.D.3 The Abstract Command Postexec Mechanism
Abstract commands can trigger PROGBUF execution through the postexec bit (bit 18 of the COMMAND register). This creates a transactional execution model:
ββββββββββββββββββββββββββββββββββββββββββββ
β 1. Debugger writes PROGBUF instructions β
ββββββββββββββββββββββββββββββββββββββββββββ€
β 2. Debugger writes DATA0 (optional) β
ββββββββββββββββββββββββββββββββββββββββββββ€
β 3. Abstract command with postexec=1 β
β - Transfers DATA0 β GPR (if transfer=1)β
β - Executes PROGBUF[0]..PROGBUF[N] β
β - Executes ebreak (returns to DM) β
β - Transfers GPR β DATA0 (if transfer=1)β
ββββββββββββββββββββββββββββββββββββββββββββ
This mechanism eliminates race conditions: the data transfer and program execution form an atomic operation from the debuggerβs perspective.
7.D.4 Case Study: Reading Debug CSR (DPC)
The Debug Program Counter (DPC, CSR 0x7b1) cannot be accessed via abstract commandsβit exists only in debug context and abstract commands target normal context registers. Reading DPC requires PROGBUF execution (rp2350.c:804-833):
Phase 1: Preserve scratch register
swd_result_t saved_s0 = rp2350_read_reg(target, hart_id, 8); // x8 = s0
The RISC-V ABI designates s0 (x8) as a saved register, but we must preserve it because our PROGBUF code will clobber it.
Phase 2: Write PROGBUF instructions
dap_write_mem32(target, DM_PROGBUF0, 0x7b102473); // csrr s0, dpc
dap_write_mem32(target, DM_PROGBUF1, 0x00100073); // ebreak
The instruction csrr s0, dpc (CSR Read) has the encoding:
31 20 19 15 14 12 11 7 6 0
βββββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββββ
β 0x7b1 β 0x00 β 0x2 β 0x08 β 0x73 β
β CSR addrβ rs1 βfunct3β rd β opcode β
β DPC β x0 βCSRRS β s0 β SYSTEM β
βββββββββββ΄ββββββββ΄βββββββ΄ββββββββ΄βββββββββ
- funct3=0x2 (CSRRS): CSR Read and Set. Since rs1=x0, no bits are set (read-only operation).
- CSR 0x7b1: DPC is defined in RISC-V Debug Spec v0.13, section 4.8.2
Phase 3: Execute with postexec
uint32_t command = (1 << 18); // postexec=1, transfer=0
dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target); // Poll ABSTRACTCS.busy
The hart now executes:
csrr s0, dpcβ DPC value loaded into s0ebreakβ Return to Debug Module, s0 contains DPC
Phase 4: Extract result via abstract command
result = rp2350_read_reg(target, hart_id, 8); // Read s0 (now contains DPC)
Phase 5: Restore architectural state
rp2350_write_reg(target, hart_id, 8, saved_s0.value); // Restore s0
This five-phase sequence is invisible to the hartβs normal execution: when resumed, all registers appear unchanged.
7.D.5 Writing Debug CSRs: The Inverse Operation
Writing DPC uses the inverse data flow (rp2350.c:879-909):
// Phase 1: Transfer new PC value to s0
err = rp2350_write_reg(target, hart_id, 8, new_pc_value);
// Phase 2: Write PROGBUF to copy s0 β DPC
dap_write_mem32(target, DM_PROGBUF0, 0x7b141073); // csrw dpc, s0
dap_write_mem32(target, DM_PROGBUF1, 0x00100073); // ebreak
// Phase 3: Execute
uint32_t command = (1 << 18); // postexec=1
dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target);
The instruction csrw dpc, s0 (CSR Write) has encoding 0x7b141073:
31 20 19 15 14 12 11 7 6 0
βββββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββββ
β 0x7b1 β 0x08 β 0x1 β 0x00 β 0x73 β
β CSR addrβ rs1 βfunct3β rd β opcode β
β DPC β s0 βCSRRW β x0 β SYSTEM β
βββββββββββ΄ββββββββ΄βββββββ΄ββββββββ΄βββββββββ
funct3=0x1 (CSRRW): CSR Read and Write. The old CSR value is discarded (rd=x0), and s0βs value is written to DPC.
7.D.6 PROGBUF Execution Constraints
The PROGBUF execution environment imposes several constraints:
Memory Access Limitation: PROGBUF instructions execute in debug mode, where memory access depends on Debug Module configuration. Standard loads/stores may fault. 1.
Instruction Count: With only 2 entries, complex operations require multiple PROGBUF sequences. Each sequence incurs the cost of abstract command execution (~100Β΅s typical). 1.
No Branching: PROGBUF is linear. Conditional execution requires host-side logic to decide which PROGBUF sequence to execute. 1.
Register Pressure: Only one scratch register (s0) is conventionally used. More complex operations require additional saves/restores. 1.
Ebreak Requirement: The final instruction must be ebreak. Omitting it causes the hart to hang in debug mode.
This execution model provides a βremote procedure callβ mechanism where the host supplies short instruction sequences that execute atomically on the hart, providing a window into debug-only architectural state.
8. SYSTEM BUS ACCESS: NON-INTRUSIVE MEMORY OPERATIONS
System Bus Access (SBA) represents a fundamental departure from the traditional halt-based debugging model. Where classical debugging requires stopping the hart, transferring data through GPRs, and resuming, SBA provides a βback doorβ to the memory subsystem that operates concurrently with hart execution.
8.A The SBA Architecture
The Debug Module contains a bus master that can initiate memory transactions on the system bus independently of the harts. This master has the following characteristics:
- Separate Bus Master: SBA transactions do not consume hart resources or execution time
- Concurrent Operation: Memory reads/writes occur while harts execute normally
- Cache Coherency Dependency: SBA bypasses hart caches; coherency is NOT guaranteed
- Bus Arbitration: SBA competes with harts for bus bandwidth
The SBA interface consists of three memory-mapped registers in the Debug Module:
#define DM_SBCS (0x38 * 4) // System Bus Access Control and Status
#define DM_SBADDRESS0 (0x39 * 4) // System Bus Address (32-bit)
#define DM_SBDATA0 (0x3C * 4) // System Bus Data (32-bit)
8.B SBCS: Control and Status Word
The SBCS register (offset 0x38) contains configuration and status fields defined in RISC-V Debug Spec v0.13.2, section 3.12.18:
31:29 sbversion (read-only) SBA version
28:23 (reserved) 0
22 sbbusyerror (W1C) Bus error occurred
21 sbbusy (read-only) Bus master is busy
20 sbreadonaddr (read-write) Auto-read on SBADDRESS0 write
19:17 sbaccess (read-write) Access width: 0=8-bit, 1=16-bit, 2=32-bit
16 sbautoincrement (read-write) Auto-increment address after access
15 sbreadondata (read-write) Auto-read on SBDATA0 read
14:12 sberror (W1C) Error status (0=none, 1=timeout, 2=bad addr, 3=alignment, 4=size, 7=other)
11:5 sbasize (read-only) Address width in bits (32 for RP2350)
8.C SBA Initialization: Capability Discovery
The SBA subsystem initialization (rp2350.c:958-992) follows a capability discovery pattern:
Phase 1: Read SBCS to detect supported features
swd_result_t result = dap_read_mem32(target, DM_SBCS);
Phase 2: Verify SBA capability
// Check sbasize field (bits [11:5]) to verify SBA is present
uint32_t sbasize = (result.value >> 5) & 0x7F;
if (sbasize == 0) {
return SWD_ERROR_INVALID_STATE; // SBA not available
}
The sbasize field indicates the system bus address width (32 bits for RP2350). RP2350 supports 8-bit, 16-bit, and 32-bit access widths. We configure for 32-bit:
Phase 3: Configure access mode
uint32_t sbcs = 0;
sbcs |= (2 << 17); // sbaccess = 2 (32-bit)
sbcs |= (1 << 20); // sbreadonaddr = 1 (auto-read trigger)
dap_write_mem32(target, DM_SBCS, sbcs);
The sbreadonaddr flag is critical: it converts the address write into an atomic read-trigger operation.
8.D The Auto-Read Mechanism
Without sbreadonaddr, a memory read requires three transactions:
1. Write address to SBADDRESS0
2. Write SBCS with read trigger
3. Read data from SBDATA0
With sbreadonaddr=1, the middle step is eliminated:
1. Write address to SBADDRESS0 β Triggers bus read automatically
2. Read data from SBDATA0 β Data is ready
Implementation (rp2350.c:1013-1020):
dap_write_mem32(target, DM_SBADDRESS0, addr); // Write triggers read
result = dap_read_mem32(target, DM_SBDATA0); // Data is already valid
The Debug Moduleβs state machine looks like:
IDLE β [SBADDRESS0 written] β BUSY β [bus read completes] β DATA_READY
β
[bus timeout] β SBERROR=1
8.E SBA Write Transactions
Memory writes use SBDATA0 as the trigger register:
dap_write_mem32(target, DM_SBADDRESS0, addr); // Set address
dap_write_mem32(target, DM_SBDATA0, value); // Write triggers bus write
The write to SBDATA0 initiates the system bus write transaction. The debugger should poll SBCS.sbbusyerror to detect completion (though in practice, pipelined writes are often used).
8.G SBA Error Handling
The SBCS.sberror field reports transaction failures:
0: No error
1: Timeout (bus did not respond)
2: Bad address (unmapped region)
3: Bad alignment (misaligned access)
4: Bad size (unsupported width)
7: Other error
Errors are sticky and must be explicitly cleared by writing 1 to SBCS.sberror (W1C = Write-1-to-Clear).
9. STATE MANAGEMENT AND CACHING
The library maintains comprehensive state tracking to avoid redundant SWD transactions:
9.A Connection State
typedef struct {
bool connected;
uint32_t idcode;
bool resource_registered;
// ...
} swd_target_t;
9.B DAP State Caching
typedef struct {
uint8_t current_apsel;
uint8_t current_bank;
bool ctrlsel;
uint32_t select_cache;
bool powered;
uint retry_count;
} dap_state_t;
9.C Per-Hart State Tracking
RP2350 contains two RISC-V harts (hardware threads) that execute independently. The library maintains per-hart state to avoid redundant operations and enable concurrent debugging:
typedef struct {
bool halt_state_known; // false after resume, true after halt/read status
bool halted; // true if hart is currently halted
// Register cache
bool cache_valid; // true if cached values are current
uint32_t cached_pc;
uint32_t cached_gprs[32];
uint64_t cache_timestamp; // For LRU if needed
} hart_state_t;
The top-level RP2350 state maintains an array of hart states:
#define RP2350_NUM_HARTS 2
typedef struct {
bool initialized;
bool sba_initialized;
// Per-hart state
hart_state_t harts[RP2350_NUM_HARTS];
// Shared cache configuration
bool cache_enabled;
} rp2350_state_t;
9.C.1 Halt State Tracking
The halt_state_known flag implements a three-state model:
- Unknown (
halt_state_known=false): Hart state is uncertain (after resume or initialization) - Known Halted (
halt_state_known=true, halted=true): Hart is confirmed halted - Known Running (
halt_state_known=true, halted=false): Hart is confirmed running
This prevents expensive DMSTATUS polls when the state is known. State transitions:
βββββββββββββββ
β Unknown β
ββββββββ¬βββββββ
β
βββββββββββββββΌββββββββββββββ
β β
halt_request() read_dmstatus()
β β
βΌ βΌ
ββββββββββββββ ββββββββββββββββ
β Halted β β Running β
βββββββ¬βββββββ ββββββββ¬ββββββββ
β β
β resume() β
βββββββββββββββββββββββββββββ
β
βΌ
βββββββββββ
β Unknown β (state invalidated)
βββββββββββ
9.C.2 Register Caching
When cache_enabled=true, the library caches register values after reads. This optimization benefits:
- Repeated reads of the same register (e.g., polling loop variables)
- Bulk register dumps where
rp2350_read_all_regs()populates the cache - Reduced SWD traffic (each register read requires ~6 SWD transactions)
Cache invalidation occurs on:
- Hart resume (execution changes registers)
- Register write (specific register invalidated)
- Hart halt request (conservative invalidation)
The cache is per-hart, allowing concurrent debugging of both harts without interference.
10. RESOURCE MANAGEMENT
PIO resources are scarce: RP2040/RP2350 provide 2 PIO blocks with 4 state machines each. The library implements a global resource tracker for multi-target support.
10.A Global Resource Tracking
typedef struct {
swd_target_t *pio0_sm_owners[4];
swd_target_t *pio1_sm_owners[4];
uint active_count;
} resource_tracker_t;
extern resource_tracker_t g_resources;
10.B Automatic Allocation
When SWD_PIO_AUTO or SWD_SM_AUTO is specified in configuration, the library scans for free resources (swd.c:105-125):
swd_error_t allocate_pio_sm(PIO *pio, uint *sm) {
for (uint i = 0; i < 4; i++) {
if (g_resources.pio0_sm_owners[i] == NULL) {
*pio = pio0;
*sm = i;
return SWD_OK;
}
}
// Try PIO1...
}
Up to 8 simultaneous target connections are supported (limited by hardware resources).
11. ERROR HANDLING AND RECOVERY
The library provides comprehensive error reporting through enumerated error codes and detailed message strings.
11.A Error Code Taxonomy
typedef enum {
SWD_OK = 0,
SWD_ERROR_TIMEOUT, // Transaction timeout
SWD_ERROR_FAULT, // Target FAULT response
SWD_ERROR_PROTOCOL, // Malformed packet
SWD_ERROR_PARITY, // Parity check failure
SWD_ERROR_WAIT, // WAIT response retry exhausted
SWD_ERROR_NOT_CONNECTED, // No active connection
SWD_ERROR_NOT_HALTED, // Operation requires halted hart
SWD_ERROR_ALREADY_HALTED, // Hart already halted (informational)
// ...
} swd_error_t;
11.B Error Detail Buffer
Each target maintains a 128-byte error detail buffer for formatted diagnostic messages (swd.c:67-84):
void swd_set_error(swd_target_t *target, swd_error_t error,
const char *detail, ...) {
target->last_error = error;
va_list args;
va_start(args, detail);
vsnprintf(target->error_detail, sizeof(target->error_detail),
detail, args);
va_end(args);
}
11.C ACK Response Mapping
SWD protocol ACK responses are mapped to error codes (swd.c:91-99):
swd_error_t swd_ack_to_error(uint8_t ack) {
switch (ack) {
case 0x1: return SWD_OK; // OK
case 0x2: return SWD_ERROR_WAIT; // WAIT
case 0x4: return SWD_ERROR_FAULT; // FAULT
default: return SWD_ERROR_PROTOCOL;
}
}
11.D Retry Mechanism
WAIT responses trigger automatic retry with backoff (swd_protocol.c:197-208):
for (uint retry = 0; retry < target->dap.retry_count; retry++) {
err = swd_io_raw(target, request, value, false);
if (err != SWD_ERROR_WAIT) break;
sleep_us(100);
}
Default retry count is 5, configurable via swd_config_t.
12. API USAGE
12.A Target Creation and Connection
swd_config_t config = swd_config_default();
config.pin_swclk = 2;
config.pin_swdio = 3;
config.freq_khz = 1000;
config.enable_caching = true;
swd_target_t *target = swd_target_create(&config);
swd_connect(targ