Overview
This project demonstrates how to build a FreeRTOS application for the RPU (Real-time Processing Unit) on the KR260 that controls LEDs in real-time, with four different communication methods from the APU (Application Processing Unit). This is a follow-up to the Vivado LED Control via PYNQ on KR260 project, which showed how to control LEDs directly from Linux using PYNQ.
This project showcases the full heterogeneous computing capabilities of the KR260 by combining:
- APU: Linux for high-level control and system management
- RPU: FreeRTOS for real-time, deterministic LED control
- PL: Custom AXI GPIO hardware interface
The system supports four distinct communication modes, eac…
Overview
This project demonstrates how to build a FreeRTOS application for the RPU (Real-time Processing Unit) on the KR260 that controls LEDs in real-time, with four different communication methods from the APU (Application Processing Unit). This is a follow-up to the Vivado LED Control via PYNQ on KR260 project, which showed how to control LEDs directly from Linux using PYNQ.
This project showcases the full heterogeneous computing capabilities of the KR260 by combining:
- APU: Linux for high-level control and system management
- RPU: FreeRTOS for real-time, deterministic LED control
- PL: Custom AXI GPIO hardware interface
The system supports four distinct communication modes, each demonstrating different aspects of APU-RPU interaction:
- Independent Mode: RPU manages LED modes autonomously using a 10-second timer
- Legacy Shared Memory Mode: APU controls LEDs via simple memory-mapped interface
- IPI with Acknowledgment Mode: APU uses Inter-Processor Interrupts with reliable message delivery
- Kernel Module Mode: APU kernel driver provides sysfs interface for system integration
Problem Statement
When developing real-time applications on the KR260, you need to:
- Separate concerns: High-level control (APU) vs. real-time tasks (RPU)
- Ensure determinism: Real-time LED control requires predictable timing
- Enable communication: APU must reliably send commands to RPU
- Support multiple interfaces: Different use cases require different communication methods
- Handle cache coherency: Shared memory between processors requires careful management
Traditional approaches have limitations:
- Direct PL control from APU: Not deterministic, subject to Linux scheduling delays
- Simple polling: Inefficient and unreliable
- No acknowledgment: Cannot verify command delivery
- Cache issues: Shared memory can have stale data without proper coherency
This project solves these challenges by:
- Using FreeRTOS on RPU for deterministic real-time control
- Implementing multiple communication methods
- Handling cache coherency properly
- Providing acknowledgment mechanisms for reliable communication
- Supporting both user-space and kernel-space interfaces
System Architecture
The complete system consists of three processing domains working together:
Hardware Architecture
The PL design (from the previous project) provides:
- Zynq UltraScale+ PS: Processing System with APU and RPU
- AXI SmartConnect: Connects PS to PL peripherals
- AXI GPIO IP: 2-bit output for LED control
- LED Connections:LED0 (DS8) - Pin E8 (LVCMOS18)LED1 (DS7) - Pin F8 (LVCMOS18)
Software Architecture
RPU FreeRTOS Application
The RPU firmware implements a multi-task FreeRTOS application:
Tasks:
- Tx Task: Generates LED patterns based on current mode (SLOW/FAST/RANDOM)
- Rx Task: Writes LED values to AXI GPIO hardware
- Timer Callback: Rotates modes every 10 seconds (when not overridden)
- IPI Handler: Receives commands from APU via interrupt
APU Applications
The APU provides multiple interfaces for controlling the RPU:
- Legacy Shared Memory App (
apu_app): Simple memory-mapped control - IPI Application (
ipi_app): IPI-based communication with acknowledgment - Kernel Module (
rpu_ipi.ko): Sysfs interface for system integration - Firmware Loader (
fw_loader): Utility for loading PL and RPU firmware
Design Components
Communication Protocols
Mode 1: Independent Mode
- RPU operates autonomously
- Timer rotates modes every 10 seconds
- No APU interaction required
Mode 2: Legacy Shared Memory
- APU writes mode to
0x40000000 - RPU polls during timer callback
- Simple but no acknowledgment
Mode 3: IPI with Acknowledgment
- APU writes command to
0xFF990000 + 0x00 - APU triggers IPI interrupt
- RPU reads command, updates mode
- RPU writes ACK to
0xFF990000 + 0x04 - APU polls for acknowledgment
Mode 4: Kernel Module
- Same as Mode 3, but via kernel driver
- Sysfs interface:
/sys/kernel/rpu_ipi/write - Status interface:
/sys/kernel/rpu_ipi/status - Handles cache coherency automatically
Prerequisites
Before starting this project, you should have:
- Completed the previous project: Vivado LED Control via PYNQ on KR260PL bitstream (
gpio_led.bitandgpio_led.xsa)Hardware description file (gpio_led.hwh)Working PYNQ environment - Development Environment:Xilinx Vitis Unified IDE (2025.2 or later)Vivado Design Suite (for PL modifications if needed)Cross-compilation toolchain for ARM64Linux development machine (Ubuntu recommended)
- KR260 Setup:KR260 board with network boot configuredAccess to Jupyter notebook serverSSH access to the boardRoot privileges for firmware loading
- Knowledge:Basic understanding of FreeRTOSFamiliarity with embedded C programmingUnderstanding of memory-mapped I/OBasic Linux kernel module development (for Mode 4)
Previous Project Setup
This project builds upon the Vivado LED Control project. Ensure you have:
- Generated
gpio_led.xsafile from Vivado - PL bitstream (
gpio_led.bit) ready for loading - Hardware description file (
gpio_led.hwh) for reference
Step-by-Step Guide
Step 1: Create Vitis Workspace
- Launch Vitis Unified IDEOpen Vitis from your Xilinx installationCreate a new workspace (e.g.,
~/Documents/gpio_led/RPU) - Set Workspace LocationChoose a dedicated directory for your projectClick "Launch" to open the workspace
Step 2: Create Platform Project
-
Create New Platform ProjectWelcome Page → Embedded Development → New Platform ComponentProject name:
platformClick "Next" -
Import Hardware SpecificationSelect "Create from hardware specification (XSA)"Browse to your
gpio_led.xsafile (from previous project)Click "Next" -
Configure PlatformPlatform name:
platformProcessor:psu_cortexr5_0(RPU)OS:freertos10_xilinxClick "Finish" -
Build PlatformRight-click platform project → "Build Project"Wait for platform build to complete (5-10 minutes)
Step 3: Create Application Project
-
Create New Application ProjectWelcome Page → Embedded Development → New Application ComponentSet the "Component Name" as gpio_appClick "Next"
-
Select PlatformChoose the platform created in Step 2Click "Next"
-
**Application Project Settings (Domain)**Application project name:
gpio_appTarget processor:psu_cortexr5_0Domain:freertos_psu_cortexr5_0Click "Next" -
Select TemplateChoose "Empty Application"Click "Finish"
Step 4: Add Source Files
- Create Source FileRight-click
gpio_app/src→ New → Source FileFile name:main.cClick "Finish" - Copy Application CodeCopy the FreeRTOS application code from
[RPU/gpio\_app/src/main\.c](https://github.com/wstanislaus/Xilinx_KR260_Projects/blob/main/gpio_led/RPU/gpio_app/src/main.c)```Paste into the newmain.c` file in Vitis - Verify IncludesEnsure all Xilinx headers are availableCheck that FreeRTOS includes are correct
Step 5: Configure Build Settings
- Open Project PropertiesRight-click
gpio_app→ PropertiesNavigate to C/C++ Build → Settings - Compiler FlagsEnsure optimization is set appropriately (-O2 recommended)Add any required preprocessor definitions:
LEGACY_MODE=1(for Mode 2 support)IPI_MODE=1(for Mode 3 and 4 support) - Linker SettingsVerify linker script is correctCheck memory regions match your platform
Step 6: Build Application
- Build ProjectRight-click
gpio_app→ "Build Project"Wait for compilation to complete - Verify OutputCheck for
gpio_app.elfin theDebugorReleasefolderReview build log for any warnings or errors
Step 7: Prepare Deployment File
- Copy Firmware to NFS Share
*# On your NFS server*``cp gpio_app.elf /nfsroot/lib/firmware/ cp gpio_led.bit /nfsroot/lib/firmware/ chmod 644 /nfsroot/lib/firmware/gpio_app.elf chmod 644 /nfsroot/lib/firmware/gpio_led.bit - Build APU Applications
cd APU/apu_app #Make sure you have installed the Yocto Linux SDK Toolchain and sourced the toolchain environment settings files, for examplesource /tools/kr260_yocto_toolchain/environment-setup-cortexa72-cortexa53-oe-linuxmake``*# This creates: apu_app, ipi_app, fw_loader* - Copy APU Applications
cp apu_app ipi_app fw_loader /nfsroot/usr/local/bin/ chmod +x /nfsroot/usr/local/bin/*
Step 8: Load Firmware on KR260
- SSH to KR260
ssh xilinx@172.20.1.1 - Load PL Bitstream and RPU Firmware
sudo ./fw_loader gpio_led.bit gpio_app.elfThefw_loaderutility will:Convert.bitto.binformat if neededLoad PL bitstream via FPGA managerStop RPU if runningLoad RPU firmwareStart RPU - Verify RPU is Running
cat /sys/class/remoteproc/remoteproc0/state``*# Should show: running *
Step 9: Test Mode 1 - Independent Mode
Mode 1 demonstrates RPU operating autonomously without APU contro
- Observe LED BehaviorLEDs will automatically cycle through modes:0-10 seconds: SLOW mode (1s toggle)10-20 seconds: FAST mode (200ms toggle)20-30 seconds: RANDOM mode (200ms random)30+ seconds: Repeats cycle
- Monitor RPU Serial Output (optional)Connect to RPU UART to see debug messages:"Timer: Switching to FAST mode""Timer: Switching to RANDOM mode""Timer: Switching to SLOW mode"
Expected Behavior:
- LEDs blink autonomously
- Mode changes every 10 seconds
- No APU interaction required
Step 10: Test Mode 2 - Legacy Shared Memory
Mode 2 uses simple memory-mapped communication without interrupts.
- Run Legacy Application
sudo ./apu_app 1``*# Set to FAST mode* - Observe LED BehaviorLEDs should immediately switch to FAST modeRPU timer callback will check shared memoryMode persists until changed or timer overrides
- Test Different Modes
sudo ./apu_app 0``*# SLOW mode*``sudo ./apu_app 2``*# RANDOM mode*``sudo ./apu_app 3``*# Release control (timer resumes) *
Expected Behavior:
- Immediate mode change when command is sent
- Mode persists until next command or timer override
- No acknowledgment (fire-and-forget)
Limitations:
- RPU only checks shared memory during timer callback (every 10s)
- No immediate response guarantee
- No confirmation of command delivery
Step 11: Test Mode 3 - IPI with Acknowledgment
Mode 3 uses Inter-Processor Interrupts for immediate, reliable communication.
- Run IPI Application
sudo ./ipi_app 1``*# Set to FAST mode* - Observe Output
Written mode 1 to shared memory at 0xff990000 Triggering IPI to RPU0 (Mask 0x100)... Waiting for RPU acknowledgment... RPU acknowledged! Mode 1 processed successfully. --- Status --- Shared Mem CMD: 1 Shared Mem ACK: 0xdeadbef0 APU IPI OBS (0xFF300004): 0x0 -> Ch1 (RPU0) IDLE - Observe LED BehaviorLEDs should immediately switch to FAST modeChange is instant (not waiting for timer)
- Test Different Modes
sudo ./ipi_app 0``*# SLOW mode*``sudo ./ipi_app 2``*# RANDOM mode*``sudo ./ipi_app 3``*# Release control *
Expected Behavior:
- Immediate mode change (interrupt-driven)
- Acknowledgment received within milliseconds
- Reliable message delivery
- APU override active (timer paused)
Advantages over Mode 2:
- Immediate response (interrupt-driven)
- Acknowledgment confirms delivery
- Cache coherency handled properly
- APU override prevents timer interference
Step 12: Build and Load Kernel Module (Mode 4)
Mode 4 provides a kernel-space interface for system integration.
- Build Kernel Module
cd APU/kernel_module``*# Ensure you have kernel headers for your KR260 kernel*``make``*# This creates: rpu_ipi.ko* - Copy Module to KR260
cp rpu_ipi.ko /nfsroot/lib/modules/ - Load Module on KR260
sudo insmod rpu_ipi.ko - Verify Module Loaded
lsmod | grep rpu_ipi dmesg | tail``*# Should show: "rpu_ipi: Module loaded successfully"* - Check Sysfs Interface
ls -l /sys/kernel/rpu_ipi/``*# Should show: write, status *
Step 13: Test Mode 4 - Kernel Module Interface
- Send Mode Command
echo 1 > /sys/kernel/rpu_ipi/write - Check Status
cat /sys/kernel/rpu_ipi/status``*# Should show: 1,ACK* - Test Different Modes`echo 0 > /sys/kernel/rpu_ipi/write
*# SLOW*cat /sys/kernel/rpu_ipi/status
echo 2 > /sys/kernel/rpu_ipi/write*# RANDOM*cat /sys/kernel/rpu_ipi/status
echo 3 > /sys/kernel/rpu_ipi/write*# Release*cat /sys/kernel/rpu_ipi/status`
- Observe LED BehaviorLEDs respond immediately to commandsStatus shows acknowledgment
Expected Behavior:
- Same functionality as Mode 3
- Sysfs interface for system integration
- Can be used by shell scripts, systemd services, etc.
- Proper kernel-space cache handling
Advantages:
- System-level integration
- Thread-safe access
- Proper cache coherency (kernel handles it)
- Can be used by other kernel modules
- Persistent interface (survives across applications)
Understanding the Code
RPU FreeRTOS Application Structure
Task Architecture
The RPU application uses a producer-consumer pattern with FreeRTOS queues:
// Tx Task: Generates LED patternsstatic void prvTxTask(void *pvParameters) { // Generates patterns based on current_blink_mode // Sends to queue}// Rx Task: Writes to hardwarestatic void prvRxTask(void *pvParameters) { // Receives from queue // Writes to AXI GPIO at 0x80000000}
Timer Callback
The timer callback manages mode rotation when APU override is not active:
static void vTimerCallback(TimerHandle_t pxTimer) { if (!apu_override_active) { // Rotate: SLOW -> FAST -> RANDOM -> SLOW }}
IPI Interrupt Handler
The IPI handler follows the OpenAMP/libmetal pattern:
static void IPI_Handler(void *CallbackRef) { // 1. Read ISR to check interrupt source // 2. Clear interrupt // 3. Invalidate cache for shared memory // 4. Read command from shared memory // 5. Update mode and set override flag // 6. Write acknowledgment // 7. Flush cache}
APU Applications
Legacy Shared Memory ( apu_app )
Simple memory-mapped write:
// Map shared memoryvoid* mapped = mmap(0, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0x40000000);// Write mode*(volatile uint32_t*)mapped = mode;
IPI Application ( ipi_app )
Full IPI communication with acknowledgment:
// 1. Write command to shared memory*shared_cmd = mode;__sync_synchronize(); // Memory barrier// 2. Trigger IPI*ipi_trig = MASK_CH1_RPU0;// 3. Poll for acknowledgmentwhile (timeout) { ack = *shared_ack; if (ack contains magic value) break;}
Kernel Module ( rpu_ipi.c )
Kernel-space implementation with proper cache handling:
// Non-cached memory mappingshared_mem_base = ioremap_prot(addr, size, pgprot_noncached(...));// Send message with proper barrierswmb(); // Write memory barrieriowrite32(mask, ipi_base + TRIG_OFFSET);wmb();
Memory Protection Unit (MPU) Configuration
The RPU must configure the MPU to access different memory regions:
// AXI GPIO (PL) - Strongly-ordered, sharedXil_SetTlbAttributes(0x80000000, STRONG_ORDERD_SHARED | PRIV_RW_USER_RW);// Shared Memory - Normal, shared, non-cacheableXil_SetTlbAttributes(0xFF990000, NORM_SHARED_NCACHE | PRIV_RW_USER_RW);// IPI Registers - Strongly-ordered, sharedXil_SetTlbAttributes(0xFF310000, STRONG_ORDERD_SHARED | PRIV_RW_USER_RW);
Cache Coherency
Shared memory between APU and RPU requires careful cache management:
RPU Side:
// Before reading: Invalidate cacheXil_DCacheInvalidateRange(SHARED_MEM_ADDR, 32);cmd = Xil_In32(SHARED_MEM_ADDR);// After writing: Flush cacheXil_Out32(SHARED_MEM_ADDR + ACK_OFFSET, ack);Xil_DCacheFlushRange(SHARED_MEM_ADDR + ACK_OFFSET, 4);
APU Side (User-space):
// Use memory barriers__sync_synchronize();
APU Side (Kernel):
// Use non-cached mappingsioremap_prot(addr, size, pgprot_noncached(...));// Use memory barrierswmb(); rmb();
Design Considerations
Communication Method Selection
Real-Time Considerations
RPU Advantages:
- Deterministic task scheduling (FreeRTOS)
- No Linux scheduling delays
- Predictable interrupt latency
- Direct hardware access
Timing Characteristics:
- Tx Task: Generates patterns at mode-specific intervals
- Rx Task: Higher priority ensures immediate hardware update
- IPI Handler: Interrupt-driven, <100μs response time
- Timer Callback: 10-second intervals for mode rotation
Memory Map Design
Address Selection:
0x80000000: LPD (Low Power Domain) address space for GPIO0xFF990000: OCM (On-Chip Memory) or DDR for shared memory0xFF300000/0xFF310000: IPI register space (PS address range)
Why These Addresses?
- LPD suitable for low-bandwidth peripherals (GPIO)
- OCM provides low-latency shared memory
- IPI registers in PS address space for direct access
Interrupt Configuration
IPI Interrupt Setup:
- GIC SPI 33 → Interrupt ID 65
- Level-sensitive, high-triggered
- Properly enabled in both IPI controller and GIC
- Spurious interrupt handling
Interrupt Safety:
- Check ISR before processing (avoid spurious interrupts)
- Clear interrupt after reading
- Use memory barriers for shared memory access
FreeRTOS Configuration
Key Settings:
- Static allocation for deterministic memory usage
- Minimal stack sizes (configurable)
- Software timers for mode management
- Interrupt-driven IPI handling
Task Priorities:
- Rx Task: Higher priority (tskIDLE_PRIORITY + 1)
- Tx Task: Lower priority (tskIDLE_PRIORITY)
- Ensures LED updates are not delayed
Next Steps
This project provides a foundation for advanced heterogeneous computing:
Enhanced Features
- Multiple RPU Cores: Extend to RPU1 for dual-core operation
- Custom IP Integration: Add custom AXI IP cores to the PL design
- Interrupt-Driven APU: APU receives interrupts from RPU
- DMA Support: High-bandwidth data transfer between APU and RPU
- Real-Time Sensors: Integrate sensor data processing on RPU
Production Considerations
- Error Handling: Add robust error recovery mechanisms
- Watchdog Timers: Monitor RPU health
- Logging: Structured logging for debugging
- Configuration: Runtime configuration without recompilation
- Security: Secure boot and encrypted communication
Integration Examples
- ROS 2 Integration: Use RPU for real-time control, APU for high-level planning
- Industrial Control: RPU handles time-critical loops, APU manages HMI
- Robotics: RPU for motor control, APU for vision processing
- Edge AI: RPU for sensor fusion, APU for neural network inference
Resources
Summary
This project demonstrated:
✅ Building FreeRTOS applications for the RPU using Vitis Unified IDE✅ Creating real-time LED control with deterministic timing✅ Implementing four different APU-RPU communication methods✅ Handling cache coherency for shared memory✅ Using Inter-Processor Interrupts for reliable communication✅ Building Linux kernel modules for system integration✅ Understanding heterogeneous computing on the KR260
The combination of FreeRTOS on RPU, Linux on APU, and custom PL hardware creates a powerful platform for real-time embedded systems. This workflow enables:
- Separation of Concerns: High-level control (APU) vs. real-time tasks (RPU)
- Deterministic Performance: FreeRTOS ensures predictable timing
- Flexible Communication: Multiple methods for different use cases
- System Integration: Kernel modules for production deployment
This project showcases the full potential of the KR260’s heterogeneous architecture, making it an ideal platform for robotics, industrial automation, and real-time embedded applications.