STM32N6 NPU Deployment — Politecnico di Milano  1.0
Documentation for Neural Network Deployment on STM32N6 NPU - Politecnico di Milano 2024-2025
3.4 — STM32CubeIDE & C Firmware

3.4 — STM32CubeIDE & C Firmware

Compile, Sign & Flash

The final step of the pipeline. STM32CubeIDE compiles the C firmware project, STM32SigningTool signs the binary for secure boot, and STM32CubeProgrammer flashes it to the board via ST-Link. All three tools are invoked automatically by common_deploy.py.

STM32N6570-DK_GettingStarted_PoseEstimation
Flash @ 0x70100000 (firmware) + 0x70380000 (weights)

What is STM32CubeIDE?

STM32CubeIDE is the official Eclipse-based IDE from STMicroelectronics for STM32 development. In our pipeline it plays a specific role: it is used purely as a command-line build tool, not as an interactive IDE. common_deploy.py invokes its headless build mode, compiles the C project, and immediately proceeds to flashing — without any manual interaction.

What makes this step interesting is that it is not just a compilation. The STM32N6 uses a secure boot architecture: the binary must be signed with STM32SigningTool before it can be flashed, and the weights are flashed separately at a different memory address than the firmware. Understanding this sequence explains the boot switch procedure and why two separate flash commands are needed.

The .conf file — how common_deploy.py finds and drives CubeIDE

common_deploy.py does not hardcode any project paths. Instead, it reads stmaic_STM32N6570-DK.conf — a JSON configuration file that describes the entire build and flash procedure for this specific board. Here is the actual file from our project:

{
  "description": "STM32N6570-DK Getting Started Pose Estimation",
  "builder": "stm32_cube_ide",
  "env": {
    "cproject_name": "STM32N6570-DK_GettingStarted_PoseEstimation",
    "project_folder": "${app_src_root}/Application/STM32N6570-DK/STM32CubeIDE",
    "network_src_root": "${ProjectFolder}/Model/STM32N6570-DK",
    "stm32_ai_lib_folder": "${ProjectFolder}/Middlewares/AI_Runtime"
  },
  "templates": [
    /* Files copied into the CubeIDE project before build: */
    [ "", "${network_src_root}/network.c", "copy" ],
    [ "", "${network_src_root}/network_ecblobs.h", "copy" ],
    [ "", "${network_src_root}/network_atonbuf.xSPI2.raw", "copy" ],
    [ "", "${stm32_ai_lib_folder}/Lib/GCC/ARMCortexM55", "copy-dir" ],
    [ "", "${app_src_root}/.../Inc/app_config.h", "copy" ]
  ]
}
Key insight — the templates section: Before calling CubeIDE, common_deploy.py uses the templates list to copy the generated files into the CubeIDE project. This is how network.c, network_ecblobs.h, and app_config.h generated by ST Edge AI Core reach the C compiler. The project in the repo is a template — it compiles correctly only after these files have been injected.

C project structure

The CubeIDE project is named STM32N6570-DK_GettingStarted_PoseEstimation and lives in Application/STM32N6570-DK/STM32CubeIDE/. Here is its complete structure:

STM32N6570-DK_GettingStarted_PoseEstimation/
├── Application/
│   ├── Inc/
│   │   ├── app_config.h               ← injected by common_deploy.py
│   │   ├── app_camerapipeline.h
│   │   ├── display_spe.h
│   │   ├── display_mpe.h
│   │   ├── display_keypoints_13.h
│   │   ├── display_keypoints_17.h
│   │   ├── crop_img.h
│   │   └── main.h
│   └── Src/
│       ├── main.c                             ← inference loop + DCMIPP init
│       ├── app_camerapipeline.c             ← MIPI + DCMIPP dual-pipe
│       ├── display_spe.c                          ← heatmap decoder + skeleton draw
│       ├── display_mpe.c                          ← YOLOv8 multi-pose display
│       └── crop_img.c
├── Model/STM32N6570-DK/
│   ├── network.c                                ← injected — epoch schedule
│   ├── network_ecblobs.h                                                    ← injected — NPU blobs
│   └── network_atonbuf.xSPI2.raw                                  ← injected — raw weight binary
├── Middlewares/AI_Runtime/
│   ├── Lib/GCC/ARMCortexM55/  (ll_aton precompiled libs)
│   ├── Inc/                    (AI runtime headers)
│   └── Npu/ll_aton/           (NPU low-level driver)
├── Drivers/                    (HAL + BSP)
├── STM32N657xx.ld              (linker script)
└── startup_stm32n657xx.s      (startup assembly)

Build & flash sequence — what common_deploy.py does step by step

After ST Edge AI Core finishes, common_deploy.py reads the .conf file and executes four operations in sequence. Each step is a command that can be re-run manually if needed:

1
Inject generated files into the CubeIDE project
# From the templates list in .conf:
cp network.c → Model/STM32N6570-DK/
cp network_ecblobs.h → Model/STM32N6570-DK/
cp network_atonbuf.xSPI2.raw → Model/STM32N6570-DK/
cp app_config.h → Application/STM32N6570-DK/Inc/
2
Build the firmware (headless CubeIDE)
STM32CubeIDE --launcher.suppressErrors -nosplash \
  -application org.eclipse.cdt.managedbuilder.core.headlessbuild \
  -build STM32N6570-DK_GettingStarted_PoseEstimation/Debug
# Output: Debug/STM32N6570-DK_GettingStarted_PoseEstimation.bin
3
Sign the binary (STM32N6 secure boot requirement)
STM32SigningTool -s \
  -bin Debug/STM32N6570-DK_GettingStarted_PoseEstimation.bin \
  -nk -t ssbl -hv 2.3 \
  -o Debug/STM32N6570-DK_GettingStarted_PoseEstimation_signed.bin
# The STM32N6 FSBL verifies this signature at boot
4
Flash — two separate commands, two separate addresses
# Flash FSBL (first-stage bootloader):
STM32CubeProgrammer -c port=swd mode=HOTPLUG -hardRst \
  -w Binary/ai_fsbl.hex

# Flash signed firmware @ 0x70100000:
STM32CubeProgrammer -c port=swd mode=HOTPLUG -hardRst \
  -w Debug/*_signed.bin 0x70100000

# Flash weight binary @ 0x70380000 (OctoFlash):
STM32CubeProgrammer -c port=swd mode=HOTPLUG -hardRst \
  -w Model/STM32N6570-DK/network_atonbuf.xSPI2.bin 0x70380000
Why two flash addresses?

The firmware (0x70100000) and the model weights (0x70380000) are flashed separately because they have completely different update lifecycles.

0x70100000 — Firmware
The compiled C code: main loop, camera pipeline, display, postprocessor. Changes rarely — only when you modify the application logic. Must be signed before flashing (FSBL verifies signature at boot).
0x70380000 — Model weights
The raw INT8 weight binary (network_atonbuf.xSPI2.bin), 2.924 MB. Changes every time you deploy a different model. Not signed — read-only data, no execution.

Boot switch procedure

The STM32N6570-DK has two boot switches that control where the chip looks for code at power-on. Getting these wrong is the most common reason deployment appears to succeed but the board does nothing.

🔘 BOOT switches RIGHT
Flashing mode. Set this before running python stm32ai_main.py operation_mode=deployment. The board connects via ST-Link and accepts the flash commands.
BOOT0 = 1, BOOT1 = 1
▶ BOOT switches LEFT
Run mode. Set this after deployment completes, then power-cycle the board. The FSBL loads from OctoFlash, verifies the signature, and starts the firmware. Inference begins.
BOOT0 = 0, BOOT1 = 0

The C firmware — your handwritten files (all in Part 2)

These are the files you actually wrote — not generated, not injected. They implement the real-time application that runs on the board. Every function is documented with call graphs in Part 2.

main.c
System init • DCMIPP setup • inference loop • calls LL_ATON_RT_Main() to trigger NPU epoch execution
app_config.h
Generated by gen_h_file.py • injected by common_deploy.py • defines NN_HEIGHT=192, KEYPOINTS_NB=13, CONF_THRESHOLD=0.4
app_camerapipeline.c
MIPI CSI-2 init • DCMIPP dual-pipe config • display pipe (continuous → PSRAM) + NN pipe (snapshot → npuRAM4)
display_spe.c
Reads float32 heatmaps from npuRAM5 • argmax decode • draws 13-keypoint skeleton on LCD foreground layer
display_mpe.c
Multi-pose display for YOLOv8n • bounding boxes + NMS results • 17-keypoint COCO skeleton rendering
display_keypoints_13.h / display_keypoints_17.h
Skeleton connectivity tables — which keypoints to connect with lines for 13-keypoint (MoveNet) and 17-keypoint (COCO/YOLOv8) models

The complete pipeline — from YAML to running board

# Everything triggered by one command:
python stm32ai_main.py operation_mode=deployment

1. parse_config.py → validate user_config.yaml
2. gen_h_file.py → read TFLite tensors → write app_config.h
3. stedgeai generate → model → C code + epoch blobs + weights binary
4. external_memory_mgt.py → patch linker script
5. copy templates → inject network.c + ecblobs + app_config.h into CubeIDE project
6. STM32CubeIDE headless → compile → .bin
7. STM32SigningTool → sign → _signed.bin
8. STM32CubeProgrammer → flash firmware @ 0x70100000
9. STM32CubeProgrammer → flash weights @ 0x70380000

# Toggle boot switches LEFT → power cycle →
✓ Board running MoveNet Lightning at 22 ms / 94.7% NPU offload
← 3.3 ST Edge AI Core Next: Chapter 4 — Deployment Workflow →