STM32N6 NPU Deployment — Politecnico di Milano  1.0
Documentation for Neural Network Deployment on STM32N6 NPU - Politecnico di Milano 2024-2025
Chapter 4 — Deployment Workflow

Chapter 4 — Deployment Workflow

From Model File
to Running Board

A step-by-step walkthrough of the complete deployment pipeline — what files go in, what comes out, what each tool does, and what can go wrong. This chapter connects Chapter 3 (the tools) to Part 2 (the code).

5 steps
1 command
3 models deployed

The complete file flow

Before diving into each step, here is the big picture: what files enter each step and what files come out. Every arrow in this diagram corresponds to a file on disk that you can inspect.

.tflite / .onnx quantized model user_config.yaml your configuration stmaic_*.conf board config *.mpool memory pool config Step 1 — parse_config.py validates user_config.yaml fail fast if any field is wrong — before any computation starts Step 2 — gen_h_file.py reads model tensors output: app_config.h (NN_HEIGHT, KEYPOINTS_NB, POSTPROCESS_TYPE...) Step 3 — ST Edge AI Core: graph opt → epoch assign → memory alloc → C gen output: network.c + network_ecblobs.h + network_atonbuf.xSPI2.raw + report Step 4 — CubeIDE compile → SigningTool → CubeProgrammer flash output: _signed.bin @ 0x70100000 & weights @ 0x70380000 Board running — toggle boot switches LEFT → power cycle
Before you start — prerequisites:
  • ST Edge AI Core v2.1.0 installed at /opt/ST/STEdgeAI/2.1/
  • STM32CubeIDE 1.18.1 installed at /home/.../stm32cubeide_1.18.1_2/
  • Both repos cloned as siblings: stm32ai-modelzoo/ and stm32ai-modelzoo-services/
  • Python venv activated: source st_zoo/bin/activate
  • Board connected via USB (ST-Link), boot switches set to RIGHT
1

Choose your model and configure user_config.yaml

The first decision is which model to deploy. If it comes from the ST Model Zoo, it is already quantized — set model_path directly to the .tflite file and operation_mode: deployment. If it is an external model (YOLOv8, TinyBERT), set operation_mode: chain_qd to quantize and deploy in one pass.

Pre-quantized (Zoo model)
model_path: .../movenet_192_int8.tflite
operation_mode: deployment
Float model (external)
model_path: .../yolov8n.pt
operation_mode: chain_qd

Also set: tools.stedgeai.path_to_stedgeai, tools.path_to_cubeIDE, and deployment.hardware_setup.board: STM32N6570-DK. See the annotated user_config.yaml in Section 3.2 for the full example.

2

Launch the pipeline — one command

cd stm32ai-modelzoo-services/pose_estimation
python stm32ai_main.py operation_mode=deployment

stm32ai_main.py immediately calls parse_config.py to validate every field in user_config.yaml. If anything is wrong — a missing path, an unsupported board, an invalid quantization type — the pipeline stops here with a clear error message pointing to the exact field. This is the fail-fast design: no computation starts until the configuration is verified.

Common configuration errors at this step
  • model_path does not exist → check the path relative to the pose_estimation/ folder
  • tools.stedgeai.version does not match installed binary → run stedgeai --version to check
  • path_to_cubeIDE points to a directory, not the executable → must point to the stm32cubeide binary directly
  • keypoints: 13 with a 17-keypoint model → mismatch causes wrong skeleton rendering
3

ST Edge AI Core converts model to C

ST Edge AI Core is called automatically by common_deploy.py. It takes 30–90 seconds depending on model size. The output folder experiments_outputs/YYYY_MM_DD_HH_MM_SS/ contains everything you need to inspect the result:

Output file What to check
network_generate_report.txt Total epochs, EC vs SW count, memory usage per SRAM bank
C_header/app_config.h Verify NN_HEIGHT, NN_WIDTH, KEYPOINTS_NB match your model
stm32ai_main.log Check for warnings about unsupported operations or memory overflow
generated/network.c 5,882 lines — only check if the build fails in step 4
Common errors at this step
  • Model not INT8: stedgeai exits with "model must be quantized" → use chain_qd to quantize first
  • Memory overflow: activations exceed npuRAM capacity → reduce input resolution or use a smaller model variant
  • Unsupported op: a layer is not supported even as SW epoch → check stedgeai release notes for op support list
  • Permission denied on ll_aton: some generated files are root-owned → run sudo chmod -R 755 experiments_outputs/
4

CubeIDE compiles, SigningTool signs, CubeProgrammer flashes

common_deploy.py first copies the generated files into the CubeIDE project (via the templates list in the .conf file), then runs three tools in sequence — STM32CubeIDE for the headless build, STM32SigningTool for the signed binary, and STM32CubeProgrammer twice (once for the firmware at 0x70100000, once for the weights at 0x70380000). The compilation takes 2–5 minutes on first build; subsequent builds are faster due to incremental compilation.

# 1. Headless build
STM32CubeIDE --headlessbuild -build STM32N6570-DK_GettingStarted_PoseEstimation/Debug

# 2. Sign for secure boot
STM32SigningTool -s -bin Debug/*.bin -nk -t ssbl -hv 2.3 -o Debug/*_signed.bin

# 3a. Flash firmware @ 0x70100000
STM32CubeProgrammer -c port=swd -hardRst -w Debug/*_signed.bin 0x70100000

# 3b. Flash weights @ 0x70380000
STM32CubeProgrammer -c port=swd -hardRst -w network_atonbuf.xSPI2.bin 0x70380000
Common errors at this step
  • Build fails on network.c: version mismatch between stedgeai compiler and ll_aton runtime library → regenerate with the correct stedgeai version
  • STM32CubeProgrammer: no device found: board not in flashing mode → boot switches must be RIGHT before running the Python script
  • Signing fails: STM32SigningTool path not set in system PATH → add /opt/ST/STEdgeAI/2.1/Utilities/linux/ to PATH
  • Flash succeeds but board does nothing: boot switches still RIGHT after flash → toggle to LEFT then power-cycle
5

Validate on the board

After flashing completes: toggle boot switches to LEFT, power-cycle the board (unplug and replug USB), and wait 3–5 seconds. The LCD should show the camera preview with the skeleton overlay.

✓ Deployment successful if you see:
  • Live camera preview on LCD
  • Skeleton overlay on detected person
  • Real-time update (<50 ms latency)
  • Welcome screen visible on first boot
✗ Something is wrong if you see:
  • Black screen → boot switches still RIGHT
  • Camera but no skeleton → wrong model_type in config
  • Corrupted display → wrong keypoints count
  • Frozen image → inference crashed, check log

The C firmware execution starts in main.c: HAL init → DCMIPP start → inference loop → LL_ATON_RT_Main() (NPU) → display_spe.c (decode + draw). See Part 2 for the full annotated call chain.

Deployment summary — our three models

We deployed three models of increasing architectural complexity. The table shows the key differences in the deployment configuration and the resulting performance.

Model Format op_mode model_type Epochs NPU % Latency
MoveNet Lightning .tflite INT8 deployment heatmaps_spe 75 (71 EC + 4 SW) 94.7% 22 ms
YOLOv8n-pose .tflite INT8 chain_qd yolo_mpe 149 (131 EC + 18 SW) 87.9% 32 ms
TinyBERT .onnx INT8 chain_qd 270 (174 EC + 96 SW) 64.4% >100 ms
← Chapter 3 — Toolchain Next: Chapter 5 — Case Studies →