Chapter 4 — Deployment Workflow

From Model File
to Running Board

A step-by-step walkthrough of the complete deployment pipeline — what files go in, what comes out, what each tool does, and what can go wrong. This chapter connects Chapter 3 (the tools) to Part 2 (the code).

5 steps

1 command

3 models deployed

The complete file flow

Before diving into each step, here is the big picture: what files enter each step and what files come out. Every arrow in this diagram corresponds to a file on disk that you can inspect.

Before you start — prerequisites:

ST Edge AI Core v2.1.0 installed at /opt/ST/STEdgeAI/2.1/
STM32CubeIDE 1.18.1 installed at /home/.../stm32cubeide_1.18.1_2/
Both repos cloned as siblings: stm32ai-modelzoo/ and stm32ai-modelzoo-services/
Python venv activated: source st_zoo/bin/activate
Board connected via USB (ST-Link), boot switches set to RIGHT

1

Choose your model and configure user_config.yaml

The first decision is which model to deploy. If it comes from the ST Model Zoo, it is already quantized — set model_path directly to the .tflite file and operation_mode: deployment. If it is an external model (YOLOv8, TinyBERT), set operation_mode: chain_qd to quantize and deploy in one pass.

Pre-quantized (Zoo model)

            model_path: .../movenet_192_int8.tflite

            operation_mode: deployment

Float model (external)

            model_path: .../yolov8n.pt

            operation_mode: chain_qd

Also set: tools.stedgeai.path_to_stedgeai, tools.path_to_cubeIDE, and deployment.hardware_setup.board: STM32N6570-DK. See the annotated user_config.yaml in Section 3.2 for the full example.

2

Launch the pipeline — one command

        cd stm32ai-modelzoo-services/pose_estimation

        python stm32ai_main.py
        operation_mode=deployment
      

stm32ai_main.py immediately calls parse_config.py to validate every field in user_config.yaml. If anything is wrong — a missing path, an unsupported board, an invalid quantization type — the pipeline stops here with a clear error message pointing to the exact field. This is the fail-fast design: no computation starts until the configuration is verified.

Common configuration errors at this step

model_path does not exist → check the path relative to the pose_estimation/ folder
tools.stedgeai.version does not match installed binary → run stedgeai --version to check
path_to_cubeIDE points to a directory, not the executable → must point to the stm32cubeide binary directly
keypoints: 13 with a 17-keypoint model → mismatch causes wrong skeleton rendering

3

ST Edge AI Core converts model to C

ST Edge AI Core is called automatically by common_deploy.py. It takes 30–90 seconds depending on model size. The output folder experiments_outputs/YYYY_MM_DD_HH_MM_SS/ contains everything you need to inspect the result:

Output file	What to check
network_generate_report.txt	Total epochs, EC vs SW count, memory usage per SRAM bank
C_header/app_config.h	Verify NN_HEIGHT, NN_WIDTH, KEYPOINTS_NB match your model
stm32ai_main.log	Check for warnings about unsupported operations or memory overflow
generated/network.c	5,882 lines — only check if the build fails in step 4

Common errors at this step

Model not INT8: stedgeai exits with "model must be quantized" → use chain_qd to quantize first
Memory overflow: activations exceed npuRAM capacity → reduce input resolution or use a smaller model variant
Unsupported op: a layer is not supported even as SW epoch → check stedgeai release notes for op support list
Permission denied on ll_aton: some generated files are root-owned → run sudo chmod -R 755 experiments_outputs/

4

CubeIDE compiles, SigningTool signs, CubeProgrammer flashes

common_deploy.py first copies the generated files into the CubeIDE project (via the templates list in the .conf file), then runs three tools in sequence — STM32CubeIDE for the headless build, STM32SigningTool for the signed binary, and STM32CubeProgrammer twice (once for the firmware at 0x70100000, once for the weights at 0x70380000). The compilation takes 2–5 minutes on first build; subsequent builds are faster due to incremental compilation.

        # 1. Headless build

        STM32CubeIDE --headlessbuild -build
        STM32N6570-DK_GettingStarted_PoseEstimation/Debug

        # 2. Sign for secure boot

        STM32SigningTool -s -bin Debug/*.bin -nk -t ssbl -hv 2.3
        -o Debug/*_signed.bin

        # 3a. Flash firmware @ 0x70100000

        STM32CubeProgrammer -c port=swd -hardRst -w Debug/*_signed.bin
        0x70100000

        # 3b. Flash weights @ 0x70380000

        STM32CubeProgrammer -c port=swd -hardRst -w network_atonbuf.xSPI2.bin
        0x70380000

Common errors at this step

Build fails on network.c: version mismatch between stedgeai compiler and ll_aton runtime library → regenerate with the correct stedgeai version
STM32CubeProgrammer: no device found: board not in flashing mode → boot switches must be RIGHT before running the Python script
Signing fails: STM32SigningTool path not set in system PATH → add /opt/ST/STEdgeAI/2.1/Utilities/linux/ to PATH
Flash succeeds but board does nothing: boot switches still RIGHT after flash → toggle to LEFT then power-cycle

5

Validate on the board

After flashing completes: toggle boot switches to LEFT, power-cycle the board (unplug and replug USB), and wait 3–5 seconds. The LCD should show the camera preview with the skeleton overlay.

✓ Deployment successful if you see:

Live camera preview on LCD
Skeleton overlay on detected person
Real-time update (<50 ms latency)
Welcome screen visible on first boot

✗ Something is wrong if you see:

Black screen → boot switches still RIGHT
Camera but no skeleton → wrong model_type in config
Corrupted display → wrong keypoints count
Frozen image → inference crashed, check log

The C firmware execution starts in main.c: HAL init → DCMIPP start → inference loop → LL_ATON_RT_Main() (NPU) → display_spe.c (decode + draw). See Part 2 for the full annotated call chain.

Deployment summary — our three models

We deployed three models of increasing architectural complexity. The table shows the key differences in the deployment configuration and the resulting performance.

Model	Format	op_mode	model_type	Epochs	NPU %	Latency
MoveNet Lightning	.tflite INT8	deployment	heatmaps_spe	75 (71 EC + 4 SW)	94.7%	22 ms
YOLOv8n-pose	.tflite INT8	chain_qd	yolo_mpe	149 (131 EC + 18 SW)	87.9%	32 ms
TinyBERT	.onnx INT8	chain_qd	—	270 (174 EC + 96 SW)	64.4%	>100 ms

← Chapter 3 — Toolchain Next: Chapter 5 — Case Studies →

From Model Fileto Running Board