STM32N6 NPU Deployment — Politecnico di Milano  1.0
Documentation for Neural Network Deployment on STM32N6 NPU - Politecnico di Milano 2024-2025
Part 2 — Code Reference

Part 2 — Code Reference

Annotated Source Code

23 source files — 13 Python and 10 C — documented with @brief, @param, @return, inline comments, and auto-generated call graphs. This page shows the execution flow of each layer so you can navigate directly to the file you need.

13 Python files
10 C / H files
Call graphs via Graphviz
How to navigate Part 2:
  • Follow the Python timeline below to understand the host-side deployment pipeline step by step
  • Follow the C firmware timeline to understand what runs on the board after flashing
  • Click any "View source →" button to open the fully annotated file with all function signatures and inline comments
  • On any function page, scroll down to find the call graph (what it calls) and caller graph (what calls it) — generated automatically by Graphviz
  • Use Module Groups in the left menu to browse by layer (Firmware / PythonPipeline)

🐍 Python Pipeline

Runs on the host PC. Triggered by python stm32ai_main.py operation_mode=deployment. The execution flows top to bottom — each file calls the next.

stm32ai_main.py Entry point
View source →

Decorated with @hydra.main. Reads user_config.yaml, dispatches to the correct operation mode (deployment, chain_qd, training, ...). The only file the user ever calls directly.

main(cfg) → get_config(cfg) → deploy(cfg)
calls immediately ↓
src/utils/parse_config.py YAML validation
View source →

Validates every section of user_config.yaml using Hydra + OmegaConf. Checks legal/required fields, sets defaults for missing optional ones, raises clear errors pointing to the exact invalid field. Returns a DefaultMunch config object used as a global reference by all downstream functions.

get_config(cfg) → _parse_dataset_section() → _parse_preprocessing_section() → _parse_deployment_section()
calls ↓
deployment/deploy.py Deployment orchestrator
View source →

Orchestrates the deployment steps: generates app_config.h, selects the board .conf file, calls stm32ai_deploy_stm32n6() in common_deploy.py which invokes ST Edge AI Core and STM32CubeIDE in sequence.

deploy(cfg) → gen_h_user_file_n6() → stm32ai_deploy_stm32n6()
forks ↓
src/utils/gen_h_file.py View →
Python → C bridge

Reads tensor shapes from the quantized TFLite model using the TFLite interpreter. Writes app_config.h with NN_HEIGHT, NN_WIDTH, KEYPOINTS_NB, POSTPROCESS_TYPE, and confidence thresholds.

common/common_deploy.py View →
stedgeai + CubeIDE invoker

Calls ST Edge AI Core CLI, copies generated files into the CubeIDE project via the .conf templates list, then invokes STM32CubeIDE headless build, SigningTool, and CubeProgrammer flash.

calls ↓
common/external_memory_mgt.py Linker script patcher
View source →

Patches the C source files and linker script to place the weight binary (network_atonbuf.xSPI2.raw) at the correct OctoFlash address (0x70380000). Without this patch, the firmware would look for weights at the wrong memory address and crash at boot.

patches STM32N657xx.ld → weights at 0x70380000
only if operation_mode=chain_qd ↓
src/quantization/quantize.py INT8 PTQ
View source →

Post-Training Quantization: converts float32 model to INT8. Routes to TFLite Converter (for .h5 models like YOLOv8) or ONNX quantizer (for .onnx models like TinyBERT). The inner _representative_data_gen() feeds calibration samples to determine scale and zero-point per layer.

quantize(cfg, ds) → .tflite INT8 or .onnx INT8
supporting modules ↓
src/preprocessing/
Dataset loading, rescaling, single-image preprocessing for evaluation.
src/postprocessing/
Heatmap decoder (MoveNet), NMS (YOLOv8), hand/head landmark decoders.
postprocess.py
src/evaluation/
OKS metric (MoveNet), mAP@[0.5:0.95] (YOLOv8). Logs to MLflow.
src/utils/
Model management and ST Edge AI runner interface.
models_mgt.py
src/training/ & src/models/ & src/data_augmentation/
Not used in this project — training pipeline for fine-tuning or from-scratch training. Fully implemented and documented.
train.py • callbacks.py • loss.py • heatmaps_train_model.py • st_movenet_lightning_heatmaps.py • data_augmentation.py • pose_random_affine.py

⚡ C Firmware — On-Board Execution

Runs on the STM32N6570-DK board after flashing. Execution starts at reset and loops continuously — capturing a frame, running inference, rendering the result, repeat.

Src/main.c Entry point + inference loop
View source →

HAL and clock initialisation, DCMIPP start, LCD setup. Then enters the main loop: waits for a camera frame snapshot, calls LL_ATON_RT_Main() to trigger the NPU epoch controller, then calls the postprocessor to decode results and render the skeleton on the LCD.

HAL_Init() → SystemClock_Config() → MX_DCMIPP_Init() → loop { LL_ATON_RT_Main() → postprocess }
calls at startup ↓
Inc/app_config.h Generated — Python → C
View source →

The only generated file included directly by the handwritten C firmware. Contains all model-specific constants: NN_HEIGHT=192, NN_WIDTH=192, AI_POSE_PP_POSE_KEYPOINTS_NB=13, POSTPROCESS_TYPE=POSTPROCESS_SPE_MOVENET_UF, AI_POSE_PP_CONF_THRESHOLD=0.4. Every other C file reads from this header — change the model, regenerate this file, recompile.

included by ↓
Src/app_camerapipeline.c DCMIPP dual-pipe
View source →

Configures the MIPI CSI-2 interface and the DCMIPP hardware block. Sets up two simultaneous output pipes: the display pipe continuously writes full-resolution frames to the PSRAM LCD framebuffer, while the NN pipe delivers a cropped, resized snapshot to npuRAM4 when triggered. Both pipes run in hardware — zero CPU involvement for memory transfers.

display pipe: camera → PSRAM (0x90000000) [continuous]
NN pipe: camera → npuRAM4 (0x34270000) [snapshot]
per frame: snapshot ready ↓
LL_ATON_RT_Main() NPU runtime — auto-generated

Not a file you wrote — part of the ll_aton runtime library injected by common_deploy.py. Executes all 75 epochs in sequence: loads the EC blob for each of the 71 NPU epochs into hardware registers, calls ll_sw_* functions for the 4 SW epochs. Input: npuRAM4 (192×192×3 camera frame). Output: npuRAM5 (48×48×13 heatmaps, float32).

npuRAM4 [0x34270000] → NPU (71 EC + 4 SW) → npuRAM5 [0x342E0000]
heatmaps ready in npuRAM5 ↓
Src/display_spe.c View →
MoveNet decoder

Reads float32 heatmaps from npuRAM5. Finds argmax per channel (48×48) to get keypoint coordinates. Filters by confidence threshold from app_config.h. Draws skeleton lines on LCD foreground layer using connectivity table from display_keypoints_13.h.

Src/display_mpe.c View →
YOLOv8 decoder

Reads YOLOv8 detection output (bounding boxes + 17 keypoints per person). Applies NMS filtering by confidence and IoU thresholds. Draws bounding boxes and 17-keypoint COCO skeletons on LCD for all detected persons simultaneously.

supporting headers ↓
display_keypoints_13.h
13-keypoint skeleton connectivity for MoveNet
View →
display_keypoints_17.h
17-keypoint COCO skeleton for YOLOv8
View →
crop_img.c / .h
Image crop utilities for NN pipe preprocessing
View →
main.h
Global typedefs and peripheral handles
View →
app_camerapipeline.h
DCMIPP init and snapshot trigger API
View →
utils.h
Shared utility macros and helper functions
View →

Quick access — all files

← Chapter 6 — Results & Analysis Part 3 — Module Groups →