STM32N6 NPU Deployment — Politecnico di Milano  1.0
Documentation for Neural Network Deployment on STM32N6 NPU - Politecnico di Milano 2024-2025
3.2 — ModelZoo Services

3.2 — ModelZoo Services

The Python Pipeline

ModelZoo Services is the Python framework that automates every step of the ML pipeline — from training to deployment — through a single YAML configuration file and a single command. This page documents every Python file in the repository, with direct links to the annotated source in Part 2.

🔗 github.com/STMicroelectronics/stm32ai-modelzoo-services

What is ModelZoo Services — and how does it connect to the Model Zoo?

In the previous section we saw the ST Model Zoo: a repository of pre-trained model files (.tflite, .h5, .onnx). The Zoo gives you the model. But having a model file is not enough — you still need to quantize it, convert it to C code, compile it, and flash it onto the board. ModelZoo Services is the tool that does all of this.

The two repositories are designed to work together. ST recommends cloning them as siblings in the same parent folder, because the example user_config.yaml files in Services reference model paths relative to the Zoo repository. In our case, this meant:

parent_folder/
├── stm32ai-modelzoo/               # model files live here
│   └── pose_estimation/movenet/ST_pretrainedmodel_public_dataset/
│         └── st_movenet_lightning_heatmaps_192_int8.tflite
└── stm32ai-modelzoo-services/      # scripts live here
    └── pose_estimation/
        ├── stm32ai_main.py
        └── user_config.yaml                    # model_path points into Zoo above

The connection point between the two repos is a single line in user_config.yaml:

# In user_config.yaml:
general:
  model_path: ../../stm32ai-modelzoo/pose_estimation/movenet/ ST_pretrainedmodel_public_dataset/st_movenet_lightning_heatmaps_192_int8.tflite

From that moment on, Services takes full control: it reads the model, validates the configuration, invokes ST Edge AI Core to convert it to C, calls STM32CubeIDE to compile, and flashes the board via ST-Link. The entire flow happens in a single Python process triggered by one command.

ST Model Zoo .tflite / .onnx / .h5 pre-trained models model_path in YAML user_config .yaml operation_mode=deployment ModelZoo Services parse_config → deploy gen_h_file → common_deploy stedgeai → CubeIDE ST-Link → board STM32N6570 running firmware real-time inference The Zoo provides the model. Services automates everything else.

Why this architecture? — Hydra + OmegaConf

Services is built on two libraries that are worth understanding: Hydra and OmegaConf. Hydra is a framework by Meta AI for configuring complex applications via YAML files. OmegaConf provides the underlying configuration object with type safety and dot-notation access. Together they allow ST to build a pipeline where:

Single config file

user_config.yaml is the only file you touch. Every parameter — model path, quantization granularity, board type, tool paths — lives there. No Python editing required.

Validated before execution

parse_config.py validates every field before any computation starts. If a required parameter is missing or invalid, you get a clear error message pointing to the exact field — not a cryptic Python traceback.

Chain modes

Setting operation_mode: chain_qd runs quantization followed by deployment in one command. No manual file passing between steps.

MLflow logging

Every metric — OKS, mAP, inference time — is automatically logged to MLflow for experiment tracking. The output directory also contains a .log file with the full run history.

The single entry point

The entire pipeline is launched with one command:

# Step 1 — set up the environment (once)
cd stm32ai-modelzoo-services
python -m venv st_zoo && source st_zoo/bin/activate
pip install -r requirements.txt

# Step 2 — configure user_config.yaml, then run
cd pose_estimation
python stm32ai_main.py operation_mode=deployment

The real user_config.yaml — our actual configuration

This is the exact user_config.yaml we used for the YOLOv8n-pose deployment. Every field is annotated inline. This is the only file you need to edit to reproduce our results or deploy a different model.

general:
  model_path: .../quantizedtransformer.tflite   # INT8 quantized model
  model_type: yolo_mpe                                                # selects YOLO multi-pose postprocessor
  num_threads_tflite: 8

operation_mode: deployment                                                          # skip training/eval — go straight to board

dataset:
  keypoints: 17                                                              # COCO 17-keypoint skeleton for YOLOv8

preprocessing:
  rescaling: { scale: 1/255., offset: 0 }                               # pixel → [0,1] normalisation
  resizing: { aspect_ratio: fit, interpolation: nearest }
  color_mode: rgb

postprocessing:                                                                 # YOLO NMS parameters
  kpts_conf_thresh: 0.15                                             # min keypoint confidence to display
  confidence_thresh: 0.001                                                # very low — detect as many boxes as possible
  NMS_thresh: 0.1                                                          # IoU for NMS suppression
  max_detection_boxes: 100

tools:
  stedgeai:
    version: 2.1.0                                                             # must match installed binary
    path_to_stedgeai: /opt/ST/STEdgeAI/2.1/Utilities/linux/stedgeai
  path_to_cubeIDE: /home/.../stm32cubeide

deployment:
  c_project_path: ./application_code/pose_estimation/STM32N6/
  IDE: GCC
  hardware_setup:
    board: STM32N6570-DK                                                           # selects stmaic_STM32N6570-DK.conf
Field Value What it controls
operation_mode deployment Skips training and evaluation — goes straight to C generation and flash
general.model_type yolo_mpe Selects YOLO multi-pose postprocessor in Python eval and app_config.h
preprocessing.rescaling scale=1/255, offset=0 Used by apply_rescaling() for quantization calibration samples
postprocessing.NMS_thresh 0.1 Low IoU threshold — aggressive NMS suppression of duplicate detections
tools.stedgeai.version 2.1.0 Must match the installed stedgeai binary — mismatch causes build failure
deployment.board STM32N6570-DK Selects stmaic_STM32N6570-DK.conf — the build/flash config for this board
hydra.run.dir experiments_outputs/${now:...} Every run gets a timestamped output folder — old results are never overwritten

Operation modes

Mode Steps executed Board needed? Used in this project?
deployment Generate C + compile + flash Yes — MoveNet
chain_qd Quantize → Deploy Yes — YOLOv8n, TinyBERT
quantization Convert FP32 → INT8 only
evaluation Measure OKS / mAP on test set
benchmarking Measure latency on target
training Fine-tune or train from scratch
chain_tqeb Train → Quantize → Evaluate → Benchmark

Complete file tree — pose_estimation/

pose_estimation/
├── stm32ai_main.py          ← entry point
├── user_config.yaml         ← the only file you edit
├── deployment/
│   └── deploy.py
└── src/
    ├── utils/
    │   ├── gen_h_file.py                 ← Python→C bridge
    │   ├── parse_config.py
    │   ├── models_mgt.py
    │   └── connections.py
    ├── quantization/
    │   └── quantize.py
    ├── preprocessing/
    │   ├── preprocess.py
    │   └── data_loader.py
    ├── postprocessing/
    │   └── postprocess.py
    ├── evaluation/
    │   ├── evaluate.py
    │   └── metrics.py
    ├── training/
    │   ├── train.py
    │   ├── callbacks.py
    │   ├── loss.py
    │   └── heatmaps_train_model.py
    ├── models/
    │   ├── st_movenet_lightning_heatmaps.py
    │   └── custom.py
    ├── data_augmentation/
    │   ├── data_augmentation.py
    │   ├── pose_random_affine.py
    │   ├── pose_random_misc.py
    │   ├── pose_random_utils.py
    │   └── swap_list_dict.py
    └── prediction/
        └── predict.py

File-by-file — Deployment path

Files ordered by execution order during operation_mode=deployment.

stm32ai_main.py Entry point View in Part 2 →

Decorated with @hydra.main, it is the top-level entry point. Reads user_config.yaml, calls get_config() to validate all parameters, then dispatches to deploy(), quantize(), train(), or evaluate() based on operation_mode.

main(cfg)get_config()deploy(cfg)
src/utils/parse_config.py YAML validation View in Part 2 →

Validates every section of user_config.yaml via Hydra + OmegaConf. Each section has its own parsing function that checks for legal/required/illegal fields and sets defaults for missing optional ones. Raises clear errors pointing to the exact field if validation fails. Returns a DefaultMunch config object used as a global reference throughout the pipeline.

Key: get_config()_parse_dataset_section()_parse_preprocessing_section()_parse_postprocessing_section()_parse_data_augmentation_section()
deployment/deploy.py Orchestrator View in Part 2 →

Called by stm32ai_main.py for deployment and chain_qd modes. Orchestrates the four deployment steps in order: (1) calls gen_h_user_file_n6() to generate app_config.h, (2) selects the correct .conf board file for the STM32N6570-DK, (3) invokes stm32ai_deploy_stm32n6() from common_deploy.py, which calls ST Edge AI Core and STM32CubeIDE in sequence.

Key: deploy(cfg)gen_h_file.pycommon_deploy.py
src/utils/gen_h_file.py Python → C bridge View in Part 2 →

The most important bridge in the codebase. Generates app_config.h — the C header that tells the firmware everything about the model: input width/height, number of keypoints, postprocessing type, confidence thresholds. It loads the quantized .tflite with the TFLite interpreter and reads tensor shapes directly, so no manual configuration is ever needed. The output is the file that the C firmware compiles against.

Key: gen_h_user_file_n6(config, quantized_model_path) → writes app_config.h
src/quantization/quantize.py INT8 PTQ View in Part 2 →

Routes quantization to one of two paths based on the input format. For .onnx models (TinyBERT): calls quantize_onnx(). For .h5 / Keras models (YOLOv8n): uses the TFLite Converter with a _representative_data_gen() that feeds calibration samples from the quantization dataset split — this is what drives the scale and zero-point estimation for every layer. Supports per_tensor and per_channel granularity, and optionally applies graph optimisation before quantisation.

Key: quantize(cfg, ds).tflite path • _tflite_ptq_quantizer()_representative_data_gen()
src/preprocessing/preprocess.py + data_loader.py

preprocess.py provides three functions: preprocess(cfg) loads all four dataset splits (train/val/quant/test) via data_loader.py and returns tf.data.Dataset objects; apply_rescaling(dataset, scale, offset) applies pixel normalisation (e.g. pixel / 127.5 - 1 for MobileNet-style models); preprocess_input(image, input_details) handles single-image preprocessing for prediction mode, including uint8/int8/float32 quantization of the input tensor. data_loader.py handles COCO JSON parsing, image loading, and keypoint coordinate normalisation to [0, 1].

src/postprocessing/postprocess.py Heatmap decoder / NMS View in Part 2 →

The Python-side postprocessor — mirrors what the C firmware does on the board but in TensorFlow for evaluation purposes. Contains five decoders: heatmaps_spe_postprocess(tensor) decodes MoveNet output by finding the argmax on each (48×48) heatmap channel and converting to normalised (x, y, conf); yolo_mpe_postprocess() applies padded NMS via tf.image.non_max_suppression() per image; plus hand and head landmark decoders for other use cases.

src/evaluation/evaluate.py + metrics.py

evaluate.py runs inference on the evaluation dataset using a Keras .h5, TFLite interpreter, or ONNX Runtime session, then computes accuracy and logs to MLflow. Supports three execution targets: host (your laptop CPU), stedgeai_host, and stedgeai_n6 (direct evaluation on the board via ST Edge AI runner). metrics.py implements the full COCO keypoint evaluation protocol: single_pose_oks() for MoveNet using per-keypoint COCO standard deviations (nose has lower tolerance than ankle); multi_pose_oks_mAP() for YOLOv8 across 10 IoU thresholds [0.5:0.05:0.95]; and compute_ap() for the precision-recall AUC via 101-point COCO interpolation.

Training pipeline — not used in this project, but fully available

Since we used pre-trained models from the Zoo, the training pipeline was never executed. However, it is fully implemented and production-quality — ready for anyone who wants to fine-tune or train a pose estimation model from scratch. Here is what each module does.

src/training/train.py Training loop

Orchestrates the full Keras training loop. Loads the model via load_model_for_training(), applies frozen layers and dropout from the config, wraps it in a HMTrainingModel custom training class, then calls model.fit() with the configured callbacks. After training, saves both best_model.h5 (best validation OKS) and last_model.h5, then runs evaluate() on the test set automatically.

src/training/callbacks.py Keras callbacks

Builds the Keras callback list for model.fit(). Always includes: ModelCheckpoint (saves best weights by val_loss or val_oks), a second ModelCheckpoint for last weights, LRTensorBoard (logs learning rate), and CSVLogger (writes per-epoch metrics to CSV). Optional callbacks from the YAML (ReduceLROnPlateau, EarlyStopping, custom LR schedulers) are dynamically instantiated via eval() from the config string.

src/training/loss.py Heatmap loss

Implements spe_loss() — the single pose estimation loss function. Supports three output types: heatmaps (converts GT keypoints to one-hot heatmaps via _reg_to_heatmaps() then computes MSE), reg (direct coordinate regression MSE), and reg_heatmaps (converts predicted heatmaps to coordinates via _heatmaps_to_reg() then computes coordinate MSE). All loss variants mask out invisible keypoints (visibility flag = 0) so occluded joints do not contribute to the gradient.

src/models/st_movenet_lightning_heatmaps.py Model architecture

Defines the ST variant of MoveNet Lightning as a Keras functional model. Uses a MobileNetV2 backbone (pretrained on ImageNet) with three feature pyramid connections at blocks 2, 5, and 9. The decoder applies alternating DepthwiseConv2D + BatchNorm + UpSampling2D + residual Add() layers — a lightweight FPN that upsamples from the backbone's 6×6 output back to 48×48 heatmaps. Output: (batch, 48, 48, nb_keypoints) Sigmoid activations.

src/data_augmentation/ (5 files) Data augmentation

data_augmentation.py is the dispatcher: it reads the list of augmentation functions from the YAML config and applies them in order to each training batch. It handles both images and keypoint labels simultaneously — if you flip an image horizontally, the keypoint coordinates must be mirrored too, and left/right limbs must be swapped (e.g. left knee becomes right knee). This is what the pose-specific modules handle.

  • pose_random_affine.py — flip, translation, rotation, shear, zoom (all with joint coordinate updates)
  • pose_random_misc.py — blur, Gaussian noise, random crop (with bounding box and keypoint clipping)
  • pose_random_utils.py — shared geometric utilities
  • swap_list_dict.py — defines which keypoints swap on horizontal flip (e.g. left shoulder ↔ right shoulder)

Deployment call chain — what happens when you press Enter

# python stm32ai_main.py operation_mode=deployment

stm32ai_main.py parse_config.py # validate user_config.yaml — fail fast if wrong
stm32ai_main.py deploy.py # orchestrate the deployment
deploy.py gen_h_file.py # read TFLite tensors → write app_config.h
deploy.py common_deploy.py # call stedgeai CLI (model → C arrays)
common_deploy.py external_memory_mgt.py # patch linker script for OctoFlash
common_deploy.py STM32CubeIDE # compile C project
STM32CubeIDE ST-Link → board # flash firmware → inference starts
← 3.1 ST Model Zoo Next: 3.3 ST Edge AI Core →