|
STM32N6 NPU Deployment — Politecnico di Milano
1.0
Documentation for Neural Network Deployment on STM32N6 NPU - Politecnico di Milano 2024-2025
|
ModelZoo Services is the Python framework that automates every step of the ML pipeline — from training to deployment — through a single YAML configuration file and a single command. This page documents every Python file in the repository, with direct links to the annotated source in Part 2.
🔗 github.com/STMicroelectronics/stm32ai-modelzoo-services
In the previous section we saw the ST Model Zoo: a repository of
pre-trained model files (.tflite, .h5, .onnx).
The Zoo gives you the model. But having a model file is not enough — you still
need to quantize it, convert it to C code, compile it, and flash it onto the board.
ModelZoo Services is the tool that does all of this.
The two repositories are designed to work together.
ST recommends cloning them as siblings in the same parent folder, because the
example user_config.yaml files in Services reference model paths
relative to the Zoo repository. In our case, this meant:
The connection point between the two repos is a single line in
user_config.yaml:
From that moment on, Services takes full control: it reads the model, validates the configuration, invokes ST Edge AI Core to convert it to C, calls STM32CubeIDE to compile, and flashes the board via ST-Link. The entire flow happens in a single Python process triggered by one command.
Services is built on two libraries that are worth understanding: Hydra and OmegaConf. Hydra is a framework by Meta AI for configuring complex applications via YAML files. OmegaConf provides the underlying configuration object with type safety and dot-notation access. Together they allow ST to build a pipeline where:
user_config.yaml is the only file you touch.
Every parameter — model path, quantization granularity,
board type, tool paths — lives there. No Python editing required.
parse_config.py validates every field before any computation starts.
If a required parameter is missing or invalid, you get a clear error message
pointing to the exact field — not a cryptic Python traceback.
Setting operation_mode: chain_qd runs quantization
followed by deployment in one command. No manual file passing between steps.
Every metric — OKS, mAP, inference time — is automatically logged to
MLflow for experiment tracking. The output directory also contains
a .log file with the full run history.
The entire pipeline is launched with one command:
This is the exact user_config.yaml we used for the YOLOv8n-pose deployment.
Every field is annotated inline. This is the only file you need to edit
to reproduce our results or deploy a different model.
| Field | Value | What it controls |
|---|---|---|
| operation_mode | deployment | Skips training and evaluation — goes straight to C generation and flash |
| general.model_type | yolo_mpe | Selects YOLO multi-pose postprocessor in Python eval and app_config.h |
| preprocessing.rescaling | scale=1/255, offset=0 | Used by apply_rescaling() for quantization calibration samples |
| postprocessing.NMS_thresh | 0.1 | Low IoU threshold — aggressive NMS suppression of duplicate detections |
| tools.stedgeai.version | 2.1.0 | Must match the installed stedgeai binary — mismatch causes build failure |
| deployment.board | STM32N6570-DK | Selects stmaic_STM32N6570-DK.conf — the build/flash config for this board |
| hydra.run.dir | experiments_outputs/${now:...} | Every run gets a timestamped output folder — old results are never overwritten |
| Mode | Steps executed | Board needed? | Used in this project? |
|---|---|---|---|
| deployment | Generate C + compile + flash | ✓ | Yes — MoveNet |
| chain_qd | Quantize → Deploy | ✓ | Yes — YOLOv8n, TinyBERT |
| quantization | Convert FP32 → INT8 only | ✗ | — |
| evaluation | Measure OKS / mAP on test set | ✗ | — |
| benchmarking | Measure latency on target | ✓ | — |
| training | Fine-tune or train from scratch | ✗ | — |
| chain_tqeb | Train → Quantize → Evaluate → Benchmark | ✓ | — |
Files ordered by execution order during operation_mode=deployment.
Decorated with @hydra.main, it is the top-level entry point.
Reads user_config.yaml, calls get_config() to
validate all parameters, then dispatches to deploy(),
quantize(), train(), or evaluate()
based on operation_mode.
main(cfg) → get_config() → deploy(cfg)
Validates every section of user_config.yaml via Hydra + OmegaConf.
Each section has its own parsing function that checks for legal/required/illegal
fields and sets defaults for missing optional ones. Raises clear errors pointing
to the exact field if validation fails. Returns a DefaultMunch
config object used as a global reference throughout the pipeline.
get_config() •
_parse_dataset_section() •
_parse_preprocessing_section() •
_parse_postprocessing_section() •
_parse_data_augmentation_section()
Called by stm32ai_main.py for deployment and chain_qd modes.
Orchestrates the four deployment steps in order:
(1) calls gen_h_user_file_n6() to generate app_config.h,
(2) selects the correct .conf board file for the STM32N6570-DK,
(3) invokes stm32ai_deploy_stm32n6() from common_deploy.py,
which calls ST Edge AI Core and STM32CubeIDE in sequence.
deploy(cfg) →
gen_h_file.py → common_deploy.py
The most important bridge in the codebase. Generates app_config.h —
the C header that tells the firmware everything about the model:
input width/height, number of keypoints, postprocessing type, confidence thresholds.
It loads the quantized .tflite with the TFLite interpreter and
reads tensor shapes directly, so no manual configuration is ever needed.
The output is the file that the C firmware compiles against.
gen_h_user_file_n6(config, quantized_model_path)
→ writes app_config.h
Routes quantization to one of two paths based on the input format.
For .onnx models (TinyBERT): calls quantize_onnx().
For .h5 / Keras models (YOLOv8n): uses the TFLite Converter with
a _representative_data_gen() that feeds calibration samples
from the quantization dataset split — this is what drives the scale and
zero-point estimation for every layer.
Supports per_tensor and per_channel granularity,
and optionally applies graph optimisation before quantisation.
quantize(cfg, ds) → .tflite path •
_tflite_ptq_quantizer() • _representative_data_gen()
preprocess.py provides three functions:
preprocess(cfg) loads all four dataset splits (train/val/quant/test)
via data_loader.py and returns tf.data.Dataset objects;
apply_rescaling(dataset, scale, offset) applies pixel normalisation
(e.g. pixel / 127.5 - 1 for MobileNet-style models);
preprocess_input(image, input_details) handles single-image
preprocessing for prediction mode, including uint8/int8/float32 quantization
of the input tensor.
data_loader.py handles COCO JSON parsing, image loading, and
keypoint coordinate normalisation to [0, 1].
The Python-side postprocessor — mirrors what the C firmware does on the board
but in TensorFlow for evaluation purposes. Contains five decoders:
heatmaps_spe_postprocess(tensor) decodes MoveNet output by finding
the argmax on each (48×48) heatmap channel and converting to
normalised (x, y, conf);
yolo_mpe_postprocess() applies padded NMS via
tf.image.non_max_suppression() per image;
plus hand and head landmark decoders for other use cases.
evaluate.py runs inference on the evaluation dataset using a
Keras .h5, TFLite interpreter, or ONNX Runtime session,
then computes accuracy and logs to MLflow. Supports three execution targets:
host (your laptop CPU), stedgeai_host, and
stedgeai_n6 (direct evaluation on the board via ST Edge AI runner).
metrics.py implements the full COCO keypoint evaluation protocol:
single_pose_oks() for MoveNet using per-keypoint COCO standard
deviations (nose has lower tolerance than ankle);
multi_pose_oks_mAP() for YOLOv8 across 10 IoU thresholds
[0.5:0.05:0.95]; and compute_ap() for the precision-recall AUC
via 101-point COCO interpolation.
Since we used pre-trained models from the Zoo, the training pipeline was never executed. However, it is fully implemented and production-quality — ready for anyone who wants to fine-tune or train a pose estimation model from scratch. Here is what each module does.
Orchestrates the full Keras training loop. Loads the model via
load_model_for_training(), applies frozen layers and dropout
from the config, wraps it in a HMTrainingModel custom training
class, then calls model.fit() with the configured callbacks.
After training, saves both best_model.h5 (best validation OKS)
and last_model.h5, then runs evaluate() on the
test set automatically.
Builds the Keras callback list for model.fit().
Always includes: ModelCheckpoint (saves best weights by
val_loss or val_oks), a second
ModelCheckpoint for last weights, LRTensorBoard
(logs learning rate), and CSVLogger (writes per-epoch metrics to CSV).
Optional callbacks from the YAML (ReduceLROnPlateau,
EarlyStopping, custom LR schedulers) are dynamically
instantiated via eval() from the config string.
Implements spe_loss() — the single pose estimation loss function.
Supports three output types: heatmaps (converts GT keypoints to
one-hot heatmaps via _reg_to_heatmaps() then computes MSE),
reg (direct coordinate regression MSE), and
reg_heatmaps (converts predicted heatmaps to coordinates via
_heatmaps_to_reg() then computes coordinate MSE).
All loss variants mask out invisible keypoints (visibility flag = 0) so
occluded joints do not contribute to the gradient.
Defines the ST variant of MoveNet Lightning as a Keras functional model.
Uses a MobileNetV2 backbone (pretrained on ImageNet)
with three feature pyramid connections at blocks 2, 5, and 9.
The decoder applies alternating DepthwiseConv2D +
BatchNorm + UpSampling2D + residual
Add() layers — a lightweight FPN that upsamples from
the backbone's 6×6 output back to 48×48 heatmaps.
Output: (batch, 48, 48, nb_keypoints) Sigmoid activations.
data_augmentation.py is the dispatcher: it reads the list of
augmentation functions from the YAML config and applies them in order to each
training batch. It handles both images and keypoint labels
simultaneously — if you flip an image horizontally, the keypoint coordinates
must be mirrored too, and left/right limbs must be swapped (e.g. left knee
becomes right knee). This is what the pose-specific modules handle.
pose_random_affine.py — flip, translation, rotation, shear, zoom
(all with joint coordinate updates)pose_random_misc.py — blur, Gaussian noise, random crop
(with bounding box and keypoint clipping)pose_random_utils.py — shared geometric utilitiesswap_list_dict.py — defines which keypoints swap on horizontal flip
(e.g. left shoulder ↔ right shoulder)