3.2 — ModelZoo Services

The Python Pipeline

ModelZoo Services is the Python framework that automates every step of the ML pipeline — from training to deployment — through a single YAML configuration file and a single command. This page documents every Python file in the repository, with direct links to the annotated source in Part 2.

🔗 github.com/STMicroelectronics/stm32ai-modelzoo-services

What is ModelZoo Services — and how does it connect to the Model Zoo?

In the previous section we saw the ST Model Zoo: a repository of pre-trained model files (.tflite, .h5, .onnx). The Zoo gives you the model. But having a model file is not enough — you still need to quantize it, convert it to C code, compile it, and flash it onto the board. ModelZoo Services is the tool that does all of this.

The two repositories are designed to work together. ST recommends cloning them as siblings in the same parent folder, because the example user_config.yaml files in Services reference model paths relative to the Zoo repository. In our case, this meant:

  parent_folder/

  ├── stm32ai-modelzoo/
                # model files live here

  │   └── pose_estimation/movenet/ST_pretrainedmodel_public_dataset/

  │         └── st_movenet_lightning_heatmaps_192_int8.tflite

  └── stm32ai-modelzoo-services/
       # scripts live here

      └── pose_estimation/

          ├── stm32ai_main.py

          └── user_config.yaml
                     # model_path points into Zoo above

The connection point between the two repos is a single line in user_config.yaml:

  # In user_config.yaml:

  general:

    model_path: ../../stm32ai-modelzoo/pose_estimation/movenet/
  ST_pretrainedmodel_public_dataset/st_movenet_lightning_heatmaps_192_int8.tflite

From that moment on, Services takes full control: it reads the model, validates the configuration, invokes ST Edge AI Core to convert it to C, calls STM32CubeIDE to compile, and flashes the board via ST-Link. The entire flow happens in a single Python process triggered by one command.

Why this architecture? — Hydra + OmegaConf

Services is built on two libraries that are worth understanding: Hydra and OmegaConf. Hydra is a framework by Meta AI for configuring complex applications via YAML files. OmegaConf provides the underlying configuration object with type safety and dot-notation access. Together they allow ST to build a pipeline where:

Single config file

user_config.yaml is the only file you touch. Every parameter — model path, quantization granularity, board type, tool paths — lives there. No Python editing required.

Validated before execution

parse_config.py validates every field before any computation starts. If a required parameter is missing or invalid, you get a clear error message pointing to the exact field — not a cryptic Python traceback.

Chain modes

Setting operation_mode: chain_qd runs quantization followed by deployment in one command. No manual file passing between steps.

MLflow logging

Every metric — OKS, mAP, inference time — is automatically logged to MLflow for experiment tracking. The output directory also contains a .log file with the full run history.

The single entry point

The entire pipeline is launched with one command:

    # Step 1 — set up the environment (once)

    cd stm32ai-modelzoo-services

    python -m venv st_zoo && source st_zoo/bin/activate

    pip install -r requirements.txt

    # Step 2 — configure user_config.yaml, then run

    cd pose_estimation

    python stm32ai_main.py
    operation_mode=deployment

The real user_config.yaml — our actual configuration

This is the exact user_config.yaml we used for the YOLOv8n-pose deployment. Every field is annotated inline. This is the only file you need to edit to reproduce our results or deploy a different model.

  general:

    model_path:
  .../quantizedtransformer.tflite
    # INT8 quantized model

    model_type:
  yolo_mpe
                                                
  # selects YOLO multi-pose postprocessor

    num_threads_tflite:
  8

  operation_mode:
  deployment
                                                          
  # skip training/eval — go straight to board

  dataset:

    keypoints:
  17
                                                              
  # COCO 17-keypoint skeleton for YOLOv8

  preprocessing:

    rescaling:
  { scale: 1/255., offset: 0 }
                               
  # pixel → [0,1] normalisation

    resizing:
  { aspect_ratio: fit, interpolation: nearest }

    color_mode: rgb

  postprocessing:
                                                                 
  # YOLO NMS parameters

    kpts_conf_thresh:
  0.15                                            
  # min keypoint confidence to display

    confidence_thresh:
  0.001
                                                
  # very low — detect as many boxes as possible

    NMS_thresh:
  0.1                                                         
  # IoU for NMS suppression

    max_detection_boxes:
  100

  tools:

    stedgeai:

      version:
  2.1.0
                                                             
  # must match installed binary

      path_to_stedgeai:
  /opt/ST/STEdgeAI/2.1/Utilities/linux/stedgeai

    path_to_cubeIDE:
  /home/.../stm32cubeide

  deployment:

    c_project_path:
  ./application_code/pose_estimation/STM32N6/

    IDE: GCC

    hardware_setup:

      board:
  STM32N6570-DK
                                                           
  # selects stmaic_STM32N6570-DK.conf

Field	Value	What it controls
operation_mode	deployment	Skips training and evaluation — goes straight to C generation and flash
general.model_type	yolo_mpe	Selects YOLO multi-pose postprocessor in Python eval and `app_config.h`
preprocessing.rescaling	scale=1/255, offset=0	Used by `apply_rescaling()` for quantization calibration samples
postprocessing.NMS_thresh	0.1	Low IoU threshold — aggressive NMS suppression of duplicate detections
tools.stedgeai.version	2.1.0	Must match the installed stedgeai binary — mismatch causes build failure
deployment.board	STM32N6570-DK	Selects `stmaic_STM32N6570-DK.conf` — the build/flash config for this board
hydra.run.dir	experiments_outputs/${now:...}	Every run gets a timestamped output folder — old results are never overwritten

Operation modes

Mode	Steps executed	Board needed?	Used in this project?
deployment	Generate C + compile + flash	✓	Yes — MoveNet
chain_qd	Quantize → Deploy	✓	Yes — YOLOv8n, TinyBERT
quantization	Convert FP32 → INT8 only	✗	—
evaluation	Measure OKS / mAP on test set	✗	—
benchmarking	Measure latency on target	✓	—
training	Fine-tune or train from scratch	✗	—
chain_tqeb	Train → Quantize → Evaluate → Benchmark	✓	—

Complete file tree — pose_estimation/

  pose_estimation/

  ├── stm32ai_main.py
           ← entry point

  ├── user_config.yaml
          ← the only file you edit

  ├── deployment/

  │   └── deploy.py

  └── src/

      ├── utils/

      │   ├── gen_h_file.py
                  ← Python→C bridge

      │   ├── parse_config.py

      │   ├── models_mgt.py

      │   └── connections.py

      ├── quantization/

      │   └── quantize.py

      ├── preprocessing/

      │   ├── preprocess.py

      │   └── data_loader.py

      ├── postprocessing/

      │   └── postprocess.py

      ├── evaluation/

      │   ├── evaluate.py

      │   └── metrics.py

      ├── training/

      │   ├── train.py

      │   ├── callbacks.py

      │   ├── loss.py

      │   └── heatmaps_train_model.py

      ├── models/

      │   ├── st_movenet_lightning_heatmaps.py

      │   └── custom.py

      ├── data_augmentation/

      │   ├── data_augmentation.py

      │   ├── pose_random_affine.py

      │   ├── pose_random_misc.py

      │   ├── pose_random_utils.py

      │   └── swap_list_dict.py

      └── prediction/

          └── predict.py

File-by-file — Deployment path

Files ordered by execution order during operation_mode=deployment.

stm32ai_main.py Entry point View in Part 2 →

Decorated with @hydra.main, it is the top-level entry point. Reads user_config.yaml, calls get_config() to validate all parameters, then dispatches to deploy(), quantize(), train(), or evaluate() based on operation_mode.

main(cfg) → get_config() → deploy(cfg)

src/utils/parse_config.py YAML validation View in Part 2 →

Validates every section of user_config.yaml via Hydra + OmegaConf. Each section has its own parsing function that checks for legal/required/illegal fields and sets defaults for missing optional ones. Raises clear errors pointing to the exact field if validation fails. Returns a DefaultMunch config object used as a global reference throughout the pipeline.

Key: get_config() • _parse_dataset_section() • _parse_preprocessing_section() • _parse_postprocessing_section() • _parse_data_augmentation_section()

deployment/deploy.py Orchestrator View in Part 2 →

Called by stm32ai_main.py for deployment and chain_qd modes. Orchestrates the four deployment steps in order: (1) calls gen_h_user_file_n6() to generate app_config.h, (2) selects the correct .conf board file for the STM32N6570-DK, (3) invokes stm32ai_deploy_stm32n6() from common_deploy.py, which calls ST Edge AI Core and STM32CubeIDE in sequence.

Key: deploy(cfg) → gen_h_file.py → common_deploy.py

src/utils/gen_h_file.py Python → C bridge View in Part 2 →

The most important bridge in the codebase. Generates app_config.h — the C header that tells the firmware everything about the model: input width/height, number of keypoints, postprocessing type, confidence thresholds. It loads the quantized .tflite with the TFLite interpreter and reads tensor shapes directly, so no manual configuration is ever needed. The output is the file that the C firmware compiles against.

Key: gen_h_user_file_n6(config, quantized_model_path) → writes app_config.h

src/quantization/quantize.py INT8 PTQ View in Part 2 →

Routes quantization to one of two paths based on the input format. For .onnx models (TinyBERT): calls quantize_onnx(). For .h5 / Keras models (YOLOv8n): uses the TFLite Converter with a _representative_data_gen() that feeds calibration samples from the quantization dataset split — this is what drives the scale and zero-point estimation for every layer. Supports per_tensor and per_channel granularity, and optionally applies graph optimisation before quantisation.

Key: quantize(cfg, ds) → .tflite path • _tflite_ptq_quantizer() • _representative_data_gen()

src/preprocessing/preprocess.py + data_loader.py

preprocess → data_loader →

preprocess.py provides three functions: preprocess(cfg) loads all four dataset splits (train/val/quant/test) via data_loader.py and returns tf.data.Dataset objects; apply_rescaling(dataset, scale, offset) applies pixel normalisation (e.g. pixel / 127.5 - 1 for MobileNet-style models); preprocess_input(image, input_details) handles single-image preprocessing for prediction mode, including uint8/int8/float32 quantization of the input tensor. data_loader.py handles COCO JSON parsing, image loading, and keypoint coordinate normalisation to [0, 1].

src/postprocessing/postprocess.py Heatmap decoder / NMS View in Part 2 →

The Python-side postprocessor — mirrors what the C firmware does on the board but in TensorFlow for evaluation purposes. Contains five decoders: heatmaps_spe_postprocess(tensor) decodes MoveNet output by finding the argmax on each (48×48) heatmap channel and converting to normalised (x, y, conf); yolo_mpe_postprocess() applies padded NMS via tf.image.non_max_suppression() per image; plus hand and head landmark decoders for other use cases.

src/evaluation/evaluate.py + metrics.py

evaluate → metrics →

evaluate.py runs inference on the evaluation dataset using a Keras .h5, TFLite interpreter, or ONNX Runtime session, then computes accuracy and logs to MLflow. Supports three execution targets: host (your laptop CPU), stedgeai_host, and stedgeai_n6 (direct evaluation on the board via ST Edge AI runner). metrics.py implements the full COCO keypoint evaluation protocol: single_pose_oks() for MoveNet using per-keypoint COCO standard deviations (nose has lower tolerance than ankle); multi_pose_oks_mAP() for YOLOv8 across 10 IoU thresholds [0.5:0.05:0.95]; and compute_ap() for the precision-recall AUC via 101-point COCO interpolation.

Training pipeline — not used in this project, but fully available

Since we used pre-trained models from the Zoo, the training pipeline was never executed. However, it is fully implemented and production-quality — ready for anyone who wants to fine-tune or train a pose estimation model from scratch. Here is what each module does.

    src/training/train.py
    Training loop
  

Orchestrates the full Keras training loop. Loads the model via load_model_for_training(), applies frozen layers and dropout from the config, wraps it in a HMTrainingModel custom training class, then calls model.fit() with the configured callbacks. After training, saves both best_model.h5 (best validation OKS) and last_model.h5, then runs evaluate() on the test set automatically.

    src/training/callbacks.py
    Keras callbacks
  

Builds the Keras callback list for model.fit(). Always includes: ModelCheckpoint (saves best weights by val_loss or val_oks), a second ModelCheckpoint for last weights, LRTensorBoard (logs learning rate), and CSVLogger (writes per-epoch metrics to CSV). Optional callbacks from the YAML (ReduceLROnPlateau, EarlyStopping, custom LR schedulers) are dynamically instantiated via eval() from the config string.

    src/training/loss.py
    Heatmap loss
  

Implements spe_loss() — the single pose estimation loss function. Supports three output types: heatmaps (converts GT keypoints to one-hot heatmaps via _reg_to_heatmaps() then computes MSE), reg (direct coordinate regression MSE), and reg_heatmaps (converts predicted heatmaps to coordinates via _heatmaps_to_reg() then computes coordinate MSE). All loss variants mask out invisible keypoints (visibility flag = 0) so occluded joints do not contribute to the gradient.

    src/models/st_movenet_lightning_heatmaps.py
    Model architecture
  

Defines the ST variant of MoveNet Lightning as a Keras functional model. Uses a MobileNetV2 backbone (pretrained on ImageNet) with three feature pyramid connections at blocks 2, 5, and 9. The decoder applies alternating DepthwiseConv2D + BatchNorm + UpSampling2D + residual Add() layers — a lightweight FPN that upsamples from the backbone's 6×6 output back to 48×48 heatmaps. Output: (batch, 48, 48, nb_keypoints) Sigmoid activations.

    src/data_augmentation/ (5 files)
    Data augmentation
  

data_augmentation.py is the dispatcher: it reads the list of augmentation functions from the YAML config and applies them in order to each training batch. It handles both images and keypoint labels simultaneously — if you flip an image horizontally, the keypoint coordinates must be mirrored too, and left/right limbs must be swapped (e.g. left knee becomes right knee). This is what the pose-specific modules handle.

pose_random_affine.py — flip, translation, rotation, shear, zoom (all with joint coordinate updates)
pose_random_misc.py — blur, Gaussian noise, random crop (with bounding box and keypoint clipping)
pose_random_utils.py — shared geometric utilities
swap_list_dict.py — defines which keypoints swap on horizontal flip (e.g. left shoulder ↔ right shoulder)

Deployment call chain — what happens when you press Enter

  # python stm32ai_main.py operation_mode=deployment


  stm32ai_main.py
   → 
  parse_config.py
    # validate user_config.yaml — fail fast if wrong

  stm32ai_main.py
   → 
  deploy.py
          # orchestrate the deployment

  deploy.py
   → 
  gen_h_file.py
      # read TFLite tensors → write app_config.h

  deploy.py
   → 
  common_deploy.py
   # call stedgeai CLI (model → C arrays)

  common_deploy.py
   → 
  external_memory_mgt.py
   # patch linker script for OctoFlash

  common_deploy.py
   → 
  STM32CubeIDE
       # compile C project

  STM32CubeIDE
   → 
  ST-Link → board
    # flash firmware → inference starts

← 3.1 ST Model Zoo Next: 3.3 ST Edge AI Core →