Part 2 — Code Reference

Annotated Source Code

23 source files — 13 Python and 10 C — documented with @brief, @param, @return, inline comments, and auto-generated call graphs. This page shows the execution flow of each layer so you can navigate directly to the file you need.

13 Python files

10 C / H files

Call graphs via Graphviz

How to navigate Part 2:

Follow the Python timeline below to understand the host-side deployment pipeline step by step
Follow the C firmware timeline to understand what runs on the board after flashing
Click any "View source →" button to open the fully annotated file with all function signatures and inline comments
On any function page, scroll down to find the call graph (what it calls) and caller graph (what calls it) — generated automatically by Graphviz
Use Module Groups in the left menu to browse by layer (Firmware / PythonPipeline)

🐍 Python Pipeline

Runs on the host PC. Triggered by python stm32ai_main.py operation_mode=deployment. The execution flows top to bottom — each file calls the next.

stm32ai_main.py Entry point

View source →

Decorated with @hydra.main. Reads user_config.yaml, dispatches to the correct operation mode (deployment, chain_qd, training, ...). The only file the user ever calls directly.

        main(cfg) →
        get_config(cfg) →
        deploy(cfg)
      

    calls immediately ↓
  

src/utils/parse_config.py YAML validation

View source →

Validates every section of user_config.yaml using Hydra + OmegaConf. Checks legal/required fields, sets defaults for missing optional ones, raises clear errors pointing to the exact invalid field. Returns a DefaultMunch config object used as a global reference by all downstream functions.

        get_config(cfg) →
        _parse_dataset_section() →
        _parse_preprocessing_section() →
        _parse_deployment_section()
      

calls ↓

deployment/deploy.py Deployment orchestrator

View source →

Orchestrates the deployment steps: generates app_config.h, selects the board .conf file, calls stm32ai_deploy_stm32n6() in common_deploy.py which invokes ST Edge AI Core and STM32CubeIDE in sequence.

        deploy(cfg) →
        gen_h_user_file_n6() →
        stm32ai_deploy_stm32n6()
      

forks ↓

src/utils/gen_h_file.py View →

Python → C bridge

Reads tensor shapes from the quantized TFLite model using the TFLite interpreter. Writes app_config.h with NN_HEIGHT, NN_WIDTH, KEYPOINTS_NB, POSTPROCESS_TYPE, and confidence thresholds.

common/common_deploy.py View →

stedgeai + CubeIDE invoker

Calls ST Edge AI Core CLI, copies generated files into the CubeIDE project via the .conf templates list, then invokes STM32CubeIDE headless build, SigningTool, and CubeProgrammer flash.

calls ↓

common/external_memory_mgt.py Linker script patcher

View source →

Patches the C source files and linker script to place the weight binary (network_atonbuf.xSPI2.raw) at the correct OctoFlash address (0x70380000). Without this patch, the firmware would look for weights at the wrong memory address and crash at boot.

        patches STM32N657xx.ld →
        weights at 0x70380000
      

    only if operation_mode=chain_qd ↓
  

src/quantization/quantize.py INT8 PTQ

View source →

Post-Training Quantization: converts float32 model to INT8. Routes to TFLite Converter (for .h5 models like YOLOv8) or ONNX quantizer (for .onnx models like TinyBERT). The inner _representative_data_gen() feeds calibration samples to determine scale and zero-point per layer.

        quantize(cfg, ds) →
        .tflite INT8 or
        .onnx INT8
      

supporting modules ↓

src/preprocessing/

Dataset loading, rescaling, single-image preprocessing for evaluation.

preprocess.py data_loader.py

src/postprocessing/

Heatmap decoder (MoveNet), NMS (YOLOv8), hand/head landmark decoders.

postprocess.py

src/evaluation/

OKS metric (MoveNet), mAP@[0.5:0.95] (YOLOv8). Logs to MLflow.

evaluate.py metrics.py

src/utils/

Model management and ST Edge AI runner interface.

models_mgt.py

src/training/ & src/models/ & src/data_augmentation/

Not used in this project — training pipeline for fine-tuning or from-scratch training. Fully implemented and documented.

train.py • callbacks.py • loss.py • heatmaps_train_model.py • st_movenet_lightning_heatmaps.py • data_augmentation.py • pose_random_affine.py

⚡ C Firmware — On-Board Execution

Runs on the STM32N6570-DK board after flashing. Execution starts at reset and loops continuously — capturing a frame, running inference, rendering the result, repeat.

Src/main.c Entry point + inference loop

View source →

HAL and clock initialisation, DCMIPP start, LCD setup. Then enters the main loop: waits for a camera frame snapshot, calls LL_ATON_RT_Main() to trigger the NPU epoch controller, then calls the postprocessor to decode results and render the skeleton on the LCD.

        HAL_Init() → SystemClock_Config() →
        MX_DCMIPP_Init() →
        loop { LL_ATON_RT_Main() → postprocess }
      

calls at startup ↓

Inc/app_config.h Generated — Python → C

View source →

The only generated file included directly by the handwritten C firmware. Contains all model-specific constants: NN_HEIGHT=192, NN_WIDTH=192, AI_POSE_PP_POSE_KEYPOINTS_NB=13, POSTPROCESS_TYPE=POSTPROCESS_SPE_MOVENET_UF, AI_POSE_PP_CONF_THRESHOLD=0.4. Every other C file reads from this header — change the model, regenerate this file, recompile.

included by ↓

Src/app_camerapipeline.c DCMIPP dual-pipe

View source →

Configures the MIPI CSI-2 interface and the DCMIPP hardware block. Sets up two simultaneous output pipes: the display pipe continuously writes full-resolution frames to the PSRAM LCD framebuffer, while the NN pipe delivers a cropped, resized snapshot to npuRAM4 when triggered. Both pipes run in hardware — zero CPU involvement for memory transfers.

        display pipe: camera → PSRAM (0x90000000) [continuous]

        NN pipe: camera →
        npuRAM4 (0x34270000) [snapshot]
      

per frame: snapshot ready ↓

        LL_ATON_RT_Main()
        
          NPU runtime — auto-generated

Not a file you wrote — part of the ll_aton runtime library injected by common_deploy.py. Executes all 75 epochs in sequence: loads the EC blob for each of the 71 NPU epochs into hardware registers, calls ll_sw_* functions for the 4 SW epochs. Input: npuRAM4 (192×192×3 camera frame). Output: npuRAM5 (48×48×13 heatmaps, float32).

        npuRAM4 [0x34270000] → NPU (71 EC + 4 SW)
        → npuRAM5 [0x342E0000]
      

heatmaps ready in npuRAM5 ↓

Src/display_spe.c View →

MoveNet decoder

Reads float32 heatmaps from npuRAM5. Finds argmax per channel (48×48) to get keypoint coordinates. Filters by confidence threshold from app_config.h. Draws skeleton lines on LCD foreground layer using connectivity table from display_keypoints_13.h.

Src/display_mpe.c View →

YOLOv8 decoder

Reads YOLOv8 detection output (bounding boxes + 17 keypoints per person). Applies NMS filtering by confidence and IoU thresholds. Draws bounding boxes and 17-keypoint COCO skeletons on LCD for all detected persons simultaneously.