|
STM32N6 NPU Deployment — Politecnico di Milano
1.0
Documentation for Neural Network Deployment on STM32N6 NPU - Politecnico di Milano 2024-2025
|
Functions | |
| None | _keep_internal_weights (str path_network_data_params) |
| None | _dispatch_weights (str internalFlashSizeFlash_KB, str kernelFlash_KB, str applicationSizeFlash_KB, str path_network_c_info, str path_network_data_params) |
| None | stm32ai_deploy (bool target=False, str stlink_serial_number=None, str stm32ai_version=None, str c_project_path=None, str output_dir=None, str stm32ai_output=None, str optimization=None, str path_to_stm32ai=None, str path_to_cube_ide=None, list additional_files=None, str stmaic_conf_filename='stmaic_c_project.conf', int verbosity=None, bool debug=False, str model_path=None, str get_model_name_output=None, str stm32ai_ide=None, str stm32ai_serie=None, list credentials=None, bool on_cloud=False, bool check_large_model=False, cfg=None, Dict custom_objects=None) |
| None | stm32ai_deploy_stm32n6 (bool target=False, str stlink_serial_number=None, str stm32ai_version=None, str c_project_path=None, str output_dir=None, str stm32ai_output=None, str optimization=None, str path_to_stm32ai=None, str path_to_cube_ide=None, list additional_files=None, str stmaic_conf_filename='stmaic_c_project.conf', int verbosity=None, bool debug=False, str model_path=None, str get_model_name_output=None, str stm32ai_ide=None, str stm32ai_serie=None, list credentials=None, bool on_cloud=False, bool check_large_model=False, str build_conf=None, cfg=None, Dict custom_objects=None, str input_data_type='', str output_data_type='', str inputs_ch_position='', str outputs_ch_position='') |
| None | stm32ai_deploy_mpu (bool target=False, str board_ip_address=None, str board_deploy=None, List class_names=None, str c_project_path=None, int verbosity=None, bool debug=False, str model_path=None, cfg=None) |
|
private |
@brief Splits model weights between internal and external Flash memory.
@details
When a model's weights are too large to fit entirely in the MCU's internal
Flash, this function distributes them between internal and external Flash
(e.g., OctoFlash on STM32N6570-DK) by annotating each weight array in the
generated C source with the appropriate GCC section attribute.
**Algorithm:**
1. Reads `network_c_info.json` — the ST Edge AI Core memory report — to
obtain the list of all weight arrays with their sizes.
2. Filters to keep only read-only memory pools (`"rights": "ACC_READ"`),
which correspond to model weights stored in Flash.
3. Sorts weight arrays from largest to smallest (greedy bin-packing strategy).
4. Iterates through the sorted list and greedily assigns each weight to
internal Flash if space remains, otherwise to external Flash.
5. Injects GCC section attributes into `network_data_params.c` accordingly:
- `AI_INTERNAL_FLASH __attribute__((section(".InternalFlashSection")))`
- `AI_EXTERNAL_FLASH __attribute__((section(".ExternalFlashSection")))`
**Memory budget calculation:**
@code
freeInternalFlash = internalFlashSize - kernelFlash - applicationFlash
@endcode
The kernel Flash (ST AI runtime library) and application code are subtracted
from the total internal Flash to compute the space available for weights.
@param internalFlashSizeFlash_KB Total internal Flash size in KB (e.g., "2048KB")
@param kernelFlash_KB ST AI runtime library size in KB (e.g., "256KB")
@param applicationSizeFlash_KB Application firmware size in KB (e.g., "512KB")
@param path_network_c_info Path to `network_c_info.json` generated by
ST Edge AI Core (contains memory pool details)
@param path_network_data_params Path to `network_data_params.c` to be annotated
@return None
@note Uses a **greedy largest-first strategy** for bin packing — not optimal
but fast and effective for the typical weight distribution of embedded models.
@see _keep_internal_weights() for the simpler case where all weights fit internally.
Definition at line 125 of file common_deploy.py.
Referenced by stm32ai_deploy().
|
private |
@brief Tags all model weight arrays for placement in internal Flash memory.
@details
This function reads the generated C file `network_data_params.c` produced by
ST Edge AI Core and injects a GCC section attribute before every weight array
declaration, forcing the linker to place all weights in the MCU's internal Flash.
The injected attribute is:
@code{.c}
AI_INTERNAL_FLASH __attribute__((section(".InternalFlashSection")))
@endcode
This function is called when the model weights fit entirely within the internal
Flash of the target board (no weight splitting required). It is the simpler
alternative to _dispatch_weights().
**How it works:**
1. Opens `network_data_params.c` for reading
2. Scans line by line for the `#include "network_data_params.h"` directive
and injects the macro definition immediately before it
3. For each line containing a weight array declaration (matched by regex
`const ai_uXX name[size]`), prepends the `AI_INTERNAL_FLASH` attribute
4. Writes the modified content to a temporary file, then atomically replaces
the original using `os.replace()`
@param path_network_data_params Absolute path to the generated
`network_data_params.c` file inside
the ST Edge AI Core output directory.
@return None
@note The file is modified **in place** using a write-then-rename pattern
to avoid partial writes in case of failure.
@see _dispatch_weights() for the alternative function used when weights
must be split between internal and external Flash.
Definition at line 64 of file common_deploy.py.
Referenced by stm32ai_deploy().
| None common_deploy.stm32ai_deploy | ( | bool | target = False, |
| str | stlink_serial_number = None, |
||
| str | stm32ai_version = None, |
||
| str | c_project_path = None, |
||
| str | output_dir = None, |
||
| str | stm32ai_output = None, |
||
| str | optimization = None, |
||
| str | path_to_stm32ai = None, |
||
| str | path_to_cube_ide = None, |
||
| list | additional_files = None, |
||
| str | stmaic_conf_filename = 'stmaic_c_project.conf', |
||
| int | verbosity = None, |
||
| bool | debug = False, |
||
| str | model_path = None, |
||
| str | get_model_name_output = None, |
||
| str | stm32ai_ide = None, |
||
| str | stm32ai_serie = None, |
||
| list | credentials = None, |
||
| bool | on_cloud = False, |
||
| bool | check_large_model = False, |
||
cfg = None, |
|||
| Dict | custom_objects = None |
||
| ) |
@brief Generic deployment function for STM32 MCU targets (H7, U5, etc.).
@details
This function orchestrates the complete deployment pipeline for standard
STM32 MCU boards (excluding STM32N6, which uses stm32ai_deploy_stm32n6()).
**Pipeline steps:**
1. **Session creation** — loads the model into an STMAi session workspace
2. **Board configuration** — reads the `.conf` file specifying memory pools,
linker scripts, and build system paths
3. **Model compilation** — runs ST Edge AI Core (offline or cloud) to:
- Convert the quantized model to optimized C arrays (`network.c`, `network_data_params.c`)
- Generate the AI runtime library (`Lib/`, `Inc/`)
- Optionally split weights between internal/external Flash for large models
4. **Firmware build and flash** — invokes STM32CubeIDE in headless mode to
compile the C project and flash the binary via ST-Link
**Large model handling (`check_large_model=True`):**
When enabled, the function first benchmarks the model to measure its ROM and RAM
requirements, then compares them against the board's available memory pools.
If weights overflow internal Flash, `_dispatch_weights()` is called to split them.
If activations overflow AXIRAM, `update_activation_c_code()` redistributes
activation buffers across AXIRAM and SDRAM.
**Cloud vs. local execution:**
- `on_cloud=True`: uses the STM32Cube.AI Developer Cloud API for compilation
- `on_cloud=False`: uses the local `stedgeai` executable (used in this project)
@param target Unused legacy parameter (kept for API compatibility).
@param stlink_serial_number ST-Link serial number for multi-board setups.
Leave empty if only one board is connected.
@param stm32ai_version Version string of ST Edge AI Core (e.g., "2.1.0").
@param c_project_path Absolute path to the STM32CubeIDE C project root.
@param output_dir Directory for all deployment outputs (logs, generated files).
@param stm32ai_output Directory where ST Edge AI Core writes generated C files.
@param optimization Compilation optimization level: "balanced", "latency", "ram".
@param path_to_stm32ai Absolute path to the `stedgeai` executable.
@param path_to_cube_ide Absolute path to the `stm32cubeide` executable.
@param additional_files Extra files to copy into the C project before building.
@param stmaic_conf_filename Board configuration file name (e.g., "stmaic_STM32N6570-DK.conf").
@param verbosity Logging verbosity (None=silent, 1=info, 2=debug).
@param debug Enable debug logging for the STMAi driver.
@param model_path Absolute path to the quantized model (.tflite or .onnx).
@param get_model_name_output Model name string used for Cloud API identification.
@param stm32ai_ide IDE/compiler identifier (must be "gcc" for GCC toolchain).
@param stm32ai_serie STM32 series string (e.g., "STM32H7", "STM32U5").
@param credentials Pre-obtained cloud credentials from cloud_connect().
@param on_cloud If True, use STM32Cube.AI Developer Cloud for compilation.
@param check_large_model If True, perform memory analysis before compilation
and split weights/activations if needed.
@param cfg Hydra DictConfig for preprocessing parameters
(used by update_activation_c_code).
@param custom_objects Custom Keras objects for model loading (if applicable).
@return None
@throws ValueError If the model is too large to fit in any available memory.
@note **Not used in this project.** For STM32N6 deployment, use
stm32ai_deploy_stm32n6() which adds Neural-ART NPU support.
Definition at line 232 of file common_deploy.py.
References _dispatch_weights(), _keep_internal_weights(), and external_memory_mgt.update_activation_c_code().
| None common_deploy.stm32ai_deploy_mpu | ( | bool | target = False, |
| str | board_ip_address = None, |
||
| str | board_deploy = None, |
||
| List | class_names = None, |
||
| str | c_project_path = None, |
||
| int | verbosity = None, |
||
| bool | debug = False, |
||
| str | model_path = None, |
||
cfg = None |
|||
| ) |
@brief Deploy an AI model to an STM32MP MPU board over SSH/SCP.
@details
This function handles deployment on STM32MP-series Microprocessor Units (MPUs),
which run Linux and use a fundamentally different deployment mechanism than MCUs:
instead of flashing firmware via ST-Link, it transfers application files over the
network using SSH and SCP.
**Deployment mechanism:**
Unlike MCU deployment (which replaces the entire firmware binary), MPU deployment:
1. Verifies board reachability via ICMP ping
2. Creates the deployment directory on the target via SSH
3. Copies application code, resources, and the model file via SCP
4. Copies board-specific shell scripts (STM32MP1/*.sh or STM32MP2/*.sh)
5. Launches the application remotely via SSH
**Supported boards:**
- STM32MP257F-EV1 (STM32MP2 series)
- STM32MP157F-DK2 (STM32MP1 series)
- STM32MP135F-DK (STM32MP1 series)
**File transfer structure:**
@code
c_project_path/
├── Application/ → Copied to board_deploy/Application/
│ └── launch_*.sh → Main launch script
├── Resources/ → Copied to board_deploy/Resources/
│ └── class_names.txt → Generated from class_names parameter
└── STM32MP1/*.sh → Board-specific scripts (MP1) or STM32MP2/*.sh (MP2)
@endcode
@param target Unused legacy parameter.
@param board_ip_address IP address of the target MPU board (e.g., "192.168.1.100").
The board must be on the same network as the host PC.
@param board_deploy Deployment directory path on the target board's filesystem.
@param class_names List of class name strings OR path to a .txt file containing
class names (one per line). Used for inference labeling.
@param c_project_path Path to the C project containing Application/ and Resources/.
@param verbosity Logging verbosity level.
@param debug Enable debug logging.
@param model_path Path to the AI model file to deploy (.tflite, .onnx, or .nb).
@param cfg Hydra DictConfig object (currently unused in MPU path).
@return True if deployment succeeded, False on any error.
@throws None — errors are caught and logged, returning False instead.
@note SSH host key checking is disabled (`StrictHostKeyChecking no`) for
convenience in lab/development environments. Do not use in production.
@note This function is **not used in this project** (which targets the STM32N6
MCU, not an MPU). It is documented here for completeness.
Definition at line 739 of file common_deploy.py.
References stm32ai_main.str.
Referenced by deploy.deploy_mpu().
| None common_deploy.stm32ai_deploy_stm32n6 | ( | bool | target = False, |
| str | stlink_serial_number = None, |
||
| str | stm32ai_version = None, |
||
| str | c_project_path = None, |
||
| str | output_dir = None, |
||
| str | stm32ai_output = None, |
||
| str | optimization = None, |
||
| str | path_to_stm32ai = None, |
||
| str | path_to_cube_ide = None, |
||
| list | additional_files = None, |
||
| str | stmaic_conf_filename = 'stmaic_c_project.conf', |
||
| int | verbosity = None, |
||
| bool | debug = False, |
||
| str | model_path = None, |
||
| str | get_model_name_output = None, |
||
| str | stm32ai_ide = None, |
||
| str | stm32ai_serie = None, |
||
| list | credentials = None, |
||
| bool | on_cloud = False, |
||
| bool | check_large_model = False, |
||
| str | build_conf = None, |
||
cfg = None, |
|||
| Dict | custom_objects = None, |
||
| str | input_data_type = '', |
||
| str | output_data_type = '', |
||
| str | inputs_ch_position = '', |
||
| str | outputs_ch_position = '' |
||
| ) |
@brief STM32N6-specific deployment function with Neural-ART NPU support.
@details
This is the primary deployment function used in this project for all three
case studies (MoveNet Lightning, YOLOv8n-pose, TinyBERT).
It differs from stm32ai_deploy() in several critical ways specific to
the STM32N6 hardware and its Neural-ART NPU:
**Key differences from generic stm32ai_deploy():**
1. **Neural-ART path** — the compile options include a `st_neural_art` parameter
pointing to the NPU Add-on configuration (`neuralart_user_path`). This enables
ST Edge AI Core to generate NPU-optimized code with the 4CA convolution
accelerator configuration:
@code
neural_art_path = profile + "@" + neuralart_user_path
opt = STMAiCompileOptions(st_neural_art=neural_art_path, ...)
@endcode
2. **Data type and channel format** — the STM32N6 camera pipeline delivers
images as uint8 in channel-last (NHWC) format. These are specified explicitly:
- `input_data_type='uint8'`
- `inputs_ch_position='chlast'`
The NPU processes activations in channel-last format internally, matching
the TFLite model's native format.
3. **Build configuration** — supports specifying a build configuration name
(`build_conf`, e.g., "Release") for the STM32CubeIDE project.
4. **Generated header files** — copies both `ai_model_config.h` AND
`app_config.h` into the C project (the generic function only copies the former).
5. **No weight splitting** — STM32N6 has 128 MB octoFlash, so weight overflow
is never a concern for the models used in this project.
**Compilation flow for STM32N6 (offline, used in this project):**
@code{.sh}
# Equivalent CLI command executed internally by _stmaic_local_call():
stedgeai generate \
--model movenet_lightning_int8.tflite \
--target stm32n6 \
--st-neural-art default@/path/to/neuralart_options.json \
--input-data-type uint8 \
--inputs-ch-position chlast \
--output /output/generated/
@endcode
@param target Unused legacy parameter.
@param stlink_serial_number ST-Link serial number (empty if single board).
@param stm32ai_version ST Edge AI Core version string.
@param c_project_path Path to the STM32CubeIDE C project root.
@param output_dir Output directory for deployment artifacts.
@param stm32ai_output Directory for ST Edge AI Core generated files.
@param optimization Optimization strategy: "balanced" (used in project).
@param path_to_stm32ai Path to the stedgeai executable.
@param path_to_cube_ide Path to the stm32cubeide executable.
@param additional_files Extra files to copy into the C project.
@param stmaic_conf_filename Board .conf file (e.g., "stmaic_STM32N6570-DK.conf").
@param verbosity Logging verbosity level.
@param debug Enable debug mode for STMAi driver.
@param model_path Path to the quantized INT8 model (.tflite or .onnx).
@param get_model_name_output Model name for Cloud API identification.
@param stm32ai_ide IDE string — must be "gcc" for STM32N6.
@param stm32ai_serie Series string — must be "STM32N6".
@param credentials Cloud credentials (unused if on_cloud=False).
@param on_cloud Use Developer Cloud for compilation (False in project).
@param check_large_model Enable pre-compilation memory analysis.
@param build_conf STM32CubeIDE build configuration name (e.g., "Release").
@param cfg Hydra DictConfig object (for activation code updates).
@param custom_objects Custom Keras objects for model loading.
@param input_data_type NPU input data type — **must be 'uint8'** for camera pipeline.
@param output_data_type NPU output data type — empty means auto-detect.
@param inputs_ch_position Input channel format — **must be 'chlast'** (NHWC for TFLite).
@param outputs_ch_position Output channel format — empty means auto-detect.
@return None
@note This function is called by deploy.py in the pose_estimation module,
which reads all parameters from user_config.yaml via the Hydra cfg object.
@see deploy.py for the caller that reads parameters from user_config.yaml.
@see stm32ai_deploy() for the generic MCU version without NPU support.
Definition at line 505 of file common_deploy.py.
Referenced by deploy.deploy().