STM32N6 NPU Deployment — Politecnico di Milano  1.0
Documentation for Neural Network Deployment on STM32N6 NPU - Politecnico di Milano 2024-2025
common_deploy Namespace Reference

Functions

None _keep_internal_weights (str path_network_data_params)
 
None _dispatch_weights (str internalFlashSizeFlash_KB, str kernelFlash_KB, str applicationSizeFlash_KB, str path_network_c_info, str path_network_data_params)
 
None stm32ai_deploy (bool target=False, str stlink_serial_number=None, str stm32ai_version=None, str c_project_path=None, str output_dir=None, str stm32ai_output=None, str optimization=None, str path_to_stm32ai=None, str path_to_cube_ide=None, list additional_files=None, str stmaic_conf_filename='stmaic_c_project.conf', int verbosity=None, bool debug=False, str model_path=None, str get_model_name_output=None, str stm32ai_ide=None, str stm32ai_serie=None, list credentials=None, bool on_cloud=False, bool check_large_model=False, cfg=None, Dict custom_objects=None)
 
None stm32ai_deploy_stm32n6 (bool target=False, str stlink_serial_number=None, str stm32ai_version=None, str c_project_path=None, str output_dir=None, str stm32ai_output=None, str optimization=None, str path_to_stm32ai=None, str path_to_cube_ide=None, list additional_files=None, str stmaic_conf_filename='stmaic_c_project.conf', int verbosity=None, bool debug=False, str model_path=None, str get_model_name_output=None, str stm32ai_ide=None, str stm32ai_serie=None, list credentials=None, bool on_cloud=False, bool check_large_model=False, str build_conf=None, cfg=None, Dict custom_objects=None, str input_data_type='', str output_data_type='', str inputs_ch_position='', str outputs_ch_position='')
 
None stm32ai_deploy_mpu (bool target=False, str board_ip_address=None, str board_deploy=None, List class_names=None, str c_project_path=None, int verbosity=None, bool debug=False, str model_path=None, cfg=None)
 

Function Documentation

◆ _dispatch_weights()

None common_deploy._dispatch_weights ( str  internalFlashSizeFlash_KB,
str  kernelFlash_KB,
str  applicationSizeFlash_KB,
str  path_network_c_info,
str  path_network_data_params 
)
private
@brief Splits model weights between internal and external Flash memory.

@details
When a model's weights are too large to fit entirely in the MCU's internal
Flash, this function distributes them between internal and external Flash
(e.g., OctoFlash on STM32N6570-DK) by annotating each weight array in the
generated C source with the appropriate GCC section attribute.

**Algorithm:**
1. Reads `network_c_info.json` — the ST Edge AI Core memory report — to
   obtain the list of all weight arrays with their sizes.
2. Filters to keep only read-only memory pools (`"rights": "ACC_READ"`),
   which correspond to model weights stored in Flash.
3. Sorts weight arrays from largest to smallest (greedy bin-packing strategy).
4. Iterates through the sorted list and greedily assigns each weight to
   internal Flash if space remains, otherwise to external Flash.
5. Injects GCC section attributes into `network_data_params.c` accordingly:
   - `AI_INTERNAL_FLASH __attribute__((section(".InternalFlashSection")))`
   - `AI_EXTERNAL_FLASH __attribute__((section(".ExternalFlashSection")))`

**Memory budget calculation:**
@code
freeInternalFlash = internalFlashSize - kernelFlash - applicationFlash
@endcode
The kernel Flash (ST AI runtime library) and application code are subtracted
from the total internal Flash to compute the space available for weights.

@param internalFlashSizeFlash_KB  Total internal Flash size in KB (e.g., "2048KB")
@param kernelFlash_KB             ST AI runtime library size in KB (e.g., "256KB")
@param applicationSizeFlash_KB    Application firmware size in KB (e.g., "512KB")
@param path_network_c_info        Path to `network_c_info.json` generated by
                                  ST Edge AI Core (contains memory pool details)
@param path_network_data_params   Path to `network_data_params.c` to be annotated

@return None

@note Uses a **greedy largest-first strategy** for bin packing — not optimal
      but fast and effective for the typical weight distribution of embedded models.

@see _keep_internal_weights() for the simpler case where all weights fit internally.

Definition at line 125 of file common_deploy.py.

Referenced by stm32ai_deploy().

Here is the caller graph for this function:

◆ _keep_internal_weights()

None common_deploy._keep_internal_weights ( str  path_network_data_params)
private
@brief Tags all model weight arrays for placement in internal Flash memory.

@details
This function reads the generated C file `network_data_params.c` produced by
ST Edge AI Core and injects a GCC section attribute before every weight array
declaration, forcing the linker to place all weights in the MCU's internal Flash.

The injected attribute is:
@code{.c}
AI_INTERNAL_FLASH __attribute__((section(".InternalFlashSection")))
@endcode

This function is called when the model weights fit entirely within the internal
Flash of the target board (no weight splitting required). It is the simpler
alternative to _dispatch_weights().

**How it works:**
1. Opens `network_data_params.c` for reading
2. Scans line by line for the `#include "network_data_params.h"` directive
   and injects the macro definition immediately before it
3. For each line containing a weight array declaration (matched by regex
   `const ai_uXX name[size]`), prepends the `AI_INTERNAL_FLASH` attribute
4. Writes the modified content to a temporary file, then atomically replaces
   the original using `os.replace()`

@param path_network_data_params  Absolute path to the generated
                                 `network_data_params.c` file inside
                                 the ST Edge AI Core output directory.

@return None

@note The file is modified **in place** using a write-then-rename pattern
      to avoid partial writes in case of failure.

@see _dispatch_weights() for the alternative function used when weights
     must be split between internal and external Flash.

Definition at line 64 of file common_deploy.py.

Referenced by stm32ai_deploy().

Here is the caller graph for this function:

◆ stm32ai_deploy()

None common_deploy.stm32ai_deploy ( bool   target = False,
str   stlink_serial_number = None,
str   stm32ai_version = None,
str   c_project_path = None,
str   output_dir = None,
str   stm32ai_output = None,
str   optimization = None,
str   path_to_stm32ai = None,
str   path_to_cube_ide = None,
list   additional_files = None,
str   stmaic_conf_filename = 'stmaic_c_project.conf',
int   verbosity = None,
bool   debug = False,
str   model_path = None,
str   get_model_name_output = None,
str   stm32ai_ide = None,
str   stm32ai_serie = None,
list   credentials = None,
bool   on_cloud = False,
bool   check_large_model = False,
  cfg = None,
Dict   custom_objects = None 
)
@brief Generic deployment function for STM32 MCU targets (H7, U5, etc.).

@details
This function orchestrates the complete deployment pipeline for standard
STM32 MCU boards (excluding STM32N6, which uses stm32ai_deploy_stm32n6()).

**Pipeline steps:**
1. **Session creation** — loads the model into an STMAi session workspace
2. **Board configuration** — reads the `.conf` file specifying memory pools,
   linker scripts, and build system paths
3. **Model compilation** — runs ST Edge AI Core (offline or cloud) to:
   - Convert the quantized model to optimized C arrays (`network.c`, `network_data_params.c`)
   - Generate the AI runtime library (`Lib/`, `Inc/`)
   - Optionally split weights between internal/external Flash for large models
4. **Firmware build and flash** — invokes STM32CubeIDE in headless mode to
   compile the C project and flash the binary via ST-Link

**Large model handling (`check_large_model=True`):**
When enabled, the function first benchmarks the model to measure its ROM and RAM
requirements, then compares them against the board's available memory pools.
If weights overflow internal Flash, `_dispatch_weights()` is called to split them.
If activations overflow AXIRAM, `update_activation_c_code()` redistributes
activation buffers across AXIRAM and SDRAM.

**Cloud vs. local execution:**
- `on_cloud=True`: uses the STM32Cube.AI Developer Cloud API for compilation
- `on_cloud=False`: uses the local `stedgeai` executable (used in this project)

@param target               Unused legacy parameter (kept for API compatibility).
@param stlink_serial_number ST-Link serial number for multi-board setups.
                            Leave empty if only one board is connected.
@param stm32ai_version      Version string of ST Edge AI Core (e.g., "2.1.0").
@param c_project_path       Absolute path to the STM32CubeIDE C project root.
@param output_dir           Directory for all deployment outputs (logs, generated files).
@param stm32ai_output       Directory where ST Edge AI Core writes generated C files.
@param optimization         Compilation optimization level: "balanced", "latency", "ram".
@param path_to_stm32ai      Absolute path to the `stedgeai` executable.
@param path_to_cube_ide     Absolute path to the `stm32cubeide` executable.
@param additional_files     Extra files to copy into the C project before building.
@param stmaic_conf_filename Board configuration file name (e.g., "stmaic_STM32N6570-DK.conf").
@param verbosity            Logging verbosity (None=silent, 1=info, 2=debug).
@param debug                Enable debug logging for the STMAi driver.
@param model_path           Absolute path to the quantized model (.tflite or .onnx).
@param get_model_name_output  Model name string used for Cloud API identification.
@param stm32ai_ide          IDE/compiler identifier (must be "gcc" for GCC toolchain).
@param stm32ai_serie        STM32 series string (e.g., "STM32H7", "STM32U5").
@param credentials          Pre-obtained cloud credentials from cloud_connect().
@param on_cloud             If True, use STM32Cube.AI Developer Cloud for compilation.
@param check_large_model    If True, perform memory analysis before compilation
                            and split weights/activations if needed.
@param cfg                  Hydra DictConfig for preprocessing parameters
                            (used by update_activation_c_code).
@param custom_objects       Custom Keras objects for model loading (if applicable).

@return None

@throws ValueError  If the model is too large to fit in any available memory.

@note **Not used in this project.** For STM32N6 deployment, use
      stm32ai_deploy_stm32n6() which adds Neural-ART NPU support.

Definition at line 232 of file common_deploy.py.

References _dispatch_weights(), _keep_internal_weights(), and external_memory_mgt.update_activation_c_code().

Here is the call graph for this function:

◆ stm32ai_deploy_mpu()

None common_deploy.stm32ai_deploy_mpu ( bool   target = False,
str   board_ip_address = None,
str   board_deploy = None,
List   class_names = None,
str   c_project_path = None,
int   verbosity = None,
bool   debug = False,
str   model_path = None,
  cfg = None 
)
@brief Deploy an AI model to an STM32MP MPU board over SSH/SCP.

@details
This function handles deployment on STM32MP-series Microprocessor Units (MPUs),
which run Linux and use a fundamentally different deployment mechanism than MCUs:
instead of flashing firmware via ST-Link, it transfers application files over the
network using SSH and SCP.

**Deployment mechanism:**
Unlike MCU deployment (which replaces the entire firmware binary), MPU deployment:
1. Verifies board reachability via ICMP ping
2. Creates the deployment directory on the target via SSH
3. Copies application code, resources, and the model file via SCP
4. Copies board-specific shell scripts (STM32MP1/*.sh or STM32MP2/*.sh)
5. Launches the application remotely via SSH

**Supported boards:**
- STM32MP257F-EV1 (STM32MP2 series)
- STM32MP157F-DK2 (STM32MP1 series)
- STM32MP135F-DK  (STM32MP1 series)

**File transfer structure:**
@code
c_project_path/
├── Application/     → Copied to board_deploy/Application/
│   └── launch_*.sh  → Main launch script
├── Resources/       → Copied to board_deploy/Resources/
│   └── class_names.txt  → Generated from class_names parameter
└── STM32MP1/*.sh    → Board-specific scripts (MP1) or STM32MP2/*.sh (MP2)
@endcode

@param target           Unused legacy parameter.
@param board_ip_address IP address of the target MPU board (e.g., "192.168.1.100").
                        The board must be on the same network as the host PC.
@param board_deploy     Deployment directory path on the target board's filesystem.
@param class_names      List of class name strings OR path to a .txt file containing
                        class names (one per line). Used for inference labeling.
@param c_project_path   Path to the C project containing Application/ and Resources/.
@param verbosity        Logging verbosity level.
@param debug            Enable debug logging.
@param model_path       Path to the AI model file to deploy (.tflite, .onnx, or .nb).
@param cfg              Hydra DictConfig object (currently unused in MPU path).

@return True if deployment succeeded, False on any error.

@throws None — errors are caught and logged, returning False instead.

@note SSH host key checking is disabled (`StrictHostKeyChecking no`) for
      convenience in lab/development environments. Do not use in production.

@note This function is **not used in this project** (which targets the STM32N6
      MCU, not an MPU). It is documented here for completeness.

Definition at line 739 of file common_deploy.py.

References stm32ai_main.str.

Referenced by deploy.deploy_mpu().

Here is the caller graph for this function:

◆ stm32ai_deploy_stm32n6()

None common_deploy.stm32ai_deploy_stm32n6 ( bool   target = False,
str   stlink_serial_number = None,
str   stm32ai_version = None,
str   c_project_path = None,
str   output_dir = None,
str   stm32ai_output = None,
str   optimization = None,
str   path_to_stm32ai = None,
str   path_to_cube_ide = None,
list   additional_files = None,
str   stmaic_conf_filename = 'stmaic_c_project.conf',
int   verbosity = None,
bool   debug = False,
str   model_path = None,
str   get_model_name_output = None,
str   stm32ai_ide = None,
str   stm32ai_serie = None,
list   credentials = None,
bool   on_cloud = False,
bool   check_large_model = False,
str   build_conf = None,
  cfg = None,
Dict   custom_objects = None,
str   input_data_type = '',
str   output_data_type = '',
str   inputs_ch_position = '',
str   outputs_ch_position = '' 
)
@brief STM32N6-specific deployment function with Neural-ART NPU support.

@details
This is the primary deployment function used in this project for all three
case studies (MoveNet Lightning, YOLOv8n-pose, TinyBERT).

It differs from stm32ai_deploy() in several critical ways specific to
the STM32N6 hardware and its Neural-ART NPU:

**Key differences from generic stm32ai_deploy():**

1. **Neural-ART path** — the compile options include a `st_neural_art` parameter
   pointing to the NPU Add-on configuration (`neuralart_user_path`). This enables
   ST Edge AI Core to generate NPU-optimized code with the 4CA convolution
   accelerator configuration:
   @code
   neural_art_path = profile + "@" + neuralart_user_path
   opt = STMAiCompileOptions(st_neural_art=neural_art_path, ...)
   @endcode

2. **Data type and channel format** — the STM32N6 camera pipeline delivers
   images as uint8 in channel-last (NHWC) format. These are specified explicitly:
   - `input_data_type='uint8'`
   - `inputs_ch_position='chlast'`
   The NPU processes activations in channel-last format internally, matching
   the TFLite model's native format.

3. **Build configuration** — supports specifying a build configuration name
   (`build_conf`, e.g., "Release") for the STM32CubeIDE project.

4. **Generated header files** — copies both `ai_model_config.h` AND
   `app_config.h` into the C project (the generic function only copies the former).

5. **No weight splitting** — STM32N6 has 128 MB octoFlash, so weight overflow
   is never a concern for the models used in this project.

**Compilation flow for STM32N6 (offline, used in this project):**
@code{.sh}
# Equivalent CLI command executed internally by _stmaic_local_call():
stedgeai generate \
    --model movenet_lightning_int8.tflite \
    --target stm32n6 \
    --st-neural-art default@/path/to/neuralart_options.json \
    --input-data-type uint8 \
    --inputs-ch-position chlast \
    --output /output/generated/
@endcode

@param target               Unused legacy parameter.
@param stlink_serial_number ST-Link serial number (empty if single board).
@param stm32ai_version      ST Edge AI Core version string.
@param c_project_path       Path to the STM32CubeIDE C project root.
@param output_dir           Output directory for deployment artifacts.
@param stm32ai_output       Directory for ST Edge AI Core generated files.
@param optimization         Optimization strategy: "balanced" (used in project).
@param path_to_stm32ai      Path to the stedgeai executable.
@param path_to_cube_ide     Path to the stm32cubeide executable.
@param additional_files     Extra files to copy into the C project.
@param stmaic_conf_filename Board .conf file (e.g., "stmaic_STM32N6570-DK.conf").
@param verbosity            Logging verbosity level.
@param debug                Enable debug mode for STMAi driver.
@param model_path           Path to the quantized INT8 model (.tflite or .onnx).
@param get_model_name_output  Model name for Cloud API identification.
@param stm32ai_ide          IDE string — must be "gcc" for STM32N6.
@param stm32ai_serie        Series string — must be "STM32N6".
@param credentials          Cloud credentials (unused if on_cloud=False).
@param on_cloud             Use Developer Cloud for compilation (False in project).
@param check_large_model    Enable pre-compilation memory analysis.
@param build_conf           STM32CubeIDE build configuration name (e.g., "Release").
@param cfg                  Hydra DictConfig object (for activation code updates).
@param custom_objects       Custom Keras objects for model loading.
@param input_data_type      NPU input data type — **must be 'uint8'** for camera pipeline.
@param output_data_type     NPU output data type — empty means auto-detect.
@param inputs_ch_position   Input channel format — **must be 'chlast'** (NHWC for TFLite).
@param outputs_ch_position  Output channel format — empty means auto-detect.

@return None

@note This function is called by deploy.py in the pose_estimation module,
      which reads all parameters from user_config.yaml via the Hydra cfg object.

@see deploy.py for the caller that reads parameters from user_config.yaml.
@see stm32ai_deploy() for the generic MCU version without NPU support.

Definition at line 505 of file common_deploy.py.

Referenced by deploy.deploy().

Here is the caller graph for this function: