|
STM32N6 NPU Deployment — Politecnico di Milano
1.0
Documentation for Neural Network Deployment on STM32N6 NPU - Politecnico di Milano 2024-2025
|
After understanding the hardware, it is time to look at the software side. This chapter walks through every tool involved in the deployment pipeline — what it does, how it fits together, and where to find the detailed code explanation in Part 2 and Part 3.
Before diving into the tools, there is one idea that ties everything together: the division of labor between Python and C.
Now that we understand the hardware platform — the STM32N6570-DK, its Neural-ART NPU, and its memory hierarchy — we can turn to the software side and understand all the steps required to deploy a neural network firmware on the board.
For each tool we will look at what it contains (code, models, configuration files), how it is used in practice, and — crucially — we will link directly to the detailed code explanations in Part 2 — Code Reference and Part 3 — Module Groups, where every function is documented line by line.
This chapter is structured in three parts, mirroring the division of labor described above:
The complete deployment pipeline is composed of four distinct software components, each responsible for a specific phase. They always execute in this exact order:
Notice the dashed vertical line in the diagram above. Everything to the left runs on your laptop in Python. Everything to the right produces and runs C code on the board. ST Edge AI Core sits exactly at the boundary: it takes a Python model as input and produces C source files as output.
Each component has its own dedicated page. For each one we examine: the repository structure, the key files, how it is configured and invoked, and direct links to the annotated source code in Part 2.
The public repository of pre-trained, validated models for STM32 deployment. Covers object detection, pose estimation, image classification and more. This is where MoveNet Lightning was sourced.
The Python framework that automates every step: training, quantization, evaluation, benchmarking, deployment. A single YAML file controls everything. All Python source files in Part 2 belong here.
The converter tool. Takes a quantized .tflite or .onnx model and produces optimised C arrays, assigns layers to NPU or CPU epochs, and generates a detailed profiling report. The Python-to-C bridge.
The IDE invoked automatically by the Python script to compile the C project and flash it to the board via ST-Link. Here we also examine the C firmware files that implement the real-time inference loop on the STM32N6570-DK.
Throughout the subpages of this chapter, every time we mention a Python function or a C file, we link directly to its annotated documentation in Part 2. If you want to go deeper on any specific file right now:
3.1 — ST Model Zoo 3.2 — ModelZoo Services 3.3 — ST Edge AI Core 3.4 — STM32CubeIDE & C Firmware