Multidisciplinary Project — A.Y. 2024–2025

Neural Network Deployment
on STM32N6 NPU

A study on deploying CNN and Transformer architectures on the STM32N6570-DK board and its Neural-ART NPU — exploring real-time pose estimation, INT8 quantization, and the limits of embedded AI at the edge.

MoveNet • 94.7% NPU • 18ms

YOLOv8n • 87.9% NPU • 32ms

TinyBERT • 64.4% NPU • >100ms

STM32N6570-DK • 600 GOPS • 800 MHz

Authors

Giacomo Colosio

LinkedIn GitHub Mail

Sebastiano Colosio

LinkedIn Mail

Patrizio Acquadro

LinkedIn GitHub Mail

Tito Nicola Drugman

LinkedIn GitHub Mail

What is this repository?

This is the code documentation for our Multidisciplinary Project at Politecnico di Milano (A.Y. 2024–2025), supervised by Prof. Cristina Silvano and Dr. Marco Ronzani.

We deployed three neural network models of increasing architectural complexity on the STM32N6570-DK development board, which features the Neural-ART NPU — a 600 GOPS hardware accelerator designed for embedded AI. Our goal was to understand how well a CNN-centric NPU handles both convolutional models (MoveNet, YOLOv8n) and Transformer-based models (TinyBERT), and to document the complete deployment pipeline from a quantized model file to real-time inference running on the board.

This documentation is designed to be read as a guide: it combines conceptual explanations, step-by-step deployment instructions, and fully annotated source code — so that anyone can reproduce our results or build on them.

How this documentation is organised

The documentation is divided into three parts, accessible from the navigation menu on the left or from the cards below.

Part 1 — Narrative Explanations

Six chapters that walk you through the theory, the hardware, the tools, and our findings — no code required. Start here if you want to understand the project from scratch.

Part 2 — Code Reference

23 fully annotated source files — 13 Python and 10 C/H — with function signatures, parameter descriptions, call graphs, and inline comments explaining every non-obvious decision.

Part 3 — Module Groups

Files grouped by layer: Firmware (all .c/.h) and PythonPipeline (all .py). Useful when you want to see a specific layer in isolation.

Part 1 — Narrative Explanations contains six chapters. Introduction explains the context: why Edge AI, what an NPU is, how CNNs and Transformers differ at the hardware level, and why INT8 quantization matters for embedded deployment. Hardware describes the STM32N6570-DK board and its Neural-ART NPU in detail — the 14 functional units, the SRAM banks, the memory map, and the boot switch procedure. Toolchain walks through the four software tools (Model Zoo, ModelZoo Services, ST Edge AI Core, STM32CubeIDE) and how to set them up from scratch. Deployment Workflow describes the five-step pipeline from model selection to on-board validation. Case Studies presents the three deployments with full profiling data: epoch counts, NPU offload rates, memory usage per SRAM bank, and measured inference latency. Results & Analysis compares the three models head-to-head and discusses what the numbers tell us about the NPU’s design.

Part 2 — Code Reference contains all 23 source files documented with Doxygen. Every function has a @brief description, @param and @return annotations, and inline comments explaining every non-trivial step. For C functions, Doxygen automatically generates call graphs (which functions it calls) and caller graphs (which functions call it) via Graphviz, making the execution flow immediately visible without reading the code.

Part 3 — Module Groups provides a thematic entry point: the Firmware group collects all C/H files that run on the board, and the PythonPipeline group collects all Python files that run on the host PC. This is useful when you want to understand a single layer without navigating the full file list.

Neural Network Deploymenton STM32N6 NPU

Authors

What is this repository?

How this documentation is organised

Recommended reading path

Neural Network Deployment
on STM32N6 NPU