<alp/inference.h> — NPU Dispatcher
A single API for running TFLite Micro models on the right silicon-specific NPU back-end.
Supported back-ends
| Back-end | TFLite Micro Kconfig | Available on |
|---|---|---|
| Arm Ethos-U55 | CONFIG_ALP_TFLM_ETHOS_U55 | Every E1M-AEN SKU (E3 / E4 / E5 / E6 / E7 / E8) — two instances per SoC |
| Arm Ethos-U85 | CONFIG_ALP_TFLM_ETHOS_U85 | E1M-AEN401 / AEN601 / AEN801 (Alif E4 / E6 / E8 only) — one instance per SoC, Transformer-capable, generative-AI forward path |
| Arm Ethos-U65 | CONFIG_ALP_TFLM_ETHOS_U65 | E1M-N93 family (NXP i.MX 93) |
| Renesas DRP-AI3 | CONFIG_ALP_TFLM_DRP_AI | E1M-X V2N family |
| DEEPX DX-M1 | ALP_SDK_INFERENCE_DEEPX_DX | E1M-X V2N-M1 family (non-TFLM runtime — DXNN) |
| CPU | (fallback, always available) | Any target — reference kernels |
The auto selector in board.yaml picks the highest-priority match for the active SoM. Order: U85 → U65 → U55 → DRP-AI → CPU. On U85-capable SKUs the loader also links the U55 driver shim as a secondary so legacy U55-compiled models still run.
The U55 / U85 split is driven by the loader's requires_cap matcher reading each SoM preset's capabilities: block — pick the SKU, the right CONFIG_* follows automatically.
Header
#include <alp/inference.h>
Quick example
extern const uint8_t my_model_tflite[];
extern const size_t my_model_tflite_len;
alp_inference_t *infer = alp_inference_open(&(alp_inference_config_t){
.backend = ALP_INFERENCE_BACKEND_AUTO,
.model_bytes = my_model_tflite,
.model_len = my_model_tflite_len,
.tensor_arena_kb = 256,
});
if (infer == NULL) {
int err = alp_last_error();
return err;
}
// Get input tensor metadata
alp_tensor_t input;
alp_inference_get_input(infer, 0, &input);
memcpy(input.data, my_image, input.byte_size);
// Run
alp_inference_invoke(infer);
// Read output
alp_tensor_t output;
alp_inference_get_output(infer, 0, &output);
// process output.data ...
alp_inference_close(infer);
board.yaml
inference:
backend: auto # auto | cpu | ethos_u | drpai | deepx_dx
tensor_arena_kb: 256
Model formats
- TFLite Micro — the universal input format. Vela-compiled for Ethos-U backends.
- DRP-AI binary — Renesas DRP-AI3 path consumes models converted via the DRP-AI Translator toolchain.
- DXNN — DEEPX path consumes
.dxnnfiles compiled from ONNX by the DEEPX compiler.
The dispatcher abstracts the format so application code only sees TFLite Micro. The build system invokes the right compiler for the active backend.
Tensor arena
tensor_arena_kb reserves working memory for the interpreter. Right-size this against the model's arena_size reported by Vela or the equivalent. Too small and alp_inference_open returns NULL with ALP_ERR_OUT_OF_RANGE; too large just wastes RAM.
Off-device training
The SDK is inference-only. Train your model offline in TensorFlow or PyTorch, export to TFLite (or ONNX for the DEEPX path), then deploy through <alp/inference.h>.
Reference applications
| App | Stack |
|---|---|
examples/aen/edgeai-vision-aen | camera → ISP → Ethos-U inference → OLED overlay |
examples/iot-connected-camera | camera → DRP-AI inference → MQTT publish |