Skip to main content

<alp/inference.h> — NPU Dispatcher

A single API for running TFLite Micro models on the right silicon-specific NPU back-end.

Supported back-ends

Back-endTFLite Micro KconfigAvailable on
Arm Ethos-U55CONFIG_ALP_TFLM_ETHOS_U55Every E1M-AEN SKU (E3 / E4 / E5 / E6 / E7 / E8) — two instances per SoC
Arm Ethos-U85CONFIG_ALP_TFLM_ETHOS_U85E1M-AEN401 / AEN601 / AEN801 (Alif E4 / E6 / E8 only) — one instance per SoC, Transformer-capable, generative-AI forward path
Arm Ethos-U65CONFIG_ALP_TFLM_ETHOS_U65E1M-N93 family (NXP i.MX 93)
Renesas DRP-AI3CONFIG_ALP_TFLM_DRP_AIE1M-X V2N family
DEEPX DX-M1ALP_SDK_INFERENCE_DEEPX_DXE1M-X V2N-M1 family (non-TFLM runtime — DXNN)
CPU(fallback, always available)Any target — reference kernels

The auto selector in board.yaml picks the highest-priority match for the active SoM. Order: U85 → U65 → U55 → DRP-AI → CPU. On U85-capable SKUs the loader also links the U55 driver shim as a secondary so legacy U55-compiled models still run.

The U55 / U85 split is driven by the loader's requires_cap matcher reading each SoM preset's capabilities: block — pick the SKU, the right CONFIG_* follows automatically.

#include <alp/inference.h>

Quick example

extern const uint8_t my_model_tflite[];
extern const size_t my_model_tflite_len;

alp_inference_t *infer = alp_inference_open(&(alp_inference_config_t){
.backend = ALP_INFERENCE_BACKEND_AUTO,
.model_bytes = my_model_tflite,
.model_len = my_model_tflite_len,
.tensor_arena_kb = 256,
});
if (infer == NULL) {
int err = alp_last_error();
return err;
}

// Get input tensor metadata
alp_tensor_t input;
alp_inference_get_input(infer, 0, &input);
memcpy(input.data, my_image, input.byte_size);

// Run
alp_inference_invoke(infer);

// Read output
alp_tensor_t output;
alp_inference_get_output(infer, 0, &output);
// process output.data ...

alp_inference_close(infer);

board.yaml

inference:
backend: auto # auto | cpu | ethos_u | drpai | deepx_dx
tensor_arena_kb: 256

Model formats

  • TFLite Micro — the universal input format. Vela-compiled for Ethos-U backends.
  • DRP-AI binary — Renesas DRP-AI3 path consumes models converted via the DRP-AI Translator toolchain.
  • DXNN — DEEPX path consumes .dxnn files compiled from ONNX by the DEEPX compiler.

The dispatcher abstracts the format so application code only sees TFLite Micro. The build system invokes the right compiler for the active backend.

Tensor arena

tensor_arena_kb reserves working memory for the interpreter. Right-size this against the model's arena_size reported by Vela or the equivalent. Too small and alp_inference_open returns NULL with ALP_ERR_OUT_OF_RANGE; too large just wastes RAM.

Off-device training

The SDK is inference-only. Train your model offline in TensorFlow or PyTorch, export to TFLite (or ONNX for the DEEPX path), then deploy through <alp/inference.h>.

Reference applications

AppStack
examples/aen/edgeai-vision-aencamera → ISP → Ethos-U inference → OLED overlay
examples/iot-connected-cameracamera → DRP-AI inference → MQTT publish

See also

Questions about this page? Discuss in Community Forum