`<alp/inference.h>` — NPU Dispatcher

A single API for running TFLite Micro models on the right silicon-specific NPU back-end.

Supported back-ends

Back-end	TFLite Micro Kconfig	Available on
Arm Ethos-U55	`CONFIG_ALP_TFLM_ETHOS_U55`	Every E1M-AEN SKU (E3 / E4 / E5 / E6 / E7 / E8) — two instances per SoC
Arm Ethos-U85	`CONFIG_ALP_TFLM_ETHOS_U85`	E1M-AEN401 / AEN601 / AEN801 (Alif E4 / E6 / E8 only) — one instance per SoC, Transformer-capable, generative-AI forward path
Arm Ethos-U65	`CONFIG_ALP_TFLM_ETHOS_U65`	E1M-N93 family (NXP i.MX 93)
Renesas DRP-AI3	`CONFIG_ALP_TFLM_DRP_AI`	E1M-X V2N family
DEEPX DX-M1	`ALP_SDK_INFERENCE_DEEPX_DX`	E1M-X V2N-M1 family (non-TFLM runtime — DXNN)
CPU	(fallback, always available)	Any target — reference kernels

The auto selector in board.yaml picks the highest-priority match for the active SoM. Order: U85 → U65 → U55 → DRP-AI → CPU. On U85-capable SKUs the loader also links the U55 driver shim as a secondary so legacy U55-compiled models still run.

The U55 / U85 split is driven by the loader's requires_cap matcher reading each SoM preset's capabilities: block — pick the SKU, the right CONFIG_* follows automatically.

#include <alp/inference.h>

Quick example

extern const uint8_t my_model_tflite[];
extern const size_t  my_model_tflite_len;

alp_inference_t *infer = alp_inference_open(&(alp_inference_config_t){
    .backend         = ALP_INFERENCE_BACKEND_AUTO,
    .model_bytes     = my_model_tflite,
    .model_len       = my_model_tflite_len,
    .tensor_arena_kb = 256,
});
if (infer == NULL) {
    int err = alp_last_error();
    return err;
}

// Get input tensor metadata
alp_tensor_t input;
alp_inference_get_input(infer, 0, &input);
memcpy(input.data, my_image, input.byte_size);

// Run
alp_inference_invoke(infer);

// Read output
alp_tensor_t output;
alp_inference_get_output(infer, 0, &output);
// process output.data ...

alp_inference_close(infer);

`board.yaml`

inference:
  backend: auto              # auto | cpu | ethos_u | drpai | deepx_dx
  tensor_arena_kb: 256

Model formats

TFLite Micro — the universal input format. Vela-compiled for Ethos-U backends.
DRP-AI binary — Renesas DRP-AI3 path consumes models converted via the DRP-AI Translator toolchain.
DXNN — DEEPX path consumes .dxnn files compiled from ONNX by the DEEPX compiler.

The dispatcher abstracts the format so application code only sees TFLite Micro. The build system invokes the right compiler for the active backend.

Tensor arena

tensor_arena_kb reserves working memory for the interpreter. Right-size this against the model's arena_size reported by Vela or the equivalent. Too small and alp_inference_open returns NULL with ALP_ERR_OUT_OF_RANGE; too large just wastes RAM.

Off-device training

The SDK is inference-only. Train your model offline in TensorFlow or PyTorch, export to TFLite (or ONNX for the DEEPX path), then deploy through <alp/inference.h>.

Reference applications

App	Stack
`examples/aen/edgeai-vision-aen`	camera → ISP → Ethos-U inference → OLED overlay
`examples/iot-connected-camera`	camera → DRP-AI inference → MQTT publish

`<alp/inference.h>` — NPU Dispatcher

Supported back-ends

Header

Quick example

`board.yaml`

Model formats

Tensor arena

Off-device training

Reference applications

See also

Supported back-ends​

Header​

Quick example​

board.yaml​

Model formats​

Tensor arena​

Off-device training​

Reference applications​

See also​

Supported back-ends

Header

Quick example

`board.yaml`

Model formats

Tensor arena

Off-device training

Reference applications

See also