This document provides an overview of the available Machine Learning (ML) accelerator backends within this project. It details their purpose, configuration, and specific custom properties that can be used to tailor their behavior.
- Vendor: VeriSilicon
- Description: This backend utilizes VeriSilicon's Neural Processing Unit (NPU) for ML inference. It requires a model file (typically
.nb
) and a corresponding json file or shared object (.so
) file generated by Vivante's toolchain. These files contain the compiled network and necessary runtime functions. - Source File:
src/hal-backend-ml-vivante.cc
There are two options:
- model.nb + model.json
- model.nb + model.so
- Model Files (
prop->model_files
):model_files[0]
: Path to the Vivante model file (e.g.,my_model.nb
). This file contains the neural network graph and weights.- (Optional)
model_files[1]
: Path to the Vivante shared object file (e.g.,vnn_my_model.so
). This library provides functions likevnn_CreateNeuralNetwork
andvnn_ReleaseNeuralNetwork
.
Note that if the shared object file is not provided, the backend attempts to load the model using the JSON file (json path should be specified in custom_properties
)
json
:- Description: Path to the JSON file for the given model file.
- Key:
json
- Value: Path to the JSON file.
- Example:
json:/path/to/my_model.json
// yolo-v8m.json { "input_tensors": [ { "id": 2, "dim_num": 4, "size": [416, 416, 3, 1], "dtype": { "vx_type": "VSI_NN_TYPE_UINT8", "qnt_type": "VSI_NN_QNT_TYPE_AFFINE_ASYMMETRIC", "zero_point": 0, "scale": 0.00390625 } } ], "output_tensors": [ { "id": 0, "size": [3549, 4, 1], "dim_num": 3, "dtype": { "scale": 2.325678825378418, "zero_point": 12, "qnt_type": "VSI_NN_QNT_TYPE_AFFINE_ASYMMETRIC", "vx_type": "VSI_NN_TYPE_UINT8" } }, { "id": 1, "dim_num": 3, "size": [3549, 80, 1], "dtype": { "scale": 0.0038632985670119524, "zero_point": 0, "qnt_type": "VSI_NN_QNT_TYPE_AFFINE_ASYMMETRIC", "vx_type": "VSI_NN_TYPE_UINT8" } } ] }
- Vendor: Qualcomm
- Description: This backend leverages Qualcomm's Snapdragon Neural Processing Engine (SNPE) for executing ML models. It typically uses a
.dlc
(Deep Learning Container) file as its model format. - Source File:
src/hal-backend-ml-snpe.cc
- Model Files (
prop->model_files
):model_files[0]
: Path to the SNPE model file (e.g.,my_model.dlc
).
Custom properties for the SNPE backend are provided as a single string, with individual key:value
pairs separated by commas.
Format: key1:value1,key2:value2,key3:value3a;value3b
-
Runtime
:- Description: Specifies the preferred SNPE runtime target for model execution.
- Key:
Runtime
- Value:
CPU
: Use CPU runtime.GPU
: Use GPU runtime.DSP
: Use DSP runtime.NPU
orAIP
: Use NPU/AIP runtime (specifically maps toSNPE_RUNTIME_AIP_FIXED8_TF
).
- Example:
Runtime:DSP
-
OutputTensor
:- Description: Specifies the names of the output tensors the application wishes to retrieve. If not provided, the backend uses all default output tensors defined in the model.
- Key:
OutputTensor
- Value: A semicolon-separated list of output tensor names. Tensor names themselves can include colons.
- Example:
OutputTensor:detection_scores:0;detection_classes:0;raw_outputs/box_encodings
-
OutputType
:- Description: Specifies the desired data types for the output tensors. The order of types in the list should correspond to the order of output tensors (either the default order or the order specified by the
OutputTensor
property). - Key:
OutputType
- Value: A semicolon-separated list of data types.
FLOAT32
: Output tensor data type will be 32-bit float.TF8
: Output tensor data type will be 8-bit quantized (typicallyuint8_t
).
- Example:
OutputType:FLOAT32;TF8
(assuming two output tensors, the first as float32, the second as TF8)
- Description: Specifies the desired data types for the output tensors. The order of types in the list should correspond to the order of output tensors (either the default order or the order specified by the
-
InputType
:- Description: Specifies the data types for the input tensors. The order of types in the list should correspond to the order of input tensors as defined in the model.
- Key:
InputType
- Value: A semicolon-separated list of data types.
FLOAT32
: Input tensor data type is 32-bit float.TF8
: Input tensor data type is 8-bit quantized (typicallyuint8_t
. Efficient for raw RGB data). UsingTF8
usually implies that the model has built-in quantization parameters, and the backend will expect quantized input data. If the model expects float input butTF8
is specified, it might lead to errors.
- Example:
InputType:TF8
(assuming a single input tensor of TF8 type)
"Runtime:DSP,OutputTensor:my_output_tensor1;my_output_tensor2,OutputType:FLOAT32;FLOAT32,InputType:TF8"