In this tutorial, we will learn how to use KotlinDL to infer the ONNX models on Desktop and Android platforms.
KotlinDL ONNX provides a set of pre-trained models through OnnxModels
API.
You can find the list of models here.
In this section, we will use MoveNet model for human pose estimation.
val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetMultiPoseLighting.pretrainedModel(modelHub)
val image = ImageConverter.toBufferedImage(File("path/to/image"))
val detectedPoses = model.inferAndCloseUsing(CPU()) {
model.detectPoses(image = image, confidence = 0.05f)
}
There are a lot of models that are not included in the ONNX Models Zoo API.
In this case, you can use a low-level API to load custom models from the file.
In this section, we will load and infer the SSDMobilenetV1 model using OnnxInferenceModel
API.
val model = OnnxInferenceModel("path/to/model.onnx")
val preprocessing = pipeline<BufferedImage>()
.resize {
outputHeight = 300
outputWidth = 300
}
.convert { colorMode = ColorMode.RGB }
.toFloatArray { }
val image = ImageConverter.toBufferedImage(File("path/to/image"))
val detections = model.inferAndCloseUsing(CPU()) {
val (inputData, shape) = preprocessing.apply(image)
it.predictRaw(inputData) { output -> {
val boxes = output.get2DFloatArray("outputBoxesName")
val classIndices = output.getFloatArray("outputClassesName")
val probabilities = output.getFloatArray("outputScoresName")
val numberOfFoundObjects = boxes.size
val foundObjects = mutableListOf<DetectedObject>()
for (i in 0 until numberOfFoundObjects) {
val detectedObject = DetectedObject(
// left, top, right, bottom
xMin = boxes[i][0],
yMin = boxes[i][1],
xMax = boxes[i][2],
yMax = boxes[i][3],
probability = probabilities[i],
label = Coco.V2017.labels()[classIndices[i].toInt()]
)
foundObjects.add(detectedObject)
}
foundObjects
}}
}
The inference on Android is almost identical to the Desktop JVM counterpart.
Slight differences appear due to the different image representations supported on a specific platform.
On Android, the primary input data type is Bitmap
the common image representation on the Android platform.
Another difference is that model files need to be downloaded separately.
In this section, the single pose detection model will be used. Note that input data type is Bitmap
instead of BufferedImage
.
val modelHub = ONNXModelHub(context) // Android context is required to access the application resources
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)
val bitmap = BitmapFactory.decodeStream(imageResource)
val detectedPose = model.inferAndCloseUsing(CPU()) {
model.detectPose(image = bitmap, confidence = 0.05f)
}
KotlinDL expects the model files to be located in the application resources. You can download the required models manually or use a Gradle plugins which downloads them automatically before the build.
To use the Gradle plugin, ensure that google
and gradlePluginPortal
repositories are listed in the settings.gradle
file:
pluginManagement {
repositories {
google()
gradlePluginPortal()
}
}
Then apply the plugin in the build script:
plugins {
id "org.jetbrains.kotlinx.kotlin-deeplearning-gradle-plugin" version "[KOTLIN-DL-VERSION]"
}
Configure plugin in the downloadKotlinDLModels
section.
downloadKotlinDLModels {
models = ["MoveNetSinglePoseLighting"] // list of model type names to download
sourceSet = "main" // optional name of the target source set ("main" by default)
overwrite = false // optional parameter to overwrite existing files ("true" by default)
}
The plugin creates a task named downloadKotlinDLModels
which is executed automatically before project is build
or can be executed manually if needed.
In this section we will use the same model as in the corresponding Desktop JVM section. Note that model instance is created from the byte representation of the model file loaded from the application resources. You can potentially load the model from external storage or the internet.
val modelBytes = resources.openRawResource(modelResource).readBytes()
val model = OnnxInferenceModel(modelBytes)
val preprocessing = pipeline<Bitmap>()
.resize {
outputHeight = 300
outputWidth = 300
}
.toFloatArray { layout = TensorLayout.NHWC }
val bitmap = BitmapFactory.decodeStream(imageResource)
val detections = model.inferAndCloseUsing(CPU()) {
val (inputData, shape) = preprocessing.apply(image)
it.predictRaw(inputData) { output -> {
val boxes = output.get2DFloatArray("outputBoxesName")
val classIndices = output.getFloatArray("outputClassesName")
val probabilities = output.getFloatArray("outputScoresName")
val numberOfFoundObjects = boxes.size
val foundObjects = mutableListOf<DetectedObject>()
for (i in 0 until numberOfFoundObjects) {
val detectedObject = DetectedObject(
// left, top, right, bottom
xMin = boxes[i][0],
yMin = boxes[i][1],
xMax = boxes[i][2],
yMax = boxes[i][3],
probability = probabilities[i],
label = Coco.V2017.labels()[classIndices[i].toInt()]
)
foundObjects.add(detectedObject)
}
foundObjects
}}
}
For more information about the KotlinDL ONNX API, please refer to the Documentation and examples. Please, also check out the Sample Android App for more details.
KotlinDL currently supports the following EPs:
- CPU (default)
- CUDA (for the devices with GPU and CUDA support)
- NNAPI (for Android devices with API 27+)
It is required to have the CUDA configured on your machine to use the CUDA EP. Please, also check how to configure dependencies for the execution on a GPU in the README.md.
There are a few options for specifying the EP to use. The models loaded using the ONNXModelHub API are instantiated with the default CPU EP.
val modelHub = ONNXModelHub(...)
val model = ONNXModels.PoseDetection.MoveNetMultiPoseLighting.pretrainedModel(modelHub) // default CPU EP is used
You can also specify the EP explicitly using the following syntax:
val model = modelHub.loadModel(ONNXModels.CV.EfficientNet4Lite, NNAPI())
Please note that when using a low-level API ONNXInferenceModel, you need to specify the EP explicitly.
You can do it using the functions inferUsing
and inferAndCloseUsing
.
Those functions explicitly declare the EPs to be used for inference in their scope.
Although these two functions have the same goal to initialize the model with the given
execution providers explicitly, they have a little different behavior.
inferAndCloseUsing
has Kotlin's 'use' scope function semantics, i.e., it closes the model at the end of the block;
meanwhile, inferUsing
is designed for repeated use and has Kotlin's 'run' scope function semantics.
val model = OnnxInferenceModel(...)
model.inferAndCloseUsing(CPU()) {
val result = it.predictRaw(image) { output -> ... }
}
Usage of inferUsingAndClose for one-time inference with CPU execution provider
val model = ONNXModels.PoseDetection.MoveNetMultiPoseLighting.pretrainedModel(...)
model.inferUsing(CUDA()) { poseDetectionModel ->
for (image in images) {
val result = it.predictRaw(image) { output -> ... }
...
}
}
model.close()
Usage of inferUsing for recurring inference with CUDA execution provider
Another option is to use the initializeWith function to configure EPs for the model instance.
val model = OnnxInferenceModel(...)
model.initializeWith(NNAPI())
Loading and initialization of the model with NNAPI execution provider