Triton Inference Server
Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server.
Install Triton Docker Image
Pull the image using the following command.
docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3
Where <xx.yy> is the version of Triton that you want to pull.
Run Triton
Use the following command to run Triton with the example model repository you just created
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models
Verify Triton Is Running Correctly
From the host system use curl to access the HTTP endpoint that indicates server status.
curl -v localhost:8000/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
Sometimes it will give 400 bad request. In that case you can check as follows
curl localhost:8000/api/status
Getting The Client Examples
Use docker pull to get the client libraries and examples image from NGC.
docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk
Running The Image Classification Example
Model file folder structure should be like this
tlt-models -> flydentity20 -> config.pbtxt -> 1 -> model.plan
To send a request for the trt engine use an image from the /workspace/images directory. In this case we ask for the top 3 classifications.
/workspace/install/bin/image_client -m flydentity20 -c 3 -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
15.346230 (504) = COFFEE MUG
13.224326 (968) = CUP
10.422965 (505) = COFFEEPOT
Error
You will get the error like Unable to get model file metadata
This type of error caused due to version problems
For example
If we generate model file with tensorrt version 7.0.0 and cuda 10.0.0. For Triton inference server image also we need to use exact version orelse you might end up with the error as above
I hope we were able to explain how to use triton inference server on ubuntu and please feel free to contact support@clofus.com for any queries thanks