/ #Triton #Basic 

Triton Inference Server

Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server.

Install Triton Docker Image

Pull the image using the following command.

docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3

Where <xx.yy> is the version of Triton that you want to pull.

Run Triton

Use the following command to run Triton with the example model repository you just created

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models

Verify Triton Is Running Correctly

From the host system use curl to access the HTTP endpoint that indicates server status.

curl -v localhost:8000/v2/health/ready
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

Sometimes it will give 400 bad request. In that case you can check as follows

curl localhost:8000/api/status

Getting The Client Examples

Use docker pull to get the client libraries and examples image from NGC.

docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk

Running The Image Classification Example

Model file folder structure should be like this

tlt-models -> flydentity20 -> config.pbtxt -> 1 -> model.plan

To send a request for the trt engine use an image from the /workspace/images directory. In this case we ask for the top 3 classifications.

/workspace/install/bin/image_client -m flydentity20 -c 3 -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
    15.346230 (504) = COFFEE MUG
    13.224326 (968) = CUP
    10.422965 (505) = COFFEEPOT


You will get the error like Unable to get model file metadata

This type of error caused due to version problems

For example

If we generate model file with tensorrt version 7.0.0 and cuda 10.0.0. For Triton inference server image also we need to use exact version orelse you might end up with the error as above

I hope we were able to explain how to use triton inference server on ubuntu and please feel free to contact support@clofus.com for any queries thanks