Artificial Intelligent(AI) is found to be the next big hot topic in the human world. From start-ups to large firms are now ready to make million dollar investments in the feild of AI to grow their businesses. Many of us consider AI to be some Sci-fi movie, but AI machines can do some pretty cool stuff and also make some impossible stuff possible, few of which human kind will never be able to. In today’s fast forward world, technology is not only about creating hardware and software tools for real world purposes, it is much more advanced. Do you want to know what’s the best example of that, it’s AI, the new change. Self-driving cars, something which is already there on the road in many developed countries, to kids palying online chess championship, AI has already proved that it can outperform human minds with its high tech new, real time tools.

AI is the science and engineering of making intelligent machines.
-John McCarthy, Father of AI

Don’t know to code, but want to explore AI? We got you covered !

Ever wondered which is the world’s best machine? The human brain is the most intelligent machine that ever existed and exists.

The amount of time the human brain takes to process any piece of information is beyond imagination. From a layman’s point of view, the main goal of AI is to create a system that will be functioning in an intelligent and independent manner making human life easier.

The computer scientist thought of interpreting the working pattern of the human brain, and that became the beginning of AI. To understand things better let’s look into an example, when we meet someone for the first time, we don’t know them, they introduced themselves with their name, and we remember that, as our brain sees the person and makes a note of the features of that particular person, maybe say their height, complexion, etc. So the next time when the human brain encounters those parameters (features of the individual), it will remember the label (name of the person) it had given the last time it came across the similar features. This is simply how the AI works at the ground level.


What’s transfer learning?

Transfer learning is simple a technique where a model is developed for one task and in the future, it is used as a start point to a different task. So, here the knowledge that is gained by a neural network from a particular task is applied to a different task. It is always considered better to starting something, with prior knowledge or experience, instead of beginning from scratch. Say for example if we are training a neural network model to recognize a bottle, in the future the same knowledge could be applied in showing the customers of amazon, the bottle of their choice!

Already learnt to ride a motorbike ⮫ Now it’s time to practice riding a car

Already learnt to play classic piano ⮫ Now it’s time to practice jazz piano

Already Know the math and statistics ⮫ Wow, it’s time to learn machine learning

The TLT is a Python-based AI toolkit for creating highly optimized and accurate AI related applications using transfer learning commands and pre-trained models.

Many people think that AI is something that cannot be accessed by anyone and everyone. The TLT has broken this myth, and it has made it clear that, through TLT ** AI can be used by everyone!** young researchers, data scientist, student developers, system developers, engineers in different feilds of study.

Why should one consider transfer learning as an option in developing an AI model?

Machines are very good at pattern recognition because they use surplus data and multiple numbers of dimensions. They can see things in multiple dimensions than humans, and they are expected to learn from these patterns. Once the process of learning is fully completed, they are expected to do the predictions. TLT is one of many feilds which makes possible for computers to learn without being explicitly programmed.

Understanding better

In the image shown below, the first model is trained to predict if the image is a cat or not. So, in the future, if we have a project in which we will have to predict the color of the cat, the previously trained model which predicts if the given image is a cat, is used as a part of the model. Instead of starting from scratch, this saves a lot of time.

The process of training the data of a previously used model is called pre-training and the process of training the model for our actual project is called fine-tuning. So, here we have taken the knowledge obtained in pretraining, ie, training the model to predict if it’s a cat, and then applied that knowledge in fine-tuning, ie., training the model to predict the color of the model. Another example that could give us a better idea is that when we train a model for speech recognition the knowledge obtained through this training can be used to train the model where we want to start a voice assistant system. For instance, nowadays a lot of households have Siri or Alexa. The model trained for speech recognition is used to train the model which is used up wake up Siri when the customer says, hello Siri, play this music for me

why should one use the transfer learning toolkit?

Transfer learning becomes an ideal option when there is a lot of data in the task you are transfering the knowledge from and less data for the problem you are transferring to. Inorder to build an accurate AI application, one needs a large amount of dataset, especially if you create from scratch. Getting and preparing a large dataset and assigning a labele all the images is costly, time-consuming, and requires extreme domain expertise. Transfer learning saves the user a lot of time and the accuracy level is high. To make faster and accurate application possible through AI training, NVIDIA has released highly accurate, purpose-built, pre-trained models with the NVIDIA Transfer Learning Toolkit (TLT) 2.0.

Things you should know before getting started with TLT

What are AI models?

What is TLT docker container?


What isNVIDIA DeepStream SDK?

What is NVIDIA TensorRT?

What is Resnet?

How to implement?

1. Download the required TLT Docker container and their AI models from NVIDIA NGC for your specific application 
2. NGC has both pruned and unpruned model 
3. Train and validate with your personalized dataset 
4. Export the trained model 
5. Deploy on the edge using [NVIDIA DeepStream SDK](https://developer.nvidia.com/deepstream-sdk) and [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt). 

Implementation Example

Main Objective

Here we will be building a model which will successfully recognize and categorize the images into either a cat or dog. The dataset we will be using is taken from the popular example of AI, which is Dogs vs. Cats Challenge.

STEP 1: Set Environment

Here we just try to set an environment, like simply telling our system that there are variables with which we will be dealing with. %set_env is utilized to define the value of the environment variables used in the project. If setenv is not given any arguments, it displays all environment variables and their respective values. If only VAR is specified, it sets an environment variable of that name to an empty (null) value.

  %set_env KEY=NGt1OWFtMTBqc2VyaHAzMzc3NDQwdGgyZWo6YmZiMjEwNTAtYWMzNi00YmZjLThkNjAtNDg3NGI1NDZlZjMw
%set_env `YOUR PATH` =/workspace/tlt-experiments/CAT_DOG_PREDICTION/cat_dog_prediction/aitraining/tlt
!ls $``your path`
#LAYERS = 18/150/101
%set_env no_layer = 18
%set_env no_epoch = 50


For object detection application in TLT expects the data to be in KITTI file format inorder to train and evaluate. For DetectNet_v2, SSD, DSSD, YOLOv3, and FasterRCNN, ResNet the obtained data is converted to the TFRecords format for training. TFRecords help iterate faster through the data. Simply the below command will do that for you

!cat $`YOUR PATH`/spec/convertkittitotfrecords.txt

The rm command (“remove”) is used for deleting files and directories on a Unix or Linux system.

The files which get stored becomes redundant, this command simply removes them, and when you run the commands from the first step again, it’s a fresh start.


-o It generates optimized bytecode (resulting in .pyo files).

!rm -rf tfrecords
!tlt-dataset-convert -d $`YOUR PATH`/spec/convertkittitotfrecords.txt \
-o $`YOUR PATH`/tfrecords/

STEP3: Train Model

In the ssd_train file the arguments are given. For example, the parameters like color, body size, the color of the eyes, the shape of the cat, and the dog will be given. Every time the commands are run, your model is trained for predicting better.

The more is the Learning rate lesser is going to be the training time.

The num_epochs can be increased gradually for better accuracy. After a particular number of epoch, the accuracy level gets saturated, ie, that will be the maximum possible accuracy level.

You simply change the parameters according to your necessity and run the below command to train your desired model.

!cat $`YOUR PATH`/spec/ssd_train_resnet_kitti.txt

OpenCV-Python is a library of Python bindings specially designed to solve problems in computer vision.

cv2.imread() is a method that loads an image from the given specified file. If the image cannot be read because of reasons like, if the file is missing, improper permissions, due to unsupported or invalid format, then this particular method returns a matrix which will be empty.

The image should be in the working directory or a full path of the image should be given.

import cv2
im = cv2.imread('/workspace/tlt-experiments/CAT_DOG_PREDICTION/cat_dog_prediction/aitraining/tlt/kitti/images/20200623_132251.jpg')

In ssd_train if num_epoch is set to 100, then EPOCH should be set to 100 in the below command.

%set_env no_epoch = 100

!tlt-train ssd -e $`YOUR PATH`/spec/ssd_train_resnet_kitti.txt \
    -r $`YOUR PATH`/build/experiment_dir_unpruned/ \
    -k NGt1OWFtMTBqc2VyaHAzMzc3NDQwdGgyZWo6YmZiMjEwNTAtYWMzNi00YmZjLThkNjAtNDg3NGI1NDZlZjMw \
    -m $`YOUR PATH`/pretrained/pretrained_resnet${LAYERS}/tlt_pretrained_object_detection_vresnet${LAYERS}/resnet_${LAYERS}.hdf5 \
    --gpus 1

Pruning aims at reducing the number of parameters, for application focused to TLT, making the model that is many times faster.

The unpruned models are found to be big. A model is created according to the implemented algorithm and if pruning is applied, a step along the process looks at those nodes which does not affect the overall performance at large scale. Those particular nodes are then removed.

Pruining is carried out inorder to understand better and to minimize the risk of overfitting. That is, for the model to be classify the training data perfectly.

You simply run the below command to remove unnecessary layers in your neural network model for better reducing the size and getting rid of the redundant layer which will not affect the overall performance

STEP 4: Prune Model

!tlt-prune -m $`YOUR PATH`/build/experiment_dir_unpruned/weights/ssd_resnet${LAYERS}_epoch_$EPOCH.tlt \
           -o $`YOUR PATH`/build/experiment_dir_pruned/ssd_resnet${LAYERS}_pruned.tlt \
           -eq intersection \
           -pth 0.1 \
           -k $KEY

After pruning the accuracy tends to decrease, so you need to retrain your pruned model to achieve maximum accuracy.

By simply updating the values of the parameters in ssd_retrain same as ssd_train and running the below command you can re-train your pruned model.

STEP5: Retrain Pruned Model

# Retraining using the pruned model as pretrained weights 
!tlt-train ssd --gpus 1 \
               -e $`YOUR PATH`/spec/ssd_retrain_resnet_kitti.txt \
               -r $`YOUR PATH`/build/experiment_dir_retrain \
               -m $`YOUR PATH`/build/experiment_dir_pruned/ssd_resnet${LAYERS}_pruned.tlt \
               -k $KEY

!ls $`YOUR PATH`/build/experiment_dir_retrain/weights

# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $`YOUR PATH`/build/experiment_dir_retrain/ssd_training_log_resnet${LAYERS}.csv
%set_env no_epoch = 100

This command is simply run for the developer can visually see how much accuracy their model has reached.

STEP 6: Evaluate Model

!tlt-evaluate ssd -e $`YOUR PATH`/spec/ssd_retrain_resnet_kitti.txt \
                  -m $`YOUR PATH`/build/experiment_dir_retrain/weights/ssd_resnet${LAYERS}_epoch_$EPOCH.tlt \
                  -k $KEY

Running the below command one can test sample images that have to be predicted.

STEP 7: Sample Inferences

!ls $`YOUR PATH`/test_samples 
!cd $`YOUR PATH`/test_samples 

To load the image, we simply import the image module from the pillow and call the Image. open(), passing the image filename.

!pip install Pillow

The ls command is a command-line utility which is used inorder to list o a directory or directories provided to it via standard input.

Here the models are trained with visual perception which is achieved by applying the technology called computer vision, which can only visualize the objects through images or videos.The particular object of interest in the image is annotated or labeled with a technique. Once the images are labeled they are utilized as training data. The labeled images are used to train the algorithms used in training specific model of our interest.

By simply running the below command the test images get labeled.

  !ls $`YOUR PATH`/ssd_infer_labels 
!cat $`YOUR PATH`/ssd_infer_labels/100_01.txt

# Running inference for detection on n images
!tlt-infer ssd -i $`YOUR PATH`/test_samples \
               -o $`YOUR PATH`/ssd_infer_images \
               -e $`YOUR PATH`/spec/ssd_retrain_resnet_kitti.txt \
               -m $`YOUR PATH`/build/experiment_dir_retrain/weights/ssd_resnet${no_layers}_epoch_$EPOCH.tlt \
               -l $USER_EXPERIMENT_DIR/ssd_infer_labels \
               -k $KEY

For better understanding and spotting the better prediction bounding boxes are drawn.

By simply running the below commands, grids with specified width, length are drawn.

# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']
print(os.environ['YOUR PATH'])
def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ[`YOUR PATH'], image_dir)
    print("output_path ==>", output_path)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 
# Visualizing the sample images.
OUTPUT_PATH = 'ssd_infer_images' # relative path from $`YOUR PATH`.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

Now finally, after training if you feel satisfied with the results, ie, accuracy with which the prediction happens, then one can export the trained model running the below command.

STEP 8: Export Trained Model

# tlt-export will fail if .etlt already exists. So we clear the export folder before tlt-export
!rm -rf $`YOUR PATH`/export
!mkdir -p $`YOUR PATH`/export
# Export in FP32 mode. Change --data_type to fp16 for FP16 mode
!tlt-export ssd -m $`YOUR PATH`/build/experiment_dir_retrain/weights/ssd_resnet${no_layers}_epoch_$EPOCH.tlt \
                -k $KEY \
                -o $`YOUR PATH`/export/ssd_resnet${LAYERS}_epoch_$EPOCH.etlt \
                -e $`YOUR PATH`/spec/ssd_retrain_resnet_kitti.txt \
                --batch_size 16 \
                --data_type fp16

TensorRT (TRT ) is mainly designed to perform various necessary transformations and optimizations to the desired neural network graph. Firstly, layers which consist of unused outputs are eliminated inoder to avoid unnecessary computation. Next, wherever possible, the convolution, bias, and ReLU layers are combined to form a single layer.

These are reasons why one needs to convert the final model to TRT Engine.

By simply running the below command, one can their desired model converted to TRT Engine.

STEP 9: Convert TO TRT Engine

# Convert to TensorRT engine (FP16)
!tlt-converter -k $KEY \
               -d 3,224,224 \
               -o NMS \
               -e $`YOUR PATH`/export/trt.engine \
               -m 16 \
               -t fp16 \
               -i nchw \
               $`YOUR PATH`/export/ssd_resnet${LAYERS}_epoch_$EPOCH.etlt

Finally, by running the below command one can verify and deploy theirs for real-time application.

STEP 10: Verify & Deploy Model

  !tlt-infer ssd -m $`YOUR PATH`/export/trt.engine \
               -e $`YOUR PATH`/spec/ssd_retrain_resnet_kitti.txt \
               -i $USER_EXPERIMENT_DIR/test_samples \
               -o $`YOUR PATH`/ssd_infer_images \
               -t 0.4
!rm -rf ../deploy && mkdir ../deploy
!cp export/trt.engine ../deploy && cp spec/ssd_retrain_resnet_kitti.txt ../deploy && cp -a test_samples ../deploy

!echo "#!/bin/bash" "../deploy/predict"
!echo "rm -rf ssd_infer_*;tlt-infer ssd -m trt.engine -e ssd_retrain_resnet_kitti.txt -i test_samples -o ssd_infer_images -l ssd_infer_labels -t 0.4" "../deploy/predict"


This blog explains why one should choose Transfer Learning Toolkit, to begin with, their AI application. After running all the above sequence of commands one can use their model for real world problems, making this world a better place to live!

Contact Us

Just leave your email and our support team will help you