/ #ec2 #aws 

The Modern Rules Of Artificial Intelligence


Artificial Intelligent is the new electricity!

AI – Mukesh

Artificial Intelligent(AI) is found to be the next big hot topic in the human world. From start-ups to large firms are now ready to make million-dollar investments in the field of AI to grow their businesses. Many of us consider AI to be some Sci-fi movie, but AI machines can do some pretty cool stuff and also make some impossible stuff possible, few of which humankind will never be able to. In today’s fast forward world, technology is not only about creating hardware and software tools for real-world purposes, it is much more advanced. Do you want to know what’s the best example of that, it’s AI, the new change. Self-driving cars, something which is already there on the road in many developed countries, to kids playing online chess championship, AI has already proved that it can outperform human minds with its high tech new and real-time tools.

AI is the science and engineering of making intelligent machines.
-John McCarthy, Father of AI

Don’t know to code, but want to explore AI? We got you covered !

  • Ever wondered which is the world’s best machine? The human brain is the most intelligent machine that ever existed and exists.
  • The amount of time the human brain takes to process any piece of information is beyond imagination. From a layman’s point of view, the main goal of AI is to create a system that will be functioning in an intelligent and independent manner making human life easier.
  • The computer scientist thought of interpreting the working pattern of the human brain, and that became the beginning of AI. To understand things better let’s look into an example, when we meet someone for the first time, we don’t know them, they introduced themselves with their name, and we remember that, as our brain sees the person and makes a note of the features of that particular person, maybe say their height, complexion, etc. So the next time when the human brain encounters those parameters (features of the individual), it will remember the label (name of the person) it had given the last time it came across the similar features. This is simply how the AI works at the ground level.


  • What’s transfer learning?

Transfer learning is simple a technique where a model is developed for one task and in the future, it is used as a start point to a different task. So, here the knowledge that is gained by a neural network from a particular task is applied to a different task. It is always considered better to starting something, with prior knowledge or experience, instead of beginning from scratch. Say for example if we are training a neural network model to recognize a bottle, in the future the same knowledge could be applied in showing the customers of amazon, the bottle of their choice!

  • Already learned to ride a motorbike ⮫ Now it’s time to practice riding a car
  • Already learned to play classic piano ⮫ Now it’s time to practice jazz piano
  • Already Know the math and statistics ⮫ Wow, it’s time to learn machine learning

The TLT is a Python-based AI toolkit for creating highly optimized and accurate AI-related applications using transfer learning commands and pre-trained models.

Many people think that AI is something that cannot be accessed by anyone and everyone. The TLT has broken this myth, and it has made it clear that through TLT ** AI can be used by everyone!** young researchers, data scientists, student developers, system developers, engineers in different fields of study. Tradinal ML VS Transfer Learning

  • Why should one consider transfer learning as an option in developing an AI model?

Machines are very good at pattern recognition because they use surplus data and multiple numbers of dimensions. They can see things in multiple dimensions than humans, and they are expected to learn from these patterns. Once the process of learning is fully completed, they are expected to do the predictions. TLT is one of many fields that makes it possible for computers to learn without being explicitly programmed.

  • Understanding better

In the image shown below, the first model is trained to predict if the image is a cat or not. So, in the future, if we have a project in which we will have to predict the color of the cat, the previously trained model which predicts if the given image is a cat, is used as a part of the model. Instead of starting from scratch, this saves a lot of time.


The process of training the data of a previously used model is called pre-training and the process of training the model for our actual project is called fine-tuning. So, here we have taken the knowledge obtained in pretraining, ie, training the model to predict if it’s a cat, and then applied that knowledge in fine-tuning, ie., training the model to predict the color of the model. Another example that could give us a better idea is that when we train a model for speech recognition the knowledge obtained through this training can be used to train the model where we want to start a voice assistant system. For instance, nowadays a lot of households have Siri or Alexa. The model trained for speech recognition is used to train the model which is used up wake up Siri when the customer says, hello Siri, play this music for me

  • why should one use the transfer learning toolkit?

Transfer learning becomes an ideal option when there is a lot of data in the task you are transferring the knowledge from and less data for the problem you are transferring to. In order to build an accurate AI application, one needs a large amount of dataset, especially if you create from scratch. Getting and preparing a large dataset and assigning a label to all the images is costly, time-consuming, and requires extreme domain expertise. Transfer learning saves the user a lot of time and the accuracy level is high. To make the faster and accurate application possible through AI training, NVIDIA has released highly accurate, purpose-built, pre-trained models with the NVIDIA Transfer Learning Toolkit (TLT) 2.0.

Things you should know before getting started with TLT

How to implement?

     1. Download the required TLT Docker container and their AI models from NVIDIA NGC for your specific application 
     2. NGC has both pruned and unpruned model 
     3. Train and validate with your personalized dataset 
     4. Export the trained model 
     5. Deploy on the edge using [NVIDIA Deep Stream SDK](https://developer.nvidia.com/deepstream-sdk) and [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt). 

Implementation Example

Main Objective

  • Here we will be building a model that will successfully recognize and categorize the images into either a cat or dog. The dataset we will be using is taken from the popular example of AI, which is Dogs vs. Cats Challenge.


Four main sections are

  • Training the model
  • Preprocessing
  • Pruning the model
  • Export and deploy the model

STEP 1 : Log in to NVIDIA NGC

STEP 2 : Generate Key

  • Go to the NGC
  • Go to TLT Container in the catalog section
  • Get yourself signed in and enjoy your access to the PULL feature in the specified repository.

STEP 3 : Create a TLT experiment folder on your PC

STEP 4 : Set up your Docker environment

  • Run the docker login nvcr.io and enter the username and password set by you.
  • Run docker pull nvcr.io/nvidia/tlt-streamanalytics: API KEY is the key set by you.
  • Run the docker pull nvcr.io/nvidia/tlt-streamanalytics:

STEP 5 : Invoke the jupiter notebook

  • Run the jupiter notebook --ip --allow-root to get started with Jupiter notebook.

STEP 6 : Downloading models

  • Run this command to download the models you have selected from NGC model registry ngc registry model download-version <ORG/model_name:version> -d <path_to_download_dir>

STEP 7 : Set Environment

  • Here we just try to set an environment, like simply telling our system that these are variables with which we will be dealing with. %set_env is utilized to define the value of the environment variables used in the project. If setenv is not given any arguments, it displays all environment variables and their respective values. If only VAR is specified, it sets an environment variable of that name to an empty (null) value.

  • Run the below code snippet, with your respective directory, in-order to set the required environment variables.

When you run the below command the generated key gets loaded and then mounts all your folders

%set_env KEY= < INSERT YOUR KEY >
%set_env < YOUR PATH > = /workspace/tlt-experiments/CAT_DOG_PREDICTION/cat_dog_prediction/aitraining/tlt
!ls < your path >
#LAYERS = 19/178/121
%set_env no_layer = 12
%set_env no_epoch = 50

STEP 8 : KITTI dataset

Download the KITTI dataset and add it to the data folder, which should be created inside the TLT experiment folder in your PC. Once the images are downloaded and unzipped, you will find two folders namely training and testing. In the training folder, you will find the images and the labels and in the testing folder, you will find the testing images.


  • For object detection application in TLT expects the data to be in KITTI file format in order to train and evaluate. For DetectNet_v2, SSD, DSSD, YOLOv3, and FasterRCNN, ResNet the obtained data is converted to the TFRecords format for training. TFRecords help iterate faster through the data. Simply the below command will do that for you
!cat <YOUR PATH> /spec/convertkittitotfrecords.txt

The sample output is given below, the parameters explanations are as follows

kitti_config {
  root_directory_path: "/workspace/tlt-experiments/CAT_DOG_PREDICTION/cat_dog_prediction/aitraining/tlt/kitti"
  image_dir_name: "images" 
  label_dir_name: "labels" 
  image_extension: ".jpg"
  partition_mode: "random"
  num_partitions: 2
  val_split: 14
  num_shards: 10
iImage_directory_path: "/workspace/tlt-experiments/CAT_DOG_PREDICTION/cat_dog_prediction/aitraining/tlt/kitti"
  • image_dir_name: “images” - The place where the images are stored
  • label_dir_name: “labels” - The place where the labels of the images are stored
  • image_extension: “.jpg” - The extension you would like to use for your image set, like .png, .jpg or .jpeg.
  • partition_mode: “random”
    • When there are too many images they are randomly split into N-folds sequentially. There are two methods to split. one is random and the other one is where the partitioning is carried out sequentially.
    • In the random method the division of data set is carried and is specified as train and veil. For this, we need to set the val_split value.
    • In sequence wise partition method, the data set is divided according to the available number of sequences. Here the num_partition value is to be specified.
  • root directory path - Path to the data set in the root directory -image_dir_name - Specific path to the location containing the images from the root_directory_path.
  • label_dir_name - Specific path to the location containing the labels of the images from the root_directory_path.
  • val_split - Specifies the percent of data that is to be separated for the validation purpose. -num_shards - Specifies the no.of.shards per fold.
!rm -rf tfrecords
-d < dataset_export-specification>
-o <output file name>
  • By running the below code you simply create the required images in TFRecord format and is saved in respective space.
  • -d – Here paste the path to the detection dataset specification file for exporting tfrecords.

STEP 10 : Train Model

  • In the ssd_train file, the arguments are given. For example, the parameters like color, body size, the color of the eyes, the shape of the cat, and the dog will be given. Every time the commands are run, your model is trained for predicting better.
  • The more is the Learning rate lesser is going to be the training time.
  • The num_epochs can be increased gradually for better accuracy. After a particular number of epoch, the accuracy level gets saturated, ie, that will be the maximum possible accuracy level.
  • You simply change the parameters according to your necessity and run the below command to train your desired model.
!cat $`YOUR PATH`/spec/ssd_train_resnet_kitti.txt

The sample output is given below and you can change any parameter in the particular file destination.

  • A spec file is necessary for training and evaluation in the case of SSD.
random_seed: 42
ssd_config {
  aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 1.0/3.0]"
  scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
  two_boxes_for_ar1: true
  clip_boxes: false
  loss_loc_weight: 0.8
  focal_loss_alpha: 0.25
  focal_loss_gamma: 2.0
  variances: "[0.1, 0.1, 0.2, 0.2]"
  arch: "resnet"
  nlayers: 18
  freeze_bn: false
  freeze_blocks: 0
  • aspect_ratios_global is an 1-D array which is created for each layer which will utilished for prediction.

  • Example: “[1.0, 2.0, 0.5, 3.0, 1.0/3.0]”

  • two_boxes_for_ar1: true - This parameter comes into the picture when the aspect ratio is 1. If it is true, then two boxes are expected to be created. Out of which one will be a scale for this layer and the other one for the next layer.

  • loss_loc_weight: 0.8 - This determines how much should of the location regression loss should be contributing to the final loss.

  • Final loss - classification_loss _ loss_loc_weight*loc_loss.

  • arch: “resnet” - A string specifying which feature extraction one should use. resnet10, resent18 are supported.

  • freeze_bn: false - This specifies whether to freeze all of the batch normalization layers during the training process.

  • freeze_blocks: 0 - The weights of the layers in your model in those specific blocks will be freezes during training.

training_config {
  batch_size_per_gpu: 16 // batch size per cpu
  num_epochs:100 // no.of epochs for training purpose
  enable_qat: false
//Only soft_start_annealing_schedule with these nested parameters is supported.
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 9e-9
    max_learning_rate: 4e-4
    soft_start: 0.15
    annealing: 0.6
  regularizer {
    type: L1
    weight: 3e-5
  • min_learning_rate: minimum learning rate that is allowed during the entire process.
  • max_learning_rate: maximum learning rate that is allowed during the entire process.
  • soft start: Time to have lapsed before warm up ( expressed in percentage of progress between 0 and 1)
  • annealing: Time to start annealing the learning rate
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  batch_size: 16
  matching_iou_threshold: 0.5
  • validation_period_during_training - validation time period with respect to each epoch is set.

  • average_precision_mode - Average Precision (AP) calculation mode will be either SAMPLE or INTEGRATE. SAMPLE is used as VOC metrics for VOC 2009 or before. INTEGRATE is used for VOC 2010 or after that.

  • matching_iou_threshold - The least iou of the anticipated box and ground truth box that can be that can be taken a match.

    nms_config { confidence_threshold: 0.01 clustering_iou_threshold: 0.6 top_k: 200 }

  • confidence_threshold - Boxes that appear to have a lesser confidence score than the confidence_threshold are discarded before applying NMS.

  • clustering_iou_threshold - Below the IOU threshold, the boxes will go through NMS process

  • top_k - The output of the top_k is received after NMS keras layer. If k is greater than the no.of.valid boxes, then 0 will be the confidence score.

augmentation_config {
  preprocessing {
    output_image_width: 178 //width of the augumentation output
    output_image_height: 178 // height of the augumentation output
    output_image_channel: 4 // channel depth of the augmentation. 
    crop_right: 178
    crop_bottom: 178
    min_bbox_width: 1.0 // during the training, minimum width considered
    min_bbox_height: 1.0 // during the training, minimum width considered
  spatial_augmentation {
    hflip_probability: 0.5 // probability to flip an input image horizontally 
    vflip_probability: 0.5// probability to flip an input image vertically
    zoom_min: 0.98 // the minimum zoom scale that can be aaplied to the input image 
    zoom_max: 3 // the maximum zoom scale that can be aaplied to the input image 
    translate_max_x: 7.0 // max translation across x axis
    translate_max_y: 7.0 // max translation across x axis
    rotate_rad_max: 1.38

  color_augmentation {
    color_shift_stddev: 0.5// the standard deviation value for colour shift 
    hue_rotation_max: 180.0 // the maximum rotation angle for the hue rotation matrix 
    saturation_shift_max: 0.200000003 // the maximum shift that chages the saturation specification 
    contrast_scale_max: 0.1000000015 // a center is given and the slope of contrast is rotated around. 
    contrast_center: 0.5 // the center around which the contrast is rotated. 
dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tlt-experiments/CAT_DOG_PREDICTION/cat_dog_prediction/aitraining/tlt/tfrecords/*"
    image_directory_path: "/workspace/tlt-experiments/CAT_DOG_PREDICTION/cat_dog_prediction/aitraining/tlt/kitti"
  image_extension: "jpg"
  target_class_mapping {
    key: "CAT"
    value: "CAT"
  target_class_mapping {
    key: "DOG"
    value: "DOG"
  • OpenCV-Python is a library of Python bindings specially designed to solve problems in computer vision. cv2.imread() is a method that loads an image from the given specified file. If the image cannot be read because of reasons like, if the file is missing, improper permissions, due to unsupported or invalid format, then this particular method returns a matrix which will be empty.

  • The image should be in the working directory or a full path of the image should be given.

import cv2
im = cv2.imread('/workspace/tlt-experiments/CAT_DOG_PREDICTION/cat_dog_prediction/aitraining/tlt/kitti/images/20200623_132251.jpg')

The sample output is given below

(150, 150, 3)
  • In ssd_train if num_epoch is set to 100, then EPOCH should be set to 100 in the below command.

Running the below code you can find the loss rate and the accuracy after every epoch.

%set_env no_epoch = 100

!tlt-train ssd -e <experiment specification>
    -r <result directory>
    -k <encoding key>
    -m <pre-trained model>
    --gpus <no.of.GPU's>

Arguments explanation:

  • -e – experiment specification : Here you paste the path to the file that consists of the specification folder in-order to get started with the process of evaluation.\

  • -r – result directory : Here you paste the path of the location where you want the output to be saved.

  • -k – key : Here you paste the key you generated in-order to load or save your .tlt model.

  • -m – pre-trained model : Here you paste the directory to the location of the pre-trained model.

  • -gpus – no.of.GPU’S : Here you specify the no.of.GPU’s you would want to use and to begin with the training. 1 is regarded as the default value.

  • In ssd_train if num_epoch is set to 100, then EPOCH should be set to 100 in the below command.

STEP 12 : Prune Model

  • Pruning aims at reducing the number of parameters, for application-focused to TLT, making the model that is many times faster.

  • You simply run the below command to remove unnecessary layers in your neural network model for better reducing the size and getting rid of the redundant layer which will not affect the overall performance

  • The unpruned models are found to be big. A model is created according to the implemented algorithm and if pruning is applied, a step along the process looks at those nodes which do not affect the overall performance at a large scale. Those particular nodes are then removed.

  • Pruning is carried out in-order to understand better and to minimize the risk of overfitting. That is, for the model to be classify the training data perfectly.

!tlt-prune -m <model file>
           -o <output path>
           -k<your key>

STEP 13 : Retrain Pruned Model

  • After pruning the accuracy tends to decrease, so you need to retrain your pruned model to achieve maximum accuracy.

  • By simply updating the values of the parameters in ssd_retrain same as ssd_train and running the below command you can re-train your pruned model.

# Retraining using the pruned model as pretrained weights 
!tlt-train ssd --gpus 1 \
               -e <experiment_spec_file>
               -r <>
               -m <model file>
               -k < YOUR KEY >
!ls $`YOUR PATH`/build/experiment_dir_retrain/weights

# Now check the evaluation stats in the csv file and pick the model with the highest eval accuracy.
!cat $`YOUR PATH`/build/experiment_dir_retrain/ssd_training_log_resnet${LAYERS}.csv
%set_env no_epoch = 100

STEP 14 : Evaluate Model

  • This command is simply run for the developer can visually see how much accuracy their model has reached.
!tlt-evaluate ssd -e<experiment_spec_file> 
                  -m <model file>
                  -k <YOUR KEY >

STEP 15 : Sample Inferences

  • Running the below command one can test sample images that have to be predicted.
!ls $`YOUR PATH`/test_samples 
!cd $`YOUR PATH`/test_samples 
  • To load the image, we simply import the image module from the pillow and call the Image. open(), passing the image filename.
!pip install Pillow
  • The ls command is a command-line utility which is used in-order to list o a directory or directories provided to it via standard input.

  • Here the models are trained with the visual perception which is achieved by applying the technology called computer vision, which can only visualize the objects through images or videos.The particular object of interest in the image is annotated or labeled with a technique. Once the images are labeled they are utilized as training data. The labeled images are used to train the algorithms used in training specific models of our interest.

  • By simply running the below command the test images get labeled.

!ls $`YOUR PATH`/ssd_infer_labels 
!cat $`YOUR PATH`/ssd_infer_labels/CAT.txt

# Running inference for detection on n images
!tlt-infer ssd -i <image>
               -o <output_file>
               -e <experiment_spec_file>
               -m <model_file>
               -k <YOUR KEY >
  • For better understanding and spotting the better prediction bounding boxes are drawn.
  • By simply running the below commands, grids with specified width, length are drawn.
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']
print(os.environ['YOUR PATH'])
def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ[`YOUR PATH'], image_dir)
    print("output_path ==>", output_path)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 
# Visualizing the sample images.
OUTPUT_PATH = 'ssd_infer_images' # relative path from $`YOUR PATH`.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)



STEP 16: Export Trained Model

  • Now finally, after training, if you feel satisfied with the results, ie, accuracy with which the prediction happens, then one can export the trained model running the below command.
# tlt-export will fail if .etlt already exists. So we clear the export folder before tlt-export
!rm -rf $`YOUR PATH`/export
!mkdir -p $`YOUR PATH`/export
# Export in FP32 mode. Change --data_type to fp16 for FP16 mode
!tlt-export ssd -m <PATH TO MODEL FILE>
                -k <YOUR KEY>
                -o < PATH TO OUTPUT_FILE>
                -e <EXPERIMENT_SPEC_FILE>
                --batch_size 16 \
                --data_type fp16

STEP 17 : Convert to TRT Engine

  • TensorRT (TRT ) is mainly designed to perform various necessary transformations and optimizations to the desired neural network graph. Firstly, layers which consist of unused outputs are eliminated in order to avoid unnecessary computation. Next, wherever possible, the convolution, bias, and ReLU layers are combined to form a single layer.

  • These are reasons why one needs to convert the final model to TRT Engine.

  • By simply running the below command, one can their desired model converted to TRT Engine.

# Convert to TensorRT engine (FP16)
!tlt-converter -k <YOUR KEY>
               -d 3,224,224 \
               -o NMS \
               -e <EXPERIMENT_SPEC_FILE>
               -i <IMAGE>

STEP 18 : Verify & Deploy Model

  • Finally, by running the below command one can verify and deploy theirs for real-time application.
!tlt-infer ssd -m <MODEL FILE>
               -e <EXPERIMENT_SPEC_FILE>
               -i <IMAGE>
               -o <OUTPUT_FILE >

!rm -rf ../deploy && mkdir ../deploy

!cp export/trt.engine ../deploy && cp spec/ssd_retrain_resnet_kitti.txt ../deploy && cp -a test_samples ../deploy

!echo "#!/bin/bash" <<EOT >> "../deploy/predict"
!echo "rm -rf ssd_infer_*;tlt-infer ssd -m trt.engine -e ssd_retrain_resnet_kitti.txt -i test_samples -o ssd_infer_images -l ssd_infer_labels -t 0.4" <<EOT >> "../deploy/predict"


This blog explains why one should choose Transfer Learning Toolkit, to begin with, their AI application. After running all the above sequence of commands one can use their model for real-world problems, making this world a better place to live!