Introduction and Analysis of Artificial Intelligence, Machine Learning and Deep Learning

For many integrators and system builders, they often encounter different challenges, such as the algorithm to automatically realize complex decision-making is difficult to write and time-consuming.

Facing this challenge, FLIR's advice to you is to make full use of deep learning by combining open source library, NVIDIA hardware and FLIR camera. This paper describes in detail how to collect image data through FLIR machine vision camera, train neural network and deploy it on embedded system.

What is deep learning?

Deep learning is a form of machine learning. The neural network has many "deep" layers between input nodes and output nodes. By training the network based on large data sets, the created model can be used for accurate prediction according to the input data. In the neural network for deep learning, the output of each layer will feed forward to the input of the next layer. Repeatedly optimize the model by changing the weighting of the connections between layers. In each cycle, feedback on the prediction accuracy of the model will be used to guide the change of connection weighting.

Figure 1: neural network with "depth" hidden layer between input and output

Figure 2: changes in relative input weights

Deep learning is changing the traffic system through the process of automatic processing, which is too complex for traditional visual applications. Thanks to the easy-to-use framework, affordable and accelerated graphics processing unit (GPU) hardware and cloud computing platform, deep learning can be easily used by all.

Why is deep learning popular now?

GPU acceleration hardware: higher efficiency and lower cost

The GPU architecture (called "massively parallel" Architecture) which uses a large number of processors to perform a set of coordinated computing in parallel is very suitable for deep learning systems. Through NVIDIA's continuous research and development, the efficacy and efficiency of GPU accelerated computing platform have been greatly improved and the cost has been greatly reduced. This technology is widely used, from compact embedded systems based on Jetson TX1 and TX2 to PC level GPUs such as GTX 1080, and then to AI special platforms such as NVIDIA dgx-1 and drive PX 2.

Figure 3: NVIDIA TX1 (left), TX2 (middle) and drive PX2 (right) hardware perform well in accelerating the deep learning process due to their parallel computing architecture

Popularize the deep learning framework

In addition to developing an easy-to-use framework, a large number of tutorials and online courses are provided to help people use deep learning. Users can quickly build and train their own deep neural network (DNN) through the tensorflow of Google and the C wrapper of open source c affe, torch and theano. It is best to start with the general-purpose tensorflow, and Caffe's GPU optimization makes it very suitable for deployment on Jetson TX1 and TX2. NVIDIA CUDA deep neural network (cudnn) library provides developers with highly optimized implementation of common deep learning functions, further simplifying the development of these platforms.

Deep learning for transportation system

Multiple applications

Although the media mainly focuses on the development of driverless vehicles, there are many other transportation applications for in-depth learning. Deep learning is applicable to small-scale systems to solve the problems of detecting pedestrians and emergency vehicles controlled by traffic signals, parking lot management, forced frequent Lane occupation, high-precision vehicle and license plate recognition, etc. It is also suitable for large-scale systems to solve the problems of intercity traffic flow dredging.

Continuously train the deep learning system to cope with changing conditions. Here is committed to deploying the map system supporting deep learning into driverless vehicles. This technology will generate continuously updated maps with a resolution of 10-20 cm. Through in-depth learning, here's map will include the precise location of fixed objects (such as signs) and temporary driving hazards (such as construction works).

Lower price and shorter preparation time

By providing decentralized off the shelf cameras and embedded platforms, transportation system designers can flexibly adjust the system to suit their projects. Separate cameras and processing hardware make the upgrade path of each component simple and unconstrained. Compared with the special smart camera, the price of this ecosystem is lower and the preparation time is shorter.

How to implement the system

Training data acquisition

Designers must train the deep learning model before deployment. High quality training data is needed to achieve accurate results. High performance cameras provide the best possible training images for systems that make decisions based on visual input.

Image processing within the camera range can simplify the data standardization required before training. Camera functions such as precise control of automatic algorithms, sharpening, pixel format conversion, lens shading correction, and FLIR's advanced preset color conversion and color correction matrix can optimize the image. FLIR strictly controls the quality during manufacturing to minimize the impact of camera performance, thus reducing the need for standardization before training.

For the application of capturing moving vehicle images, the global shutter sensor can read all pixels at the same time to prevent distortion caused by the movement of objects in the reading process. Many of FLIR's machine vision cameras use Sony pregius global shutter CMOS sensors. With a dynamic range of 72db and a reading noise of less than 3e-, they can capture the details of bright and shaded areas at the same time, and provide excellent low light photography performance.

Low light applications such as indoor parking lot management can take advantage of the pixel structure of back illuminated (BSI) Sony Exmor R and starvis sensors. These devices pursue greater quantum efficiency rather than readout speed, so as to have small and economical sensors and better low light performance.

Training on dedicated hardware

After collecting enough training data, you can train your model. To speed up this process, you can use a PC with one or more CUDA enabled GPUs or dedicated AI training hardware such as NVIDIA dgx-1. It also provides a computing platform dedicated to deep learning.

Relative performance (based on training time)

NVIDIA dgx-1 trains 75 times faster

The CPU is a dual socket Intel Xeon e5-2697 v3170 TF that supports half precision or fp16

Deploy to embedded system

After completing the training of deep learning model, it can be deployed to related fields. The powerful compact GPU accelerated embedded platform can prevent the use of applications with space and power requirements on traditional PCs, and force the use of limited Internet connection during edge computing. These systems are based on ARM processor architecture and usually run on Linux based operating systems. For information on how to use FLIR's flycapture SDK on arm devices in Linux environment, you can find the link in the original reading.

Many traffic applications rely on systems with multiple cameras. Through FLIR machine vision camera, system designers can flexibly trigger multiple cameras accurately through GPIO or software. IEEE 1588 precise time protocol (PTP) synchronizes the camera clock with the universal time base or GPS time signal without user supervision. The MTBF of multi camera system decreases with each additional camera, and highly reliable cameras play a key role in building a robust system. Designing and testing FLIR machine vision cameras ensures 24 / 7 reliability, minimizing downtime and maintenance.

NVIDIA drive PX2 with FLIR grasshopper3 usb3 camera

NVIDIA drive PX 2 is an open automotive AI platform with two built-in Pascal GPU cores. With eight tflops, the drive PX2 has the computing power of 150 MacBook Pro. Drive PX2 is designed to support deep learning applications and achieve driverless vehicle orientation. In addition to the first generation USB 3.1 and GigE vision cameras, it also invested in cameras using the automotive gmsl camera interface.

