technical report computer vision techniques and its application for solving vietnams ransportation problems

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (551.55 KB, 11 trang )

<span class="text_page_counter">Trang 1</span><div class="page_container" data-page="1">

<b>HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY SCHOOL OF ELECTRONIC AND ELECTRICAL ENGINEERING </b>

<b>TECHNICAL REPORT </b>

<b>Topic: </b>

<b>COMPUTER VISION TECHNIQUES AND ITS APPLICATION FOR SOLVING VIETNAM’S TRANSPORTATION PROBLEMS </b>

<b>Instructor: </b> Dr Nguy n Ti n Hòa ễ ế

<b>Student: Student ID: </b>

Thân Đức Trí 20203891

<b>Course: </b> Technical writing & Presentation

<b>Major: </b> Smart embedded systems and IoT

<b>HANOI, 2022 December </b>

</div><span class="text_page_counter">Trang 2</span><div class="page_container" data-page="2">

2.1 The object detection model (Example Ambulance) ... 3

2.2 The instance segmentation model ... 6

2.3 Analysis of use cases ... 9

<b>Chapter 3. CONCLUSION ... 10 </b>

<b>REFERRENCE ... 11 </b>

</div><span class="text_page_counter">Trang 3</span><div class="page_container" data-page="3">

<b>LIST OF ACRONYMS </b>

</div><span class="text_page_counter">Trang 4</span><div class="page_container" data-page="4">

<b>LISTOFFIGURES</b>

Figure 1 Representation of Faster RCNN ... 3

Figure 2(a) Training loss vs number of iterations ... 4

Figure 3 Recognition of ambulance in traffic congestion ... 5

Figure 4 Representation of Faster RCNN ... 6

Figure 5 Specified configurations ... 7

Figure 6 Identification of ambulance in traffic ... 8

</div><span class="text_page_counter">Trang 5</span><div class="page_container" data-page="5">

iii

<b>LISTOFTABLES </b>

Table 2 Tabular representation of accuracy ... 8

</div><span class="text_page_counter">Trang 6</span><div class="page_container" data-page="6">

<b>ABSTRACT </b>

Computer vision technology has significantly impacted the field of intelligent transportation systems. Its applications range from traffic monitoring systems to self-driving cars and often involve basic or advanced image or video analytics. This report aims to present the use of object detection and instance segmentation for emergency vehicle detection, which is essential for any intelligent transportation system (Vietnam is also included). Specifically, this detection can be integrated into autonomous vehicles and traffic signal controllers to prioritize emergency vehicles. The implemented architectures, Faster RCNN for object detection and Mask RCNN for instance segmentation, are evaluated in terms of accuracy and suitability for detecting emergency vehicles in chaotic traffic conditions. Additionally, the pros and cons of using object detection versus instance segmentation for emergency vehicle detection are compared.

</div><span class="text_page_counter">Trang 7</span><div class="page_container" data-page="7">

Computer vision is a field of artificial intelligence that involves the development of algorithms and systems that can interpret visual data from the world around us. This includes the ability to recognize and classify objects, understand the relationships between objects, and interpret the context and meaning of the scene being observed.

One area where computer vision techniques have been applied is in solving traffic problems. For example, traffic cameras equipped with computer vision algorithms can be used to detect and classify vehicles, monitor traffic flow, and identify potential hazards or incidents. This information can be used to improve traffic management and safety, as well as to optimize the use of transportation infrastructure. Other applications of computer vision in traffic include the development of autonomous vehicles, which rely on computer vision to navigate roads and avoid collisions, and the use of computer vision to analyze traffic patterns and optimize the routing of vehicles.

Traffic in Vietnam is often congested, making it difficult for ambulances to quickly reach their destinations. This is especially problematic as there are many motorbikes and cars on the roads. In order to find a solution to this issue, my friends and I have been researching various scientific approaches, including the use of computer vision.

</div><span class="text_page_counter">Trang 8</span><div class="page_container" data-page="8">

<b>CHAPTER 2.BODY </b>

The two computer vision techniques used for emergency vehicle recognition are object detection and instance segmentation. For object recognition and instance segmentation, specially built (CNNs [1]) termed Faster RCNN and Mask RCNN are used.

These CNNs were first trained, iteration after iteration, to distinguish aspects of emergency vehicles in photos. This was followed by extensive testing of the trained models for detection accuracy on an alternate unseen dataset. The outcomes were classified as genuine positives, false positives, true negatives, and false negatives.

<b>2.1 </b>

<b> HE OBJECT DETECTION MODEL </b>

<b>T(E</b>

<b>XAMPLE </b>

<b>A</b>

<b>MBULANCE</b>

<b>) </b>

Object detection involves identifying and locating a particular object in an image by creating a bounding box around it. This is typically accomplished using convolutional neural networks that have been pre-trained on large datasets for image classification, such as Resnet, Visual Geometry Group Net, and Inception Net. These networks are modified to be fully convolutional and able to handle inputs of various dimensions, and are then combined with object detection networks such as Faster RCNN, Single Shot Detectors, and Region based Fully Convolution Networks. An example of this is shown in Figure 1 Representation of Faster RCNN which depicts a Faster RCNN [2] with a Visual Geometry Group Net base network.

Transfer learning, which involves using pre-trained networks to minimize the number of computations and images needed for a custom dataset, is commonly employed in object detection.

<small>Figure 1 Representation of Faster RCNN </small>

</div><span class="text_page_counter">Trang 9</span><div class="page_container" data-page="9">

4 Object detection is the process of locating a specific object in an image by constructing a bounding box around the object. The backbone for object detection is traditional convolutional neural networks that do picture classification (Resnet, VGG Net, Inception Net, and so on). The transfer learning concept is used, which means that these base networks are pre-trained on big pre-existing datasets to decrease the amount of pictures. This ensures that traffic congestion, disorderly movement, and non-homogeneity in the images are avoided. The model took into account the forms and sizes of the emergency vehicles. The object detecting model was developed.

TensorFlow [3] deep learning platform was used for 10200 iterations to reduce training loss values. The subsequent training and validation loss values achieved were 0.0086 and 0.0029 (Figure 2 ).

<small>Figure 2(a) Training loss vs number of iterations</small>

</div><span class="text_page_counter">Trang 10</span><div class="page_container" data-page="10">

<small>Figure 2(b) Validation loss vs number of iterations </small>Results:

The object detection algorithm recognizes the ambulance even when it is amidst a traffic congestion (Figure 3). The output is in the form of a bounding box with detection accuracy in percentage.

<small>Figure 3 Recognition of ambulance in traffic congestion</small>

</div><span class="text_page_counter">Trang 11</span><div class="page_container" data-page="11">

<b>2.2 T</b>

<b> HE INSTANCE SEGMENTATION MODEL </b>

Instance segmentation is a computer vision technique that accurately detects and outlines the boundaries of a particular object at the pixel level. It is often accomplished using a flexible framework called Mask Region Based Convolution Neural Network. Mask RCNN is highly effective deep neural network that can identify objects in an image, video, or real-time feed by enclosing them in a bounding box and simultaneously creating a segmentation mask for specific instances detected in the feed. It outperforms other models by combining object detection (classification and location) and instance segmentation simultaneously.

The Mask RCNN network (as shown in Figure 4) has two main stages. The first stage determines the presence of an object in a specific region of the input image, known as the Region of Interest. The second stage predicts the probability and displays an Image over Union bounding box and a binary mask around the image based on the results of the first stage. Both stages are integrated into the backbone. The network has three components: the Feature Pyramid Network, the Region Proposal Network, and the backbone network architecture. The FPN is a top-down or bottom-up architecture that serves as a universal feature extractor, using a bottom-up approach for this implementation. The RPN is a lightweight network that scans the FPN bottom-up and proposes likely regions in the image where the object may be present. It then recognizes different regions by fitting multiple bounding boxes according to certain IoU values. The backbone is a multi-layered neural network that generates feature maps of the input feed. In this case, ResNet50 is used as it is not a very deep architecture and fine-tuning helps the model achieve higher accuracy with less training time.

<small>Figure 4 Representation of Faster RCNN</small>

</div>