Tải bản đầy đủ (.doc) (76 trang)

Nhận dạng các tình huống khó ứng dụng trong trợ giúp người khiếm thị sử dụng kinect di động

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.1 MB, 76 trang )

HOANGVAN NAM

MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
---------------------------------------

Hoang Van Nam

COMPUTERSCIENCE

DIFFICULT SITUATIONS RECOGNITION SYSTEM FOR
VISUALLY-IMPAIRED AID USING A MOBILE KINECT

MASTER THESIS OF SCIENCE
COMPUTER SCIENCE

2014B
Ha Noi – 2016


MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
--------------------------------------Hoang Van Nam

DIFFICULT SITUATIONS RECOGNITION SYSTEM FOR
VISUALLY-IMPAIRED AID USING A MOBILE KINECT

Department :

COMPUTER SCIENCE


MASTER THESIS OF SCIENCE
…......................................

SUPERVISOR :
1. Dr. Le Thi Lan

Ha Noi – 2016


CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM
Độc lập – Tự do – Hạnh phúc

BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ
Họ và tên tác giả luận văn : …………………………………........……………..
Đề tài luận văn: ………………………………………….....……………...............….
Chuyên ngành:……………………………...…………………........................…..........
Mã số SV:………………………………….. …………………....................................…...
Tác giả, Người hướng dẫn khoa học và Hội đồng chấm luận văn xác
nhận tác giả đã sửa chữa, bổ sung luận văn theo biên bản họp Hội đồng
ngày….........................………… với các nội dung sau:
……………………………………………………………………………………………………..…………
…………………………………………………………………………………………..…………………………
…………………………………………………………………………..…………………………………………
…………………………………………………………..…………………………………………………………
…………………………………………..…………………………………………………………………………
…………………………..……………………………………………………………………………………..

Ngày

tháng


năm

Tác giả luận văn

Giáo viên hướng dẫn

CHỦ TỊCH HỘI ĐỒNG


Declaration of Authorship
I, Hoang Van Nam, declare that this thesis titled, ’Di cult situations recognition for
visual-impaired aid using mobile Kinect’ and the work presented in it are my own. I
con rm that:
This work was done wholly or mainly while in candidature for a research degree
at this University.
Where any part of this thesis has previously been submitted for a degree or any
other quali cation at this University or any other institution, this has been clearly
stated.
Where I have consulted the published work of others, this is always clearly attributed.
Where I have quoted from the work of others, the source is always given. With
the exception of such quotations, this thesis is entirely my own work.
I have acknowledged all main sources of help.
Where the thesis is based on work done by myself jointly with others, I have made
clear exactly what was done by others and what I have contributed myself.

Signed:

Date:


i


HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

Abstract
International Research Institute MICA
Computer Vision Department
Master of Science
Di cult situations recognition for visual-impaired aid using mobile Kinect
by Hoang Van Nam

By 2014, according to gures from some organization, here are more than one million
people in the Vietnam living with sight loss, about 1.3% of Vietnamese people. Although
the big impact to the daily living, especially with the ability to move, read, communicate
with another, only a small percentage of blind or visually impaired people live with
assistive device or animal such as a dog guide. Motivated by the signi cant changes in
technology have take place in the last decade, especially in the introduction of varies
types of sensors as well as the development in the eld of computer vision, I present in
this thesis a di cult situations recognition system for visually impaired aid using a mobile
Kinect. This system is based on data captured from Kinect and using computer vision
technique to detect obstacle. At the current prototype, I only focused on detecting
obstacle in the indoor environment like public building and two types of obstacle will be
exploited: general obstacle in the moving way and staircases-which causes a big
dangerous to the visually impaired people. The 3D imaging techniques were used to
detect the general obstacle including: plane segmentation, 3D point clustering and the
mixed strategy between depth and color image is used to detect the staircase based on
detecting the stair edges and its structure. The system is very reliable with the detection
rate is about 82.9% and the time to process each frame is 493 ms.



Acknowledgements
I am so honor to be here the second time, in one of the nest university in Vietnam to
write those grateful words to people who have been supporting, guiding me from the
very rst moment when I was a university student until now, when I am writing my
master thesis.
I am grateful to my supervisor, Dr. Le Thi Lan, whose expertise, understanding,
gener-ous guidance and support made it possible for me to work on a topic that was
of great interest to me. It was a pleasure to work with her.
Special thanks to Dr. Tran Thi Thanh Hai, Dr. Vu Hai and Dr. Nguyen Thi Thuy
(VNUA) and all of the members in the Computer Vision Department, MICA Institute
for their sharp comments, guidance for my works which helps me a lot in how to
study and to do research in right way and also the valuable advices and
encouragements that they gave to me during my thesis.
I would like to express my gratitude to Prof. Veelaert Peter, Dr. Luong Quang Hiep
and Mr. Michiel Vlaminck at Ghent University, Belgium for their supporting. It’s been
a great honor to cooperate and work with them.
Finally, I would especially like to thank my family and friends for their continues love,
support they have given me through my life, helps me pass through all the frustrating,
struggling, confusing. Thanks for everything that helped me get to this day.
Hanoi, 19/02/2016
Hoang Van Nam

iii


Contents
Declaration of Authorship
Abstract


i
ii

Acknowledgements

iii

Contents

iv

List of Figures

vi

List of Tables

ix

Abbreviations

x

1 Introduction

1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Assistive systems for visually impaired people . . . . . . . . . . . .

1.2.2 Di cult situations . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Mobile Kinect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 Environment Context . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Di cult Situations Recognition System . . . . . . . . . . . . . . . . . . .
1.4 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Related Works

1
2
2
3
5
11
12
13
14

2.1 Assistive systems for visually impaired people . . . . . . . . . . . . . . . .
2.2 RGB-D based assistive systems for visually impaired people . . . . . . . .
2.3 Stair Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Obstacle Detection

14
18
19
25

3.1
3.2
3.3

3.4
3.5

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Point Cloud Registration . . . . . . . . . . . . . . . . . . . . . . . . . . .
Plane Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ground & Wall Plane Detection . . . . . . . . . . . . . . . . . . . . . . .
iv

25
26
27
30
32


Contents
3.6 Obstacle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v
32

3.7 Stair Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7.1 Stair de nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7.2 Color-based stair detection . . . . . . . . . . . . . . . . . . . . . .
3.7.3 Depth-based stair detection . . . . . . . . . . . . . . . . . . . . . .
3.7.4 Result fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8 Obstacle information representation . . . . . . . . . . . . . . . . . . . . .
4 Experiments


34
34
35
45
46
48
49

4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Di cult situation recognition evaluation . . . . . . . . . . . . . . . . . . .
4.2.1 Obstacle detection evaluation . . . . . . . . . . . . . . . . . . . . .
4.2.2 Stair detection evaluation . . . . . . . . . . . . . . . . . . . . . . .
5 Conclusions and Future Works

49
51
51
53
58

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Publications

58
59
60

Bibliography


61


List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
1.15
1.16
2.1
2.2
2.3
2.4
2.5
2.6

A Comprehensive Assistive Technology (CAT) Model provided by [12] . .
A model for activities attribute and mobility provided by [12] . . . . . . .

Distribution of frequencies of head-level accidents for blind people [18] . .
Distribution of frequencies of tripping resulting a fall [18] . . . . . . . . .
A typical example of depth image (A) raw depth image, (B) depth image
is visualized by jet color map and the colorbar shows the real distance
with each color value, (C) Reconstructed 3D scene . . . . . . . . . . . . .
A stereo images that taken from OpenCV library and the calculated depth
image (A) left image, (B) right image, (C) depth image (disparity map) .
Some existed stereo camera. From left to right: Kodak stereo camera,
View-Master Personal stereo camera, ZED, Duo 3D Sensor . . . . . . . .
Time of ight systems from [3] . . . . . . . . . . . . . . . . . . . . . . . .
Some ToF cameras. From left to right: DepthSense, Fotonic, Microsoft
Kinect v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structured light cameras. From left to right: PrimeSense, Microsoft
Kinect v1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structured light systems from [3] . . . . . . . . . . . . . . . . . . . . . . .
Figure from [16], (A) raw IR image with pattern, (B) depth image . . . .
Figure from [16] (A) Errors for structured light cameras, (B) Quantization
errors in di erent distances of a door: 1m, 3m, 5m . . . . . . . . . . . . .
Prototype of system using mobile Kinect, (A) Kinect with battery and
belt, (B) Backpack with laptop (C)Mobile Kinect is mounted on human
body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Two di erent environments that I tested with (A) Our o ce build (B)
Nguyen Dinh Chieu secondary school . . . . . . . . . . . . . . . . . . . . .
Prototype of our obstacle detection and warning system . . . . . . . . . .
Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C) Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NXT Robot System from [6] (A) The system’s Block Diagram, (B) NXT
Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mobile robot from [22] [21] . . . . . . . . . . . . . . . . . . . . . . . . . .
BrainPort vision substitution device [32] . . . . . . . . . . . . . . . . . . .
Obstacle detection process from [30] . . . . . . . . . . . . . . . . . . . . .

Stair detection from [26] (A) Input image (B)(C)Frequency as a output
of Gabor lter (D)Stair detection result . . . . . . . . . . . . . . . . . . .

vi

3
4
4
5

6
7
7
8
8
8
9
9
10
11
12
13
15
16
16
18
20
21



List of Figures
2.7 A near-approach for stair detection in [13] (A) Input image with detected

2.8
2.9
2.10
3.1
3.2
3.3
3.4
3.5

stair region, (B) Texture energy, (C)Input image with detected lines are
stair candidates, (D)Optical ow maps in this image, there is a signi cant
changing in the line in the edge of stair . . . . . . . . . . . . . . . . . . .
Example of segmentation and classi cation in [24] . . . . . . . . . . . . .
Stair modeling(left) and features in each plane [24] . . . . . . . . . . . . .
Stair detection algorithm proposed in [29] (A) Detected line in the edge
image (using color infomation) (B) Depth pro les in each line (red line:
pedestrian crosswalk, blue: down stair, green: upstair) . . . . . . . . . . .
Obstacle Detection Flowchart . . . . . . . . . . . . . . . . . . . . . . . .

vii

22
23
23

24
26


Kinect mounted on body . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
Coordinate Transformation Process . . . . . . . . . . . . . . . . . . . . . .
28
Kinect Coordinate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
Point Cloud rotation using normal vector of ground plane (while arrow):
left: before rotating, right: after rotating . . . . . . . . . . . . . . . . . . .
30
3.6 Normal vector estimation algorithms [15] (a) Normal vector of the center
point can be calculated by a cross product of two vectors of four neighbor
points (red), (b) Normal vector estimation in a scene . . . . . . . . . . .
31
3.7 Plane segmentation result using algorithm proposed in [15]. Each plane
is represented by a distinctive color. . . . . . . . . . . . . . . . . . . . . .
31
3.8 Detected Ground and Walls plane (ground: blue, wall: red) . . . . . . . .
33
3.9 Human Segmentation Data by Microsoft Kinect SDK (a) Color Image,
(b) Human Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.10 Detected Obstacles (a) Color Image, (b) Detected Obstacles . . . . . . .
34
3.11 Model of stair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.12 Coordinate transformation models from [7] . . . . . . . . . . . . . . . . .
36
3.13 Projective chirping: a) A real world object that generate a projection
with "chirping" - "periodicity-in-perspective" b) Center raster of image

c) Best t projective chirp . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.14 A pinhole camera model with stair . . . . . . . . . . . . . . . . . . . . . . 38
3.15 A vertical Gabor lter kernel. . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.16 Gabor lter applied on a color image (a) Original (b) Filtered Image. . . 40
3.17 Thresholding the grayscale image (a) Original (b) Thresholded Image. . 40
3.18 Example of thinning image using morphological. . . . . . . . . . . . . . . 41
3.19 Thresholding the grayscale image (a) Original (b) Thresholded Image. . 42
3.20 Six points vote for a line will make an intersection in Hough space, this
intersection has higher intensity than neighbor pixels. . . . . . . . . . . . 42
3.21 Hough space (a) Line in the original space (b) Three curves vote for this line
in Hough space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.22 Hough space on stair image (a) Original image (b) Hough space. . . . . . 43
3.23 Chirp pattern detection (a) Hough space (b) Original image with detected
chirp pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.24 Point cloud of stair (a) Original color image (b)Point cloud data created from
color and depth image. . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.25 Detected steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.26 Detected planes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.27 Detected stair on point cloud. . . . . . . . . . . . . . . . . . . . . . . . . .
47


List of Figures

viii

3.28 Obstacle position quantization for sending warning message to visually
impaired people. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Depth image encoding (A) Original, (B) Visualized Image (C) Encoded

Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Detection time of each step in our proposed method. . . . . . . . . . . . . 52
4.3 Example stair image to evaluation (A)Positive sample from MICA dataset
(B) Negative sample from MICA dataset (C) Positive sample from MONASH
dataset (D) Negative sample from MONASH dataset. . . . . . . . . . . . 54
4.4 Detected stair in Tian’s based method (A-F) and detected stair in my
proposal method (G-I) (A) Color image (B) Depth image (C) Edges (D) Line
segments (E) Detected concurrent lines (F) Depth values on detected lines
(G) Detected stair with blue lines are false stair edge and green lines are
stair edge (H) Edges Image, (I) Detected peaks in Hough map corresponding
to lines in Figure G. . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Miss detection in Tian’s based method because of missed depth on stair(AF) and detected stair in my proposed method (G-I). . . . . . . . . . . . . 56 4.6
Miss detection in Tian’s based method because of missed depth on stair(AF) and detected stair in my proposed method (G-I). . . . . . . . . . . . . 57


List of Tables
2.1 Comparison between assistive robot and wearable device . . . . . . . . . .
4.1 Database speci cations. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14
50

4.2
4.3
4.4
4.5
4.6

52
52

53
53

Pixel level evaluation result (TP,FP,FN: million pixels). . . . . . . . . . .
Object level evaluation result (TP,FP,FN: objects). . . . . . . . . . . . . .
Stair dataset for evaluation . . . . . . . . . . . . . . . . . . . . . . . . . .
Stair detection result of the proposed method on di erent datasets . . . .
Comparison of the proposed method and the method of Tian et al. [29]
on MICA dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

55


Abbreviations
PCL

Point Cloud Library

CAT

Comprehensive Assistive Technology

TDU

Tongue Display Unit

IR


Infrared

OpenCV

Open Computer Vision

RGB

Red Green Blue

RGB-D

Red Green Blue and Depth

ToF

Time of Flight

x


Chapter 1

Introduction
1.1

Motivation

According to the o cial statistic of National Eye Hospital in 2002, Vietnam has about
900.000 blind people, including about 400.000 who are totally blind. By 2014, according

to gures from some organizations, the number of blind people in Vietnam is about 1,2 to
1,4 million people, it’s still a large number in comparison with other countries. In the
worldwide, the visually impaired population is estimated to number in excess of 285
1

million by the investigation of World Health Organization (August 2014) . About 90% of
them live in developing countries with low-income settings. Visually impaired has made a
big impact in their daily living. Especially they can not read the document, the ability to
move and to communicate with other people is compromised because the information is
received primarily through vision. All of the above things have led blindness become the
public heath problem in all over the world.

Nowadays, with the signi cant developing in the technology, lots of assistive devices
has been released in order to help visually impaired people in daily life. But although
many researchers and companies are concerned with making better and cheap
device to improve the comfort of visually impaired people, the research in this eld still
re-mains many unsolved issues and in general, those devices still cannot replace
traditional methods such as the white cane or guided dog.
Take the motivation on the signi cant changes in technology have take place in the last
decade, especially in the introduction of varies types of sensors as well as the development in
the eld of computer vision, my thesis aims to build a prototype of system to help visually
impaired people avoid the obstacle in the environment using Kinect sensor.

1

/>
1


Chapter 1. Introduction


2

With the Kinect, the bene t is that we can make a reliable system by using depth and
color information to detect the obstacle with an a ordable price. In my thesis, due to the
lack of time, I only focus on indoor environment, more speci cally in the public building
such as apartment or o ce in order to detect some general objects encountered on the
moving way and stair which may cause danger to the visually impaired people.

My thesis is organized as follows:
First, I shall give some de nitions in the context of my work and the contributions in
this thesis.
In chapter 2, I shall review brie y some other works that related to my system such as
existing assistive devices, obstacle detection algorithms/systems, its advantages and
disadvantages.
In chapter 3, a framework for obstacle detection will be developed and I shall present the
details of each module and also the entire system, analyzing and assessing them.

In the next chapter, I shall give some experiments results of my system, including
how to prepare the dataset, how to make an evaluation and the nal results.
In the nal chapter, I end this work by giving some conclusions and future works to
make the system more complete and e ective.

1.2
1.2.1

De nition
Assistive systems for visually impaired people

According to [12], assistive systems for visually impaired people can be understood as an

equipment, devices or systems which can be used to overcome the gap between a
disabled person wants to do and what the social allows them to do. In short, such kind of
system must be able to help the visually impaired people to do the things that normal
people can do. And this system can be model by the Comprehensive Assistive
Technology (CAT) Model as shown in Fig 1.1. The top level of this model consist of four
components that can be used to de ne all assistive technology systems:

Context (in which the assistive technology will be
used). Person (what kind of user can use this system)
Activities (what activities that assistive system can help the visually impaired
people, can be seen more clearly in Fig 1.2)


Chapter 1. Introduction

3

Assistive Technology (technology will be used to make a system)
Most of the existing systems are aimed at solving one speci c aspect of each branch
in the model: work on bounded de ned context, with some certain types of users to
help them in speci c activities in daily life. In the framework of my master thesis, to
simplify the system, I just focused on some certain aspects of this model and I will
explain in detail in the next sections. In short, I applied my system with the local
settings of context, in a small public building such as o ce, department and the users
are the visually impaired students at the Nguyen Dinh Chieu Secondary school to
help them avoid obstacles in a moving way.
Local settings
Culture &
Social context
Context


National context
Characteristics

Comprehensive

Person

Assistive
Technology
model

Activities
Assistive Technology

Attitudes
Social aspects
Activity specification
Design issues
System technology issues
End user issues

Figure 1.1: A Comprehensive Assistive Technology (CAT) Model provided by [12]

1.2.2

Di cult situations

Fig 1.2 shows detail information of activities branch in the model CAT (see Fig 1.1). As
shown in the gure, there are a lot of services that can be used in assistive systems for visually impaired

people such as mobility, daily living, cognitive activities, education and employment, recreational
activities, communication and access to information. But most of exist works focus on the mobility
component in the activities model because of its important role for visually impaired people daily life.


Chapter 1. Introduction

4
Reaching and lifting
Sitting and standing
Short distance locomotion

inside & outside

Long & medium distance

Mobility

locomotion

Movement on ramps,

slopes, stairs & hills
Daily living
Activities

Our Focus

Obstacle avoidance


Cognitive activities
Navigation and orientation
Education and Employment
Recreational activities
Communication & Access to information

Sitting and standing
Access to environment

Figure 1.2: A model for activities attribute and mobility provided by [12]

According to the survey of R.Manduchi [18] in 2011 with 300 respondents who are
legally blind or blind, there were half of the respondents said that they had an headlevel accident at least once in a week and about 30% respondents fell down at least
once a month (see Fig 1.3 and Fig 1.4). Therefore, helping visually impaired people
in the moving process is always an interested topic for researchers, social
organizations and companies. In fact, many products have been released, and also
have some particular success like the system proposed in [11], [10], [1] and [4].

Figure 1.3: Distribution of frequencies of head-level accidents for blind people [18]


Chapter 1. Introduction

5

Figure 1.4: Distribution of frequencies of tripping resulting a fall [18]

In the context of my thesis, I aim to develop a system which can detect the obstacles in
visually impaired people’s moving way which are the main cause of the accidents
mentioned above. The scenario in this project is that visually impaired people want to

move on the hallway inside a public building, so they need to avoid obstacles including
moving or static objects and to go up/down the stair. Obstacle in my case can be de ned
as objects laying on the ground or in front of the visually impaired people that he/she can
be harmed while moving if encountered these objects. Although obstacle’s class is very
important with the visually impaired people to distinguish which is more dangerous and
which is not but in my work, I just try to detect obstacle in the scene without saying its
name (make a classi cation). And within the framework of this thesis, I also focus on
detection another special object that often appears in the building and is very dangerous
for the visually impaired people, that is the stair. Moreover, the proposed system will only
give a warning to the blind people using Tongue Display Unit (TDU) which has been
already developed by Thanh-Huong Nguyen in 2013 [23]. In brief, my proposed system
aims to solve two aspects of mobility component of the activities model (see Fig. 1.2):
obstacle avoidance and movement on ramps, slopes, stair & hill and with the second
aspect, the current system just stop at the level of given warning distance of stairs to the
visually impaired people in order to assist them in going up/down stairs.

1.2.3

Mobile Kinect

1. Introduction To assist visually impaired persons in those di cult situations, in my
thesis, I proposed using a Kinect sensor to capture the information of the environment in order to detect obstacles if they appear. There are a lot of
advantages when using Kinect in this system since it is a popular RGB-D
camera with cheap price. But rstly, I will give some brief information about the
depth camera where Kinect is the typical example.


Chapter 1. Introduction

6


Depth camera is actually a sensor which has the capacity to provide depth information (depth image or depth map). A depth map is an image that contains
information relating to the distance of the surface of scene objects from a viewpoint, for example in the Fig. 1.5. An intensity value of each pixel in a depth
map represents a distance from a point in the object to the camera. Therefore,
3D information of the scene can be reconstructed by using depth image (as
shown in Fig. 1.5-C). The bene t of the depth image that is not a ected by
lighting conditions.

(a)

(b)

(c)

Figure 1.5: A typical example of depth image (A) raw depth image, (B) depth image is
visualized by jet color map and the colorbar shows the real distance with each color value, (C)
Reconstructed 3D scene

In recent years, with the development of technology, especially in the eld of
sensor fabrication industry, there are a lot of cameras have been placed on the
market which is capable of capturing the depth information. Those devices can
be separated into several groups by used technology such as stereo camera:
ZED, for example, Time-of-Flight (ToF) like ZCam, the structured light camera
like Kinect, long range 3D camera. Each device has it own advantages,
disadvantages and only suitable for a particular use case.
2. Stereo Camera
Stereo camera is the kind of camera was used in the robotics since its early days.
Take the ideas of human binocular vision, it contains two or more cameras with
precisely known relative o sets. Depth information can be calculated by matching
similar point in the overlapped region between images. Hence, 3D distance to

matching points can be determined using triangulation like illustrated in Fig 1.6.
However, the camera is used in this case is still the color camera. As a result, it is still a ected by
the changing of lighting conditions. On the other hand, the depth image is calculated by
matching algorithms, so it works very poorly when the scene is texture-less, for example image
of wall, building,. There are many stereo cameras that are available on the market due to the
ease of making such as Kodak


Chapter 1. Introduction

7
2

3

stereo camera, View-Master Personal stereo camera, ZED , Duo 3D Sensor ,
as illustrated in Fig 1.7.

(a)

(b)

(c)

Figure 1.6: A stereo images that taken from OpenCV library and the calculated depth image
(A) left image, (B) right image, (C) depth image (disparity map)

Figure 1.7: Some existed stereo camera. From left to right: Kodak stereo camera, ViewMaster Personal stereo camera, ZED, Duo 3D Sensor

3. Time of Flight (ToF) camera

Time of Flight (ToF) cameras use the same principle as laser radar, which is
instead of transmitting a single beam, short pulses of Infrared(IR) light is sent. The
camera will get the return time from pixels across its eld of view. And the distance
was measured by comparing the phase of the modulated return pulses with those
emitted by the laser (Fig 1.8). But ToF cameras is also su ered from similar
limitations as a ime of ight sensors, including ambiguity of measurements, multiple
re ection, sensitivity to material re ectance, background lighting and do not operate
well outdoors in strong sunlight. Some of the popular ToF cameras can be listed
4

5

such as: DepthSense , Fotonic , Microsoft Kinect v2 (see Fig 1.9)

4. Structured light camera
Structured light camera is another approach to measure depth information by
using \structured light", which is a pattern of light such as array of lines. The scene
will be viewed at an angle like illustrating in the Fig 1.11. If the pattern is projected
onto a at wall, the camera will see straight lines but if the scene is very complex
then it will see a more complex pro le. By analyzing this pro les across

2

3

/>

4
/>5


/>


×