Nghiên cứu ứng dụng công nghệ nhận diện giọng nói vào việc xây dựng phần mềm hỗ trợ luyện tập phát âm tiếng Anh trên thiết bị di động: Đề tài nghiên cứu khoa học

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.78 MB, 54 trang )

BỘ GIÁO DỤC VÀ ĐÀO TẠO

<b>TRƯỜNG ĐẠI HỌC BÀ RỊA - VŨNG TÀU </b>

<b>ĐỀ TÀI KHOA HỌC VÀ CÔNG NGHỆ CẤP TRƯỜNG </b>

<b>Nghiên cứu ứng dụng cơng nghệ nhận diện giọng nói vào </b>

<b>việc xây dựng phần mềm hỗ trợ luyện tập phát âm tiếng </b>

<b>Anh trên thiết bị di động</b>

<b>Chủ nhiệm đề tài: TS. Phan Ngoc Hoàng </b>

</div>
<span class='text_page_counter'>(2)</span><div class='page_container' data-page=2>

<b>Tên đề tài: Nghiên cứu ứng dụng cơng nghệ nhận diện giọng nói vào việc </b>

xây dựng phần mềm hỗ trợ luyện tập phát âm tiếng Anh trên thiết bị di động

<b>Chủ nhiệm đề tài: TS. Phan Ngọc Hoàng, Phó trưởng Khoa, Khoa CNTT – </b>

Điện – Điện tử

<b>Danh sách cán bộ tham gia chính: </b>

TS. Phan Ngọc Hồng, Phó trưởng Khoa, Khoa CNTT – Điện – Điện tử
TS. Bùi Thị Thu Trang, Phó trưởng ngành CNTT, Khoa CNTT – Điện – Điện
tử

<b>Nội dung chính: </b>

Nhóm nghiên cứu là mong muốn tạo ra một giải pháp thực sự phù hợp để có
thể hỗ trợ người học là sinh viên, giảng viên Trường Đại học Bà Rịa-Vũng Tàu nói
riêng, cũng như người học trong cộng đồng nói chung, giải quyết những vấn đề khó
khăn trong việc luyện tập phát âm Anh.

Với sự phát triển nhanh chóng và vượt bậc của cơng nghệ nhận diện giọng nói
cũng như sự tiện lợi mang lại của thiết bị di động, giải pháp của nhóm nghiên cứu
hướng tới việc ứng dụng cơng nghệ nhận diện giọng nói vào việc xây dựng phần
mềm hỗ trợ phát âm tiếng Anh trên thiết bị di động. Mục đích cuối cùng của giải
pháp là tạo ra được phần mềm trên thiết bị di động có thể hỗ trợ người học tiếng
Anh.

+ Nhóm nghiên cứu đã hồn thiện việc xây dựng phần mềm ứng dụng hỗ trợ
luyện tập phát âm trên thiết bị di động áp dụng công nghệ nhận diện giọng nói.

+ Phần mềm ứng dụng được xây dựng trên nền tảng iOS và được tích hợp
cơng nghệ nhận diện giọng nói nổi bật đang được sử dụng hiện nay trong trợ lý ảo
thông minh Siri của Apple.

</div>
<span class='text_page_counter'>(3)</span><div class='page_container' data-page=3>

+ Kết quả nghiên cứu đã được công bố thông qua 01 bài báo trên 01 tạp chí
khoa học thuộc danh mục ISI/SCOPUS như sau: Lecture Notes of the Institute for
Computer Sciences, Social Informatics and Telecommunications Engineering, Vol
298, pp. 157-166, Springer, 2019, (SCOPUS – Q4), ISSN 1867-8211.

<b>Thời gian nghiên cứu: từ 11/2018 đến 11/2019 </b>

<b>Phòng KHCN & HTQT </b> <b>Trưởng Khoa/</b>

</div>
<span class='text_page_counter'>(4)</span><div class='page_container' data-page=4>

<b>MỤC LỤC </b>

1. ĐẶT VẤN ĐỀ ... 5

2. GIẢI QUYẾT VẤN ĐỀ ... 8

3. THỰC HIỆN GIẢI PHÁP ... 10

3.1. Thiết kế xây dựng CSDL ... 10

3.1.1. Bài học (Lesson) ... 10

3.1.2. Cách phát âm (Pronunciation) ... 13

3.1.3. Bài tập phát âm (Practice) ... 14

3.1.4. Từ tiếng Anh dùng để luyện tập (Word) ... 15

3.1.5. Xây dựng CSDL trên Core Data ... 15

3.2. Thiết kế xây dựng phần mềm trên nền tảng iOS ... 16

3.2.1. Chức năng xem danh sách bài học ... 17

3.2.2. Chức năng xem cách phát âm ... 20

3.2.3. Chức năng xem danh sách bài luyện tập ... 21

3.2.4. Chức năng chọn chế độ luyện tập ... 22

3.2.5. Chức năng luyện tập với từ đơn ... 23

3.2.6. Chức năng tổng hợp kết quả luyện tập ... 26

3.2.7. Chức năng thiết lập lại luyện tập ... 27

4. KẾT QUẢ ĐẠT ĐƯỢC ... 29

</div>
<span class='text_page_counter'>(5)</span><div class='page_container' data-page=5>

Trước xu thế hội nhập và tồn cầu hóa, tiếng Anh được xem là ngơn ngữ sử
dụng phổ biến nhất thế giới. Trong đó gần 60 quốc gia sử dụng tiếng Anh là ngôn
ngữ chính, ngồi ra bên cạnh tiếng mẹ để có gần 100 quốc gia sử dụng tiếng Anh
như ngôn ngữ thứ hai. Vì vậy ngoại ngữ chính là chìa khóa quan trọng trong thời kỳ
hội nhập và toàn cầu hóa hiện nay.

Trong bối cảnh đó, mối quan hệ của con người cũng như sự hợp tác, đầu tư
trong bất kỳ lĩnh vực nào từ kinh doanh, thương mại, giao thông, công nghệ, truyền
thông, du lịch, ... cho đến những cơ hội trong học tập, làm việc đã mở rộng ra trong
phạm tất cả các nước trên tồn thế giới. Tiếng Anh chính là một cơng cụ hữu hiệu
và đóng vai trị quan trọng trong thành công của nhiều cá nhân và doanh nghiệp.

Đối với tiếng Anh cũng như mọi ngôn ngữ khác, phát âm là một trong những
kỹ năng cơ bản đóng vai trò nền tảng và quyết định cho những người bắt đầu học
tiếng Anh. Phát âm chính là yếu tố có ảnh hưởng tới việc học tất cả các kỹ năng cịn
lại như: từ vựng, nghe, nói, đọc, viết, ...

Phát âm chuẩn giúp người nghe dễ hiểu hơn, mặc dù người phát âm chưa được
chuẩn lắm thì người nghe vẫn có thể hiểu, nhưng đơi khi họ cũng phải cố gắng hết
sức mới hiểu được người nói muốn diễn đạt gì.

Ngồi ra phát âm chuẩn có nghĩa là người nói biết được cách phát âm như thế
nào, điều này rất hữu ích cho kỹ năng nghe hiểu của người phát âm chuẩn. Từ đó có
thể giúp người đó nghe hiểu dễ dàng hơn các đoạn video, radio hay các đoạn hội
thoại. Trong trường hợp người nói phát âm sai từ nào đó, chắc chắn sẽ khơng thể
hiểu khi nghe người khác nói chính từ mà mình phát âm sai.

Người học tiếng Anh có rất nhiều phương pháp tự học cũng như công cụ hỗ
trợ đắc lực trong việc luyện phát âm chuẩn. Chẳng hạn người học có thể dùng
phương pháp cổ điển là phát âm và nhìn vào gương để nhận biết chuyển động của
mơi và miệng một cách chính xác nhất trong việc phát âm.

</div>
<span class='text_page_counter'>(6)</span><div class='page_container' data-page=6>

dụng phần mềm hỗ trợ học phát âm tiếng Anh hiện tại đều hướng đến những chức
năng chung này, cụ thể các ứng dụng sẽ hiển thị cách phát âm của từ, cho phép người
học nghe đoạn phát âm mẫu, sau đó người học sẽ ghi âm lại nội dung phát âm của
mình và tự so sánh với đoạn phát âm mẫu. Hoặc người học nghe/nhìn từ và gõ lại
từ/phiên âm của từ để phần mềm đánh giá sự đúng sai.

<b>Hình 1. Ví dụ các phần mềm luyện tập phát âm trên thiết bị di động </b>

</div>
<span class='text_page_counter'>(7)</span><div class='page_container' data-page=7></div>
<span class='text_page_counter'>(8)</span><div class='page_container' data-page=8>

<b>2. GIẢI QUYẾT VẤN ĐỀ </b>

Mục đích của nhóm nghiên cứu là mong muốn tạo ra một giải pháp thực sự
phù hợp để có thể hỗ trợ người học là sinh viên, giảng viên Trường Đại học Bà
Rịa-Vũng Tàu nói riêng, cũng như người học trong cộng đồng nói chung, giải quyết
những vấn đề khó khăn trong việc luyện tập phát âm nêu trên.

1. Thụ hưởng công nghệ nhận diện giọng nói để giúp người học có thể tự
kiểm tra việc phát âm tiếng Anh của bản thân và sẽ có sự điều chỉnh phù hợp.

2. Cung cấp cho người học các chức năng vốn có của một công cụ hỗ trợ
luyện phát âm tiếng Anh, cụ thể là danh sách từ vựng luyện theo âm, phiên âm và
phát âm mẫu của mỗi từ.

3. Hỗ trợ người học luyện phát âm tiếng Anh mọi lúc, mọi nơi và hồn
tồn miễn phí.

Để thực hiện mục tiêu nêu trên nhóm phát triển sẽ tiến hành nghiên cứu xây
dựng phần mềm ứng dụng trên thiết bị di động hỗ trợ luyện tập phát âm tiếng Anh
sử dụng công nghệ nhận diện giọng nói dựa vào các nguồn sau:

+ Nghiên cứu các phương pháp, tài liệu, nội dung liên quan đến việc luyện
tập phát âm tiếng Anh để đưa vào phần mềm cho phù hợp.

+ Nghiên cứu các công nghệ nhận diện giọng nói đã phát triển, khả năng phù
hợp để tích hợp chúng vào phần mềm.

+ Nghiên cứu thiết kế giao diện, ngơn ngữ lập trình liên quan để xây dựng
phần mềm;

</div>
<span class='text_page_counter'>(9)</span><div class='page_container' data-page=9>

+ Chuyển đổi nội dung luyện tập phát âm từ các nguồn sang dạng hệ thống
thông tin.

+ Cho phép người dùng xem danh sách các bài học của từng âm trong tiếng
Anh và lựa chọn bài học tương ứng.

+ Dựa vào âm được lựa chọn, cho phép người dùng xem lại cách phát âm của
âm.

+ Dựa vào âm được lựa chọn, cho phép người dùng xem danh sách các bài
luyện tập tương ứng và lựa chọn bài tập để luyện tập.

+ Cho phép người dùng lựa chọn chế độ luyện tập các từ chưa hoàn thành
hoặc luyện tập tất cả các từ trong bài tập.

+ Đối với từng từ luyện tập:

- cho phép người dùng xem phiên âm của từ;

- nghe cách phát âm mẫu của người nói tiếng Anh bản địa;

- kiểm tra việc phát âm từ đúng hay sai dựa vào công nghệ nhận diện
giọng nói.

+ Dựa vào kết quả phát âm của các từ trong bài tập, phần mềm tự động tổng
hợp và cho phép người dùng biết được kết quả chung về mức độ phát âm đối với
bài tập.

+ Dựa vào kết quả của các bài tập, phần mềm tự động tổng hợp và cho phép

người dùng biết được kết quả chung về mức độ phát âm đối với bài học của từng
âm.

+ Cho phép người dùng thiết lập lại kết quả bài tập để luyện tập bài tập lại từ
đầu.

</div>
<span class='text_page_counter'>(10)</span><div class='page_container' data-page=10>

<b>3.1. Thiết kế xây dựng CSDL </b>

Cơng việc chính của phần thiết kế xây dựng CSDL nhằm thực hiện nhiệm vụ
chuyển đổi những thông tin, tài liệu liên quan đến việc luyện tập phát âm tiếng Anh
sang hệ thống CSDL phục vụ cho việc xây dựng phần mềm ứng dụng.

<b>3.1.1. Bài học (Lesson) </b>

Để phát âm được 1 từ đúng, chúng ta sẽ cần phát âm dựa vào phần phiên âm
của từ chứ khơng nhìn vào mặt chữ của từ đó. Trong ví dụ hình 3, chúng ta có thể
<b>thấy, mặc dù 2 từ đều được viết là wind, tuy nhiên cách phát âm của 2 từ này lại </b>
<b>hoàn toàn khác nhau. Từ thứ nhất, là 1 danh từ, được phát âm là /wɪnd/, từ thứ 2 là </b>
<b>1 động từ, được phát âm là /waɪnd/. </b>

<b>Hình 3. Ví dụ về sự quan trọng của phát âm dựa vào phiên âm </b>

Chính vì vậy, muốn phát âm chính xác được 1 từ, chúng ta cần phát âm dựa
vào phần phiên âm của từ. Để hiểu được phần phiên âm tiếng Anh này, chúng ta sử
dụng bảng mẫu tự ngữ âm quốc tế IPA (International Phonetic Alphabet) cho tiếng
Anh.

</div>
<span class='text_page_counter'>(11)</span><div class='page_container' data-page=11>

<b>Hình 4. Danh sách 44 âm trong bảng IPA của tiếng Anh </b>

Danh sách các nguyên âm (vowel sounds) và một số ví dụ của nó được liệt kê
như sau:

/iː/ – như trong từ sea /siː/, green /ɡriːn/

/ɪ/ – như trong từ kid /kɪd/, bid, village /ˈvɪlɪdʒ/
/ʊ/ – như trong từ good /ɡʊd/, put /pʊt/

/uː/ – như trong từ goose /ɡuːs/, blue/bluː/
/e/ – như trong từ dress /dres/, bed /bed/

/ə/ – như trong từ banana /bəˈnɑːnə/, teacher /ˈtiːtʃə(r)/
/ɜː/ – như trong từ burn /bɜːn/, birthday /ˈbɜːθdeɪ/
/ɔː/ – như trong từ ball /bɔːl/, law /lɔː/

/æ/ – như trong từ trap /træp/, bad /bæd/
/ʌ/ – như trong từ come /kʌm/, love /lʌv/

/ɑː/ – như trong từ start /stɑːt/, father /ˈfɑːðə(r)/
/ɒ/ – như trong từ hot /hɒt/, box /bɒks/

</div>
<span class='text_page_counter'>(12)</span><div class='page_container' data-page=12>

/ʊə/ – như trong từ poor /pʊə(r)/, jury /ˈdʒʊəri/
/ɔɪ/ – như trong từ choice /tʃɔɪs/, boy /bɔɪ/
/əʊ/ – như trong từ goat /ɡəʊt/, show /ʃəʊ/

/eə/ – như trong từ square /skweə(r)/, fair /feə(r)/
/aɪ/ – như trong từ price /praɪs/, try /traɪ/

/aʊ/ – như trong từ mouth/maʊθ/, cow /kaʊ/

Danh sách các phụ âm (consonant sounds) và một số ví dụ của nó được liệt
kê như sau:

/p/ – như trong từ pen /pen/, copy /ˈkɒpi/
/b/ – như trong từ back /bæk/, job /dʒɒb/
/t/ – như trong từ tea /tiː/, tight /taɪt/

/d/ – như trong từ day /deɪ/, ladder /ˈlædə(r)/
/ʧ/ – như trong từ church /ʧɜːʧ/, match /mætʃ/
/ʤ/ – như trong từ age /eiʤ/, gym /dʒɪm/
/k/ – như trong từ key /ki:/, school /sku:l/
/g/ – như trong từ get /ɡet/, ghost /ɡəʊst/
/f/ – như trong từ fat /fæt/, coffee /ˈkɒfi/
/v/ – như trong từ view /vjuː/, move /muːv/
/θ/ – như trong từ thin /θɪn/, path /pɑːθ/
/ð/ – như trong từ this /ðɪs/, other /ˈʌðə(r)/
/s/ – như trong từ soon /suːn/, sister /ˈsɪstə(r)/
/z/ – như trong từ zero /ˈzɪərəʊ/, buzz /bʌz/
/ʃ/ – như trong từ ship /ʃɪp/, sure /ʃɔː(r)/

/ʒ/ – như trong từ pleasure /’pleʒə(r), vision /ˈvɪʒn/
/m/ – như trong từ more /mɔː(r)/, room /ruːm/
/n/ – như trong từ nice /naɪs/, sun /sʌn/

/ŋ/ – như trong từ ring /riŋ/, long /lɒŋ/

</div>
<span class='text_page_counter'>(13)</span><div class='page_container' data-page=13>

/r/ – như trong từ right /raɪt/, sorry /ˈsɒri/
/w/ – như trong từ wet /wet/, win /wɪn/
/j/ – như trong từ yes /jes/ , use /ju:z/

Với thông tin về các âm ở trên, các âm trong bảng IPA của tiếng Anh có thể
được biểu diễn bằng một bảng trong CSDL với tên LESSON (bài học) như trong mô
tả tại bảng 1:

<b>Bảng 1. Bảng LESSON (bài học) trong CSDL </b>

<b>LESSON </b> <b>Mô tả </b>

<b>PK lessonId </b> Mã bài học

name Tên bài học

sound Âm được sử dụng trong bài học
description Mô tả bài học

photo Hình đại diện của bài học
completion Mức độ hoàn thành bài học

<b>3.1.2. Cách phát âm (Pronunciation) </b>

Đối với mỗi âm trong tiếng Anh sẽ có những cách phát âm cụ thể, trong đó có
nhiều thành phần liên quan tác động đến cấu thành một âm như môi, lưỡi, miệng, độ
dài hơi, ... Các cách phát âm liên quan đến một âm tiết có thể phân thành các mục
như sau.

+ Âm thanh được tạo ra như thế nào (How the sounds are made), đây là phần
hướng dẫn cách âm thanh của một âm trong tiếng Anh được tạo ra như thế nào.

+ Âm thanh được tạo ra từ đâu (Where the sounds are made), phần này sẽ

hướng dẫn người học biết được âm thanh của âm được tạo ra từ đâu, cách phối hợp
các bộ phận như môi, miệng, lưỡi, răng, ... để tạo ra âm thanh.

+ Thanh âm và độ dài của hơi (Voicing and length), sẽ cho biết độ dài/ngắn
của âm và việc phát ra thanh âm sử dụng dây thanh âm.

</div>
<span class='text_page_counter'>(14)</span><div class='page_container' data-page=14>

+ Đánh vần (Spelling), phần này cho biết âm được dùng để phát âm tương
ứng với các ký tự nào trong tiếng Anh.

+ Lưỡi (The tongue), phần này sẽ hướng dẫn người học điều khiển lưỡi chính
xác để đọc một âm tương ứng.

+ Mơi và miệng (The lips and mouth), phần này sẽ hướng dẫn người học điều
khiển mơi và miệng chính xác để đọc một âm tương ứng

Với thông tin về các thành phần ảnh hưởng đến cách phát âm các âm trong
tiếng Anh, có thể thấy một âm (Lesson) sẽ có rất nhiều hướng dẫn phát âm
(pronunciation). Những hướng dẫn này có thể biểu diễn trong bằng một bảng của
CSDL như sau (bảng 2):

<b>Bảng 2. Bảng cách phát âm (PRONUNCIATION) trong CSDL </b>

<b>PRONUNCIATION </b> <b>Mô tả </b>

<b>PK </b> pronunciationId Mã hướng dẫn phát âm
title Tiêu đề hướng dẫn phát âm
description Nội dung hướng dẫn phát âm

lessonId Âm tiếng Anh tương ứng với hướng dẫn phát âm

<b>3.1.3. Bài tập phát âm (Practice) </b>

Đối với mỗi âm hay mỗi bài học, sẽ có nhiều bài luyện tập phát âm tương ứng
với vị trí hay tính đặc biệt của âm ở trong từ. Thơng thường các bài luyện tập phát
âm được chia ra làm các loại sau:

+ Bài luyện tập chứa các từ trong đó âm cần luyện tập nằm đầu từ;
+ Bài luyện tập chứa các từ trong đó âm cần luyện tập nằm giữa từ;
+ Bài luyện tập chứa các từ trong đó âm cần luyện tập nằm cuối từ;

+ Bài luyện tập chứa các từ trong đó âm cần luyện tập nằm trước hay sau một
hay nhiều nguyên âm khác;

</div>
<span class='text_page_counter'>(15)</span><div class='page_container' data-page=15>

Với thông tin về các dạng bài luyện tập ở trên, chúng ta thấy các âm hay bài
học sẽ có nhiều bài luyện tập phát âm khác nhau. Bài luyện tập phát âm có thể được
biểu diễn bằng một bảng của CSDL như sau (bảng 3):

<b>Bảng 3. Bảng bài luyện tập phát âm (PRACTICE) trong CSDL </b>

<b>PRACTICE </b> <b>Mô tả </b>

<b>PK </b> practiceId Mã bài luyện tập phát âm
name Tên bài luyện tập phát âm

description Thông tin mô tả về bài luyện tập phát âm
completion Mức độ hoàn thành bài luyện tập phát âm

lessonId Âm tiếng Anh tương ứng với bài luyện tập phát âm

<b>3.1.4. Từ tiếng Anh dùng để luyện tập (Word) </b>

Mỗi bài luyện tập phát âm sẽ chứa nhiều từ tiếng Anh tương ứng phù hợp với
nội dung bài luyện tập. Mỗi từ tiếng Anh dùng để luyện tập có thể được biểu diễn
bằng một bảng của CSDL như sau (bảng 5):

Bảng 5. Bảng từ luyện tập (WORD) trong CSDL

<b>PK </b> wordId Mã của từ dùng để luyện tập

text Nội dung từ

pronunciation Phiên âm của từ

isCompleted Từ đã được phát âm đúng

practiceId Bài luyện tập phát âm tương ứng với từ

</div>
<span class='text_page_counter'>(16)</span><div class='page_container' data-page=16>

Các bài học luyện tập phát âm trong ứng dụng đã được trích chọn kỹ lưỡng,
khơng thay đổi nên trong giải pháp này sử dụng công nghệ Core Data cho nền tảng
di động iOS. Công nghệ Core Data cho phép lưu trữ cơ sở dữ liệu dễ dàng mà không
cần quản trị cơ sở dữ liệu trực tiếp như hình 6(a). Đồng thời Core Data cho phép

theo dõi các thay đổi và có thể khơi phục dữ liệu riêng lẻ, theo nhóm hoặc tất cả
cùng một lúc, giúp dễ dàng hỗ trợ các chức năng undo hoặc redo trong ứng dụng
như hình 6(b).

<b>3.2. Thiết kế xây dựng phần mềm trên nền tảng iOS </b>

Cơng việc chính của phần này là thiết kế và xây dựng phần mềm ứng dụng
trên nền tảng iOS tích hợp cơng nghệ nhận diện giọng nói đang được sử dụng trong
trợ lý ảo thông minh Siri của Apple. Phần mềm ứng dụng này dùng để hỗ trợ luyện
tập phát âm trên thiết bị di động với các chức năng chính như sau:

</div>
<span class='text_page_counter'>(17)</span><div class='page_container' data-page=17>

+ Cho phép người dùng xem danh sách các bài học của từng âm trong tiếng
Anh và lựa chọn bài học tương ứng.

+ Dựa vào âm được lựa chọn, cho phép người dùng xem lại cách phát âm của
âm.

+ Dựa vào âm được lựa chọn, cho phép người dùng xem danh sách các bài
luyện tập tương ứng và lựa chọn bài tập để luyện tập.

+ Cho phép người dùng lựa chọn chế độ luyện tập các từ chưa hoàn thành
hoặc luyện tập tất cả các từ trong bài tập.

+ Đối với từng từ luyện tập:

- cho phép người dùng xem phiên âm của từ;

- nghe cách phát âm mẫu của người nói tiếng Anh bản địa;

- kiểm tra việc phát âm từ đúng hay sai dựa vào cơng nghệ nhận diện
giọng nói.

+ Dựa vào kết quả của các bài tập, phần mềm tự động tổng hợp và cho phép
người dùng biết được kết quả chung về mức độ phát âm đối với bài học của từng
âm.

+ Cho phép người dùng thiết lập lại kết quả bài tập để luyện tập bài tập lại từ
đầu.

+ Cho phép người dùng thiết lập lại kết quả bài học của từng âm để luyện tập
bài học lại từ đầu.

</div>
<span class='text_page_counter'>(18)</span><div class='page_container' data-page=18>

Bảng IPA tiếng Anh chứa 44 âm (sounds) được biểu diễn trong hình 2. Trong
đó, có 20 nguyên âm (vowel sounds) và 24 phụ âm (consonant sounds). Khi người
dùng bắt đầu mở ứng dụng, màn hình ứng dụng sẽ phải cung cấp cho người dùng
khả năng lựa chọn bài học tương ứng với từng âm.

Trước khi lựa chọn âm luyện tập, người dùng có thể lựa chọn ngơn ngữ để
luyện tập bao gồm tiếng Anh – Anh và tiếng Anh – Mỹ như hình 8.

</div>
<span class='text_page_counter'>(19)</span><div class='page_container' data-page=19>

ứng dụng cho phép người dùng lựa chọn bài học tương ứng với 44 âm trong tiếng
Anh.

<b>Hình 9. Ứng dụng cho phép người dùng lựa chọn bài học từ danh sách tương ứng </b>

44 âm trong tiếng Anh

</div>
<span class='text_page_counter'>(20)</span><div class='page_container' data-page=20>

Sau khi người dùng chọn bài học, dựa vào âm tiếng Anh tương ứng của bài
học được lựa chọn, ứng dụng tự động chuyển người dùng đến màn hình có chức
năng xem cách phát âm. Tùy theo mỗi âm tiếng Anh sẽ có những hướng dẫn cách
phát âm cụ thể, trong đó có nhiều thành phần liên quan tác động đến cấu thành một
âm như môi, lưỡi, miệng, độ dài hơi, ... Tùy theo mức độ thông thạo cách phát âm
đối với âm trong tiếng Anh, người học có thể đọc kỹ hoặc bỏ qua phần hướng dẫn
phát âm này.

Ví dụ trong hình 10(a) hiển thị các cách phát âm của phụ âm /p/ mà ứng dụng
cung cấp cho người dùng. Đối với phụ âm /p/ sẽ có những hướng dẫn phát âm liên

quan như: cách tạo ra âm thanh như thế nào; âm thanh được tạo ra từ đâu; thanh âm
và độ dài của hơi; độ mạnh và cách đánh vần.

(a) (b)

và (b) - nguyên âm /ɔː/

</div>
<span class='text_page_counter'>(21)</span><div class='page_container' data-page=21>

Ngoài ra người dùng có thể lựa chọn nút Video để có thể xem video hướng
dẫn cách phát âm của các âm như hình 11. Sau khi xem xong video có thể chọn quay
lại phần mềm ứng dụng AI English 1 để tiếp tục luyện tập.

<b>Hình 11. Chức năng xem video hướng dẫn phát âm </b>

<b>3.2.3. Chức năng xem danh sách bài luyện tập </b>

Sau khi chắc chắn việc hiểu đầy đủ các hướng dẫn phát âm, người dùng có
thể chuyển sang chế độ luyện tập bằng cách chọn thẻ luyện tập (tab Practice). Khi
người dụng chọn chế độ luyện tập, ứng dụng sẽ hiển thị cho người dùng danh sách
các bài luyện tập của âm tương ứng.

Tùy theo âm được lựa chọn sẽ có các chế độ luyện tập như: luyện với âm nằm
đầu từ; luyện với âm nằm giữa từ; luyện với âm nằm cuối từ; luyện với âm theo sau
hoặc nằm trước các âm khác cần phải chú ý.

Trên màn hình ứng dụng lúc này sẽ hiển thị các thông tin cơ bản về bài luyện
tập như: tên bài luyện tập; mô tả ngắn gọn về bài luyện tập và mức độ hoàn thành

của người học đối với bài luyện tập (lúc mới dùng các mức độ hồn thành được thiết
lập 0%).

Ví dụ trong hình 12(a), đối với phụ âm /p/ sẽ có 4 bài luyện tập tương ứng
như:

+ bài luyện tập với những từ có phụ âm /p/ đứng đầu từ;

</div>
<span class='text_page_counter'>(22)</span><div class='page_container' data-page=22>

+ bài luyện tập với những từ có phụ âm /s/ đứng ngay trước phụ âm /p/;
+ bài luyện tập với những từ có phụ âm /p/ đứng cuối từ.

(a) (b)

<b>Hình 12. Ứng dụng hiển thị danh sách bài luyện tập tương ứng phụ âm /p/ </b>

Mặt khác trong hình 12(b), đối với phụ âm /t/ sẽ có các dạng bài luyện tập
khác, trong đó có 5 bài luyện tập tương ứng như:

+ bài luyện tập với những từ có phụ âm /t/ đứng đầu từ;

+ bài luyện tập với những từ có phụ âm /s/ đứng ngay trước phụ âm /t/;
+ bài luyện tập với những từ có phụ âm /t/ đứng giữa từ;

+ bài luyện tập với những từ có phụ âm /t/ đứng cuối từ;

+ bài luyện tập với những từ quá khứ đơn kết thúc bằng ‘ed’ được phát âm
thành phụ âm /t/.

<b>3.2.4. Chức năng chọn chế độ luyện tập </b>

</div>
<span class='text_page_counter'>(23)</span><div class='page_container' data-page=23>

Ngồi ra trên màn hình chức năng cho phép người học lựa chọn một trong hai
chế độ luyện tập sau:

+ Luyện tập với những từ chưa hoàn thành: chế độ này được sử dụng trong
trường hợp người học đã từng luyện tập với bài tập này, tuy nhiên vì một số lý do
nào đó có một số từ trong bài luyện tập người học chưa hoàn thành và muốn tiếp tục
hoàn thành bài luyện tập.

+ Luyện tập với tất cả các từ trong bài tập: chế độ này được sử dụng cho người
học mới bắt đầu với bài luyện tập hoặc người học mong muốn luyện tập lại với tất
cả các từ trong bài học.

Màn hình chức năng lựa chọn chế độ luyện tập tương ứng với bài tập được
biểu diễn trong hình 13.

<b>Hình 13. Ứng dụng cho phép lựa chọn chế độ luyện tập phù hợp </b>

<b>3.2.5. Chức năng luyện tập với từ đơn </b>

</div>
<span class='text_page_counter'>(24)</span><div class='page_container' data-page=24>

<b>Hình 14. Màn hình ứng dụng luyện tập với từ đơn </b>

Đối với mỗi từ, ứng dụng hiển thị nội dung từ và cho phép người học nghe
phát âm mẫu của người nói tiếng Anh bản địa (bấm vào biểu tượng loa) và xem
phiên âm của từ (ngay bên dưới từ đơn).

Ngoài ra ứng dụng cho phép người dùng luyện phát âm và sử dụng công cụ
nhận diện giọng nói (sử dụng chức năng Start answer) để xác định xem bản thân đã
phát âm chính xác từ dùng để luyện tập. Ứng dụng cho phép người học nhận kết quả

kiểm tra ngay và người học có thể tiếp tục thử phát âm lại trong trường hợp phát âm
chưa chính xác (hình 15).

</div>
<span class='text_page_counter'>(25)</span><div class='page_container' data-page=25>

Bên cạnh đó, ứng dụng cũng cho phép người học có thể tạm thời bỏ qua việc
luyện tập từ đơn (chức năng Skip). Sau khi người học luyện tập hết tất cả các từ
trong danh sách hoặc người học có thể kết thúc bài luyện tập sớm (chức năng Finish),
ứng dụng sẽ tự động chuyển sang màn hình kết quả của bài luyện tập.

Chức năng đánh giá phát âm của người học đối với từng từ đơn lẻ được thực
hiện bằng cách tích hợp vào phần mềm ứng dụng cơng nghệ nhận diện giọng nói của
Apple đang sử dụng trong trợ lý ảo thơng minh Siri. Trong đó ứng dụng đánh giá
phát âm của người học đối với một từ đơn lẻ qua sơ đồ trong hình 16.

<b>Hình 16. Đánh giá phát âm người học sử dụng công nghệ nhận diện giọng nói </b>

Đối với từ đơn lẻ, đầu tiên người học sẽ phát âm từ cần luyện tập, thiết bị di
động sẽ thu âm những gì người học thơng qua micro. Sau đó cơng nghệ nhận diện
giọng nói được áp dụng để nhận diện từ người dùng phát âm. Kết quả nhận diện
nhận được đầu tiên sẽ được so sánh với từ cần luyện tập và trả về kết qủa đánh giá
đối với từ phát âm của người học (hình 17).

<b>Hình 17. Sơ đồ đánh giá phát âm từ đơn lẻ của người học sử dụng công nghệ nhận </b>

</div>
<span class='text_page_counter'>(26)</span><div class='page_container' data-page=26>

<b>3.2.6. Chức năng tổng hợp kết quả luyện tập </b>

Sau khi người học hoàn thành việc luyện tập tất cả các từ trong bài tập. Ứng
dụng tự động tính tốn mức độ hoàn thành của người học và hiển thị thơng tin về
bài tập. Hình 18 biểu diễn giao diện tổng hợp kết quả của bài học bao gồm các thông
tin như: tên bài luyện tập; số từ đã hồn thành; mức độ hồn thành tính theo phần
trăm.

<b>Hình 18. Ứng dụng tự động tổng hợp kết quả luyện tập theo bài tập </b>

</div>
<span class='text_page_counter'>(27)</span><div class='page_container' data-page=27>

(a) (b)

<b>Hình 19. Kết quả luyện tập của người học: (a) – theo bài tập; (b) theo bài học </b>

<b>3.2.7. Chức năng thiết lập lại luyện tập </b>

Sau khi hoàn thành việc luyện tập, trong trường hợp người dùng mong muốn
luyện tập lại, người dùng có thể sử dụng chức năng thiết lập lại (Reset/Reset All)
của mục chỉnh sửa (Edit).

</div>
<span class='text_page_counter'>(28)</span><div class='page_container' data-page=28>

<b>Hình 20. Chế độ thiết lập lại bài luyện tập </b>

Trong danh sách bài học, khi người học lựa chọn chỉnh sửa một số bài học và
sử dụng chức năng Reset, ứng dụng sẽ thiết lập lại mức độ hoàn thành của các bài
học về 0%. Trong trường hợp người học lựa chọn chức năng Reset All, ứng dụng sẽ
thiết lập lại mức độ hoàn thành về 0% cho tất cả các bài học. Ví dụ về giao diện thiết
lập lại bài học được biểu diễn trong hình 21.

</div>
<span class='text_page_counter'>(29)</span><div class='page_container' data-page=29>

Nhóm tác giả của giải pháp đã hoàn thành việc xây dựng phần mềm ứng dụng
hỗ trợ luyện tập phát âm trên thiết bị di động áp dụng công nghệ nhận diện giọng
nói với các chức năng chính như sau:

+ Chuyển đổi nội dung luyện tập phát âm từ các nguồn sang dạng hệ thống
thông tin.

+ Cho phép người dùng xem danh sách các bài học của từng âm trong tiếng
Anh và lựa chọn bài học tương ứng.

+ Dựa vào âm được lựa chọn, cho phép người dùng xem lại cách phát âm của
âm.

+ Dựa vào âm được lựa chọn, cho phép người dùng xem danh sách các bài
luyện tập tương ứng và lựa chọn bài tập để luyện tập.

+ Cho phép người dùng lựa chọn chế độ luyện tập các từ chưa hoàn thành
hoặc luyện tập tất cả các từ trong bài tập.

+ Đối với từng từ luyện tập:

+ cho phép người dùng xem phiên âm của từ;

+ nghe cách phát âm mẫu của người nói tiếng Anh bản địa;

+ kiểm tra việc phát âm từ đúng hay sai dựa vào cơng nghệ nhận diện giọng
nói.

+ Dựa vào kết quả của các bài tập, phần mềm tự động tổng hợp và cho phép

người dùng biết được kết quả chung về mức độ phát âm đối với bài học của từng
âm.

+ Cho phép người dùng thiết lập lại kết quả bài tập để luyện tập bài tập lại từ
đầu.

</div>
<span class='text_page_counter'>(30)</span><div class='page_container' data-page=30>

+ Sản phẩm phần mềm ứng dụng của nhóm tác giả đã được đánh giá cao và
đạt giải nhì cuộc thi Sáng tạo khoa học kỹ thuật tỉnh Bà Rịa-Vũng Tàu năm
2018-2019.

Giải pháp dự thi là phần mềm ứng dụng trên thiết bị di động hỗ trợ luyện tập
phát âm tiếng Anh được tích hợp cơng nghệ nhận diện giọng nói đang được sử dụng
trong trợ lý ảo thông minh Siri của Apple. Việc này tạo nên sự khác biệt của giải
pháp so với các phần mềm ứng dụng luyện tập phát âm tiếng Anh khác đó là việc
cho phép người dùng thụ hưởng cơng nghệ nhận diện giọng nói. Từ đó người dùng
có thể giải quyết vấn đề tự nhận biết cách phát âm của người dùng là đúng hay sai.

Bằng việc tích hợp cơng nghệ nhận diện giọng nói, giải pháp dự thi tạo ra
phần mềm ứng dụng có thể đánh giá việc phát âm tiếng Anh của người dùng đối với

từng từ riêng lẻ, cũng như đánh giá tổng hợp theo bài luyện tập, theo âm luyện tập.
Từ đó giúp người dùng nhận biết những điểm còn yếu và có sự điều chỉnh phù hợp
để nâng cao khả năng phát âm tiếng Anh của bản thân.

</div>
<span class='text_page_counter'>(31)</span><div class='page_container' data-page=31>

<b>Hiệu quả kỹ thuật </b>

Giải pháp dự thi đã tích hợp cơng nghệ nhận diện giọng nói tạo ra cơng cụ hỗ
trợ luyện tập phát âm tiếng Anh trên thiết bị di động dành cho mọi đối tượng người
học. Đây là công cụ kỹ thuật hỗ trợ đắc lực trong các lớp học, thơng qua đó có thể
nâng cao khả năng phát âm tiếng Anh của các bạn học sinh, sinh viên cũng như các
đối tượng khác. Từ đó góp phần đưa công nghệ kỹ thuật hiện đại của thời kỳ cách
mạng công nghiệp lần thứ 4 vào ứng dụng trong đời sống, đặc biệt trong lĩnh vực
giáo dục và đào tạo.

Giải pháp dự thi giúp người học có thể luyện tập phát âm tiếng Anh miễn phí
trong mọi thời gian và thời điểm trên thiết bị di động. Điều này góp phần giúp người
học giải quyết vấn đề về hạn chế thời gian cũng như chi phí để tham gia các khóa
học luyện tập phát âm tiếng Anh.

Đối với một người học đơn lẻ, khi tham gia khóa học tiếng Anh sẽ phải tiêu
tốn chi phí ít nhất từ 3 triệu đồng cho một khóa học khoảng thời gian học tập trong
giai đoạn có hạn. Với trường hợp giải pháp được ứng dụng cho toàn bộ 5.000 sinh
viên trường đại học Bà Rịa-Vũng Tàu, khoản chi phí tiết kiệm được sẽ vào khoảng
1.5 tỷ đồng. Trong trường hợp giải pháp tiếp tục được áp dụng cho toàn bộ khoảng
30.000 học sinh trung học phổ thông trên địa bàn tỉnh Bà Rịa-Vũng Tàu, chi phí tiết
kiệm thêm của giải pháp sẽ vào khoảng 9 tỷ đồng.

</div>
<span class='text_page_counter'>(32)</span><div class='page_container' data-page=32>

Giải pháp dự thi giúp người học trong cộng động có thể tiếp cận và thụ hưởng
công nghệ tiên tiến trong đời sống xã hội, cụ thể là việc sử dụng cơng nghệ nhận
diện giọng nói tích hợp trên thiết bị di động vào việc hỗ trợ luyện tập phát âm tiếng
Anh. Việc này có thể giúp cộng đồng nâng cao chất lượng cuộc sống, nâng cao khả
năng tiếng Anh, từ đó đóng góp một phần vào khả năng hòa nhập của cộng đồng
trong thời kỳ hội nhập và tồn cầu hóa.

<b>Mức độ triển khai </b>

</div>
<span class='text_page_counter'>(33)</span><div class='page_container' data-page=33>

1. Juang B. H., Rabiner L. R. (2015) Automatic speech recognition–a
brief history of the technology development [Online]. Available:

2. Benesty J., Sondhi M. M., Huang Y., Springer Handbook of Speech
Processing, Springer Science & Business Media, 2008.

3. Jelinek F. (2015) Pioneering Speech Recognition [Online]. Available:

4. Huang X., Baker J., R. Reddy, A Historical Perspective of Speech
Recognition, Communications of the ACM, vol. 57, no. 1, pp. 94-103, 2014.

5. Hanazawa T., Hinton G., Shikano K., Lang K. J., “Phoneme

recognition using time-delay neural networks,” IEEE Transactions on Acoustics,
Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, 1989.

6. Wu J., Chan C., Isolated Word Recognition by Neural Network Models
with Cross-Correlation Coefficients for Speech Dynamics, IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1174-1185, 1993.

7. Zahorian S. A., Zimmer A. M., Meng F., Vowel Classification for
Computer based Visual Feedback for Speech Training for the Hearing Impaired,
ICSLP, 2002.

8. Hu H., Zahorian S. A., Dimensionality Reduction Methods for HMM
Phonetic Recognition, ICASSP, 2010.

9. Sak H., Senior A., Rao K., Beaufays F., Schalkwyk J., Google voice
search: faster and more accurate, Wayback Machine, 2016.

10. Fernandez S., Graves A., Hinton G., Sequence labelling in structured
domains with hierarchical recurrent neural networks, Proceedings of IJCAI, 2007.

11. Graves A., Mohamed A., Schmidhuber J., Speech recognition with
deep recurrent neural networks, ICASSP, 2013.

</div>
<span class='text_page_counter'>(34)</span><div class='page_container' data-page=34>

13. Yu D., Deng L., Dahl G., Roles of Pre-Training and Fine-Tuning in
Context-Dependent DBN-HMMs for Real-World Speech Recognition, NIPS
Workshop on Deep Learning and Unsupervised Feature Learning, 2010.

14. Dahl G. E., Yu D., Deng L., Acero A., Context-Dependent Pre-Trained
Deep Neural Networks for Large-Vocabulary Speech Recognition, IEEE
Transactions on Audio, Speech, and Signal Processing, vol. 20, no. 1, pp. 30-42,

2012.

15. Deng L., Li J., Huang J., Yao K., Yu D., Seide F., Recent Advances in
Deep Learning for Speech Research at Microsoft, ICASSP, 2013.

16. Jurafsky D., James H. M., Speech and Language Processing: An
Introduction to Natural Language Processing, Computational Linguistics, and
Speech Recognition, Stanford University, 2018.

17. Graves A., Towards End-to-End Speech Recognition with Recurrent
Neural Networks, ICML, 2014.

18. Yannis M. A., Brendan S., Shimon W. N., Nando de Freitas, LipNet:
End-to-End Sentence-level Lipreading, Cornell University, 2016.

19. Brendan S., Yannis A., Hoffman M. W. and others, Large-Scale Visual
Speech Recognition, Cornell University, 2018.

20. National Center for Technology Innovation (2010) Speech Recognition
for Learning [Online]. Available:

21. Follensbee B., McCloskey-Dale S., Speech recognition in schools: An
update from the field, Technology and Persons with Disabilities Conference, 2018.

22. Forgrave K. E., Assistive Technology: Empowering Students with
Disabilities, The Clearing House, vol. 7, no. 3, pp. 122-126, 2002.

</div>
<span class='text_page_counter'>(35)</span><div class='page_container' data-page=35></div>
<span class='text_page_counter'>(36)</span><div class='page_container' data-page=36>

Lecture Notes of the Institute

for Computer Sciences, Social Informatics

and Telecommunications Engineering

298

Editorial Board Members

Ozgur Akan

Middle East Technical University, Ankara, Turkey
Paolo Bellavista

University of Bologna, Bologna, Italy
Jiannong Cao

Hong Kong Polytechnic University, Hong Kong, China
Geoffrey Coulson

Lancaster University, Lancaster, UK
Falko Dressler

University of Erlangen, Erlangen, Germany
Domenico Ferrari

Università Cattolica Piacenza, Piacenza, Italy
Mario Gerla

UCLA, Los Angeles, USA
Hisashi Kobayashi

Princeton University, Princeton, USA
Sergio Palazzo

University of Catania, Catania, Italy
Sartaj Sahni

University of Florida, Gainesville, USA
Xuemin (Sherman) Shen

University of Waterloo, Waterloo, Canada
Mircea Stan

University of Virginia, Charlottesville, USA
Xiaohua Jia

City University of Hong Kong, Kowloon, Hong Kong
Albert Y. Zomaya

</div>
<span class='text_page_counter'>(37)</span><div class='page_container' data-page=37></div>
<span class='text_page_counter'>(38)</span><div class='page_container' data-page=38>

Phan Cong Vinh

•

Abdur Rakib (Eds.)

Context-Aware Systems

and Applications,

and Nature of Computation

and Communication

</div>
<span class='text_page_counter'>(39)</span><div class='page_container' data-page=39>

Editors

Phan Cong Vinh

Nguyen Tat Thanh University
Ho Chi Minh City, Vietnam

Abdur Rakib

The University of the West of England
Bristol, UK

ISSN 1867-8211 ISSN 1867-822X (electronic)
Lecture Notes of the Institute for Computer Sciences, Social Informatics
and Telecommunications Engineering

ISBN 978-3-030-34364-4 ISBN 978-3-030-34365-1 (eBook)

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional afﬁliations.

</div>
<span class='text_page_counter'>(40)</span><div class='page_container' data-page=40>

Preface

The 8th EAI International Conference on Context-Aware Systems and Applications
(ICCASA 2019) and the 5th EAI International Conference on Nature of Computation
and Communication (ICTCC 2019) are international scientiﬁc events for research in
theﬁeld of smart computing and communication. These two conferences were jointly
held during November 28–29, 2019, in My Tho City, Vietnam. The aim, for both
conferences, is to provide an internationally respected forum for scientiﬁc research in
the technologies and applications of smart computing and communication. These
conferences provide an excellent opportunity for researchers to discuss modern
approaches and techniques for smart computing systems and their applications. The
proceedings of ICCASA 2019 and ICTCC 2019 are published by Springer in the
Lecture Notes of the Institute for Computer Sciences, Social Informatics and
Telecommunications Engineering series (LNICST; indexed by DBLP, EI, Google
Scholar, Scopus, Thomson ISI).

For this eighth edition of ICCASA andﬁfth edition of ICTCC, repeating the success
of the previous years, the Program Committee received submissions from 12 countries
and each paper was reviewed by at least three expert reviewers. We chose 20 papers
after intensive discussions held among the Program Committee members. We
appreciate the excellent reviews and lively discussions of the Program Committee
members and external reviewers in the review process. This year we had three
prominent invited speakers, Prof. Herwig Unger from Fern Universität in Hagen,
Germany, Prof. Phayung Meesad from King Mongkut’s University of Technology
North Bangkok (KMUTNB) in Thailand, and Prof. Waralak V. Siricharoen from
Silpakorn University in Thailand.

ICCASA 2019 and ICTCC 2019 were jointly organized by The European Alliance
for Innovation (EAI), Tien Giang University (TGU), and Nguyen Tat Thanh University
(NTTU). These conferences could not have been organized without the strong support

of the staff members of these three organizations. We would especially like to thank
Prof. Imrich Chlamtac (University of Trento), Lukas Skolek (EAI), and Martin
Karbovanec (EAI) for their great help in organizing the conferences. We also
appre-ciate the gentle guidance and help from Prof. Nguyen Manh Hung, Chairman and
Rector of NTTU, and Prof. Vo Ngoc Ha, Rector of TGU.

November 2019 Phan Cong Vinh

</div>
<span class='text_page_counter'>(41)</span><div class='page_container' data-page=41>

Organization

Steering Committee

Imrich Chlamtac (Chair) University of Trento, Italy

Phan Cong Vinh Nguyen Tat Thanh University, Vietnam
Thanos Vasilakos Kuwait University, Kuwait

Organizing Committee

Honorary General Chairs

Vo Ngoc Ha Tien Giang University, Vietnam

Nguyen Manh Hung Nguyen Tat Thanh University, Vietnam

General Chair

Phan Cong Vinh Nguyen Tat Thanh University, Vietnam

Program Chairs

Abdur Rakib The University of the West of England, UK

Vangalur Alagar Concordia University, Canada

Publications Chair

Phan Cong Vinh Nguyen Tat Thanh University, Vietnam

Publicity and Social Media Chair

Cao Nguyen Thi Tien Giang University, Vietnam

Workshop Chair

Nguyen Ngoc Long Tien Giang University, Vietnam

Sponsorship and Exhibits Chair

Bach Long Giang Nguyen Tat Thanh University, Vietnam

Local Chair

Duong Van Hieu Tien Giang University, Vietnam

Web Chair

</div>
<span class='text_page_counter'>(42)</span><div class='page_container' data-page=42>

Technical Program Committee

Chernyi Sergei Admiral Makarov State University of Maritime

and Inland Shipping, Russia

Chien-Chih Yu National ChengChi University, Taiwan

David Sundaram The University of Auckland, New Zealand
Duong Van Hieu Tien Giang University, Vietnam

Franỗois Siewe De Montfort University, UK

Gabrielle Peko The University of Auckland, New Zealand
Giacomo Cabri University of Modena and Reggio Emilia, Italy
Haﬁz Mahfooz Ul Haque University of Lahore, Pakistan

Huynh Trung Hieu Industrial University of Ho Chi Minh City, Vietnam
Huynh Xuan Hiep Can Tho University, Vietnam

Ijaz Uddin The University of Nottingham, UK

Iqbal Sarker Swinburne University of Technology, Australia

Issam Damaj The American University of Kuwait, Kuwait

Krishna Asawa Jaypee Institute of Information Technology, India

Kurt Geihs University of Kassel, Germany

Le Hong Anh University of Mining and Geology, Vietnam

Le Nguyen Quoc Khanh Nanyang Technological University, Singapore

Manisha Chawla Google, India

Muhammad Athar Javed
Sethi

University of Engineering and Technology
(UET) Peshawar, Pakistan

Nguyen Duc Cuong Ho Chi Minh City University of Foreign Languages–
Information Technology, Vietnam

Nguyen Hoang Thuan Can Tho University of Technology, Vietnam
Nguyen Manh Duc University of Ulsan, South Korea

Nguyen Thanh Binh Ho Chi Minh City University of Technology, Vietnam
Ondrej Krejcar University of Hradec Kralove, Czech Republic
Pham Quoc Cuong Ho Chi Minh City University of Technology, Vietnam
Prashant Vats Fairﬁeld Institute of Management & Technology

in Delhi, India

Rana Mukherji The ICFAI University Jaipur, India

Tran Huu Tam University of Kassel, Germany

Tran Vinh Phuoc Ho Chi Minh City Open University, Vietnam
Vijayakumar Ponnusamy SRM IST, India

Waralak V. Siricharoen Silpakorn University, Thailand

Zhu Huibiao East China Normal University, China

</div>
<span class='text_page_counter'>(43)</span><div class='page_container' data-page=43>

Contents

ICCASA 2019

Declarative Approach to Model Checking for Context-Aware Applications. . . 3
Ammar Alsaig, Vangalur Alagar, and Nematollaah Shiri

Planquarium: A Context-Aware Rule-Based Indoor Kitchen Garden . . . 11
Rahat Khan, Altaf Uddin, Ijaz Uddin, Rashid Naseem,

and Arshad Ahmad

Text to Code: Pseudo Code Generation . . . 20
Altaf U. Din and Awais Adnan

Context-Aware Mobility Based onp-Calculus in Internet of Thing:

A Survey . . . 38
Vu Tuan Anh, Pham Quoc Cuong, and Phan Cong Vinh

High-Throughput Machine Learning Approaches for Network Attacks

Detection on FPGA . . . 47
Duc-Minh Ngo, Binh Tran-Thanh, Truong Dang, Tuan Tran,

Tran Ngoc Thinh, and Cuong Pham-Quoc

IoT-Based Air-Pollution Hazard Maps Systems for Ho Chi Minh City. . . 61

Phuc-Anh Nguyen, Tan-Ri Le, Phuc-Loc Nguyen,

and Cuong Pham-Quoc

Integrating Retinal Variables into Graph Visualizing Multivariate Data

to Increase Visual Features . . . 74
Hong Thi Nguyen, Lieu Thi Le, Cam Thi Ngoc Huynh,

Thuan Thi My Pham, Anh Thi Van Tran, Dang Van Pham,
and Phuoc Vinh Tran

An Approach of Taxonomy of Multidimensional Cubes Representing

Visually Multivariable Data . . . 90
Hong Thi Nguyen, Truong Xuan Le, Phuoc Vinh Tran,

and Dang Van Pham

A System and Model of Visual Data Analytics Related to Junior High

</div>
<span class='text_page_counter'>(44)</span><div class='page_container' data-page=44>

CDNN Model for Insect Classification Based on Deep Neural

Network Approach . . . 127
Hiep Xuan Huynh, Duy Bao Lam, Tu Van Ho, Diem Thi Le,

and Ly Minh Le

Predicting of Flooding in the Mekong Delta Using Satellite Images. . . 143
Hiep Xuan Huynh, Tran Tu Thi Loi, Toan Phung Huynh, Son Van Tran,

Thu Ngoc Thi Nguyen, and Simona Niculescu

Development English Pronunciation Practicing System Based

on Speech Recognition . . . 157
Ngoc Hoang Phan, Thi Thu Trang Bui, and V. G. Spitsyn

Document Classification by Using Hybrid Deep Learning Approach . . . 167
Bui Thanh Hung

A FCA-Based Concept Clustering Recommender System . . . 178
G. Chemmalar Selvi, G. G. Lakshmi Priya, and Rose Bindu Joseph

Hedge Algebra Approach for Semantics-Based Algorithm to Improve

Result of Time Series Forecasting . . . 188
Loc Vuminh, Dung Vuhoang, Dung Quachanh, and Yen Phamthe

ICTCC 2019

Post-quantum Commutative Encryption Algorithm . . . 205
Dmitriy N. Moldovyan, Alexandr A. Moldovyan, Han Ngoc Phieu,

and Minh Hieu Nguyen

Toward Aggregating Fuzzy Graphs a Model Theory Approach . . . 215
Nguyen Van Han, Nguyen Cong Hao, and Phan Cong Vinh

An Android Business Card Reader Based on Google Vision: Design

and Evaluation . . . 223
Nguyen Hoang Thuan, Dinh Thanh Nhan, Lam Thanh Toan,

Nguyen Xuan Ha Giang, and Quoc Bao Truong

Predicted Concentration TSS (Total Suspended Solids) Pollution
for Water Quality at the Time: A Case Study of Tan Hiep Station

in Dong Nai River . . . 237
Cong Nhut Nguyen

Applying Geostatistics to Predict Dissolvent Oxygen (DO) in Water on the

Rivers in Ho Chi Minh City . . . 247

</div>
<span class='text_page_counter'>(45)</span><div class='page_container' data-page=45>

<b>Development English Pronunciation Practicing System </b>
<b>Based on Speech Recognition </b>

Phan Ngoc Hoang*, Bui Thi Thu Trang* and Spitsyn V.G.**

*Ba Ria-Vung Tau University, 80, Truong Cong Dinh, Vung Tau, Ba Ria-Vung Tau,
Vietnam

** National Research Tomsk Polytechnic University, 30, Lenin Avenue, Tomsk, Russia
{hoangpn285,trangbt.084}@gmail.com

{hoangpn,trangbtt}@bvu.edu.vn
{spvg}@tpu.ru

<b>Abstract. The relevance of the research is caused by the need of application of </b>

speech recognition technology for language teaching. The speech recognition is
one of the most important tasks of the signal processing and pattern recognition
fields. The speech recognition technology allows computers to understand human
speech and it plays very important role in people’s lives. This technology can be
used to help people in a variety way such as controlling smart homes and devices;
using robots to perform job interviews; converting audio into text, etc. But there
are not many applications of speech recognition technology in education,
especially in English teaching. The main aim of the research is to propose an
algorithm in which speech recognition technology is used English language
teaching. Objects of researches are speech recognition technologies and
frameworks, English spoken sounds system. Research results: The authors have
proposed an algorithm based on speech recognition framework for English
pronunciation learning. This proposed algorithm can be applied to another speech
recognition framework and different languages. Besides the authors also
demonstrated how to use the proposed algorithm for development English
pronunciation practicing system based on iOS mobile app platform. The system
also allows language learners can practice English pronunciation anywhere and
anytime without any purchase.

<b>Keywords: Speech recognition, English pronunciation, Hidden Markov Models, </b>

Neural networks, mobile application.

<b>1 Introduction </b>

<b>1.1 Speech recognition technology </b>

</div>
<span class='text_page_counter'>(46)</span><div class='page_container' data-page=46>

In the 1970’s the key technologies of speech recognition were the pattern recognition

models, spectral representation using LPC methods, speaker-independent recognizers
using pattern clustering methods and dynamic programming methods for connected
word recognition. During this time, we able to recognize medium vocabularies
(100-1000 words) using simple template-based and pattern recognition methods [1].

In the 1980’s the speech recognition technology started to solve the problems of
large vocabulary (1000 – unlimited number of words) using statistical methods and
neural networks for handling language structures. The important technologies used in
this time were the Hidden Markov Model (HMM) and stochastic language model [1].
Using HMMs allowed to combine different knowledge sources such as acoustics,
language, and syntax, in a unified probabilistic model.

In the 1990’s the key technologies of speech recognition were stochastic language
understanding methods, statistical learning of acoustic and language models, finite state
transducer framework and FSM library. In this time speech recognition technology
allow us to build large vocabulary systems using unconstrained language models and
constrained task syntax models for continuous speech recognition and understanding
[1].

In the last few years, the speech recognition technology can handle with very large
vocabulary systems based on full semantic models, integrated with text-to-speech
(TTS) synthesis systems, and multi-modal inputs. In this time, the key technologies
were highly natural concatenative speech synthesis systems, machine learning to
improve both speeches understanding and speech dialogs [1].

<b>1.2 Key speech recognition methods </b>

<b>Dynamic time warping (DTW) </b>

Dynamic time warping (DTW) is an approach that was historically used for speech

recognition. This method is used to recognize about 200-word vocabulary [2]. DTW
divide speech into short frames (e.g. 10ms segments) and then it processes each frame
as a single unit. During the time of DTW, achieving speaker independence remained
unsolved. DTW was applied for automatic speech recognition to cope with different
speaking speeds. It allows to find an optimal match between two given sequences (e.g.,
time series) with certain restrictions.

<b>Hidden Markov Models (HMM) </b>

DTW has been displaced by the more successful Hidden Markov Models-based
approach. HMMs are statistical models that output a sequence of symbols or quantities.
In HMMs a speech signal can be a piecewise stationary signal or a short-time stationary
signal. And speech can be approximated as a stationary process in a short time-scale
(e.g., 10 milliseconds).

</div>
<span class='text_page_counter'>(47)</span><div class='page_container' data-page=47>

<b>Neural networks </b>

Neural networks have been used in speech recognition to solve many problems such
as phoneme classification, isolated word recognition, audiovisual speech recognition,
audiovisual speaker recognition and speaker adaptation [5, 6].

By comparing with HMMs, neural networks make fewer explicit assumptions about
feature statistical properties. Neural networks allow discriminative training in a natural
and efficient manner, so they are effectiveness in classifying short-time units such as
individual phonemes and isolated words [7]. However, because of their limited ability
to model temporal dependencies, neural networks are not successfully used for
continuous speech recognition.

To solve this problem, neural networks are used to pre-process speech signal (e.g.
feature transformation or dimensionality reduction) and then use HMM to recognize

speech based on the features received from neural networks [8]. In recently, related
Recurrent Neural Networks (RNNs) have showed an improved performance in speech
recognition [9–11].

Like shallow neural networks, Deep Neural Networks (DNNs) can used to model
complex non-linear relationships. The architectures of these DNNs generate
compositional models, so DNNs have a huge learning capacity and they are potential
for modeling complex patterns of speech data [12]. In 2010, the DNN with the large
output layers based on context dependent HMM states constructed by decision trees
have been successfully applied in large vocabulary speech recognition [13–15].

<b>End-to-end automatic speech recognition </b>

Traditional HMM-based approaches required separate components and training for
the pronunciation, acoustic and language model. And a typical n-gram language model,
required for all HMM-based systems, often takes several gigabytes memory to deploy
them on mobile devices [16]. However, since 2014 end-to-end ASR models jointly
learn all the components of the speech. It allows to simplify the training and deployment
process. Because of that, the modern commercial ASR systems from Google and Apple
are deployed on the cloud.

Connectionist Temporal Classification (CTC) based systems was the first
end-to-end ASR and introduced by Alex Graves of Google DeepMind and Navdeep Jaitly of
the University of Toronto in 2014 [17]. In 2016, University of Oxford presented LipNet
using spatiotemporal convolutions coupled with an RNN-CTC architecture. It was the
first end-to-end sentence-level lip reading model. And it was better than human-level
performance in a restricted grammar dataset [18]. In 2018 Google DeepMind presented
a large-scale CNN-RNN-CTC architecture. In the results this system achieved 6 times
better performance than human experts [19].

<b>1.3 Speech recognition applications </b>

</div>
<span class='text_page_counter'>(48)</span><div class='page_container' data-page=48>

For education, speech recognition technology can be used to help students who are
blind or have very low vision. They can use computer by using voice commands instead
of having a look at the screen and keyboard [20]. Besides, students who are physically
disabled or suffer from injuries having difficulty in writing, typing or working can
benefit from using this technology. They can use speech-to-text programs to do their
homework or school assignments [21]. Speech recognition technology can allow
students to become better writers. They can improve the fluidity of their writing by
using speech-to-text programs. When they say to computer, they don’t worry about
spelling, punctuation, and other mechanics of writing [21]. In addition, speech
recognition technology can be useful for language learning. They can teach people
proper pronunciation and help them to develop their speaking skills [22].

Recently, all people have their own mobile devices and they can use them anywhere,
anytime. Most of mobile apps and devices runs on two main operating systems: iOS
and Android OS. These operating systems are equipped with the best speech
recognition technology developed by Google or Apple. There are many mobile apps
that use these speech recognition technologies for playing games, controlling devices,
making phone calls, sending text messages etc.

There are also many software applications to practice English pronunciation on
mobile devices. By using these support tools, learners can record all what they say and
compare with sample pronunciation of native speakers to correct errors. The
applications often display the pronunciation of words, allowing learners to listen to
sample pronunciation, then the learners will record their pronunciation and compare
themselves with the sample pronunciation. The application has not integrated the voice
recognition feature into the software to test the learner's pronunciation.

Because of that, building a mobile app using speech recognition technologies for

language pronunciation learning is urgent and perspective. In this paper we present an
algorithm that use speech recognition technology to help people determine if they
properly pronounce an English sound. The proposed algorithm is used for building
mobile app based on speech recognition technology. This algorithm is tested

<b>2 Proposed algorithm </b>

In this paper, we propose an algorithm based on speech recognition framework for
English pronunciation learning. The framework used to test proposed algorithm in this
paper is Apple speech recognition technology [23]. Besides, in this paper we
demonstrate how to use the proposed algorithm for development English pronunciation
practicing system based on iOS mobile app platform. This proposed algorithm can be
applied to another speech recognition framework (e.g. Google speech recognition) and
different languages.

</div>
<span class='text_page_counter'>(49)</span><div class='page_container' data-page=49>

<b>2.1 </b> <b>Apple speech recognition technology </b>

The Apple speech recognition framework allow to recognize spoken words in recorded
or live audio. It can be used to translate audio content to text, handle recognize verbal
commands etc. The framework is fast and works in near real time. Besides the
framework is accurate and can interpret over 50 languages and dialects [23]. The
process of speech recognition task using Apple technology can be presented in Fig. 1.

<b>Fig. 1. Process of speech recognition task on speech recognition framework. </b>

Audio Input is an audio source from which transcription should occur. Audio source
can be read from recorded audio file or can be captured audio content, such as audio
from the device’s microphone. The audio input is then sent to Recognizer that is used
to check for the availability of the speech recognition service, and to initiate the speech
recognition process. At the end, the process gives the partial or final results of speech

recognition [23].

<b>2.2 One-word pronunciation assessment </b>

Based on this speech recognition framework, we propose an algorithm to assess the
language learner’s pronunciation. The process of pronunciation assessment for one
word is presented in Fig. 2.

</div>
<span class='text_page_counter'>(50)</span><div class='page_container' data-page=50>

At first the language learner pronounces a word which is used to practice
pronunciation. Then the learner’s pronunciation is handled by speech recognition
framework which gives the recognition result. After that, the recognition result is
compared with target word to determine if the learner correctly pronounce the target
word (Fig. 3).

<b>Fig. 3. Learner’s pronunciation assessment for one word </b>

<b>2.3 One sound pronunciation assessment </b>

In order to assess one sound pronunciation, we need to assess the pronunciations of
list of words which contain the target sound. The process of pronunciation assessment
for one sound can be then presented in Figure 4.

</div>
<span class='text_page_counter'>(51)</span><div class='page_container' data-page=51>

At first the language learner pronounces one word of the list which contains the
sound used to practice pronunciation. Then the learner’s pronunciation is handled by
recognition process. After that the recognition result are processed by pronunciation
asserting. The language learner repeats these steps for other words of the list until all
words of the list have been pronounced. Based on the pronunciation results of words in
the list, we can calculate the sound pronunciation fluency of the language learner by
following formula:

Sound pronunciation fluency = Total number of correctly pronounced words / Total
number of words in the list

<b>2.4 English pronunciation practicing system </b>

The English language contains 44 sounds divided into three main groups: vowels
(12 sounds), diphthongs (8 sounds) and consonants (24 sounds). The vowel sounds
consist of two sub-groups: long sounds and short sounds. The consonant sounds consist
of three sub-groups: voiced consonants, voiceless consonants and other consonants.
The phonemic chart of 44 English spoken sounds is presented in Table 1.

Based on the phonemic chart of spoken English sounds, proposed algorithm for word
and sound pronunciation asserting, we developed an iOS app for English pronunciation
practicing system. The main aim of this system is to allow language learners can know
if they correctly pronounce English sounds. Based on the results, provided by this
system, language learners will have proper adjustment to improve their English
pronunciation. Besides the app allows language learners can freely practice
pronunciation anywhere and anytime.

<b>Table 1. Phonemic chart English sounds </b>

<b>English </b>
<b>sounds </b>

<b>Vowels </b>

<b>Short </b>

<b>sounds </b> ɪ e æ ʌ ʊ ə ɒ

<b>sounds </b> i: ɜ: u: ɔ: ɑ:

<b>Diphthongs </b> eɪ ɔɪ aɪ eə ɪə ʊə əʊ aʊ

<b>Consonants </b>

<b>Voiceless </b>

<b>consonants </b> p f θ t s ʃ ʧ k

<b>Voiced </b>

<b>consonants </b> b v ð d z ʒ ʤ g

<b>Other </b> m n ŋ h w l r j

</div>
<span class='text_page_counter'>(52)</span><div class='page_container' data-page=52>

<b>Fig. 5. English pronunciation practicing system: a) list of lessons, b) examples of exercise types </b>

of sound p.

The language learners must practice with all words in the list of exercise, and then
the system will automatic give recognition and pronunciation results according each
word (Fig. 6). After that the system calculates the pronunciation fluency for each sound
and shows the results to the language learners (Fig. 7).

<b>a b c</b>

<b>Fig. 6. Example of one practice: a) practice overview and mode, b) practice answer mode, c) </b>

</div>
<span class='text_page_counter'>(53)</span><div class='page_container' data-page=53>

<b>a b c</b>

<b>Fig. 7. Example of pronunciation assessment: a) pronunciation result for one practice, </b>

b) pronunciation result for practices, c) pronunciation result for sound.

<b>3 Conclusion </b>

In this paper, we propose an algorithm based on speech recognition framework for
English pronunciation learning. This proposed algorithm can be applied to another
speech recognition framework (e.g. Google speech recognition) and different
languages. Besides we also demonstrate how to use the proposed algorithm for
development English pronunciation practicing system based on iOS mobile app
platform.

This system allows language learners can determine if they correctly pronounce
English sounds. Based on these results, the language learners will have proper
adjustment to improve their English pronunciation. The system also allows language
learners can practice English pronunciation anywhere and anytime without any
purchase, which they can not do in the classroom.

<b>References </b>

1. Juang B. H., Rabiner L. R. (2015) Automatic speech recognition–a brief history of the
technology development [Online]. Available:

</div>
<span class='text_page_counter'>(54)</span><div class='page_container' data-page=54>

3. Jelinek F. (2015) Pioneering Speech Recognition [Online]. Available:

4. Huang X., Baker J., R. Reddy, A Historical Perspective of Speech Recognition,
Communications of the ACM, vol. 57, no. 1, pp. 94-103, 2014.

5. Hanazawa T., Hinton G., Shikano K., Lang K. J., “Phoneme recognition using time-delay
neural networks,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37,
no. 3, pp. 328-339, 1989.

6. Wu J., Chan C., Isolated Word Recognition by Neural Network Models with
Cross-Correlation Coefficients for Speech Dynamics, IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 15, no. 11, pp. 1174-1185, 1993.

7. Zahorian S. A., Zimmer A. M., Meng F., Vowel Classification for Computer based Visual
Feedback for Speech Training for the Hearing Impaired, ICSLP, 2002.

8. Hu H., Zahorian S. A., Dimensionality Reduction Methods for HMM Phonetic Recognition,
ICASSP, 2010.

9. Sak H., Senior A., Rao K., Beaufays F., Schalkwyk J., Google voice search: faster and more
accurate, Wayback Machine, 2016.

10. Fernandez S., Graves A., Hinton G., Sequence labelling in structured domains with
hierarchical recurrent neural networks, Proceedings of IJCAI, 2007.

11. Graves A., Mohamed A., Schmidhuber J., Speech recognition with deep recurrent neural

networks, ICASSP, 2013.

12. Deng L., Yu D., Deep Learning: Methods and Applications, Foundations and Trends in
Signal Processing, vol. 7, no. 3, pp. 197-387, 2014.

13. Yu D., Deng L., Dahl G., Roles of Pre-Training and Fine-Tuning in Context-Dependent
DBN-HMMs for Real-World Speech Recognition, NIPS Workshop on Deep Learning and
Unsupervised Feature Learning, 2010.

14. Dahl G. E., Yu D., Deng L., Acero A., Context-Dependent Pre-Trained Deep Neural
Networks for Large-Vocabulary Speech Recognition, IEEE Transactions on Audio, Speech,
and Signal Processing, vol. 20, no. 1, pp. 30-42, 2012.

15. Deng L., Li J., Huang J., Yao K., Yu D., Seide F., Recent Advances in Deep Learning for
Speech Research at Microsoft, ICASSP, 2013.

16. Jurafsky D., James H. M., Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition, Stanford
University, 2018.

17. Graves A., Towards End-to-End Speech Recognition with Recurrent Neural Networks,
ICML, 2014.

18. Yannis M. A., Brendan S., Shimon W. N., Nando de Freitas, LipNet: End-to-End
Sentence-level Lipreading, Cornell University, 2016.

19. Brendan S., Yannis A., Hoffman M. W. and others, Large-Scale Visual Speech Recognition,
Cornell University, 2018.

20. National Center for Technology Innovation (2010) Speech Recognition for Learning

[Online]. Available:

21. Follensbee B., McCloskey-Dale S., Speech recognition in schools: An update from the field,
Technology and Persons with Disabilities Conference, 2018.

22. Forgrave K. E., Assistive Technology: Empowering Students with Disabilities, The Clearing
House, vol. 7, no. 3, pp. 122-126, 2002.

</div>


<a href=' /><a href=' /><a href=' /> 043_Nghiên cứu công nghệ J2ME và ứng dụng phát triển các dịch vụ giải trí trên thiết bị di động