Integration of Speech Recognitionbased Caption Editing System with Presentation Software

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.98 MB, 22 trang )

Integration of Speech Recognition-based
Caption Editing System with
Presentation Software
HV: Bùi Văn Chung
Nguy n Qu c Uyễ ố
1
contents
1. Introduction
2. Preliminary Survey and Investigation
3. Problems and Apparatus
4. Results
5. Summary
2
1. Introduction
-
Recently an increasing amount of e-
Learning material including audio and
presentation slides is being provided
through the Internet or private networks
referred to as intranets.
-
Many hearing impaired people and senior
citizens require captioning to understand
such content.
1.1 Background
3
-
The paper introduce the method of “IBM Caption
Editing System with Presentation Integration
(hereafter CESPI)” which is an extension to IBM
Caption Editing System (hereafter CES). CESPI

completely includes all the functions within CES, but
is further extended to include the presentation
integration functions.
-
CES encapsulates the speech recognition engine
for transcribing audio into text (CES Recorder)
and also allows various editing features for error
correction (CES Master and CES Client). As shown
in Figure 1,
4
-
CESPI integrates presentation software in various
ways for both the CES Recorder and the CES
Master System
5
Fig. 2. The sample output of CESPI is shown.
Presentation slide image is on the left hand side, video
image is on the upper right hand and the caption is on
the lower right hand side.
6
- We also showed how the caption editing steps can be
improved using three major concepts. The three concepts
were “complete audio synchronization”, “completely
automatic audio control”, and “status marking”.
- In CES, the output phrases (as candidate caption lines) from
the voice recognition engine are laid out vertically as
individual lines along with timestamps. “Complete audio
synchronization” means that the keyboard focus always matches
the audio replay position.
7

- The second concept of “completely automatic audio
control”, means that the audio is fully controlled automatically
by the system. Users are not required to “replay” and “stop”
the audio manually (usually a huge number of times). As the
editing begins, the focus is set on the initial series of words,
and the audio which is associated to that portion is
replayed automatically
- The last concept is “status marking”. The unverified
lines are automatically distinguished from the corrected lines
as shown in Figure 3,in CES, each caption line includes a
button which is used to mark the status of each caption line
8
Fig. 3. The sample image of CES is shown.
9
Fig. 4. The figure shows how the caption
editing task using the CES. All the audio
processing is automatic and user merely needs to
focus on making the necessary correction.
10
-
Presentation software provides many useful features to
easily create effective e-Learning contents by the following
2 steps.
1. Prepare presentation file by combination of text,
pictures, visual layout, and any other provided feature.
2. Make oral presentation using the slide showfeature of the
presentation software. At the same time record the movie
by any video camera and/or oral presentation audio.
11
2. Preliminary Survey and Investigation

- The results as shown in Table 1, showed that 66.3% found the
multimedia composite either "Strongly Agree” or "Agree",
irrelevant of age group. Sowe concluded that a multimedia
composite is very useful for better understanding in e-Learning.
12
3. Problems and Apparatus
- Based on the preliminary survey and investigation, we
investigated the available caption editing tools that generate
captions from audio, and identified 3 major problems. The
three major problems between CES and presentation software
were identified as “Content Layout Definitions”, “Editing
Focus Linkage”, and “Exporting to Speaker Notes”
-
To address these problems, we extended our Caption
Editing System (CES) to integrate it with Microsoft
PowerPoint, creating our new Caption Editing System with
PresentationIntegration (CESPI). The architecture in terms of
code interface is shown in Figure 5.
13
3. Problems and Apparatus
Fig. 5. The base platform is Microsoft Windows 2000/XP. User
Interface of CESPI is built on Visual Basic V6.0. IBM ViaVoice engine
control is implemented by Microsoft Visual C++ 6.0. The interface
between ViaVoice and CESPI isSpeech Manager API (SMAPI) V7.0.
Also, the interface between CESPI and Microsoft PowerPoint is
Visual Basic for Application (VBA) V6.0.
14
3. Problems and Apparatus
Fig. 7. The figure shows the Change Content Layout
dialog on the left hand side and the

Select Layout Video + PPT + Caption dialog with the
focus on the right hand side
15
3. Problems and Apparatus
3.1 Editing Focus Linkage
Fig. 8.
16
3. Problems and Apparatus
3.2 Speaker Notes Export
Fig. 9. Master caption is exported into the speaker notes portion
of the presentation. The speaker notes can be referenced to the client
caption.
17
4. Results
An experiment was performed to measure the editing time
under the following conditions.
1. Editors are to use CES and CESPI for an approximately 30
minutes of content each.
2. It is known that as you get used to 5 editors who already
have enough experience with CES and CESPI were chosen
to eliminate any inconsistencies due to the learning curve
effect(Barloff 1971).
3. Each editor was also assigned different portions of the
content for CES and CESPI so that memory from the previous
content will not take effect.
4. Task consists of correcting all the speech recognition
errors, laying out the multimedia composite without each
overlapping or excessive blank space, and exporting the
speaker notes to the appropriate page.
18

4. Results
19
4. Results
As shown in Table 3, the results showed that CESPI provided
a 37.6% improvement in total editing time.
20
4. Results
Fig. 10. Figure shows that out of the improvement of editing
time shown in Table 2, 50.3% accounted for Content Layout
Definition, 31.1% accounted for Editing Focus Linkage, 18.6%
for Speaker Notes Export.
21
5. Summary
- The three major problems between CES and
presentation software were identified as “Content Layout
Definitions”, “Editing Focus Linkage”, and “Exporting to
Speaker Notes”. This paper has shown how CESPI solves
each of these problems. And experiment showed a 37.6%
efficiency improvement compared with the previous
method. Among the 3 items “Content Layout Definition”
accounted for the most improvement in time, followed
by “Editing Focus Linkage” and “Speaker Notes Export”
came last.
- Currently CESPI only supports Microsoft PowerPoint as
the choice of presentation software. Future work item will
be to support other presentation software.
22

Integration of Speech Recognitionbased Caption Editing System with Presentation Software

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về