Tải bản đầy đủ (.pdf) (125 trang)

iSSE an intelligent engine for user oriented web service selection

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.02 MB, 125 trang )

Acknowledgements

First of all, I feel deeply indebted to my supervisors Associate Professor Pung
Hung Keng and Dr. Ng See-Kiong, without whom the completion of this thesis
could not have been possible. I would like to take this opportunity to express my
deepest appreciation and sincere gratitude to them for their inspiring guidance,
advice and kindly patience.
I am grateful to the team-workers in M-Comm group of NSS (Network
Systems & Services) lab for their innovative thinking and works during the
project. I am also grateful to Haman Lee, Zhang Zhuo and all my colleagues in
the Knowledge Discovery Department of the Institute for Infocomm Research
(I2R) for their valuable instruction and generous assistance, which have been a
great source of help. I am grateful to He Jun, Zhou Lifeng, Peng Bin and Gu Tao
in NSS lab who have been always encouraging, supporting and helping me during
my postgraduate study.
I gratefully acknowledge the financial support from I2R, the Agency for
Science, Technology and Research and National University of Singapore for the
duration of this project. Otherwise, I would not be able to undertake my further
study on this project.
Finally, I want to show my deep appreciation to my boy friend for his constant
caring and support throughout my life. There are many others who have assisted
me in various ways during this project. I gratefully acknowledge their help.

I


Table of Content
ACKNOWLEDGEMENTS............................................................................................................I

TABLE OF CONTENT ................................................................................................................ II


LIST OF TABLES ........................................................................................................................ V

LIST OF FIGURES .....................................................................................................................VI

SUMMARY ...............................................................................................................................VIII

CHAPTER 1

INTRODUCTION................................................................................................ 1

1.1

RESEARCH PROBLEM ..................................................................................................... 1

1.2

PROPOSED APPROACH .................................................................................................... 4

1.3

CONTRIBUTIONS............................................................................................................. 5

1.4

THESIS OVERVIEW ......................................................................................................... 6

CHAPTER 2 BACKGROUND AND RELATED WORK ...................................................... 8
2.1

WEB SERVICE ARCHITECTURE AND STANDARDS ........................................................... 9


2.1.1

Web Service Architecture ......................................................................................... 9

2.1.2

Web Service Standards ........................................................................................... 11

2.2

WEB SERVICE SELECTION ............................................................................................ 15

2.2.1

UDDI for Service Selection .................................................................................... 16

2.2.2

Service Selection in B2C......................................................................................... 17

2.3

RELATED WORK .......................................................................................................... 18

2.3.1

Semantic Annotations to Web Service .................................................................... 18

2.3.2


Semantic Annotations to UDDI .............................................................................. 23

2.3.3

Semantic Annotations to WSDL.............................................................................. 24

2.3.4

Semantic Annotations to Ontology ......................................................................... 26

2.3.5

User-Oriented Web Service Selection .................................................................... 27

II


2.4

OUR APPROACH – A PREVIEW ..................................................................................... 28

CHAPTER 3 SERVICE SELECTION IN MOBILE COMMERCE ................................... 30
3.1
3.1.1
3.2

THE M-COMM SYSTEM ................................................................................................ 30
M-Comm User Scenarios........................................................................................ 33
SERVICE SELECTION IN M-COMM ................................................................................ 34


3.2.1

Service Selection System Architecture.................................................................... 35

3.2.2

Discussions............................................................................................................. 38

CHAPTER 4 INTELLIGENT WEB SERVICE SELECTION ENGINE (ISSE)................... 40
4.1

SERVICE DESCRIPTION BY TEXTUAL INFORMATION..................................................... 44

4.1.1

Text-based Web Service Representation................................................................. 45

4.1.2

Representing Web Services by Words..................................................................... 47

4.2

DOMAIN IDENTIFICATION USING SVM ........................................................................ 49

4.2.1

Motivation .............................................................................................................. 49


4.2.2

Domain Modeling................................................................................................... 51

4.2.3

Domain Identification............................................................................................. 59

4.3

SERVICE SELECTION .................................................................................................... 62

4.3.1

Ontology Concept Annotation ................................................................................ 62

4.3.2

Target Service Template Construction ................................................................... 64

4.4

CONCLUSION ................................................................................................................ 66

CHAPTER 5 EVALUATION .................................................................................................... 68
5.1

SYSTEM IMPLEMENTATION .......................................................................................... 68

5.2


DATA ........................................................................................................................... 69

5.3

EVALUATION ............................................................................................................... 73

5.3.1

Selection accuracy.................................................................................................. 73

5.3.2

Time latency ........................................................................................................... 78

5.4

CONCLUSION ................................................................................................................ 79

III


CHAPTER 6 CONCLUSIONS .................................................................................................. 80

APPENDIX A: UDDI DATA STRUCTURE ............................................................................. 83
A.1 DATA STRUCTURES IN UDDI .............................................................................................. 83
A.2 AN EXAMPLE ....................................................................................................................... 90
APPENDIX B. SAMPLE OWL FOR POSTAL CODE SERVICE ......................................... 92

APPENDIX C. THE M-COMM SYSTEM ................................................................................ 97

C.1 SYSTEM ARCHITECTURE ...................................................................................................... 97
C.2 SYSTEM DEVELOPMENT MODEL AND METHODS ................................................................. 99
APPENDIX D. M-COMM SERVICE DISCOVERY SUB-SYSTEM DATA FLOW.......... 103

REFERENCE ............................................................................................................................. 107

IV


List of Tables
Table4.1 Presidential Election Results 2004 – Service Presentation................... 47
Table4.2 Two Web Service Presented in iSSE ..................................................... 57
Table 4.3 Sample Features in Postal Code Service............................................. 58
Table 4.4 Mapping Question Words to Noun ....................................................... 60
Table 5.1 System Environment.............................................................................. 68
Table5.2 Domain Categories and number of services in each domain ................ 70
Table 5.3 OWL design for Postal Code Service ................................................... 71
Table5.4 Precision and Recall in Service Selection Process of iSSE ................... 74

V


List of Figures
Fig 2.1, Service – Oriented Web Services Architecture........................................ 10
Fig2.2. Web Service Technology Stack................................................................. 12
Fig3. 1. M-Comm System for Shopping - General Structure................................ 32
Fig3. 2. Components of M-Comm Service Discovery Engine .............................. 36
Fig4.1. Web Service Discovery in M-Comm System............................................. 42
Fig 4.2 iSSE System Process and Components..................................................... 42
Fig 4.3 Distribution of # of Words and Occurrence............................................. 58

Fig5.1 Distribution of Words and Occurrences from the Testing Data ............... 72
Fig 5.2 Precision Comparison between iSSE and SalCentral.com ...................... 75
Fig5.3 Recall Comparison between iSSE and SalCentral.com ............................ 75
Fig A.1. Five Data Structures defined in UDDI Data Structure Specification 2.03
............................................................................................................................... 84
Fig A.2 BusinessEntity data Structure .................................................................. 84
Fig.A.3 BusinessEntity Example ........................................................................... 85
Fig A.4. BusinessService Data Structure Specification ........................................ 85
FigA.5 BusinessService Example.......................................................................... 86
Fig A.6. BindingTemplate structure specification ................................................ 87
FigA.7 BindingTemplate Example ........................................................................ 87
Fig A.8 tModel Data Structure Specification ...................................................... 88
FigA.9 tModel Example ........................................................................................ 89
Fig A.10 CategoryBag Data Structure Specification .......................................... 90
FigA.11 CategoryBag Example ............................................................................ 90
VI


FigA.12 Example of UDDI data structure............................................................ 91
Figure C.1. System architecture diagram............................................................. 97
Figure C.2. Technology vs. System architecture ................................................ 101
Figure C.3 Deployment View of M-Comm System ............................................. 102
Fig D.1 Level 0: DFD Diagram for Service Discovery Sub-system................... 103
Fig D.2 Level 1 DFD Diagram for Service Invoking ......................................... 104
Fig D.3 Level 1 DFD Diagram for Query Process ............................................ 105

VII


Summary


The World Wide Web, once just a simple repository of hyper-linked web pages, is
now rapidly evolving into a provider of services. Although web services were
originally designed for B2B scenarios, as the popularity of the web increases,
there is a growing need to adapt web services for B2C scenarios. Current web
service standards require the user to be conversant with the high technicalities
associated with web services in order to select the relevant web services to apply.
However, B2C scenarios are inherently dynamic and heterogeneous—useroriented processes are necessary in order for web services to be useful.

In this thesis, we investigate the challenging problem of providing an
intelligent engine for user-oriented web service selection in a B2C environment.
Our main contribution in this thesis is a novel light-weight semantic approach that
utilizes machine learning and information retrieval techniques to identify web
services that are relevant to a user’ query in free text. Our approach treats web
services as textual web pages from which their semantic content can be extracted
using natural language text processing methods. We employ a Support Vector
Machine to identify the domain of the web services requested by a user’s query,
and we then map the semantic content of the user’s free text query to the semantic
content of the domain’s web services’ ontology concepts to select a list of
relevant web services in the identified domain for the user. We have implemented

VIII


our intelligent web service selection method as part of the service discovery
engine in the M-Comm system. Using actual web services from xMethods, we
show that the iSSE (Intelligent Service Selection Engine) can accurately select
relevant web services for different domains.

IX



Chapter 1 Introduction

To begin, we provide the motivation of our thesis work in this introductory
chapter. We also give an overview of our proposed approach and our research
contributions in this work.

1.1 Research Problem

The World Wide Web—once just a simple repository of hyperlinked web pages—
is now evolving into a provider of services, or more specifically, “web
services”.[1] Web services are automated resources that can be accessed via the
Internet—they have been hailed as the next wave of internet based applications
that will dramatically change the use of internet. An increasing number of web
services have now been published to several service registries. Web users can use
these published web services to perform many everyday information retrieval
tasks and access the information providers’ technology platform directly. For
example, we can automate our access of catalog data at the online bookstore
Amazon.com, create and populate an Amazon online shopping cart (a book sale
platform), and even initiate the electronic checkout process by using the web
services provided by Amazon (AWS, or Amazon.com Web Service) [4]. The

1


emergence of web services is an increasingly evident phenomenon that cannot be
ignored—even the leading online search engine Google has also provided a Web
Services Kit for programmers to query for web services recently [5].


Web services offer internet users the possibility to interact with and compose the
source of the information directly. However, this availability comes at a price. In
order to decide which web service to use, a user has to scan through lists of
service descriptions to interpret the information so as to map the various available
services according to his needs. If the task at hand happens to be a complex one
(that is, one that cannot be accomplished by simply enlisting an individual web
service), the user will have to find ways to combine multiple web services in
order to accomplish his overall task. Because of this, web services are currently
typically used only in a B2B (Business-to-Business) scenario where a technically
complex contract about how to use the services can be assumed between the
participants for involved web transactions. For web services to be of good use in a
B2C (business-to-customer) scenario, where the customer cannot be expected to
be conversant with the high technicalities currently associated with web services,
an efficient and precise service discovery and selection mechanism for web
services is necessary.
Web service selection is a process to automatically select one web service or a
list of services according to users’ requirement. So far, research in the area of web
services mainly focuses on B2B web services with well defined interfaces and
terminologies defining a shared meaning for machine-to-machine interaction [60].

2


But web service selection task in B2C scenario is far more complicated than that
in a B2B scenario in the following ways:
o

Multiple domains environment: Web services in B2C scenario are
heterogeneous in terms of service domains. For instance, the 400 web services
registered in xMethods.com 1 are related to 24 service domains, including

SMS, postal code, medical information and address validation, and so on. It is
impossible to infer a uniform presentation of the services, as in the case of
B2B scenarios. Domain selection is the first step of B2C web service
selections.

o

Lacking common language between service providers and human users: In
B2B scenario, a common language, such as taxonomies and specifications, is
introduced to seek a common understanding between service providers and
service requesters. In a B2C scenario, human users cannot be expected to go
through the service catalogs to have an understanding about the technicalities
associated with web services.
This thesis therefore focuses on the research problem of how to intelligently

select web services based on a users’ requirement in a B2C environment.

1

XMethods.com is a web service repository from XMethods Inc. It is founded in 2000 by Tony
Hong and James Hong to provide services that facilitate the development, deployment, and usage
of web services and web service network

3


1.2 Proposed Approach

We present a novel automated web service selection method that utilizes machine
learning and information retrieval techniques to identify relevant web services

based on a users’ service query in free text format. Our approach is to treat web
services as textual web pages that are presented as service descriptions in WSDL
or other service description formats. In this way, the domain knowledge of a web
service category can be represented by a textual collection of the web services
belonging to it. The resulting text-based representation of the domain knowledge
also introduces an overlap between representation of the services and the
presentation of users’ free text queries, which can then be used to guide the
computer’s selection of relevant web services.

Our proposed approach can be separated into two main phases: build-time and
run-time. During build-time, we construct domain models that can be used to
classify a user query into specific service domains.

To address the manual

bottleneck of domain modeling, we apply machine learning on the training
services to automatically derive a support vector machine (SVM) model to
intelligently distinguish the different domains of interest. We also employ the
WordNet [6] to mine the semantic meanings in the concept labels of the service
ontologies.

4


During run-time, a user inputs a free-text query to request for some web
services. This user query must first be mapped to a known service category or
domain.

By applying the SVM from build-time on the meaningful words


extracted from the query string by PoS (Part of Speech) tagging, we identify the
relevant domain in which to select the appropriate web services based on the user
query. Next, we partially fill in the web service template for the chosen domain
using specific values from the user query. The partially-filled service template is
then used to find the best matches in the web services published under the chosen
domain. These web services are then selected by the system and presented to the
user. In this way, our system provides intelligent service selection for web service
applications in a B2C environment.

1.3 Contributions

We list the main contributions of our intelligent web service selection approach
below:
1. Service Selection by Textual Content: Our approach provides a novel
process for intelligently searching within the textual content of the web
services, which allows us to present the content (capability) of services by
using the textual description from service provider, at the same time it
map directly to free text user queries in B2C scenarios.

5


2. Light-weighted Semantic Approach: A key limitation of the so-called
semantic web approaches for web service selection (see Chapter 2) is that
they rely heavily on the availability (as well as the quality and
completeness) of the domain knowledge modeling of the manually
constructed ontologies. By devising a process to automatically generate
annotations of the domain ontologies; our approach is applicable even
when the domain ontology is not perfectly defined.


3. Applicability in B2C Environment: In a B2C environment, the user queries
are in free texts and they cannot be expected to conform to rigid standard
structures such as UDDI (Universal Discovery, Description and
Integration). Our approach can be applied in such a flexible environment
as it relies on neither pre-determined UDDI categories nor manually
constructed ontological models for service selection.

1.4 Thesis Overview

The rest of this thesis is organized as follows. In Chapter 2, we provide the
background of web services and the related works from W3C and Semantic Web
research area. Since our automated web service selection work is a component of
the M-Comm project, we describe the overall M-Comm architecture in Chapter 3.
Then, in Chapter 4, we provide a detailed description of our approach for
automatic web service selection. In Chapter 5, we present the evaluation results
6


of our system. Finally, in Chapter 6, we conclude our thesis with discussions on
possible further work.

7


Chapter 2 Background and Related Work

As mentioned in Introduction, the Web is evolving into a provider of services
instead of solely as a conventional repository for hyperlinked text and images.
Web users can now use the Internet not only to retrieve web pages of stored
information but also to enlist useful web services such as a stock-quote service,

flight information service and various e-commerce and B2B (business-tobusiness) applications, to service their operational needs. Take the following
scenario as an example for a possible use of web services. A secretary wants to
book a ticket from Paris to London for her manager. She opens a Soogle (
“Soogle” was used as a fictitious example for the future of web services, just like
Google today), and requests to book the cheapest ticket from Paris to London
departing tonight. The underlying intelligent system searches the Internet and
discovers some services providing air ticket information and some services that
provide online tickets booking. The system finds the best match of services and
then composes the services.

The orchestrated services are then seamlessly

invoked. With only a simple free-text query from the secretary, an appropriate
class of ticket for the manager is now waiting for her boss who can claim it when
departing the airport. From this example, it is clear that web services should be
discovered, composed, and invoked automatically.

To do so requires the

establishment of common architecture and standards to enable seamless

8


interoperability between the underlying software languages, platforms and
protocols.

2.1 Web Service Architecture and Standards

A web service operates as a software system identified by a URL whose public

interfaces and bindings are defined and described, currently using XML. Its
definition should be open so that can be discovered by other software systems to
interact with the web service in a manner prescribed by its definition, using XML
based messages conveyed by internet protocols [7, 8, 9]. SOAP (Simple Object
Access Protocol), WSDL (Web Services Description Language) and UDDI
(Universal Description, Discovery and Integration) are three current key standards
in Web service architecture. They define the protocol of Web Services for
messaging, service description and service registry & selection respectively.

2.1.1 Web Service Architecture
The web service architecture is a Service-Oriented Architecture (SOA for short),
which is also the structure that Grid computing is now based on. Three key roles –
Service Registry, Service Provider and Service Requester – are defined in the
SOA architecture, as shown in Fig 2.1:

9


1) Service Provider. A service provider is the owner and the implementer of a
published web service. It must provide the descriptions of the services as well
as a platform where its clients can access to the service.
2) Service Requester. A service requester is a person or an organization that
wishes to make use of a provider's web service. It will use a requester agent to
exchange messages with the service provider's provider agent. [9]
3) Service Registry. A service registry is the medium between the interaction of
service requesters and providers. It provides the necessary API’s and storage
for service registering and browsing. An entry in the service registry contains
descriptions of the necessary business context of the provided service, such as
the service provider’s name, contact information, and so on.


Fig 2.1, Service – Oriented Web Services Architecture

10


Also shown in Figure 2.1 are the three corresponding actions between the
three roles of the web service architecture:
1) Publish. Service providers publish their services’ description in the service
registry. The UDDI is the current standard protocol for web services, and
WSDL is the standard service description language. We will provide more
detailed descriptions on UDDI and WSDL in the next section.
2) Find. Service requesters find services by browsing the service registry using
the API’s provided in the service registry.
3) Invoke. After a service requester identifies a relevant web service published
in the registry, he invokes the service by using the information found in
service description.

The messaging standard for the three actions is done with SOAP, which is an
XML-based messaging exchanging protocol, and the underlying network protocol
can be HTTP, SMTP, and so on.

2.1.2 Web Service Standards
To ensure interoperability, there exist published standards for performing the
actions on web service architectures for operations involved in business
interactions. The standard web service stack is shown in Figure 2.2. The various
web service standards are mainly categorized into two key layers on the web
service technology stack:
11



Fig2.2. Web Service Technology Stack
1) Core Layers. Core layers are the lower-level layers in the stack. They include
TCP/IP as the network protocol, HTTP and SMTP as the transport protocols,
SOAP which is based on XML as the protocol for information exchanging.
2) Emerging Layers. Emerging layers are upper layers in Web service stack,
including WSDL as the web service description languages, UDDI as the
service registry and discovery protocol, and other standards like Web Service
Flow Language (WSFL) [10] for business interactions.
In the following, we describe each of the standards in the web service stack’s
Emerging Layers in further details:


Simple Object Access Protocol (SOAP): SOAP provides a lightweight
mechanism for exchanging structured information between peers in a

12


decentralized, distributed environment using XML. SOAP provides
encoding mechanisms for encoding data within modules, which allows it
to be used in a large variety of systems ranging from messaging systems to
RPC [11]. SOAP consists of three parts:
o Envelop. The SOAP envelop defines an overall framework for
representing the SOAP content, and it contains the other two parts
of SOAP message.
o Header. The SOAP header is a collection of zero or more SOAP
header blocks each of which might be targeted at any SOAP
receiver within the SOAP message path.
o Body. The SOAP body is a collection of zero or more element
information items targeted at an ultimate SOAP receiver in the

SOAP message path.
For further details about SOAP, please refer to [12, 13, and 14]


Web Service Description Language (WSDL): WSDL is an XML-based
interface language. It defines the web service in a high level of abstraction,
including a set of operations, the messages and a set of data types
implemented by the service. The service invoking styles (one-way,
request-response, solicit-response and notification) are also defined in
WSDL.

13




Universal Description, Discovery and Integration (UDDI): UDDI
provides the storage and API’s (registry API and discovery API) for
service providers and service requesters to advertise and browse services
respectively. Currently, WSDL is most closely associated with UDDI for
service descriptions. [15]



Web Service Flow Language (WSFL): WSFL, a language layer on the
top of WSDL, is an XML format for the description of web service
composition. Two models are contained in WSFL definition: a flow model
and a global model. Execution sequence of the functionality provided by
the composed web services is defined by the flow models. Global models
specify how composed services interact with each other. [10]


These standards in SOA allow the web service actions publish and invoke to
be carried out in a common standardized web service platform between the
participants. The remaining, and most important action in SOA that concerns both
Service Requesters and Service Provides is find — how can we help Service
Requesters locate the relevant services from the Service Providers based on the
Requesters’ specific requirements and the Providers’ specific specifications? .We
discuss this “Web Service Selection problem” in the next section.

14


2.2 Web Service Selection

Web services were originally designed with B2B (Business-to-Business)
interactions in mind. For example, the standard for web service registry and
selection, UDDI, which is based on catalog-based service registry and discovery,
is applicable for B2B scenarios as the business transactions are carried out by
applications that can be programmed to understand the various UDDI categories.
However, with the impending evolution of web service technology, web service
has expanded into the area of B2C (Business-to-Customer) and eventually to
peer-to-peer interactions. Indeed, B2C and End User (personal agent) Web
Services have been described as one of the most important usage scenarios of web
services. This new development calls for a new variety of web search engines:
user-oriented web service selection engines. Future search engines will not only
search web page content by keywords but they will also be expected to provide
web searches according to web site capabilities.
The traditional search engines such as Yahoo! and Google.com currently work
as follows: a user inputs query keywords and the search engine searches the web
pages stored in their database to find the web sites that contain some or all of the

user query keywords. The new user-oriented web service selection engines will
have to parse users’ input requirements in free text for some specific task, and
translate them into appropriate service requirement specifications that can be
understood by the web service selection engines, and then return (or even invoke)
15


a list of relevant candidate web services for the user. Some efforts have been
attempted in adapting the current B2B-focuses web services standards for useroriented service selection [58, 59, and 60]. In the next section, we discuss the pros
and cons of using the current web service protocol—UDDI—for service selection
tasks.

2.2.1 UDDI for Service Selection
In the Web Services SOA (Services Oriented Architecture) shown in Figure 2.1,
UDDI takes the central role as the standard protocol for the service registry and
provides the mechanism for browsing through and discovering services registered.
Appendix A describes in further details the various innards of the UDDI.
Categorization is the key feature with which an UDDI-compliant registry can
provide a service selection platform for the user. However, as mentioned earlier,
the categorization-based service selection functionality of UDDI is more
applicable to a B2B (Business-to-Business) scenario, where corporate ecommerce applications can be programmed to understand the pre-defined
structures in UDDI in order to exploit the web services that are available in the
registry. In other words, users can only use UDDI to find the services already
known to them (i.e. they must first know exactly which category their desired web
service falls under in order to search for it). In a more dynamic environment such
as B2C (Business-to-Customers), such rigidly pre-categorized service selection
functionality provided under UDDI is inadequate. When a web service provider
registers under a specific category along with UDDI registration data, it can only
16



×