Oracle® Ultra Search
User’s Guide
10g Release 1 (10.1)
Part No. B10731-02
June 2004
Oracle Ultra Search User’s Guide 10g Release 1 (10.1)
Part No. B10731-02
Copyright © 2002, 2004, Oracle. All rights reserved.
Primary Author: Michele Cyran
Contributors: Sandeepan Banerjee, Stefan Buchta, Chung-Ho Chen, Will Chin, Jack Chung, Ray Hachem,
Cindy Hsin, Hassan Karraby, Yasuhiro Matsuda, Colin McGregor, Valarie Moore, Visar Nimani, Steve Yang,
David Zhang
The Programs (which include both the software and documentation) contain proprietary information; they
are provided under a license agreement containing restrictions on use and disclosure and are also protected
by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly,
or decompilation of the Programs, except to the extent required to obtain interoperability with other
independently created software or as specified by law, is prohibited.
The information contained in this document is subject to change without notice. If you find any problems in
the documentation, please report them to us in writing. This document is not warranted to be error-free.
Except as may be expressly permitted in your license agreement for these Programs, no part of these
Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any
purpose.
If the Programs are delivered to the United States Government or anyone licensing or using the Programs
on behalf of the United States Government, the following notice is applicable:
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data
delivered to U.S. Government customers are "commercial computer software" or "commercial technical data"
pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As
such, use, duplication, disclosure, modification, and adaptation of the Programs, including documentation
and technical data, shall be subject to the licensing restrictions set forth in the applicable Oracle license
agreement, and, to the extent applicable, the additional rights set forth in FAR 52.227-19, Commercial
Computer Software Restricted Rights (June 1987). Oracle Corporation, 500 Oracle Parkway, Redwood City,
CA 94065
The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently
dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup,
redundancy and other measures to ensure the safe use of such applications if the Programs are used for such
purposes, and we disclaim liability for any damages caused by such use of the Programs.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks
of their respective owners.
The Programs may provide links to Web sites and access to content, products, and services from third
parties. Oracle is not responsible for the availability of, or any content provided on, third-party Web sites.
You bear all risks associated with the use of such content. If you choose to purchase any products or services
from a third party, the relationship is directly between you and the third party. Oracle is not responsible for:
(a) the quality of third-party products or services; or (b) fulfilling any of the terms of the agreement with the
third party, including delivery of products or services and warranty obligations related to purchased
products or services. Oracle is not responsible for any loss or damage of any sort that you may incur from
dealing with any third party.
iii
Contents
Send Us Your Comments xiii
Preface xv
Audience xv
Documentation Accessibility xv
Structure xvi
Related Documentation xvii
Conventions xvii
What's New in Oracle Ultra Search? xxi
Ultra Search Release Information xxiv
1 Introduction to Oracle Ultra Search
Overview of Oracle Ultra Search 1-1
Ultra Search Components 1-1
Ultra Search Crawler 1-2
Ultra Search Backend 1-2
Ultra Search Administration Tool 1-2
Ultra Search APIs and Sample Applications 1-2
Ultra Search Features 1-3
Instance Snapshot Support 1-3
Document and Search Attributes 1-3
Metadata Loader 1-4
Extensible Crawler and Crawler Agents 1-4
Robots Exclusions 1-4
Data Harvesting Mode 1-4
URL Rewrite 1-5
Query API 1-5
Secure Search 1-5
Dependency on Oracle XML DB 1-6
Sample Query Applications 1-7
Document Relevancy Boosting 1-7
Query Syntax Expansion 1-7
Display URL Support 1-7
Federated Search 1-8
iv
Single Sign-On Authentication 1-8
Integration with Oracle Internet Directory 1-8
Ultra Search Administration Groups in Oracle Internet Directory 1-8
Authorization of the Administration Privileges 1-9
Integration with Oracle Application Server 1-9
Sample Search Portlet 1-9
Ultra Search System Configuration 1-10
2 Getting Started with Oracle Ultra Search
Overview 2-1
Installation 2-2
Using the Oracle Universal Installer 2-2
Accessing the Ultra Search Administration Application 2-2
Setting up the Sample Query Application 2-3
Setting up the Ultra Appliance Demo 2-3
Crawl and Index Ultra Appliance’s Intranet Documents 2-4
Crawl and Index Ultra Appliance’s Database Documents 2-7
Issuing a Query 2-8
3 Installing and Configuring Ultra Search
Ultra Search Requirements 3-1
Hardware Requirements 3-1
Software Requirements 3-2
Installing the Ultra Search Backend 3-2
Database Release 3-2
Oracle Application Server Release 3-2
Installing As Part of Oracle Application Server Metadata Repository Creation 3-2
Installing Into an Existing Database 3-3
Post-Installation Tasks for the Ultra Search Backend 3-4
Enabling Ultra Search to Process Binary Files 3-4
Configure the Oracle Database for Ultra Search 3-4
Configure a Secure Ultra Search Installation 3-5
Backend Reconfiguration After a Database Character Set Change 3-7
Configuring the Default Ultra Search Instance 3-7
Installing the Ultra Search Middle Tier on Web Server Hosts 3-8
Web Applications Concepts 3-8
Browser Requirements 3-9
Installing the Middle Tier with the Oracle Database Release 3-9
Installing the Middle Tier with the Oracle Application Server Release 3-10
Configuring the Middle Tier with Oracle HTTP Server and OC4J 3-10
Configuring the Administration Tool with Single Sign-On Server 3-13
Deploying the Ultra Search EAR File on a Third Party Middle Tier 3-14
Editing the data-sources.xml File 3-16
Editing the ultrasearch.properties File 3-17
Starting the Web Server 3-18
Testing the Ultra Search Administration Tool 3-18
Testing the Ultra Search Sample Query Applications 3-18
v
Installing the Backend on Remote Crawler Hosts 3-19
Installing the Backend on Remote Crawler Hosts 3-19
Configuring the Remote Crawler 3-20
Unregistering a Remote Crawler 3-21
Configuring Ultra Search in a Hosted Environment 3-22
Preconfiguration Tasks for a Hosted Environment 3-22
Configuring Ultra Search in the Subscriber Context 3-22
4 Post-Installation Information
Changing Ultra Search Schema Passwords 4-1
Configuring the Oracle Server for Ultra Search 4-1
Step 1: Tune the Oracle Database 4-2
Step 2: Create and Assign the Temporary Tablespace to the CTXSYS User 4-3
Step 3: Create a Large Tablespace for Each Ultra Search Instance User 4-3
Step 4: Create and Configure New Users for Ultra Search Instances 4-4
Step 5: Alter the Index Preferences 4-5
Configuring Ultra Search for SSL 4-5
Managing Stoplists 4-6
Default Ultra Search Stoplist 4-6
Modifying Instance Stoplists 4-6
Modifying Instance Stoplists Before Initial Crawling 4-6
Modifying Instance Stoplists After Initial Crawling 4-7
Upgrading Ultra Search 4-7
Pre-Upgrade Steps 4-8
Upgrading Ultra Search Shipped with Oracle Database 4-8
Upgrading Ultra Search Shipped with Oracle Application Server 4-8
Upgrading Ultra Search Shipped with Oracle Collaboration Suite 4-8
Upgrading Ultra Search to Oracle Collaboration Suite Release 1 4-9
Upgrade from Ultra Search 1.0.3 to 9.0.3 4-9
Upgrade from Ultra Search 9.0.2 to 9.0.3 4-11
Upgrade from Ultra Search 9.2 to 9.0.3 4-11
Post-Upgrade Configuration Steps 4-11
Post-Upgrade Example in Non-RAC Environment 4-12
Post-Upgrade Example in RAC Environment 4-12
Configuring the Query Application 4-12
Step 1: Edit the data-sources.xml File 4-12
Step 2: Deploy Multiple Query Applications Against Multiple Instances 4-13
5 Security in Oracle Ultra Search
About Ultra Search Security 5-1
Ultra Search Security Model 5-1
Ultra Search with Secure Socket Layer and HTTPS 5-2
Classes of Users and Their Privileges 5-2
Ultra Search Default Users 5-3
Ultra Search Admin Privilege Model in the Hosted Environment 5-3
Admin Privilege Model 5-4
vi
Resources Protected by Ultra Search 5-5
Authorization and Access Enforcement 5-6
How Ultra Search Leverages Security Services 5-6
How Ultra Search Leverages the Identity Management Infrastructure 5-6
Ultra Search Extensibility and Security 5-6
Configuring a Security Framework for Ultra Search 5-7
Configuring Security Framework Options for Ultra Search 5-7
Configuring Oracle Identity Management Options for Ultra Search 5-7
Configuring Ultra Search Security 5-7
6 Understanding the Oracle Ultra Search Crawler and Data Sources
Overview of the Ultra Search Crawler 6-1
Crawler Settings 6-1
Crawler Data Sources 6-2
Using Crawler Agents 6-2
Synchronizing Data Sources 6-2
Display URL and Access URL 6-2
Document Attributes 6-3
Crawling Process for the Schedule 6-3
Queuing and Caching Documents 6-3
Indexing Documents 6-5
Data Synchronization 6-6
Web Crawling Boundary Control 6-6
URL Boundary Rule 6-6
robots.txt Protocol and robots Metatag 6-7
Crawling Depth 6-7
URL Rewriter 6-8
URL Redirection and Boundary Rule Enforcement 6-8
Ultra Search Remote Crawler 6-8
Ultra Search Crawler Status Codes 6-8
7 Understanding the Ultra Search Administration Tool
Ultra Search Administration Tool 7-1
Setting Crawler Parameters 7-2
Setting Query Options 7-2
Attributes 7-2
Data Groups 7-2
Online Help in Different Languages 7-2
Logging On to Ultra Search 7-3
Logging On and Managing Instances as SSO Users 7-4
Logging On to Ultra Search 7-4
Granting Privileges to SSO Users 7-4
Instances Page 7-5
Creating an Instance 7-5
Creating a Regular Instance 7-5
Creating a Snapshot Instance 7-6
Selecting an Instance 7-8
vii
Deleting an Instance 7-8
Editing an Instance 7-8
Instance Mode 7-8
Schema Password 7-8
Crawler Page 7-9
Configure the Settings 7-9
Remote Crawler Profiles 7-12
Crawler Statistics 7-12
Summary of Crawler Activity 7-13
Detailed Crawler Statistics 7-13
Crawler Progress 7-13
Problematic URLs 7-13
Web Access Page 7-13
Proxies 7-13
Authentication 7-13
HTTP Authentication 7-13
HTML Forms 7-14
Attributes Page 7-14
Search Attributes 7-14
Mappings 7-15
Sources Page 7-15
Web Sources 7-16
Creating Web Sources 7-16
Table Sources 7-18
Creating Table Sources 7-18
Editing Table Sources 7-19
Table Sources Comprised of More Than One Table 7-19
Limitations With Database Links 7-19
Email Sources 7-20
Creating Email Sources 7-20
File Sources 7-21
Creating File Sources 7-21
Oracle Sources 7-21
Oracle Portal Sources 7-22
Federated Sources 7-22
User-Defined Sources 7-24
Creating User-Defined Data Source Types 7-24
Creating User-Defined Sources 7-24
Schedules Page 7-25
Data Synchronization 7-25
Creating Synchronization Schedules 7-25
Updating Schedules 7-25
Editing Synchronization Schedules 7-26
Launching Synchronization Schedules 7-27
Synchronization Status and Crawler Progress 7-28
Index Optimization 7-28
Queries Page 7-29
viii
Data Groups 7-29
URL Submission 7-29
Relevancy Boosting 7-30
Query Statistics 7-30
Configuration 7-31
Users Page 7-32
Preferences 7-32
Super-Users 7-32
Privileges 7-32
Globalization Page 7-33
Search Attribute Name 7-33
LOV Display Name 7-34
Data Group Name 7-34
8 Ultra Search Developer's Guide and API Reference
Overview of Ultra Search APIs 8-1
Ultra Search Query API 8-2
Customizing the Query Syntax Expansion 8-3
Default Query Syntax Expansion Implementation 8-3
End User Query Syntax 8-3
Scoring Classes 8-4
Expansion Rules 8-5
Examples of Applying the Rules 8-5
Customizing the Rules 8-6
Ultra Search Query Tag Library 8-7
Query Tag Descriptions 8-8
<instance> Tag: Connecting to the Ultra Search Instance 8-8
<iterAttributes> Tag: Show All Search Attributes 8-9
<iterGroups> Tag: Show All Search Groups 8-10
<iterLanguages> Tag: Show All Search Languages 8-10
<iterLOV> Tag: Show All Values Defined for a Search Attribute 8-11
Formulating the Query 8-11
<getResult> Tag: Perform Search 8-11
<fetchAttribute> Tag: Metadata Selection 8-12
<showHitCount> Tag: Show Estimated Hit Count 8-13
<iterResult> Tag: Render the Results 8-13
<showAttributeValue> Tag: Render a Document Attribute 8-13
Ultra Search Crawler Agent API 8-14
Crawler Agent Overview 8-14
Standard Agent 8-15
Smart Agent 8-15
Document Attributes and Properties 8-15
Library Path and Java Class Path 8-16
Crawler Agent Functionality 8-16
Data Source Type Registration 8-16
Data Source Registration 8-17
Data Source Attribute Registration 8-18
ix
User-Implemented Crawler Agent 8-18
Interaction Between the Crawler and the Crawler Agent 8-18
Crawler Agent APIs and Classes 8-18
Sample Agent Files 8-19
Setting up the Sample Crawler Agent 8-19
Compiling and Building the Agent Jar File 8-19
Creating a Data Source Type 8-19
Defining Data Source Parameters 8-20
Defining a Data Source of this Type 8-20
Ultra Search Java Email API 8-21
JavaMail Implementation 8-21
Java Email API 8-21
Sample Mailing List Browser Application Files 8-22
Setting up the Sample Mailing List Browser Application 8-22
Ultra Search URL Rewriter API 8-22
URL Link Filtering 8-23
URL Link Rewriting 8-23
Creating and Using a URL Rewriter 8-24
Ultra Search Document Service API 8-25
APIs and Classes 8-26
Interface DocumentService 8-26
Agent Registration Client Interface 8-27
Example of Setting Up the Sample Document Service Agent 8-28
Ultra Search Sample Query Applications 8-28
Sample Query Applications 8-29
JavaServer Page Concepts 8-30
9 Tuning and Performance
Tuning the Web Crawling Process 9-1
Web Crawling Strategy 9-1
Monitoring the Crawling Process 9-1
URL Looping 9-2
Tuning Query Performance 9-2
Using the Remote Crawler 9-4
Understanding the Launcher 9-4
RMI-Based Remote Crawling 9-5
JDBC-Based Remote Crawling 9-5
Security With Remote Crawlers 9-6
Scalability and Load Balancing 9-6
Installation and Configuration Sequence 9-6
Ultra Search on Real Application Clusters 9-9
Configuring Storage Access 9-9
Remote Crawler File Cache 9-10
Logging on to the Oracle Instance 9-11
Query Search Application for Read Application Clusters 9-11
Java Crawler 9-11
Choosing a JDBC Driver 9-11
x
Ultra Search Failover in a RAC Environment 9-12
Table Data Source Synchronization 9-12
Synchronizing Crawling of Oracle Databases 9-12
Create Log Table 9-13
Create Log Triggers 9-13
Synchronizing Crawling of Non-Oracle Databases 9-14
10 Administration PL/SQL APIs
Instance-Related APIs 10-3
CREATE_INSTANCE 10-3
DROP_INSTANCE 10-4
GRANT_ADMIN 10-5
REVOKE_ADMIN 10-6
SET_INSTANCE 10-7
Schedule-Related APIs 10-8
CREATE_SCHEDULE 10-8
DROP_SCHEDULE 10-9
INTERVAL 10-10
SET_SCHEDULE 10-11
UPDATE_SCHEDULE 10-12
Crawler Configuration APIs 10-13
IS_ADMIN_READONLY 10-13
SET_ADMIN_READONLY 10-14
UPDATE_CRAWLER_CONFIG 10-15
A Loading Metadata into Ultra Search
Launching the Loading Tool A-1
Loading Documents and Relevance Scores A-2
The Input XML File A-2
Example of the Document Relevance Boosting XML File A-2
Loading Search Attribute LOVs and LOV Display Names A-3
The LOV XML File A-3
Example of the LOV XML File A-3
XML Schema for Document Relevance Boosting A-4
XML Schema for LOVs and LOV Display Names A-4
B Altering the Crawler Java Classpath
Reasons for Altering the Crawler Java Classpath B-1
Difference Between the Crawler Classpath and the Remote Crawler Classpath B-1
Altering the Crawler Java Classpath on the Ultra Search Server Host B-1
Altering the Crawler Java Classpath on a Remote Crawler Host B-2
C Ultra Search Views
OUS_INSTANCES C-1
OUS_SCHEDULES C-1
OUS_DEFAULT_CRAWLER_SETTINGS C-2
xi
OUS_CRAWLER_SETTINGS C-2
D URL Crawler Status Codes
Index
xii
xiii
Send Us Your Comments
Oracle Ultra Search User’s Guide 10g Release 1 (10.1)
Part No. B10731-02
Oracle welcomes your comments and suggestions on the quality and usefulness of this
publication. Your input is an important part of the information used for revision.
■ Did you find any errors?
■ Is the information clearly presented?
■ Do you need more information? If so, where?
■ Are the examples correct? Do you need more examples?
■ What features did you like most about this manual?
If you find any errors or have any other suggestions for improvement, please indicate
the title and part number of the documentation and the chapter, section, and page
number (if available). You can send comments to us in the following ways:
■ Electronic mail:
■ FAX: (650) 506-7227 Attn: Server Technologies Documentation Manager
■ Postal service:
Oracle Corporation
Server Technologies Documentation
500 Oracle Parkway, Mailstop 4op11
Redwood Shores, CA 94065
USA
If you would like a reply, please give your name, address, telephone number, and
electronic mail address (optional).
If you have problems with the software, please contact your local Oracle Support
Services.
xiv
xv
Preface
This Preface contains these topics:
■ Audience
■ Documentation Accessibility
■ Structure
■ Related Documentation
■ Conventions
Audience
Oracle Ultra Search User’s Guide is intended for database administrators and application
developers who perform the following tasks:
■ Install and configure Ultra Search
■ Administer Ultra Search instances
■ Develop Ultra Search applications
To use this document, you should have experience with the Oracle database
management system, SQL, SQL*Plus, and PL/SQL.
Documentation Accessibility
Our goal is to make Oracle products, services, and supporting documentation
accessible, with good usability, to the disabled community. To that end, our
documentation includes features that make information available to users of assistive
technology. This documentation is available in HTML format, and contains markup to
facilitate access by the disabled community. Standards will continue to evolve over
time, and Oracle is actively engaged with other market-leading technology vendors to
address technical obstacles so that our documentation can be accessible to all of our
customers. For additional information, visit the Oracle Accessibility Program Web site
at
/>Accessibility of Code Examples in Documentation
JAWS, a Windows screen reader, may not always correctly read the code examples in
this document. The conventions for writing code require that closing braces should
appear on an otherwise empty line; however, JAWS may not always read a line of text
that consists solely of a bracket or brace.
xvi
Accessibility of Links to External Web Sites in Documentation
This documentation may contain links to Web sites of other companies or
organizations that Oracle does not own or control. Oracle neither evaluates nor makes
any representations regarding the accessibility of these Web sites.
Structure
This document contains:
"What's New in Oracle Ultra Search?"
This section describes new features and provides pointers to additional information.
Chapter 1, "Introduction to Oracle Ultra Search"
This chapter provides an overview of Ultra Search and describes the system
configuration.
Chapter 2, "Getting Started with Oracle Ultra Search"
This chapter provides an example scenario that shows installation and use of Ultra
Search.
Chapter 3, "Installing and Configuring Ultra Search"
This chapter describes how to install and configure Ultra Search.
Chapter 4, "Post-Installation Information"
This chapter provides post-installation information, such as how to configure the
Oracle Database server for Ultra Search and how to manage stoplists. It also describes
how to upgrade to the most recent Ultra Search release.
Chapter 5, "Security in Oracle Ultra Search"
This chapter describes the architecture and configuration of security for Ultra Search.
Chapter 6, "Understanding the Oracle Ultra Search Crawler and Data Sources"
This chapter explains how the crawler works. It also describes crawler settings, data
sources, document attributes, data synchronization, and the remote crawler.
Chapter 7, "Understanding the Ultra Search Administration Tool"
This chapter describes how to use the Ultra Search administration tool to configure
and schedule the Ultra Search crawler.
Chapter 8, "Ultra Search Developer's Guide and API Reference"
This chapter explains the following Ultra Search APIs: query API, crawler agent API,
email API, URL rewriter API, and the document service API. It also provides related
API information, such as details about the sample query applications, the query tag
library, and query syntax expansion customization.
Chapter 9, "Tuning and Performance"
This chapter describes various ways to tune Ultra Search and improve performance.
These include tuning the Web crawling process, tuning query performance, using the
remote crawler, using Ultra Search on Real Application Clusters, and table data source
synchronization.
xvii
Chapter 10, "Administration PL/SQL APIs"
This chapter details some of Ultra Search's PL/SQL APIs for administration, including
those for crawler configuration, crawler scheduling, and instance administration.
Appendix A, "Loading Metadata into Ultra Search"
This appendix describes the command-line tool for loading metadata into an Ultra
Search database.
Appendix B, "Altering the Crawler Java Classpath"
This appendix explains why and how to alter the crawler Java classpath.
Appendix C, "Ultra Search Views"
This appendix shows the various views available with Ultra Search.
Appendix D, "URL Crawler Status Codes"
This appendix lists the codes that the crawler uses to indicate the result of the crawled
URL.
Related Documentation
For more information, see these Oracle resources:
■ Oracle Database Concepts
■ Oracle Database Administrator's Guide
■ Oracle Database Performance Tuning Guide
■ Oracle Enterprise Manager Concepts
Many books in the documentation set use the sample schemas of the seed database,
which is installed by default when you install Oracle Database. Refer to Oracle
Database Sample Schemas for information on how these schemas were created and how
you can use them yourself.
Printed documentation is available for sale in the Oracle Store at
/>To download free release notes, installation documentation, white papers, or other
collateral, please visit the Oracle Technology Network (OTN). You must register online
before using OTN; registration is free and can be done at
/>If you already have a user name and password for OTN, then you can go directly to
the documentation section of the OTN Web site at
/>To access the database documentation search engine directly, visit
/>Conventions
This section describes the conventions used in the text and code examples of this
documentation set. It describes:
xviii
■ Conventions in Text
■ Conventions in Code Examples
■ Conventions for Windows Operating Systems
Conventions in Text
We use various conventions in text to help you more quickly identify special terms.
The following table describes those conventions and provides examples of their use.
Conventions in Code Examples
Code examples illustrate SQL, PL/SQL, SQL*Plus, or other command-line statements.
They are displayed in a monospace (fixed-width) font and separated from normal text
as shown in this example:
SELECT username FROM dba_users WHERE username = 'MIGRATE';
The following table describes typographic conventions used in code examples and
provides examples of their use.
Convention Meaning Example
Bold Bold typeface indicates terms that are
defined in the text or terms that appear in a
glossary, or both.
When you specify this clause, you create an
index-organized table.
Italics Italic typeface indicates book titles or
emphasis.
Oracle Database Concepts
Ensure that the recovery catalog and target
database do not reside on the same disk.
UPPERCASE
monospace
(fixed-width)
font
Uppercase monospace typeface indicates
elements supplied by the system. Such
elements include parameters, privileges,
datatypes, RMAN keywords, SQL
keywords, SQL*Plus or utility commands,
packages and methods, as well as
system-supplied column names, database
objects and structures, usernames, and
roles.
You can specify this clause only for a NUMBER
column.
You can back up the database by using the
BACKUP command.
Query the TABLE_NAME column in the
USER_TABLES data dictionary view.
Use the DBMS_STATS.GENERATE_STATS
procedure.
lowercase
monospace
(fixed-width)
font
Lowercase monospace typeface indicates
executable programs, filenames, directory
names, and sample user-supplied
elements. Such elements include computer
and database names, net service names
and connect identifiers, user-supplied
database objects and structures, column
names, packages and classes, usernames
and roles, program units, and parameter
values.
Note: Some programmatic elements use a
mixture of UPPERCASE and lowercase.
Enter these elements as shown.
Enter sqlplus to start SQL*Plus.
The password is specified in the orapwd file.
Back up the datafiles and control files in the
/disk1/oracle/dbs directory.
The department_id, department_name, and
location_id columns are in the
hr.departments table.
Set the QUERY_REWRITE_ENABLED initialization
parameter to true.
Connect as oe user.
The JRepUtil class implements these methods.
lowercase
italic
monospace
(fixed-width)
font
Lowercase italic monospace font represents
placeholders or variables.
You can specify the parallel_clause.
Run old_release.SQL where old_release
refers to the release you installed prior to
upgrading.
xix
Conventions for Windows Operating Systems
The following table describes conventions for Windows operating systems and
provides examples of their use.
Convention Meaning Example
[ ]
Anything enclosed in brackets is optional.
DECIMAL (digits [ , precision ])
{ }
Braces are used for grouping items.
{ENABLE | DISABLE}
|
A vertical bar represents a choice of two
options.
{ENABLE | DISABLE}
[COMPRESS | NOCOMPRESS]
Ellipsis points mean repetition in syntax
descriptions.
In addition, ellipsis points can mean an
omission in code examples or text.
CREATE TABLE AS subquery;
SELECT col1, col2, , coln FROM
employees;
Other symbols You must use symbols other than brackets
([ ]), braces ({ }), vertical bars (|), and
ellipsis points ( ) exactly as shown.
acctbal NUMBER(11,2);
acct CONSTANT NUMBER(4) := 3;
Italics
Italicized text indicates placeholders or
variables for which you must supply
particular values.
CONNECT SYSTEM/system_password
DB_NAME = database_name
UPPERCASE
Uppercase typeface indicates elements
supplied by the system. We show these
terms in uppercase in order to distinguish
them from terms you define. Unless terms
appear in brackets, enter them in the order
and with the spelling shown. Because these
terms are not case sensitive, you can use
them in either UPPERCASE or lowercase.
SELECT last_name, employee_id FROM
employees;
SELECT * FROM USER_TABLES;
DROP TABLE hr.employees;
lowercase
Lowercase typeface indicates user-defined
programmatic elements, such as names of
tables, columns, or files.
Note: Some programmatic elements use a
mixture of UPPERCASE and lowercase.
Enter these elements as shown.
SELECT last_name, employee_id FROM
employees;
sqlplus hr/hr
CREATE USER mjones IDENTIFIED BY ty3MU9;
Convention Meaning Example
Choose Start >
menu item
How to start a program. To start the Database Configuration Assistant,
choose Start > Programs > Oracle -
HOME_NAME > Configuration and Migration
Tools > Database Configuration Assistant.
File and directory
names
File and directory names are not case
sensitive. The following special characters
are not allowed: left angle bracket (<), right
angle bracket (>), colon (:), double
quotation marks ("), slash (/), pipe (|), and
dash (-). The special character backslash (\)
is treated as an element separator, even
when it appears in quotes. If the filename
begins with \\, then Windows assumes it
uses the Universal Naming Convention.
c:\winnt"\"system32 is the same as
C:\WINNT\SYSTEM32
xx
C:\> Represents the Windows command
prompt of the current hard disk drive. The
escape character in a command prompt is
the caret (^). Your prompt reflects the
subdirectory in which you are working.
Referred to as the command prompt in this
manual.
C:\oracle\oradata>
Special characters The backslash (\) special character is
sometimes required as an escape character
for the double quotation mark (") special
character at the Windows command
prompt. Parentheses and the single
quotation mark (') do not require an escape
character. Refer to your Windows
operating system documentation for more
information on escape and special
characters.
C:\>exp HR/HR TABLES=employees
QUERY=\"WHERE job_id='SA_REP' and
salary<8000\"
HOME_NAME
Represents the Oracle home name. The
home name can be up to 16 alphanumeric
characters. The only special character
allowed in the home name is the
underscore.
C:\> net start OracleHOME_NAMETNSListener
ORACLE_HOME
and
ORACLE_BASE
In releases prior to Oracle8i release 8.1.3,
when you installed Oracle components, all
subdirectories were located under a top
level ORACLE_HOME directory. The default
for Windows NT was C:\orant.
This release complies with Optimal
Flexible Architecture (OFA) guidelines. All
subdirectories are not under a top level
ORACLE_HOME directory. There is a top
level directory called ORACLE_BASE that
by default is
C:\oracle\product\10.1.0. If you
install the latest Oracle release on a
computer with no other Oracle software
installed, then the default setting for the
first Oracle home directory is
C:\oracle\product\10.1.0\db_n,
where n is the latest Oracle home number.
The Oracle home directory is located
directly under ORACLE_BASE.
All directory path examples in this guide
follow OFA conventions.
Refer to Oracle Database Installation Guide
for 32-Bit Windows for additional
information about OFA compliances and
for information about installing Oracle
products in non-OFA compliant
directories.
Go to the
ORACLE_BASE\ORACLE_HOME\rdbms\admin
directory.
Convention Meaning Example
xxi
What's New in Oracle Ultra Search?
This section describes Ultra Search new features, with pointers to additional
information. It also explains the Ultra Search release history.
Secure Crawling
Ultra Search provides secure crawling with the following types of authentication:
Digest Authentication Ultra Search supports HTTP digest authentication, and the
Ultra Search crawler can authenticate itself to Web servers employing HTTP digest
authentication scheme. This is based on a simple challenge-response paradigm;
however, the password is encrypted.
HTML Form Authentication HTML form-based authentication is the most commonly
used authentication scheme on the Web. Ultra Search lets you register HTML forms
that you want the Ultra Search crawler to automatically fill out during Web crawling.
HTML form authentication requires that HTTP cookie functionality is enabled, which
is the default.
Indexing Control of Dynamically Generated Web Pages
The crawler can be configured to not index Web pages that are dynamically generated
(for example, if a URL contains a question mark).
HTTPS
Ultra Search now supports HTTPS (HTTP over SSL). The Ultra Search crawler can
now crawl HTTPS URLs (for example, ).
Secure Searching
Ultra Search now supports secure searches. Secure searches return only documents
that the search user is allowed to view.
Each indexed document can be protected by an access control list (ACL). During
searches, the ACL is evaluated. If the user performing the search has permission to
read the protected document, then the document is returned by the query API.
Otherwise, it is not returned.
See Also: "Creating Web Sources" on page 7-16
See Also: "Creating Web Sources" on page 7-16
See Also: "Ultra Search with Secure Socket Layer and HTTPS" on
page 5-2 and "Creating Web Sources" on page 7-16
xxii
Ultra Search stores ACLs in the Oracle XML DB repository. Ultra Search also uses
Oracle XML DB functionality to evaluate ACLs.
Remote Crawler JDBC Caching Support
It is now possible to use the remote crawler without mounting the remote cache
directory to the server machine. Instead, the cache files are sent over the crawler's
JDBC connection to the server cache directory.
Manual Launch Scheduling
A schedule can be created with no scheduled launch time, so that it can only be started
on demand.
Crawler Log File Versioning
For each data source, the crawler will preserve the latest 3 log files. This avoids wiping
out previous crawling log file on recrawl.
New PL/SQL Administration APIs
Ultra Search now includes APIs for various administration tasks, such as crawler,
schedule, and instance administration.
Integration with Oracle Internet Directory
Oracle Internet Directory is Oracle's native LDAP v3-compliant directory service, built
as an application on top of the Oracle Database. Ultra Search integrates with Oracle
Internet Directory in the following areas:
■ Ultra Search administration groups and group membership are stored in Oracle
Internet Directory.
■ Users are authenticated through the single sign-on (SSO) server and Oracle
Internet Directory.
■ Oracle Internet Directory performs authorization on Ultra Search users'
administration privileges.
Cookie Support
Cookies remember context between HTTP requests. For example, the server can send a
cookie such that it knows if a user has already logged on and does not need to log on
again. Cookie support is enabled by default.
Crawler Cache Deletion Control
During crawling, documents are stored in the cache directory. Every time the preset
size is reached, crawling stops and indexing starts. In previous releases, the cache file
See Also: "Secure Search" on page 1-5
See Also: "JDBC-Based Remote Crawling" on page 9-5 and
"Remote Crawler Profiles" on page 7-12
See Also: "Data Synchronization" on page 7-25
See Also: "Crawler Logging" on page 11
See Also: Chapter 10, "Administration PL/SQL APIs"
See Also: "Integration with Oracle Internet Directory" on page 1-8
xxiii
was always deleted when indexing was done. You can now specify not to delete the
cache file when indexing is done. This option applies to all data sources. The default is
to delete the cache file after indexing.
URL Boundary Rules Include Port Number Inclusion or Exclusion
You can set URL boundary rules to refine the crawling space. You can now include or
exclude Web sites with a specific port. For example, you can include www.oracle.com
but not www.oracle.com:8080. By default, all ports are crawled.
Hostname Prefix Allowed in Web Data Source URL Boundary Specification
In previous releases, you could only specify suffix inclusion rules. For example, crawl
only URLs ending with "oracle.com." You can now also specify prefix rules. For
example, crawl "oracle.com" but not "stores.oracle.com".
Default Ultra Search Instance and Schema
Ultra Search automatically creates a default Ultra Search instance based on the default
Ultra Search test user. So, you can test Ultra Search functionality based on the default
instance after installation.
Monitoring Ultra Search Components with Oracle Enterprise Manager
You can use Enterprise Manager's Grid Control to monitor Ultra Search components.
Using Grid Control, you can set up notification rules to send out email notification
automatically whenever a schedule status reaches certain severity states. For more
information on the using Grid Control to monitor Ultra Search components, see the
Oracle Enterprise Manager Concepts guide.
Crawler Recrawl Policy
You can update the recrawl policy to process documents that have changed or to
process all documents.
In previous releases, "process all documents" did not help when the crawling scope
had been narrowed. For example, if crawling depth was reduced from seven to five,
the PDF mimetype was deleted, or a host inclusion rule was removed, then you had to
remove the affected documents manually in a SQL*Plus session.
With this release, all crawled URLs are subject to crawler setting enforcement, not just
newly crawled URLs.
Federated Search
Traditionally, Ultra Search used centralized search to gather data on a regular basis and
update one index that cataloged all searchable data. This provided fast searching, but
it required that the data source to be crawlable before it could be searched. Ultra
Search now also provides federated search, which allows multiple indexes to perform a
single search. Each index can be maintained separately. By querying the data source at
See Also: "Crawler Page" on page 7-9
See Also: "Creating Web Sources" on page 7-16
See Also: "Creating Web Sources" on page 7-16
See Also: "Configuring the Default Ultra Search Instance" on
page 3-7
See Also: "Editing Synchronization Schedules" on page 7-26
xxiv
search-time, search results are always the latest results. User credentials can be passed
to the data source and authenticated by the data source itself. Queries can be
processed efficiently using the data's native format.
To use federated search, you must deploy an Ultra Search search adapter, or searchlet,
and create an Oracle Database source. A searchlet is a Java module deployed in the
middle tier (inside OC4J) that searches the data in an enterprise information system on
behalf of a user. When a user's query is delegated to the searchlet, the searchlet runs
the query on behalf of the user. Every searchlet is a JCA 1.0 compliant resource
adapter.
Ultra Search Release Information
Ultra Search is released with the Oracle Database, Oracle Application Server, and
Oracle Collaboration Suite. Because of different release numbers in the past, the Ultra
Search release numbers are somewhat confusing.
■ Oracle Ultra Search 9.0.4 is part of Oracle Application Server release 10g (9.0.4).
■ Oracle Ultra Search release 9.0.3 is part of the Oracle Collaboration Suite release
9.0.3.
■ Oracle Ultra Search release 9.2 is part of Oracle9i release 9.2. Oracle Ultra Search
release 1.0.3 was part of Oracle9i release 1 (9.0.1).
■ Oracle Ultra Search release 9.0.2 is part of Oracle9iAS release 2 (9.0.2).
See Also: "Federated Sources" on page 7-22
Introduction to Oracle Ultra Search 1-1
1
Introduction to Oracle Ultra Search
This chapter contains the following topics:
■ Overview of Oracle Ultra Search
■ Ultra Search Components
■ Ultra Search Features
■ Ultra Search System Configuration
Overview of Oracle Ultra Search
Ultra Search is built on the Oracle Database and Oracle Text technology that provides
uniform search-and-locate capabilities over multiple repositories: Oracle databases,
other ODBC compliant databases, IMAP mail servers, HTML documents served up by
a Web server, files on disk, and more.
Ultra Search uses a 'crawler' to collect documents. You can schedule the crawler to suit
the Web sites that you want to search. The documents stay in their own repositories,
and the crawled information is used to build an index that stays within your firewall
in a designated Oracle database. Ultra Search also provides APIs for building content
management solutions.
In addition, Ultra Search offers the following:
■ A complete text query language for text search inside the database
■ Full integration with the Oracle Database and the SQL query language
■ Advanced features like concept searching and theme analysis
■ Attribute mapping to facilitate attribute search across disparate repositories
■ Indexing of all popular file formats (150+)
■ Full globalization, including support for Chinese, Japanese and Korean (CJK), and
Unicode
Ultra Search Components
Ultra Search is made up of the following components:
■ Ultra Search Crawler
■ Ultra Search Backend
■ Ultra Search Administration Tool
■ Ultra Search APIs and Sample Applications