Tải bản đầy đủ (.pdf) (18 trang)

Tài liệu Solr 1.4 Enterprise Search Server- P8 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.08 MB, 18 trang )

[ 302 ]
B
batchSize 78
bf parameter 117
Blacklight Online Public Access Catalog.
See Blacklight OPAC, Ruby On Rails
integrations
Blacklight OPAC, Ruby On Rails
integrations
about 263
data, indexing 263-267
Boolean operators
AND 100
AND operator, combining with OR
operator 101
AND or && operator 101
NOT 100
NOT operator 101
OR 100
OR or || operator 101
bool element 92
boost functions
boosting 137, 138
r_event_date_earliest eld 138
boosting 70, 107
boost queries
boosting 134-137
bq parameter(s) 134
bucketFirstLetter 148
buildOnCommit 174
buildOnCommit, spellchecker option 174


buildOnOptimize, spellchecker option 174
C
caches
tuning 281
CapitalizationFilterFactory lter 63
CCK 252
Chainsaw
URL 204
characterEncoding, FileBasedSpellChecker
option 175
CharFilterFactory 62
CI 128
classname 173
CM 197
CMS 250
Co-ordination Factor.
See coord
collapse.facet, eld collapsing 192
collapse.eld, eld collapsing 192
collapse.info.doc, eld collapsing 193
collapse.maxdocs, eld collapsing 193
collapse.threshold, eld collapsing 193
collapse.type, eld collapsing 192
combined index 32
CommonsHttpSolrServer 235
complex systems, tuning
about 271
CPU usage 272
memory usage 272
scale deep 273

scale high 273
scale wide 273
system changes 272
components
about 111, 159
solrcong.xml 159
compressed, eld option 41
conguration les, Solr
<requestHandler> tag 25
solrcong.xml le 25
standard request handler 26
Conguration Management.
See CM
ConsoleHandler 204
Content Construction Kit 252
Content Management System.
See CMS
Continuous Integration. See CI
coord 112
copyField directive
about 46
uses 46
CoreDescriptor classes 231
core, managing 209, 210
count, Stats component 189
CPU usage 272
cron 289
CSV, sending to Solr
about 72
conguration options 73, 74

curl
using, to interact with Solr 66, 68
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 303 ]
D
data, indexing
stream.body parameter 67
stream.le parameter 67
stream.url parameter 67
through HTTP POST 67
ways 67
database
and Lucene search index, differences 9, 10
DataImportHandler.
See DIH
dataSource attribute 78
date element 93
date facet, parameters
facet.date 151
facet.date.end 151
facet.date.gap 151
facet.date.hardend 151
facet.date.other 152
facet.date.start 151
dates, Faceting 146
debugQuery, diagnostic parameter
about 98
explainOther 98
defaults 111

defaultSearchField, schema.xml settings 47
defType, query parameter 95
defType parameter 128
deleteById() 232
deleteByQuery() 232
denormalizing
one to many associated data 36, 37
one to one associated data 36
deployment process, Solr 197, 198
df, query parameter 95
diagnostic query parameters
debugQuery 98
echoHandler 98
echoParams 98
indent 98
dictionary
about 169
building, from source 176, 177
DIH
about 74, 236
capabilities 74
dataSource attribute 78
development console 76, 77
documents, entities 78
entity 78
getting started 75
mb-dih-artists-jdbc.xml le 75, 76
query attribute 78
reference document, URL 74
Solr, registering with 75

solrcong.xml 75
DIH, development console
DataSources, JdbcDataSource type 77, 7
8
DIH control form 77
documents, entities 79
elds 79
importing with 80
DIH, transformers
dateTimeFormat attributes 79
splitBy attributes 79
template attributes 79
DIH elds
column attribute 79
name attribute 79
directory structure, Solr
build 13
client 13
dist 13
example 14
example/etc 14
example/multicore 14
example/solr 14
example/webapps 14
lib 14
site 14
src 14
src/java 14
src/scripts 14
src/solrj 14

src/test 14
src/webapp 14
Disjunction-Max.
See dismax
DisjunctionMaxQuery
about 130
boosts, conguring 131
queried elds, conguring 131
dismax 113
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 304 ]
dismax handler. See Dismax Solr request
handler
dismax query handler 131
dismax request handler 128
Dismax Solr request handler
about 128
automatic phrase boosting 132, 133
boost functions, boosting 137, 138
boost queries, boosting 134-137
debugQuery option used 129
default search 140, 141
DisjunctionMaxQuery 130
features, over standard handler 129
limited query syntax 131
min-should-match 138
mm query parameter 138
phrase slop, conguring 134
distanceMeasure, spellchecker option 174

distributed search 32
div(x,y), mathematical primitives 121
doc element 93
docText eld data 233
document
deleting 70
documentCache 281
Domain Specic Language.
See DSL
double element 92
DoubleMetaphone, phonetic encoding
algorithms 58
DoubleMetaphoneFilterFactory analysis
lter, options
inject 59
maxCodeLength 59
Drupal, options
Apache Solr Search integration module 251
Solr, hosted by Acquia 252
DSL 269
dynamic elds
* fallback 46
about 45
E
echoHandler, diagnostic parameter 98
echoParams 152
echoParams, diagnostic parameter 98
EdgeNGram analyzer 61
EdgeNGramFilterFactory 61
EdgeNGramTokenizerFactory 61

Elasticfox 276
Embedded-Solr 65
embedded Solr
legacy Lucene, upgrading from 237
using for rich clients 237
using in in-process streaming 236, 237
EmbeddedSolrServer class 224
encoder attribute 59
EnglishPorterFilter Factory, stemming 54
Entity tags 279
ETag 279
ETL 78
eval() function 238
existence (and non-existence) queries 107
explicit mapping 56
Extract Transform and Load.
See ETL
extraParams entry 242
F
facet 146
facet.date 151, 286
examples 151
facet.date.end 151
facet.date.gap 151
facet.date.hardend 151
facet.date.other 152
facet.date.start 151
facet.eld 147
facet.limit 147
facet.method 148

facet.mincount 147
facet.missing 148
facet.missing parameter 143
facet.offset 147
facet.prex 148, 156
facet.query 286
facet.query parameter 152, 153
facet.sort 147
facet_counts 143
faceted navigation 7, 141, 145, 153
faceted search 149, 220, 221
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 305 ]
faceting
about 141
alphabetic range bucketing (A-C, D-F, and
so on) 148, 149
date facet parameters 151, 152
dates 146, 149, 150
example 142, 143
facet.eld 147
facet.limit 147
facet.method 148
facet.mincount 147
facet.missing 148
facet.missing parameter 143
facet.offset 147
facet.prex 148
facet.sort 147

facet_counts 143
facet prexing (term suggest) 156-158
eld, requisites 146
eld values (text) 146
lters, excluding 153-155
Local Params 155
on arbitrary parameters 152, 153
queries 146
release types, exampleexample 142, 143
schema changes, MusicBrainz example 144,
145
text 147
types 146
faceting, dates
about 149
examples 150
Facet prexing 156
Familiarity
URL 204
FastLRUCache 280
fetchSize 78
eld, attributes
default (optional) 42
name 42
required (optional) 42
type 42
eld, IndexBasedSpellChecker option 174
eld collapsing, search components
about 191, 192
collapse.facet 192

collapse.eld 192
collapse.info.count 193
collapse.info.doc 193
collapse.maxdocs 193
collapse.threshold 193
collapse.type 192
conguring 192, 193
SOLR-236 191
eld denitons, schema.xml le
attributes 42
copyField, using 46
copyField directive, using 46
default (optional) 42
dynamic elds 45
name 42
required (optional) 42
schema.xml, settings 47
sorting 44
sorting, limitations 44, 45
type 42
eld length.
See eldNorm
eld list. See 
eldNorm 112
eld options, schema.xml le
compresses 41
indexed 41
multiValued 41
omitNorms (advanced) 41
positionIncrementGap (advanced) 42

sortMissingFirst 41
sortMissingLast 41
stored 41
termVectors (advanced) 41
eld qualier 102, 103
eld references, function queries 120
eldType, spellchecker option 174
eld types, schema.xml le
<elds/> tag 40
<types/> tag 40
class attribute 40
eld values (text), Faceting 146
le, spellchecker 172
FileBasedSpellChecker options
characterEncoding 175
sourceLocation 175
FileHandler logging 204
lterCache 280
lter element 50
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 306 ]
ltering 108, 109
lters, Faceting
excluding 153, 155
rst-components 111
 220
, output related parameter 96
oat element 92
fq, query parameter 95

function argument
limitations 120
function queries
_val_ pseudo-eld hack 117
about 117
bf parameter 117
Daydreaming search example 119
example 118
eld references 120
function references 120
incorporating, to searches 117
t_trm_lookups 118
function query, tips 128
function references
mathematical primitives 121
function references, function queries 120
G
g, query parameter 95
g.op, query parameter 95
generic XML data structure
about 92
appends 111
arr, XML element 92
bool element 92
components 111
date element 93
defaults 111
double element 92
rst-components 111
oat element 92

int element 92
invariants 111
last-components 111
long element 92
lst, XML element 92
str element 92
Git
URL 11
H
Hadoop 225
HathiTrust 273
Heritrix
using, to download artist pages 226, 227
highlighted eld list.
See hl.
highlighting component, search
components
about 161
conguring 163
example 161, 163
hl 164
hl. 164
hl.fragsize 164
hl.highlightMultiTerm 164
hl.mergeContiguous 165
hl.requireFieldMatch 164
hl.snippets 164
hl.usePhraseHighlighter 164
hl alternateField 165
hl formatter 165

hl fragmenter 165
hl maxAnalyzedChars 165
parameters 164
hl, highlighting component 164
hl. 161
hl., highlighting component 164
hl.fragsize, highlighting component 164
hl.highlightMultiTerm, highlighting
component 164
hl.increment, regex fragmenter 166
hl.mergeContiguous, highlighting
component 165
hl.regex.maxAnalyzedChars, regex
fragmenter 166
hl.regex.pattern, regex fragmenter 166
hl.regex.slop, regex fragmenter 166
hl.requireFieldMatch, highlighting
component 164
hl.snippets, highlighting component 164
hl.usePhraseHighlighter, highlighting
component 164
hl alternateField, highlighting component
165
hl formatter, highlighting component
about 165
hl.simple.pre and hl.simple.post 165
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 307 ]
hl fragmenter, highlighting component 165

hl maxAlternateFieldLength, highlighting
component 165
hl maxAnalyzedChars, highlighting
component 165
home directory, Solr
bin 15
conf 15
conf/schema.xml 15
conf/solrcong.xml 15
conf/xslt 15
data 15
lib 15
HTML, indexing in Solr 227
HTMLStripStandardTokenizerFactory 52
HTMLStripStandardTokenizerFactory
tokenizer 227
HTMLStripWhitespaceTokenizerFactory 52
HTTP caching 277-279
HTTP server request access logs, logging
about 201, 202
log directory, creating 201
Tailing 202
I
IDF 33
idf 112
ID eld 44
indent, diagnostic parameter 98
index 31
index-time
and query-time, boosting 113

versus query-time 57
index-time boosting 70
IndexBasedSpellChecker options
eld 174
sourceLocation 174
thresholdTokenFrequency 175
index data
document access, controlling 221
securing 220
indexed, eld option 41
indexed, schema design 282
indexes
sharding 295
indexing strategies
about 283
factors, committing 285
factors, optimizing 285
unique document checking, disabling 285
Index Searchers 280
Information Retrieval. S
ee IR
int element 92
InternetArchive 226
invariants 111
Inverse Document Frequency. S
ee IDF
inverse reciprocals 125
IR 8
ISOLatin1AccentFilterFactory lter 62
issue tracker, Solr 27

J
J2SE
with JConsole 212
JARmageddon 205
jarowinkler, spellchecker 172
java.util.logging package 203
Java class names
abbreviated 40
org.apache.solr.schema.BoolField 40
Java Development Kit (JDK)
URL 11
JavaDoc tags 234
Java Management Extensions. S
ee JMX
Java Naming and Directory Interface. S
ee
JNDI
Java replication
versus script 289
JavaScript Object Notation. S
ee JSON
Java Server Pages. S
ee JSPs
JConsole GUI
about 212
URL 212
JDK [1.4] logging 203
JDK logging 203
Jetty
startup integration 205

web.xml, customizing 218
jetty.xml 201
JIRB tool 215
JMX
about 212
access, controlling 220
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 308 ]
information extracting, JRuby used 215
Solr, starting with 212-215
Jmx4r 217
JMX Console 212
JNDI 16, 200
JNDI name 200
jQuery 240
jQuery Autocomplete widget 241, 242
JRuby
using, to extract JMS information 215
JRuby Interactive Browser tool. S
ee JIRB
tool
JSON 238
JSONP 242
JSON with Padding. S
ee JSONP
JSPs 17
JUL 203
JVM
conguration 277

K
KeepWordFilterFactory lter 62
KeywordTokenizerFactory 52
KStem, stemming 55
L
last-components 111
LengthFilterFactory 145
LengthFilterFactory lter 62
LetterTokenizerFactory 52
limited query syntax 131
disabling 132
linear(x,m,c), miscellaneous math 122
Local Params 155
LocalSolr component 194
log(x), mathematical primitives 121
Log4j
conguring, URL 205
logging to 204
Log4j JAR le
URL 204
logarithms 123, 124
Logback
URL 204
logging
about 201
HTTP server request access logs 201, 202
levels. managing at runtime 205, 206
Solr application logging 203
types 201
logging.properties le 204

long element 92
LowerCaseFilterFactory lter 62
LRUCache 280
lst, XML element 92
Lucene
about 8
DisjunctionMaxQuery 130
features 8
scoring 112
Lucene’s query syntax
URL 44
LUCENE-1435 45
Lucene search index
and database, differences 9, 10
Lucene syntax
query expression 100
query syntax 99
sub-expressions 101
M
mailing lists, Solr
URL 26
Managed Bean. S
ee MBeans
mandatory clause, expression query 100
map() function 243
map(x,min,max,target), miscellaneous math
121
master server
indexing into 292
mathematical primitives, function

references
abs(x) 121
div(x,y) 121
log(x) 121
pow(x,y) 121
product(x,y,z, ) 121
sqrt(x) 121
sum(x,y,z, ) 121
Maven 228
max(x,c), miscellaneous math 121
max, Stats component 189
maxGramSize 60
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 309 ]
maxScore 93
maxWarmingSearchers 284
mb-dih-artists-jdbc.xml le 75, 76
mb_attributes.txt
content 145
MBeans 212
mean, Stats component 189
member_id eld 36
memory usage 272
Metaphone, phonetic encoding algorithms
58
min, Stats component 189
min-should-match
about 138
basic rules 139

multiple rules 139
rules 139
rules, choosing 140
minGramSize 60
miscellaneous math, function references
linear(x,m,c) 122
map(x,min,max,target) 121
max(x,c) 121
recip(x,m,a,c) 122
scale(x,minTarget,maxTarget) 121
missing, Stats component 189
MLT, search components
as dedicated request handler 182
as request handler, with external input
document 183
as Solr component 182
conguration parameters 183
mlt 183
mlt.boost 186
mlt.count 183
mlt. 185
mlt.maxntp 186
mlt.maxqt 186
mlt.maxwl 185
mlt.mindf 185
mlt.mintf 185
mlt.minwl 185
mlt.qf 185
parameters 185, 186
parameters, specic to MLT request handler

184
results, example 186, 188
specic parameters 183
using, ways 182
mlt.boost 186
mlt. 185
mlt.maxntp 186
mlt.maxqt 186
mlt.maxwl 185
mlt.mindf 185
mlt.mintf 185
mlt.minwl 185
mlt.qf 185
mm query parameter 138
mm specication formats
as examples 139
more-like-this search component. S
ee MLT,
search components
more like this plugin 9
multi-word synonyms 56
multicore
need for 210, 211
multiple indices 32
multiple Solr servers
documents, assigning to shards 296
indexes, sharding 295
master server, indexing into 292
replication, conguring 291
script versus Java replication 289

searches, distributing 291
search queries, distributing across slaves
293, 294
shards, searching across 297, 298
slaves, conguring 292, 293
starting 290, 291
multiValued, eld option 41
multiValued eld 221
MusicBrainz.org 30, 31
N
n-gramming costs
Edge n-gramming costs 62
tokenizer based n-gramming costs 62
N-gramming costs, substring indexing
a_name eld 61
a_name eld + a_ngram eld 61
minGramSize 62
name 173
name attribute 143
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 310 ]
name eld 33
newSearch query 284
NOT operator 100, 101
numFound 93
Nutch 225
Nutch + Web Archive eXtensions. S
ee
NutchWAX

NutchWAX 225
O
OLTP 78
omitNorms (advanced), eld option 41
omitNorms, schema design 282
omitTermFreqAndPositions, schema design
282
Online Transaction Processing systems. S
ee
OLTP
optional clause, expression query 100
ord() function 120, 122
ord(eldReference) 122
ord/rord 122
ord and rord, function references
ord(eldReference) 122
rord(eldReference) 122
OR operator 100
OR or || operator 101
output related parameters, query parameters
 96
sort 96
version 98
wt 97
outputUnigrams controls 288
P
parse
parameter 243
parse() function 244
partial indexing. S

ee substring indexing
PatternReplaceFilterFactory lter 63
PatternTokenizerFactory 53
pf, tips 134
pf parameter 133
phoneme 58
phonetic encoding algorithms
DoubleMetaphone 58
encoder attribute 59
Metaphone 58
RenedSoundex 58
Soundex 58
PhoneticFilterFactory lter 59
phonetic sounds-like
about 58
phonetic encoding algorithms 58
phrase queries 103
phrase search performance
improving 287
shingling, solution 287, 288
phrase slop
conguring 134
Plain Old Java Objects. S
ee POJOs
POJOs
indexing 234
PorterStemFilterFactory, stemming 54
positionIncrementGap (advanced), eld
option 42
pow(x,y), mathematical primitives 121

product(x,y,z, ), mathematical primitives
121
prohibited clause, expression query 100
PRONOM Unique Identier. S
ee PUID
public searches
securing 219, 220
PUID 31
Q
q parameter
processing 175
qt, miscellaneous parameter 95
QTime 93
queries, Faceting 146
query-time
and index-time, boosting 113
versus index-time 57
query-time boosting 70
query attribute 78
query converter 175
query elevation, search components
about 166
cong-le 167, 168
conguration parameters 167
conguring 167
elevateArtists.xml 168
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 311 ]
forceElevation 168

queryFieldType 168
query expression, clauses
mandatory clause 100
optional clause 100
prohobited clause 100
query parameters
about 95
defType 95
df 95
diagnostic 98
fq 95
output related parameters 96
q 95
q.op 95
qt 95
result paging 96
rows 96
start 96
query parser plugin 128
QueryResponse object 235
queryResultCache 280
query spell checker
indexed content based 8, 9
query syntax
about 99
boosting 107
documents, matching 99
existence (and non-existence) queries 107
eld qualier 102, 103
fuzzy queries 105

phrase queries 103
query expression, clauses 100
special characters 108
sub-expressions 101
term proximity 103
wildcard queries 103, 104
R
r_a_name 42
r_attributes 144
r_event_date_earliest eld 138
r_name_facetLetter 148
r_ofcial 144
r_type 144
range queries
[ and ] brackets 106
{ and } brackets 106
about 105, 106
date math 106, 107
readOnly 77
recip(x,m,a,c), miscellaneous math 122
reciprocals and rord, with dates 126, 127
RecordItem 234
RenedSoundex, phonetic encoding
algorithms 58
regex fragmenter, options
hl.increment 166
hl.regex.pattern 166
hl.regex.slop 166
hl regex.maxAnalyzedChars 166
release’s artist’s name. S

ee r_a_name
remote streaming
about 68, 221
disabling 69
enabling 69
remote streaming feature 224
RemoveDuplicatesTokenFilterFactory lter
62
renderResult() method 247
replication
and sharding, combining 298-300
conguring 291
requestHandler 207
request handler
about 110
conguration, creating 110
conguring 110
result() function 243, 244
right eld type/analysis, using 109
rOfcial 144
rord() 122
rord(eldReference) 122
rows parameter 96, 242
rsolr
versus solr-ruby 269
Ruby On Rails integrations
acts_as_solr 254-259
acts_as_solr plugin 253
Blacklight OPAC 263
Convention over Conguration 253

display, customizing 267
elds display, customizing 268, 269
solr-ruby versus rsolr 269
solr_data 257
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 312 ]
S
scale() function
example 123
inverse reciprocals, using 124, 125
logarithms, using 123, 124
reciprocals and rord with dates, using
126, 127
scale(x,minTarget,maxTarget),
miscellaneous math 121
scale deep 298
scale high 276
scale wide 289
schema, Solr
<copyField> tag 25
<elds> tag 25
<types> tag 25
primary key 25
text, eld name 25
schema.xml, settings
defaultSearchField 47
solrcong.xml 47
solrQueryParser 47
uniqueKey 47

schema.xml le
<elds/> tag 40
<types/> tag 40
eld denitions 42, 43
eld options 40
eld types 40
sample 45
schema design
about 34
compressed eld option 282
data, denormalizing 36
entities returned from search, determining
35
inclusion of elds used in search results,
omitting 38, 39
indexed 282
omitNorms 282
omitTermFreqAndPositions 282
one to many associated data, denormalizing
36, 37
one to one associated data, denormalizing
36
Solr powered search, determining 35
stored 282
score boosting. S
ee boosting
scoring
about 112
co-ordination factor (coord) 112
factors 112

eld length (eldNorm) 112
Inverse Document Frequency (idf) 112
query-time and index-time, boosting 113
term frequency (tf) 112
troubleshooting 113, 114
script
versus Java replication 289
search, distributing across slaves
about 291
master server, indexing into 292
slaves, conguring 292, 293
search components
about 161
eld collapsing 191, 192
highlighting component 161
MLT (more-like-this) 182
query elevation 166
spellcheck 169
Stats component 189
terms component 194
termVector component 194
search engine 161, 223, 237, 266, 272
searcher.num_docs attribute 216
SearchHandler
per search interface 207
search handler 128
searching 89, 90
server access
limiting 217, 219
Servlet container

and Solr, differences 199
installing in 199
solr.home property, dening 199
sharding
and replication, combining 298-300
documents, assigning 296
indexes 295, 296
searching across 297, 298
ShingleFilterFactory 288
shingling 133, 127, 287
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 313 ]
Simple Java interface. See SolrJ
Simple Logging Facade for Java package.
S
ee SLF4J package
single combined index
issues 34
schema.xml snippet, sample 32
using, issues 33
single Solr server
optimizing 276
single Solr server, optimizing
faceting performance, enhancing 286
HTTP caching 277-279
indexing strategies 283, 284
JVM conguration 277
phrase search performance, improving 287
schema design considerations 282

Solr caching 280, 281
term vectors, using 286, 287
tuning caches 281
slaves
conguring 292
search queries, distributing across slaves
293, 294
SLF4j 20
SLF4J package 203
SnowballPorterFilterFactory, stemming 54
Solr
about 7, 10
and Servlet container, differences 199
building 13
communicating with 65
complex systems, tuning 271, 272
conguration les 25, 26
cores, managing 209, 210
CSV, sending to 72
deploying 17
deployment process 197, 198
directory structure 13
disjunction-max query handler 9
Faceting 141
features 8, 9
ltering 108, 109
function query, incorporating to searches
117
generic XML data structure 92
home directory 15

interacting with, curl used 66, 68
issue tracker 27
local le accessing, example 68
logging 201
mailing list 26
ofcial site, URL 11
powered artists building, autocomplete
widget with jQuery used 240, 241,
242
powered artists building, autocomplete
widget with JSONP used 243
prerequisites 11
query parameters 95
query syntax 99
remote streaming 68, 69
request handlers 110
resources 26
running 17-19
sample data, loading 20, 21
schema 25
search request handler 128
securing 217
simple query, running 22-24
solr.solr.home, searching for 16
sorting 109
spell check plugin 9
starting 15, 16
starting, with JMX 212-215
statistics page 24
system changes 272

testing 13
tools 58
XML, sending to 69, 70
XML response format 93
Solr’s DIH DataImportHandler contrib
add-on 66
Solr’s Wiki 26
Solr, accessing from PHP applications
about 247, 248
Drupal, options 250
solr-php-client 248-250
Solr, communicating with
convenient client API 65
data formats 66
data streamed remotely 66
Direct HTTP 65
Solr’s lesystem 66
Solr, data formats
rich documents 66
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 314 ]
Solr-binary 66
Solr-XML 66
Solr, examples
structure 223
summary 224
Solr, lters
CapitalizationFilterFactory 63
CharFilterFactory 62

ISOLatin1AccentFilterFactory 62
KeepWordFilterFactory 62
LengthFilterFactory 62
LowerCaseFilterFactory 62
PatternReplaceFilterFactory 63
RemoveDuplicatesTokenFilterFactory 62
StandardFilterFactory 62
write your own 63
Solr, integrating
JavaScript used 238, 239
Solr, prerequisites
Apache ant 11
Java Development Kit (JDK) 11
Subversion or Git 11
Solr, securing
document access, controlling 221
index data, securing 220
JMX access, controlling 220
server access, limiting 217, 219, 220
SOLR-236 191
solr-balancer 294
Solr-binary 66
solr-php-client
a_member_name array 249
about 248, 249, 250
Apache_Solr_Service, conguration 249
solr-ruby
versus rsolr 269
Solr-XML 66
solr.body feature 68

solr.home property
dening 199
JNDI (Java Naming and Directory Interface)
200
solr.war le 200
solr.setParser(new XMLResponseParser())
235
solr.solr.home
searching for 16
solr.TextField 48
Solr 1.3 11
Solr 1.4 11
Solr admin
Assistance area 20
example 19
Make a Query text box 20
navigation menu 19
Solr application logging, logging 203
Jetty, startup integration 205
Log4j, logging to 204
logging output, conguring 203
log levels, managing at runtime 205, 206
solrbook-packtpub 273
Solr caching
autowarmCount 281
class 281
conguring 281
documentCache 281
lterCache 280
queryResultCache 280

size 281
Solr cell
binary content, extracting 81, 82
documents, indexing with 81
karaoke lyrics, extracting 83-85
richer documents, indexing 85-87
Solr, conguring 83
Solr cores
cores, managing 209, 210
multicore, need for 210, 211
solr.xml, conguring 208, 209
solrcong.xml
<requestHandler /> elements 159
about 75
solrcong.xml, schema.xml settings 47
Solr DIH Wiki page
URL 79
SolrDocumentList object 235
SolrDocument object 235
Solr home 16
SolrIndexSearch Mbean 214
SolrJ
about 65, 224
client API 230-233
CommonsHttpSolrServer 224
embedded Solr, need for 235, 236
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 315 ]
EmbeddedSolrServer class 224

Heritrix using, to download artist pages
226, 227
HTML, indexing 227-230
HTMLStripStandardTokenizerFactory
tokenizer 227
POJOs, indexing 234, 235
stream.le parameter 224
Solr JIRA
URL 12
SolrJS
about 245, 246
addWidget() method 247
project homepage, URL 245
SolrJS Manager object 247
URL 220
Solrmarc 236
SolrQuery object 235
solrQueryParser, schema.xml settings 47
Solr resources
about 26
issue tracker 27
mailing lists 26
Solr’s Wiki 26
Solr search components
LocalSolr component 194
terms component 194
termVector component 194
sort, output related parameter 97
sorting
about 44, 109

limitations 44
string type 45
title_sort type 45
sortMissingFirst, eld option 41
sortMissingLast, eld option 41
Soundex, phonetic encoding algorithms 58
sourceLocation, FileBasedSpellChecker
option 175
sourceLocation, IndexBasedSpellChecker
option 174
spellcheck 177
spellcheck, search components
a_spell, spellchecker 172
a_spellPhrase, spellchecker 172
about 169
alternative approach 180, 182
classname 173
dictionary, building from source 176
le, spellchecker 172
FileBasedSpellChecker options 175
IndexBasedSpellChecker options 174
indexed content 169
jarowinkler, spellchecker 172
mispelled query, example 178, 180
name 173
q parameter, processing 175
requests, issuing 177, 178
schema conguration 169-171
solrcong.xml, conguration in 171, 172
Solr conguring, ways 169

spellcheck.q parameter, processing 176
spellchecker, index and le based 173
spellcheckers (dictionaries), conguring
173
spellcheckIndexDir 173
text le of words 169
spellcheck.collate 178
spellcheck.count 177
spellcheck.dictionary 177
spellcheck.extendedResults 178
spellcheck.onlyMorePopular 178
spellcheck.q 177
spellcheck.q parameter
processing 176
spellchecker, index and le based
accuracy 174
buildOnCommit 174
buildOnOptimize 174
classname 173
distanceMeasure 174
eldType 174
name 173
spellcheckIndexDir 173
spellcheckIndexDir 173
spell check plugin 9
Splunk 205
sqrt(x), mathematical primitives 121
Squid
URL 279
standard component list 160

StandardFilterFactory lter 62
StandardTokenizerFactory 52
start 93
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 316 ]
startEmbeddedSolr() 234
start parameter 96
stats, Stats component 189
stats.facet, Stats component 190
stats.eld, Stats component 189
Stats component, search components
about 189
conguring 189
count 189
max 189
mean 189
min 189
missing 189
statistics, for track durations 190
stats 189
stats.facet 190
stats.eld 189
stddev 189
sum 189
sumOfSquares 189
status 93
stddev, Stats component 189
stemming
about 54

EnglishPorterFilterFactory 54
implementations 54
KStem 55
PorterStemFilterFactory 54
SnowballPorterFilterFactory 54
StopFilterFactory 186
used, for stop words ltering 57
stop words
ltering, StopFilterFactory used 57
stored, eld option 41
stored, schema design 282
stream.body parameter 67
stream.le parameter 67, 224
stream.url parameter 67
StreamingUpdateSolrServer 284
str element 92
string type 45
sub-expressions
about 101
prohibited clause, limitations 102
substring indexing
about 60
analyzer conguration, n-grams used 60
EdgeNGramFilterFactory 61
EdgeNGramTokenizerFactory 61
n-gramming costs 61
NGramFilterFactory, conguring with min-
GramSize of 2 60
NGramFilterFactory, conguring with min-
GramSize of 5 60

Subversion
URL 11
sum(x,y,z, ), mathematical primitives 121
sum, Stats component 189
sumOfSquares, Stats component 189
synonyms
=> 56
about 55
ignoreCase, setting true 56
index-time versus query-time 57
WordNet, thesarus 55
T
t_duration 152
t_shingle 288
t_trm_lookups 118
Tailing 202
term-suggest 141, 156
term frequency. S
ee tf
term proximity 103
terms component 194
termVector component 194
termVectors 186
term vectors 286, 287
termVectors (advanced), eld option 41
text analysis
about 47
experimenting with 50, 51
highlight matches 51
index box 51

multi-word synonyms 56
n-gram 60
n-gramming costs 61, 62
partial indexing 60
phonetic sounds-like 58
query box 51
stemming 54, 55
stop words 58
substring indexing 60
synonyms 55
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
[ 317 ]
term text 51
text eld type 50
text eld type denition, conguration 48
text eld type denition, conguring 49
tokenizer 52
verbose output 51
WordDelimiter analyzer 53
WordDelimiterFilterFactory 53
WorkDelimiterFilterFactory 54
text eld type 50
tf 112
threaded_test.rb script 283, 284
thresholdTokenFrequency,
IndexBasedSpellChecker option 175
title_sort type 45
tokenizer
about 50

HTMLStripStandardTokenizerFactory 52
HTMLStripWhitespaceTokenizerFactory 52
KeywordTokenizerFactory 52
LetterTokenizerFactory 52
PatternTokenizerFactory 53
StandardTokenizerFactory 52
WhitespaceTokenizerFactory 52
Tomcat 199
TPS 272
track_PUID eld 33
Transactions Per Second. S
ee TPS
U
uniqueKey, schema.xml settings 47
uniqueKey eld 232, 233
V
version, output related parameter 98
Vigilog
URL 204
W
WAR 199
web.xml
customizing, in Jetty 218
Web application archive. S
ee WAR
WebTrends 202
WhitespaceTokenizerFactory 52
wildcard queries
about 103, 104
fuzzy queries 105

WordDelimeterFilterFactory 51
WordDelimeterFilterFactory,
tokenizer action 50
WordDelimiter analyzer
splitting, ways 53, 54
tokenizing, ways 53, 54
WordDelimiterFilterFactory 53
WordNet thesarus 55
write your own lter 63
wt, output related parameter 97
X
XML, sending to Solr
about 69, 70
changes, committing 71
commit and optimize 71
documents, deleting 70
rollback command 71
uncommitted changes, withdrawing 71
XML response format
<lst name=”response header”> 93
<result name="response"
numFound="1002272" start="0"
maxScore="1.0"> 93
about 93
maxScore 93
numFound 93
QTime 93
start 93
status 93
URL, parsing 94

Y
y, argument 120
Z
zip format 292
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
Thank you for buying
Solr 1.4 Enterprise Search Server
Packt Open Source Project Royalties
When we sell a book written on an Open Source project, we pay a royalty directly to that
project. Therefore by purchasing Solr 1.4 Enterprise Search Server, Packt will have given some
of the money received to the Apache Solr project.
In the long term, we see ourselves and you—customers and readers of our books—as part of
the Open Source ecosystem, providing sustainable revenue for the projects we publish on.
Our aim at Packt is to establish publishing royalties as an essential part of the service and
support a business model that sustains Open Source.
If you're working with an Open Source project that you would like us to publish on, and
subsequently pay royalties to, please get in touch with us.
Writing for Packt
We welcome all inquiries from people who are interested in authoring. Book proposals
should be sent to If your book idea is still at an early stage and you
would like to discuss it rst before writing a formal book proposal, contact us; one of our
commissioning editors will get in touch with you.
We're not just looking for published authors; if you have strong technical skills but no writing
experience, our experienced editors can help you develop a writing career, or simply get some
additional reward for your expertise.
About Packt Publishing
Packt, pronounced 'packed', published its rst book "Mastering phpMyAdmin for Effective
MySQL Management" in April 2004 and subsequently continued to specialize in publishing
highly focused books on specic technologies and solutions.

Our books and publications share the experiences of your fellow IT professionals in adapting
and customizing today's systems, applications, and frameworks. Our solution-based books
give you the knowledge and power to customize the software and technologies you're using
to get the job done. Packt books are more specic and less general than the IT books you have
seen in the past. Our unique business model allows us to bring you more focused information,
giving you more of what you need to know, and less of what you don't.
Packt is a modern, yet unique publishing company, which focuses on producing quality,
cutting-edge books for communities of developers, administrators, and newbies alike. For
more information, please visit our website:
www.PacktPub.com.
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327
JasperReports for Java
Developers
ISBN: 1-904811-90-6 Paperback: 344 pages
Create, Design, Format, and Export Reports with the
world's most popular Java reporting library
1. Get started with JasperReports, and develop the
skills to get the most from it
2. Create, design, format, and export reports
3. Generate report data from a wide range of
datasources
4. Integrate Jasper Reports with Spring,
Hibernate, Java Server Faces, or Struts
JBoss Portal Server Development
ISBN: 978-1-847194-10-7 Paperback: 276 pages
Create dynamic, feature-rich, and robust enterprise
portal applications
1. Complete guide with examples for building
enterprise portal applications using the

free, open-source standards-based JBoss
portal server
2. Quickly build portal applications such as B2B
web sites or corporate intranets
3. Practical approach to understanding concepts
such as personalization, single sign-on,
integration with web technologies, and
content management
Please check www.PacktPub.com for information on our titles
This material is copyright and is licensed for the sole use by William Anderson on 26th August 2009
4310 E Conway Dr. NW, , Atlanta, , 30327

×