Nick Dimiduk
Amandeep Khurana
FOREWORD BY
Michael Stack
MANNING
HBase in Action
NICK DIMIDUK
AMANDEEP KHURANA
TECHNICAL EDITOR
MARK HENRY RYAN
MANNING
Shelter Island
For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964
Email:
©2013 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning
Publications was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have
the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books are
printed on paper that is at least 15 percent recycled and processed without the use of elemental
chlorine.
Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964
Development editors:
Technical editor:
Technical proofreaders:
Copyeditor:
Proofreaders:
Typesetter:
Cover designer:
ISBN 9781617290527
Printed in the United States of America
1 2 3 4 5 6 7 8 9 10 – MAL – 17 16 15 14 13 12
Renae Gregoire, Susanna Kline
Mark Henry Ryan
Jerry Kuch, Kristine Kuch
Tiffany Taylor
Elizabeth Martin, Alyson Brener
Gordan Salinovic
Marija Tudor
brief contents
PART 1
PART 2
PART 3
PART 4
HBASE FUNDAMENTALS . ....................................................1
1
■
Introducing HBase
3
2
■
Getting started
3
■
Distributed HBase, HDFS, and MapReduce 51
21
ADVANCED CONCEPTS ......................................................83
4
■
HBase table design
85
5
■
Extending HBase with coprocessors
6
■
Alternative HBase clients 143
126
EXAMPLE APPLICATIONS .................................................179
7
■
HBase by example: OpenTSDB
8
■
Scaling GIS on HBase 203
181
OPERATIONALIZING HBASE ............................................237
9
■
Deploying HBase
10
■
Operations 264
iii
239
334
TableMapper 69
TableMapReduceUtil 69
TableOutputFormat 71
TableReducer 70
TaskTracker 61
telemetry 11–12
temporary directory 259
Text 55
TextOuputFormat 70
Thrift
API 156
client library 157–158
gateway deployment 157
interacting with tables
159–162
scripting shell from 156–162
service, launching 159
throughput
MapReduce. See MapReduce
parallel execution 53–55
serial execution 53
throughput vs. latency 52
time series 184–185
aggregation 201
data management 185
data, recording 185
hot spot 185
metadata auto-complete
199–201
reading 201–202
Time Series Database. See TSDB
Time To Live. See TTL
timestamp 19
TimestampFilter 123
tombstone record 30
top (Linux tool) 269
Trend Micro, HBase use 12
truncate command 292
TSDB 182
TSDB.addPoint() 198
TsdbQuery 201
TsdbQuery.createAndSetFilter()
201
INDEX
TsdbQuery.findSpans() 201
TsdbQuery.getScanner() 201
TsdbQuery.run() 201
TTL 101
disabling 115
tuning. See performance, tuning
TwitBase 22–48
U
uberhbck 293
UI, port 259
UID
creating 194
name auto-completion
199–201
UniqueId.getOrCreateId() 194
UniqueId.getSuggestScanner()
200
UniqueId.suggest() 200
UNIX, scripting shell from
144–147
upgrading 285–286
URL shorteners 14
user model, serving 14
user-interaction data,
capturing 11
See also HLog
See also MemStore
WALObserver 129
web search, canonical
problem 9
well-known text. See WKT
WhileMatchFilter 47
Whirr 249–250
within query
client side 228–231
filter 231–234
WKT 224
workload, tuning. See performance, tuning
WritableByteArrayComparable
121
write buffer 258
write path 26–28
write pattern, defining 89
write-ahead log. See WAL
Y
Yahoo! Cloud Serving Benchmark (YCSB) 275–276
Yahoo!, Hadoop development 8
V
Z
value 33
ValueFilter 46, 123
VerifyReplication 303
version 33
version information 32
versioning, doesn’t map to
relational database 113
vm.swappiness 261
W
WAL 26, 28, 129
disabling 28
Z-order curve 216
zk_dump 288–289, 311
ZooKeeper 66–68, 311–312
hardware 246
quorum 257
quorum address 24
root znode 259
status 288–289
timeout 259
zookeeper.session.timeout 259
zookeeper.znode.parent 259
DATABASE
HBase IN ACTION
SEE INSERT
Dimiduk Khurana
●
H
Base is a NoSQL storage system designed for fast, random
access to large volumes of data. It runs on commodity
hardware and scales smoothly from modest datasets to
billions of rows and millions of columns.
HBase in Action is an experience-driven guide that shows you
how to design, build, and run applications using HBase. First,
it introduces you to the fundamentals of handling big data.
Then, you’ll explore HBase with the help of real applications
and code samples and with just enough theory to back up the
practical techniques. You’ll take advantage of the MapReduce
processing framework and benefit from seeing HBase best
practices in action.
What’s Inside
“
A difficult topic lucidly
explained.
”
—John Griffin, coauthor of
Hibernate Search in Action
“
”
—Charles Pyle, APS Healthcare
Written for developers and architects familiar with data storage and processing. No prior knowledge of HBase, Hadoop, or
MapReduce is required.
Nick Dimiduk is a Data Architect with experience in social media
analytics, digital marketing, and GIS. Amandeep Khurana is a
Solutions Architect focused on building HBase-driven solutions.
To download their free eBook in PDF, ePub, and Kindle formats, owners
of this book should visit manning.com/HBaseinAction
$39.99 / Can $41.99
”
—From the Foreword by Michael
Stack, Chair of the Apache HBase
Project Management Committee
Amusing tongue-in-cheek
style that doesn’t detract
from the substance.
When and how to use HBase
● Practical examples
● Design patterns for scalable data systems
● Deployment, integration, design
●
MANNING
“
Timely, practical ... explains
in plain language how
to use HBase.
[INCLUDING eBOOK]
“
Learn how to think the
HBase way.
”
—Gianluca Righetto, Menttis