Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P3 docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (212.35 KB, 10 trang )


xx

CONTENTS



25

Arrays in SQL 575

25.1 Arrays via Named Columns 576
25.2 Arrays via Subscript Columns 580
25.3 Matrix Operations in SQL 581
25.3.1 Matrix Equality 582
25.3.2 Matrix Addition 582
25.3.3 Matrix Multiplication 583
25.3.4 Other Matrix Operations 585
25.4 Flattening a Table into an Array 585
25.5 Comparing Arrays in Table Format 587

26

Set Operations 591

26.1 UNION and UNION ALL 592
26.1.1 Order of Execution 594
26.1.2 Mixed UNION and UNION ALL Operators 595
26.1.3 UNION of Columns from the Same Table 595
26.2 INTERSECT and EXCEPT 596
26.2.1 INTERSECT and EXCEPT without NULLs


and Duplicates 599
26.2.2 INTERSECT and EXCEPT with NULLs
and Duplicates 600
26.3 A Note on ALL and SELECT DISTINCT 601
26.4 Equality and Proper Subsets 602

27

Subsets 605

27.1 Every nth Item in a Table 605
27.2 Picking Random Rows from a Table 607
27.3 The CONTAINS Operators 612
27.3.1 Proper Subset Operators 612
27.3.2 Table Equality 613
27.4 Picking a Representative Subset 618

28

Trees and Hierarchies in SQL 623

28.1 Adjacency List Model 624

CONTENTS

xxi

28.1.1 Complex Constraints 625
28.1.2 Procedural Traversal for Queries 627
28.1.3 Altering the Table 628

28.2 The Path Enumeration Model 628
28.2.1 Finding Subtrees and Nodes 629
28.2.2 Finding Levels and Subordinates 630
28.2.3 Deleting Nodes and Subtrees 630
28.2.4 Integrity Constraints 631
28.3 Nested Set Model of Hierarchies 631
28.3.1 The Counting Property 633
28.3.2 The Containment Property 634
28.3.3 Subordinates 635
28.3.4 Hierarchical Aggregations 636
28.3.5 Deleting Nodes and Subtrees 636
28.3.6 Converting Adjacency List to Nested Set Model 637
28.4 Other Models for Trees and Hierarchies 639

29

Temporal Queries 641

29.1 Temporal Math 642
29.2 Personal Calendars 643
29.3 Time Series 645
29.3.1 Gaps in a Time Series 645
29.3.2 Continuous Time Periods 648
29.3.3 Missing Times in Contiguous Events 652
29.3.4 Locating Dates 656
29.3.5 Temporal Starting and Ending Points 658
29.3.6 Average Wait Times 660
29.4 Julian Dates 661
29.5 Date and Time Extraction Functions 665
29.6 Other Temporal Functions 666

29.7 Weeks 667
29.7.1 Sorting by Weekday Names 669
29.8 Modeling Time in Tables 670
29.8.1 Using Duration Pairs 672
29.9 Calendar Auxiliary Table 673

xxii

CONTENTS



29.10 Problems with the Year 2000 675
29.10.1 The Zeros 675
29.10.2 Leap Year 676
29.10.3 The Millennium 677
29.10.4 Weird Dates in Legacy Data 679
29.10.5 The Aftermath 680

30

Graphs in SQL 681

30.1 Basic Graph Characteristics 682
30.1.1 All Nodes in the Graph 682
30.1.2 Path Endpoints 683
30.1.3 Reachable Nodes 683
30.1.4 Edges 684
30.1.5 Indegree and Outdegree 684
30.1.6 Source, Sink, Isolated, and Internal Nodes 685

30.2 Paths in a Graph 686
30.2.1 Length of Paths 687
30.2.2 Shortest Path 687
30.2.3 Paths by Iteration 688
30.2.4 Listing the Paths 691
30.3 Acyclic Graphs as Nested Sets 695
30.4 Paths with CTE 697
30.4.1 Nonacyclic Graphs 703
30.5 Adjacency Matrix Model 705
30.6 Points inside Polygons 706

31

OLAP in SQL 709

31.1 Star Schema 710
31.2 OLAP Functionality 711
31.2.1 RANK and DENSE_RANK 711
31.2.2 Row Numbering 711
31.2.3 GROUPING Operators 712
31.2.4 The Window Clause 714
31.2.5 OLAP Examples of SQL 716
31.2.6 Enterprise-Wide Dimensional Layer 717

CONTENTS

xxiii

31.3 A Bit of History 718


32

Transactions and Concurrency Control 719

32.1 Sessions 719
32.2 Transactions and ACID 720
32.2.1 Atomicity 720
32.2.2 Consistency 721
32.2.3 Isolation 721
32.2.4 Durability 722
32.3 Concurrency Control 722
32.3.1 The Five Phenomena 722
32.3.2 The Isolation Levels 724
32.3.3 CURSOR STABILITY Isolation Level 726
32.4 Pessimistic Concurrency Control 726
32.5 SNAPSHOT Isolation: Optimistic Concurrency 727
32.6 Logical Concurrency Control 729
32.7 Deadlock and Livelocks 730

33

Optimizing SQL 731

33.1 Access Methods 732
33.1.1 Sequential Access 732
33.1.2 Indexed Access 732
33.1.3 Hashed Indexes 733
33.1.4 Bit Vector Indexes 733
33.2 Expressions and Unnested Queries 733
33.2.1 Use Simple Expressions 734

33.2.2 String Expressions 738
33.3 Give Extra Join Information in Queries 738
33.4 Index Tables Carefully 740
33.5 Watch the IN Predicate 742
33.6 Avoid UNIONs 744
33.7 Prefer Joins over Nested Queries 745
33.8 Avoid Expressions on Indexed Columns 746
33.9 Avoid Sorting 746
33.10 Avoid CROSS JOINs 750

xxiv

CONTENTS



33.11 Learn to Use Indexes Carefully 751
33.12 Order Indexes Carefully 752
33.13 Know Your Optimizer 754
33.14 Recompile Static SQL after Schema Changes 756
33.15 Temporary Tables Are Sometimes Handy 757
33.16 Update Statistics 760

References 761

General References 761
Logic 761
Mathematical Techniques 761
Random Numbers 762
Scales and Measurements 763

Missing Values 763
Regular Expressions 764
Graph Theory 765
Introductory SQL Books 765
Optimizing Queries 766
Temporal Data and the Year 2000 Problem 766
SQL Programming Techniques 768
Classics 768
Forum 769
Updatable Views 769
Theory, Normalization, and Advanced Database Topics 770
Books on SQL-92 and SQL-99 771
Standards and Related Groups 771
Web Sites Related to SQL 772
Statistics 772
Temporal Databases 773
New Citations 774

Index 777
About the Author 810

Introduction to the Third Edition

T

HIS BOOK, LIKE THE first and second editions before it, is for the
working SQL programmer who wants to pick up some advanced
programming tips and techniques. It assumes that the reader is an
SQL programmer with a year or more of experience. It is not an
introductory book, so let’s not have any gripes in the Amazon.com

reviews about that, as we did with the prior editions.
The first edition was published ten years ago and became a minor
classic among working SQL programmers. I have seen copies of this
book on the desks of real programmers in real programming shops
almost everywhere I have been. The true compliment is the Post-it

®


notes sticking out of the top. People really use it often enough to put
stickies in it! Wow!

1.1 What Changed in Ten Years

Hierarchical and network databases still run vital legacy systems in
major corporations. SQL people do not like to admit that Fortune 500
companies have more data in IMS files than in SQL tables. But SQL
people can live with that, because we have all the new applications and
all the important smaller databases.

xxvi INTRODUCTION TO THE THIRD EDITION

Object and object-relational databases found niche markets, but
never caught on with the mainstream. But OO programming is firmly in
place, so object-oriented people can live with that.
XML has become the popular data tool

du jour

as of this writing in

2005. Technically, XML is syntax for describing and moving data from
one platform to another, but its support tools allow searching and
reformatting. It seems to be lasting longer and finding more users than
DIF, EDI, and other earlier attempts at a “Data Esperanto” did in the
past. An SQL/XML subcommittee in INCITS H2 (the current name of
the original ANSI X3H2 Database Standards Committee) is making sure
they can work together.
Data warehousing is no longer an exotic luxury reserved for major
corporations. Thanks to the declining prices of hardware and software,
medium-sized companies now use the technology. Writing OLAP
queries is different from writing OLTP queries, and OLAP probably
needs its own

Smarties

book now.
Small “pseudo-SQL” products have appeared in the open source
arena. Languages such as MySQL are very different in syntax and
semantics from Standard SQL, often being little more than a file system
interface with borrowed reserved words. However, their small footprint
and low cost have made them popular with Web developers.
At the same time, full scale, serious SQL databases have become open
source.
Firebird ( has most ANSI SQL-92
features, and it runs on Linux, Microsoft Windows, and a variety of
UNIX platforms. Firebird offers optimistic concurrency and language
support for stored procedures and triggers. It has been used in
production systems (under a variety of names) since 1981, and became
open source in 2000. Firebird is the open source version of Borland
Software Corporation’s


(nèe

Inprise Corporation) InterBase product.
CA-Ingres became open source in 2004, and Computer Associates
offered one million dollars in prize money to anyone who would develop
software that would convert existing database code to Ingres. Ingres is
one of the best database products ever written, but was a commercial
failure due to poor marketing.
Postgres is the open-source descendent of the original Ingres project
at UC-Berkeley. It has commercial support from Pervasive Software,
which also has a proprietary SQL product that evolved from their Btrieve
products.
The SQL standard has changed over time, but not always for the best.
Parts of the standard have become more relational and set-oriented,

1.2 What Is New in This Edition xxvii

while other parts have added things that clearly are procedural, deal with
nonrelational data and are based on file system models. To quote David
McGoveran, “A committee never met a feature it did not like.” In this
case, he seems to be quite right.
But strangely enough, even with all the turmoil, the ANSI/ISO
Standard SQL-92 is still the common subset that will port across various
SQL products to do useful work. In fact, the U.S. Government described
the SQL-99 standard as “a standard in progress” and required SQL-92
conformance for federal contracts.
The reason for the loyalty to SQL-92 is simple. The FIPS-127
conformance test suite was in place during the development of SQL-92,
so all the vendors could move in the same direction. Unfortunately, the

Clinton administration canceled the program, and conformity began to
drift. Michael M. Gorman, the President of Whitemarsh Information
Systems Corporation and secretary of INCITS H2 for more than 20
years, has a great essay on this and other political aspects of SQL’s
history at www.wiscorp.com; it is worth reading.

1.2 What Is New in This Edition

Ten years ago, in the first edition, I tried to stick to the SQL-89 standard
and to use only the SQL-92 features that are already used in most
implementations. Five years ago, in the second edition, I wrote that it
would be years before any vendor had a full implementation of SQL-92,
but all products were moving toward that goal. This is still true today, as
I write the third edition, but now we are much closer to universal
implementations of intermediate and full SQL-92. I now feel brave
enough to use some of the SQL-99 features found in current products,
while doing most of the work in SQL-92.
In the second edition, I dropped some of the theory from the book
and moved it to

Joe Celko’s Data and Databases: Concepts in Practice

(ISBN
1-55860-432-4). I find no reason to add it back into this edition.
I have moved and greatly expanded techniques for trees and
hierarchies into a separate book (

Joe Celko’s Trees and Hierarchies in SQL
for Smarties


, ISBN 1-55860-920-2) because there was enough material to
justify it. I have included a short mention of some techniques here, but
not at the detailed level offered in the other book.
I put programming tips for newbies into a separate book (

Joe Celko’s
SQL Programming Style

, ISBN 1-12088-797-5).
This book is an advanced programmer’s book, and I assume that the
reader is writing real SQL, not some dialect or his native programming

xxviii INTRODUCTION TO THE THIRD EDITION

language in a thin disguise. I also assume that he or she can translate
Standard SQL into their local dialect without much effort.
I have tried to provide comments with the solutions to explain why
they work. I hope this will help the reader see underlying principles that
can be used in other situations.
A lot of people have contributed material, either directly or via
newsgroups, and I cannot thank all of them. But I made a real effort to
put names in the text next to the code. In case I missed anyone, I got
material or ideas from Aaron Bertrand, Alejandro Mesa, Anith Sen,
Craig Mullins, Daniel A. Morgan, David Portas, David Cressey, Dawn
M. Wolthuis, Don Burleson, Erland Sommarskog, Itzik Ben-Gan, John
Gilson, Knut Stolze, Louis Davidson, Michael L. Gonzales of HandsOn-
BI LLC, Dan Guzman, Hugo Kornelis, Richard Romley, Serge Rielau,
Steve Kass, Tom Moreau, Troels Arvin, and probably a dozen others I
am forgetting.


1.3 Corrections and Additions

Please send any corrections, additions, suggestions, improvements, or
alternative solutions to me or to the publisher.
Morgan-Kaufmann Publishers
500 Sansome Street, Suite 400
San Francisco, CA 94111-3211

CHAPTER

1

Database Design

T

HIS CHAPTER DISCUSSES THE DDL (Data Definition Language), which
is used to create a database schema. It is related to the next chapter
on the theory of database normalization. Most bad queries start with
a bad schema. To get data out of the bad schema, you have to write
convoluted code, and you are never sure if it did what it was meant
to do.
One of the major advantages of databases, relational and otherwise,
was that the data could be shared among programs so that an
enterprise could use one trusted source for information. Once the data
was separated from the programs, we could build tools to maintain,
back up, and validate the data in one place, without worrying about
hundreds or even thousands of application programs possibly working
against each other.
SQL has spawned a whole branch of data modeling tools devoted

to designing its schemas and tables. Most of these tools use a graphic
or text description of the rules and the constraints on the data to
produce a schema declaration statement that can be used directly in a
particular SQL product. It is often assumed that a CASE tool will
automatically prevent you from creating a bad design. This is simply
not true.
Bad schema design leads to weird queries that are trying to work
around the flaws. These flaws can include picking the wrong data

×