Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P25 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (127.63 KB, 10 trang )


212 CHAPTER 8: TABLE OPERATIONS

the searched deletion uses a

WHERE

clause like the search condition in a

SELECT

statement.



8.1.1 The DELETE FROM Clause

The syntax for a searched deletion statement is:

<delete statement: searched> :: =
DELETE FROM <table name>
[WHERE <search condition>]

The

DELETE FROM

clause simply gives the name of the updatable
table or view to be changed. Notice that no correlation name is allowed
in the


DELETE FROM

clause. The SQL model for an alias table name is
that the engine effectively creates a new table with that new name and
populates it with rows identical to the base table or updatable view from
which it was built. If you had a correlation name, you would be deleting
from this system-created temporary table, and it would vanish at the end
of the statement. The base table would never have been touched.
For this discussion, we will assume the user doing the deletion has
applicable

DELETE

privileges for the table. The positioned deletion
removes the row in the base table that is the source of the current cursor
row. The syntax is:

<delete statement: positioned> :: =
DELETE FROM <table name>
WHERE CURRENT OF <cursor name>

Cursors in SQL are generally more expensive than nonprocedural
code and, despite the existence of the Standard, they vary widely in
current implementations. If you have a properly designed table with a
key, you should be able to avoid them in a

DELETE FROM

statement.


8.1.2 The WHERE Clause

The most important thing to remember about the

WHERE

clause is that it
is optional. If there is no

WHERE

clause, all rows in the table are deleted.
The table structure still exists, but there are no rows.
Most, but not all, interactive SQL tools will give the user a warning
when he or she is about to do this and ask for confirmation. Unless you
want to clear out the table, immediately do a

ROLLBACK

to restore it; if
you

COMMIT

or have set the tool to automatically commit the work, then

8.1 DELETE FROM Statement 213

the data is pretty much gone. The DBA will have to do something to save
you. And don’t feel badly about doing it at least once while you are

learning SQL.
Because we wish to remove a subset of rows all at once, we cannot
simply scan the table one row at a time and remove each qualifying row
as it is encountered. The way most SQL implementations do a deletion is
with two passes on the table. The first pass marks all of the candidate
rows that meet the

WHERE

clause condition. This is also when most
products check to see if the deletion will violate any constraints. The
most common violations involve trying to remove a value that is
referenced by a foreign key (“Hey, we still have orders for those pink
lawn flamingoes; you cannot drop them from inventory yet!”). But other
constraints in

CREATE ASSERTION

statements’

CHECK()

constraints
can also cause a

ROLLBACK

.




After the subset is validated, the second pass removes it, either
immediately or by marking the rows so that a housekeeping routine can
later reclaim the storage space. Then any further housekeeping, such as
updating indexes, is done last.
The important point is that while the rows are being marked, the
entire table is still available for the

WHERE

condition to use. In many if
not most cases, this two-pass method does not make any difference in
the results. The

WHERE

clause is usually a fairly simple predicate that
references constants or relationships among the columns of a row. For
example, we could clear out some Personnel with this deletion:

DELETE FROM Personnel
WHERE iq <= 100; constant in simple predicate

or:

DELETE FROM Personnel
WHERE hat_size = iq; uses columns in the same row

A good optimizer could recognize that these predicates do not
depend on the table as a whole, and would use a single scan for them.

The two passes make a difference when the table references itself. Let’s
fire employees with IQs that are below average for their departments.

DELETE FROM Personnel
WHERE iq < (SELECT AVG(P1.iq)
FROM Personnel AS P1 must have correlation name

214 CHAPTER 8: TABLE OPERATIONS

WHERE Personnel.dept_nbr = P1.dept_nbr);

We have the following data:

Personnel
emp_nbr dept_nbr iq
======================
'Able' 'Acct' 101
'Baker' 'Acct' 105
'Charles' 'Acct' 106
'Henry' 'Mkt' 101
'Celko' 'Mkt' 170
'Popkin' 'HR' 120


If this were done one row at a time, we would first go to Accounting
and find the average IQ, (101 + 105 + 106)/3.0 = 104, and fire Able.
Then we would move sequentially down the table, and again find the
average IQ, (105 + 106)/2.0 = 105.5 and fire Baker. Only Charles would
escape the downsizing.
Now sort the table a little differently, so that the rows are visited in

reverse alphabetic order. We first read Charles’s IQ and compute the
average for Accounting (101 + 105 + 106)/3.0 = 104, and retain Charles.
Then we would move sequentially down the table, with the average IQ
unchanged, so we also retain Baker. Able, however, is downsized when
that row comes up.
It might be worth noting that early versions of DB2 would delete rows
in the sequential order in which they appear in physical storage. Sybase’s
SQL Anywhere (

née

WATCOM SQL) has an optional

ORDER BY

clause
that sorts the table, then does a sequential deletion on the table. This
feature can be used to force a sequential deletion in cases where order
does not matter, thus optimizing the statement by saving a second pass
over the table. But it also can give the desired results in situations where
you would otherwise have to use a cursor and a host language.
Anders Altberg, Johannes Becher, and I tested different versions of a

DELETE

statement whose goal was to remove all but one row of a group.
The column dup_cnt is a count of the duplicates of that row in the
original table. The three statements tested were:

D1:

DELETE FROM Test

8.1 DELETE FROM Statement 215

WHERE EXISTS (SELECT *
FROM Test AS T1
WHERE T1.dup_id = Test.dup_id
AND T1.dup_cnt < dup_cnt)
D2:
DELETE FROM Test
WHERE dup_cnt > (SELECT MIN(T1.dup_cnt)
FROM Test AS T1
WHERE T1.dup_id = Test.dup_id);
D3:
BEGIN ATOMIC
INSERT INTO WorkingTable(dup_id, min_dup_cnt)
SELECT dup_id, MIN(dup_cnt)
FROM Test
GROUP BY dup_id;
DELETE FROM Test
WHERE dup_cnt > (SELECT min_dup_cnt
FROM WorkingTable
WHERE Working.dup_id = Test.dup_id);
END;

Their relative execution speeds in one SQL desktop product were:

D1 3.20 seconds
D2 31.22 seconds
D3 0.17 seconds


Without seeing the execution plans, I would guess that statement D1
went to an index for the

EXISTS()

test and returned

TRUE

on the first
item it found. On the other hand, D2 scanned each subset in the
partitioning of Test by dup_id to find the

MIN()

over and over. Finally,
the D3 version simply does a

JOIN

on simple scalar columns. With full
SQL-92, you could write D3 as:

D3-2:
DELETE FROM Test
WHERE dup_cnt >
(SELECT min_dup_cnt
FROM (SELECT dup_id, MIN(dup_cnt)


216 CHAPTER 8: TABLE OPERATIONS

FROM Test
GROUP BY dup_id) AS WorkingTable(dup_id,
min_dup_cnt)
WHERE Working.dup_id = Test.dup_id);

Having said all of this, the faster way to remove redundant duplicates
is most often with a

CURSOR

that does a full table scan.

8.1.3 Deleting Based on Data in a Second Table

The

WHERE

clause can be as complex as you wish. This means you can
have subqueries that use other tables. For example, to remove customers
who have paid their bills from the Deadbeats table, you can use a
correlated

EXISTS

predicate, thus:

DELETE FROM Deadbeats

WHERE EXISTS (SELECT *
FROM Payments AS P1
WHERE Deadbeats.cust_nbr = P1.cust_nbr
AND P1.amtpaid >= Deadbeats.amtdue);

The scope rules from

SELECT

statements also apply to the

WHERE


clause of a

DELETE FROM

statement, but it is a good idea to qualify all of
the column names.

8.1.4 Deleting within the Same Table

SQL allows a

DELETE FROM

statement to use columns, constants, and
aggregate functions drawn from the table itself. For example, it is
perfectly all right to remove everyone who is below average in a class

with this statement:

DELETE FROM Students
WHERE grade < (SELECT AVG(grade) FROM Students);

But the

DELETE FROM

clause does not allow for correlation names on
the table in the

DELETE FROM

clause, so not all

WHERE

clauses that
could be written as part of a

SELECT

statement will work in a

DELETE
FROM

statement. For example, a self-join on the working table in a
subquery is impossible.


DELETE FROM Personnel AS B1 correlation name is INVALID SQL

8.1 DELETE FROM Statement 217

WHERE Personnel.boss_nbr = B1.emp_nbr
AND Personnel.salary > B1.salary);

There are ways to work around this. One trick is to build a

VIEW

of
the table and use the

VIEW

instead of a correlation name. Consider the
problem of finding all employees who are now earning more than their
boss and deleting them. The employee table being used has a column for
the employee’s identification number, emp_nbr, and another column for
the boss’s employee identification number, boss_nbr.



CREATE VIEW Bosses
AS SELECT emp_nbr, salary FROM Personnel;
DELETE FROM Personnel
WHERE EXISTS (SELECT *
FROM Bosses AS B1

WHERE Personnel.boss_nbr = B1.emp_nbr
AND Personnel.salary > B1.salary);

Simply using the Personnel table in the subquery will not work. We
need an outer reference in the

WHERE

clause to the Personnel table in the
subquery, and we cannot get that if the Personnel table is in the
subquery. Such views should be as small as possible, so that the SQL
engine can materialize them in main storage.
Redundant Duplicates in a Table
Redundant duplicates are unneeded copies of a row in a table. You most
often get them because you did not put a
UNIQUE constraint on the table
and then you inserted the same data twice. Removing the extra copies
from a table in SQL is much harder than you would think. If fact, if the
rows are exact duplicates, you cannot do it with a simple
DELETE FROM
statement. Removing redundant duplicates involves saving one of them
while deleting the other(s). But if SQL has no way to tell them apart, it
will delete all rows that were qualified by the
WHERE clause. Another
problem is that the deletion of a row from a base table can trigger
referential actions, which can have unwanted side effects.
For example, if there is a referential integrity constraint that says a
deletion in Table1 will cascade and delete matching rows in Table2,
removing redundant duplicates from T1 can leave me with no matching
rows in T2. Yet I still have a referential integrity rule that says there must

be at least one match in T2 for the single row I preserved in T1. SQL
218 CHAPTER 8: TABLE OPERATIONS
allows constraints to be deferrable or nondeferrable, so you might be
able to suspend the referential actions that the transaction below would
cause:
BEGIN
INSERT INTO WorkingTable use DISTINCT to kill duplicates
SELECT DISTINCT * FROM MessedUpTable;
DELETE FROM MessedUpTable; clean out messed-up table
INSERT INTO MessedUpTable put working table into it
SELECT * FROM WorkingTable;
DROP TABLE WorkingTable; get rid of working table
END;
Removal of Redundant Duplicates with ROWID
Leonard C. Medel came up with several interesting ways to delete
redundant duplicate rows from a table in an Oracle database.
Let’s assume that we have a table:
CREATE TABLE Personnel
(emp_id INTEGER NOT NULL,
name CHAR(30) NOT NULL,
);
The classic Oracle “delete dups” solution is the statement:
DELETE FROM Personnel
WHERE ROWID < (SELECT MAX(P1.ROWID)
FROM Personnel AS P1
WHERE P1.dup_id = Personnel.dup_id
AND P1.name = Personnel.name);
AND );
The column, or more properly pseudo-column, ROWID is based on
the physical location of a row in storage. It can change after a user

session but not during the session. It is the fastest possible physical
access method into an Oracle table, because it goes directly to the
physical address of the data. It is also a complete violation of Dr. Codd’s
rules, which require that the physical representation of the data be
hidden from the users.
8.1 DELETE FROM Statement 219
Doing a quick test on a 100,000-row table, Mr. Medel achieved a
nearly tenfold improvement with these two alternatives. In English, the
first alternative is to find the highest
ROWID for each group of one or
more duplicate rows, and then delete every row, except the one with
highest
ROWID.
DELETE FROM Personnel
WHERE ROWID
IN (SELECT P2.ROWID
FROM Personnel AS P2,
(SELECT P3.dup_id, P3.name,
MAX(P3.ROWID) AS max_rowid
FROM Personnel AS P3
GROUP BY P3.dup_id, P3.name, )
AS P4
WHERE P2.ROWID <> P4.max_rowid
AND P2.dup_id = P4.dup_id
AND P2.name = P4.name);
Notice that the GROUP BY clause needs all the columns in the table.
The second approach is to notice that the set of all rows in the table
minus the set of rows we want to keep defines the set of rows to delete.
This gives us the following statement:
DELETE FROM Personnel

WHERE ROWID
IN (SELECT P2.ROWID
FROM Personnel AS P2
EXCEPT
SELECT MAX(P3.ROWID)
FROM Personnel AS P3
GROUP BY P3.dup_id, P3.name, );
Both of these approaches are faster than the short, classic version
because they avoid a correlated subquery expression in the
WHERE
clause.
220 CHAPTER 8: TABLE OPERATIONS
8.1.5 Deleting in Multiple Tables without Referential
Integrity
There is no way to directly delete rows from more than one table in a
single
DELETE FROM statement. There are two approaches to removing
related rows from multiple tables. One is to use a temporary table of the
deletion values; the other is to use referential integrity actions. For the
purposes of this section, let us assume that we have a database with an
Orders table and an Inventory table. Our business rule is that when
something is out of stock, we delete it from all the orders.
Assume that no referential integrity constraints have been declared at
all. First, create a temporary table of the products to be deleted based on
your search criteria, then use that table in a correlated subquery to
remove rows from each table involved.
CREATE MODULE Foobar
CREATE LOCAL TEMPORARY TABLE Discontinue
(part_nbr INTEGER NOT NULL UNIQUE)
ON COMMIT DELETE ROWS;


PROCEDURE CleanInventory( )
BEGIN ATOMIC
INSERT INTO Discontinue
SELECT DISTINCT part_nbr pick out the items to be removed
FROM
WHERE ; using whatever criteria you require
DELETE FROM Orders
WHERE part_nbr IN (SELECT part_nbr FROM Discontinue);
DELETE FROM Inventory
WHERE part_nbr IN (SELECT part_nbr FROM Discontinue);
COMMIT WORK;
END;

END MODULE;
In the Standard SQL model, the temporary table is persistent in the
schema, but its content is not.
TEMPORARY tables are always empty at
the start of a session, and they always appear to belong only to the user
of the session. The
GLOBAL option means that each application gets one
copy of the table for all the modules, while
LOCAL would limit the scope
to the module in which it is declared.
8.2 INSERT INTO Statement 221
8.2 INSERT INTO Statement
The INSERT INTO statement is the only way to get new data into a base
table. In practice, there are always other tools for loading large amounts
of data into a table, but they are very vendor-dependent.
8.2.1 INSERT INTO Clause

The syntax for INSERT INTO is:
<insert statement> :: =
INSERT INTO <table name>
<insert columns and source>
<insert columns and source> :: =
[(<insert column list>)]
<query expression>
| VALUES <table value constructor list>
| DEFAULT VALUES
<table value constructor list> :: =
<row value constructor> [{<comma> <row value
constructor>} ]
<row value constructor> :: =
<row value constructor element>
| <left paren> <row value constructor list> <right paren>
| <row subquery>
<row value constructor list> :: =
<row value constructor element>
[{<comma> <row value constructor element>} ]
<row value constructor element> :: =
<value expression> | NULL |DEFAULT
The two basic forms of an INSERT INTO are a table constant (usually
a single row) insertion and a query insertion. The table constant
insertion is done with a
VALUES() clause. The list of insert values
usually consists of constants or explicit
NULLs, but in theory they could
be almost any expression, including scalar
SELECT subqueries.

×