Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P72 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (130.24 KB, 10 trang )


682 CHAPTER 30: GRAPHS IN SQL

The most common way to model a graph in SQL is with an adjacency
list model. Each edge of the graph is shown as a pair of nodes in which
the ordering matters, and then any values associated with that edge are
shown in another column.

30.1 Basic Graph Characteristics

The following code is from John Gilson. This code uses an adjacency list
model of the graph, with nodes in a separate table. This is the most
common method for modeling graphs in SQL.

CREATE TABLE Nodes
(node_id INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE AdjacencyListGraph
(begin_node_id INTEGER NOT NULL REFERENCES Nodes (node_id),
end_node_id INTEGER NOT NULL REFERENCES Nodes (node_id),
PRIMARY KEY (begin_node_id, end_node_id),
CHECK (begin_node_id <> end_node_id));

It is also possible to load an acyclic directed graph into a nested set
model by splitting the nodes.

CREATE TABLE NestedSetsGraph
(node_id INTEGER NOT NULL REFERENCES Nodes (node_id),
lft INTEGER NOT NULL CHECK (lft >= 1) PRIMARY KEY,
rgt INTEGER NOT NULL UNIQUE,
CHECK (rgt > lft),
UNIQUE (node_id, lft));



To split nodes, start at the sink nodes and move up the tree. When
you come to a node with an indegree greater than one, replace it with
that many copies of the node under each of its superiors. Continue to do
this until you get to the root. The acyclic graph will become a tree, but
with duplicated node values. There are advantages to this model; we will
discuss them in Section 30.3.

30.1.1 All Nodes in the Graph

To view all nodes in the graph, use the following:

30.1 Basic Graph Characteristics 683

CREATE VIEW GraphNodes (node_id)
AS
SELECT DISTINCT node_id FROM NestedSetsGraph;

30.1.2 Path Endpoints

A path through a graph is a traversal of consecutive nodes along a
sequence of edges. Clearly, the node at the end of one edge in the
sequence must also be the node at the beginning of the next edge in the
sequence. The length of the path is the number of edges that are
traversed along the path.
Path endpoints are the first and last nodes of each path in the graph.
For a path of length zero, the path endpoints are the same node. If there
is more than one path between two nodes, each path will be
distinguished by its own distinct set of number pairs for the nested-set
representation.

If there is only one path, P, between two nodes, but P is a subpath of
more than one distinct path, then the endpoints of P will have number
pairs for each of these greater paths. As a canonical form, the least-
numbered pairs are returned for these endpoints.

CREATE VIEW PathEndpoints
(begin_node_id, end_node_id,
begin_lft, begin_rgt,
end_lft, end_rgt)
AS
SELECT G1.node_id, G2.node_id,
G1.lft, G1.rgt, G2.lft, G2.rgt
FROM (SELECT node_id, MIN(lft), MIN(rgt)
FROM NestedSetsGraph
GROUP BY node_id) AS G1 (node_id, lft, rgt)
INNER JOIN
NestedSetsGraph AS G2
ON G2.lft >= G1.lft
AND G2.lft < G1.rgt;

30.1.3 Reachable Nodes

If a node is reachable from another node, then a path exists from the one
node to the other. It is assumed that every node is reachable from itself.

684 CHAPTER 30: GRAPHS IN SQL

CREATE VIEW ReachableNodes (begin_node_id, end_node_id)
AS
SELECT DISTINCT begin_node_id, end_node_id

FROM PathEndpoints;

30.1.4 Edges

Edges are pairs of adjacent connected nodes in the graph. If edge E is
represented by the pair of nodes (n0, n1), then n1 is reachable from n0
in a single traversal.

CREATE VIEW Edges (begin_node_id, end_node_id)
AS
SELECT begin_node_id, end_node_id
FROM PathEndpoints AS PE
WHERE begin_node_id <> end_node_id
AND NOT EXISTS
(SELECT *
FROM NestedSetsGraph AS G
WHERE G.lft > PE.begin_lft
AND G.lft < PE.end_lft
AND G.rgt > PE.end_rgt);

30.1.5 Indegree and Outdegree

The indegree of a node, n, is the number of distinct edges ending at n.
Nodes that have an indegree of zero are not returned. To determine the
indegree of all nodes in the graph:

CREATE VIEW Indegree (node_id, node_indegree)
AS
SELECT N.node_id, COUNT(E.begin_node_id)
FROM GraphNodes AS N

LEFT OUTER JOIN
Edges AS E
ON N.node_id = E.end_node_id
GROUP BY N.node_id;

The outdegree of a node, (n), is the number of distinct edges
beginning at (n). Nodes that have an outdegree of zero are not returned.
To determine the outdegree of all nodes in the graph:

30.1 Basic Graph Characteristics 685

CREATE VIEW Outdegree (node_id, node_outdegree)
AS
SELECT N.node_id, COUNT(E.end_node_id)
FROM GraphNodes AS N
LEFT OUTER JOIN
Edges AS E
ON N.node_id = E.begin_node_id
GROUP BY N.node_id;

30.1.6 Source, Sink, Isolated, and Internal Nodes

A source node of a graph has a positive outdegree but an indegree of
zero; that is, it has edges leading from, but not to, the node. This
assumes there are no isolated nodes (nodes belonging to no edges).

CREATE VIEW SourceNodes (node_id, lft, rgt)
AS
SELECT node_id, lft, rgt
FROM NestedSetsGraph AS G1

WHERE NOT EXISTS
(SELECT *
FROM NestedSetsGraph AS G
WHERE G1.lft > G2.lft
AND G1.lft < G2.rgt);

Likewise, a sink node of a graph has positive indegree but an
outdegree of zero; that is, it has edges leading to, but not from, the node.
This assumes there are no isolated nodes.

CREATE VIEW SinkNodes (node_id)
AS
SELECT node_id
FROM NestedSetsGraph AS G1
WHERE lft = rgt - 1
AND NOT EXISTS
(SELECT *
FROM NestedSetsGraph AS G2
WHERE G1.node_id = G2.node_id
AND G2.lft < G1.lft);

An isolated node belongs to no edges; i.e., it has zero indegree and
zero outdegree.

686 CHAPTER 30: GRAPHS IN SQL

CREATE VIEW IsolatedNodes (node_id, lft, rgt)
AS
SELECT node_id, lft, rgt
FROM NestedSetsGraph AS G1

WHERE lft = rgt - 1
AND NOT EXISTS
(SELECT *
FROM NestedSetsGraph AS G2
WHERE G1.lft > G2.lft
AND G1.lft < G2.rgt);

An internal node of a graph has an indegree greater than zero and an
outdegree greater than zero; that is, it acts as both a source and a sink.

CREATE VIEW InternalNodes (node_id)
AS
SELECT node_id
FROM (SELECT node_id, MIN(lft) AS lft, MIN(rgt) AS rgt
FROM NestedSetsGraph
WHERE lft < rgt - 1
GROUP BY node_id) AS G1
WHERE EXISTS
(SELECT *
FROM NestedSetsGraph AS G2
WHERE G1.lft > G2.lft
AND G1.lft < G2.rgt)

30.2 Paths in a Graph

Finding a path in a graph is the most important commercial application
of graphs. Graphs model transportation networks, electrical and cable
systems, process control flow and thousands of other things.
A path, P, of length


L

from a node n0 to a node n

k

in the graph is
defined as a traversal of (

L

+ 1) contiguous nodes along a sequence of
edges, where the first node is node number 0 and the last is node
number

k

.

CREATE VIEW Paths
(begin_node_id, end_node_id, this_node_id,
seq_nbr,
begin_lft, begin_rgt, end_lft, end_rgt,

30.2 Paths in a Graph 687

this_lft, this_rgt)
AS
SELECT PE.begin_node_id, PE.end_node_id, G1.node_id,
(SELECT COUNT(*)

FROM NestedSetsGraph AS G2
WHERE G2.lft > PE.begin_lft
AND G2.lft <= G1.lft
AND G2.rgt >= G1.rgt),
PE.begin_lft, PE.begin_rgt,
PE.end_lft, PE.end_rgt,
G1.lft, G1.rgt
FROM PathEndpoints AS PE
INNER JOIN
NestedSetsGraph AS G1
ON G1.lft BETWEEN PE.begin_lft
AND PE.end_lft
AND G1.rgt >= PE.end_rgt

30.2.1 Length of Paths

The length of a path is the number of edges that are traversed along the
path. A path of

n

nodes has a length of (

n



1).

CREATE VIEW PathLengths

(begin_node_id, end_node_id,
path_length,
begin_lft, begin_rgt,
end_lft, end_rgt)
AS
SELECT begin_node_id, end_node_id, MAX(seq_nbr),
begin_lft, begin_rgt, end_lft, end_rgt
FROM Paths
GROUP BY begin_lft, end_lft, begin_rgt, end_rgt,
begin_node_id, end_node_id;

30.2.2 Shortest Path

The following code gives the shortest path length between all nodes,
but it does not tell you what the actual path is. There are other queries
that use the new CTE feature and recursion, which we will discuss in
Section 30.3.

688 CHAPTER 30: GRAPHS IN SQL

CREATE VIEW ShortestPathLengths
(begin_node_id, end_node_id, path_length,
begin_lft, begin_rgt, end_lft, end_rgt)
AS
SELECT PL.begin_node_id, PL.end_node_id,
PL.path_length,
PL.begin_lft, PL.begin_rgt,
PL.end_lft, PL.end_rgt
FROM (SELECT begin_node_id, end_node_id,
MIN(path_length) AS path_length

FROM PathLengths
GROUP BY begin_node_id, end_node_id) AS MPL
INNER JOIN
PathLengths AS PL
ON MPL.begin_node_id = PL.begin_node_id
AND MPL.end_node_id = PL.end_node_id
AND MPL.path_length = PL.path_length;

30.2.3 Paths by Iteration

First, let’s build a graph that has a cost associated with each edge and put
it into an adjacency list model.

INSERT INTO Edges (out_node, in_node, cost)
VALUES ('A', 'B', 50),
('A', 'C', 30),
('A', 'D', 100),
('A', 'E', 10),
('C', 'B', 5),
('D', 'B', 20),
('D', 'C', 50),
('E', 'D', 10);

To find the shortest paths from one node to the other nodes it can
reach, we can write this recursive

VIEW

.


CREATE VIEW ShortestPaths (out_node, in_node, path_length)
AS
WITH RECURSIVE Paths (out_node, in_node, path_length)
AS
(SELECT out_node, in_node, 1
FROM Edges

30.2 Paths in a Graph 689

UNION ALL
SELECT E1.out_node, P1.in_node, P1.path_length + 1
FROM Edges AS E1, Paths AS P1
WHERE E1.in_node = P1.out_node)
SELECT out_node, in_node, MIN(path_length)
FROM Paths
GROUP BY out_node, in_node;
out_node in_node path_length
============================
'A' 'B' 1
'A' 'C' 1
'A' 'D' 1
'A' 'E' 1
'C' 'B' 1
'D' 'B' 1
'D' 'C' 1
'E' 'B' 2
'E' 'D' 1

To find the shortest paths without recursion, stay in a loop and add
one edge at a time to the set of paths defined so far.


CREATE PROCEDURE IteratePaths()
LANGUAGE SQL
MODIFIES SQL DATA
BEGIN
DECLARE old_path_tally INTEGER;
SET old_path_tally = 0;
DELETE FROM Paths; clean out working table
INSERT INTO Paths
SELECT out_node, in_node, 1
FROM Edges; load the edges
add one edge to each path
WHILE old_path_tally < (SELECT COUNT(*) FROM Paths)
DO SET old_path_tally = (SELECT COUNT(*) FROM Paths);
INSERT INTO Paths (out_node, in_node, lgth)
SELECT E1.out_node, P1.in_node, (1 + P1.lgth)
FROM Edges AS E1, Paths AS P1
WHERE E1.in_node = P1.out_node
AND NOT EXISTS path is not here already

690 CHAPTER 30: GRAPHS IN SQL

(SELECT *
FROM Paths AS P2
WHERE E1.out_node = P2.out_node
AND P1.in_node = P2.in_node);
END WHILE;
END;

The least cost path is basically the same algorithm, but instead of a

constant of one for the path length, we use the actual costs of the edges.

CREATE PROCEDURE IterateCheapPaths ()
LANGUAGE SQL
MODIFIES SQL DATA
BEGIN
DECLARE old_path_cost INTEGER;
SET old_path_cost = 0;
DELETE FROM Paths; clean out working table
INSERT INTO Paths
SELECT out_node, in_node, cost
FROM Edges; load the edges
add one edge to each path
WHILE old_path_cost < (SELECT COUNT(*) FROM Paths)
DO SET old_path_cost = (SELECT COUNT(*) FROM Paths);
INSERT INTO Paths (out_node, in_node, cost)
SELECT E1.out_node, P1.in_node, (E1.cost + P1.cost)
FROM Edges AS E1
INNER JOIN
(SELECT out_node, in_node, MIN(cost)
FROM Paths
GROUP BY out_node, in_node)
AS P1 (out_node, in_node, cost)
ON E1.in_node = P1.out_node
AND NOT EXISTS
(SELECT *
FROM Paths AS P2
WHERE E1.out_node = P2.out_node
AND P1.in_node = P2.in_node
AND P2.cost <= E1.cost + P1.cost);

END WHILE;
END;

30.2 Paths in a Graph 691

30.2.4 Listing the Paths

I took the data for this table from the book

Introduction to Algorithms


(Cormen, Leiserson, and Rivest 1990), page 518. This book was very
popular in college courses in the United States. I made one decision that
will be important later: I added self-traversal edges (i.e., the node is both
the out_node and the in_node of an edge) with weights of zero.

INSERT INTO Edges VALUES ('s', 's', 0);
INSERT INTO Edges VALUES ('s', 'u', 3);
INSERT INTO Edges VALUES ('s', 'x', 5);
INSERT INTO Edges VALUES ('u', 'u', 0);
INSERT INTO Edges VALUES ('u', 'v', 6);
INSERT INTO Edges VALUES ('u', 'x', 2);
INSERT INTO Edges VALUES ('v', 'v', 0);
INSERT INTO Edges VALUES ('v', 'y', 2);
INSERT INTO Edges VALUES ('x', 'u', 1);
INSERT INTO Edges VALUES ('x', 'v', 4);
INSERT INTO Edges VALUES ('x', 'x', 0);
INSERT INTO Edges VALUES ('x', 'y', 6);
INSERT INTO Edges VALUES ('y', 's', 3);

INSERT INTO Edges VALUES ('y', 'v', 7);
INSERT INTO Edges VALUES ('y', 'y', 0);

I am not happy about this approach, because I have to decide the
maximum number of edges in a path before I start looking for an
answer. But this solution will work, and I know that a path will have no
more than the total number of nodes in the graph. Let’s create a table to
hold the paths:

CREATE TABLE Paths
(step1 CHAR(2) NOT NULL,
step2 CHAR(2) NOT NULL,
step3 CHAR(2) NOT NULL,
step4 CHAR(2) NOT NULL,
step5 CHAR(2) NOT NULL,
total_cost INTEGER NOT NULL,
path_length INTEGER NOT NULL,
PRIMARY KEY (step1, step2, step3, step4, step5));

×