Tải bản đầy đủ (.docx) (63 trang)

Basic data structure trong lập trình

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.42 MB, 63 trang )

MỤC LỤC
Data structures (7)
• Sqrt-decomposition
[TeX]

• Fenwick tree
• The system of disjoint sets
[TeX]

• Segment tree
[TeX]

• Treap (treap, deramida)
• Modification of the stack and queue for finding the minimum in O (1)
• Randomized heap
[TeX]

Sqrt-decomposition
Sqrt-decomposition - is a method or a data structure that allows some typical
operations (sum of subarray elements, to find the minimum / maximum, etc.)
for , significantly faster than for the trivial algorithm.
First, we describe a data structure for one of the simplest applications of this idea,
then show how to generalize it to solve some other problems, and, finally, look at
some other use of this idea: the partition of the input requests sqrt-blocks.
The data structure on the basis of decomposition
sqrt-
We pose the problem . Given an array . Required to implement such
a data structure that will be able to find the sum of the elements for
arbitrary and for operations.
Description
The basic idea sqrt-decomposition is what to do next predposchёt : divide the


array into blocks of length approximately , and in each block in advance
predposchitaem sum of elements in it.
We can assume that the length of one unit and the number of units equal to one and
the same number - the root of , rounded up:
then the array is divided into blocks like this:
1
Although the latter unit may comprise less than , the elements (if not divisible ) -
is not essential.
Thus, for each block , we know the amount on it :
So, let the values previously calculated (that is, obviously, operations). That
they can give the calculation of the response to another request ? Note that if
the interval is long, then it will contain several blocks as a whole, and these
blocks, we can find out the amount on them in a single operation. As a result, the
total length will be only two blocks falling into it is only partially, and these pieces
we have to sum trivial algorithm.
Illustration (here designated block number, wherein the lie , and by - the number
of the block in which lies ):
This figure shows that in order to calculate the amount of the segment , it is
necessary to sum the elements of only two "tails":
and , and sum the values in all the blocks, starting from and
ending with :
(Note: this is incorrect formula where : in this case, some elements are added
together twice and in this case it is necessary to sum the elements simply by )
Thus we ekononim significant number of operations. Indeed, the size of each of the
"tails" clearly does not exceed the block length and the number of blocks not
2
exceed . Since we have chosen , the total amount for the calculation of the
interval , we need only operations.
Implementation
We give first a simple realization:

// входные данные
int n;
vector<int> a (n);

// предпосчёт
int len = (int) sqrt (n + .0) + 1; // и размер блока, и
количество блоков
vector<int> b (len);
for (int i=0; i<n; ++i)
b[i / len] += a[i];

// ответ на запросы
for (;;) {
int l, r; // считываем входные данные - очередной
запрос
int sum = 0;
for (int i=l; i<=r; )
if (i % len == 0 && i + len - 1 <= r) {
// если i указывает на начало блока,
целиком лежащего в [l;r]
sum += b[i / len];
i += len;
}
else {
sum += a[i];
++i;
}
}
The disadvantage of this implementation is that it unreasonably long division
operations (which are known to run significantly slower than other

operations). Instead, the block numbers can be counted and where the
boundaries are and respectively, and then making a loop in the blocks
by separately treating "tails" in the blocks , and . Furthermore, in the case
of such an implementation becomes special and require separate
processing:
int sum = 0;
int c_l = l / len, c_r = r / len;
if (c_l == c_r)
for (int i=l; i<=r; ++i)
sum += a[i];
3
else {
for (int i=l, end=(c_l+1)*len-1; i<=end; ++i)
sum += a[i];
for (int i=c_l+1; i<=c_r-1; ++i)
sum += b[i];
for (int i=c_r*len; i<=r; ++i)
sum += a[i];
}
Other objects
We consider the problem of finding the sum of array elements in some of its
subsegments. This task can expand a little: solve and change individual elements of
the array . Indeed, if some change element , it suffices to update the value in
the block, wherein the member is ( )
On the other hand, instead of the problem of the amount can be similarly solved the
problem of minimum, maximum elements in the interval. If these problems prevent
changes in individual elements, too, will have to recalculate the value of the unit
that owns the variable element, but recalculate already fully pass on all elements of
the unit operations.
Similarly sqrt-decomposition can be applied for a variety of other similar problems:

finding the number of zero elements, the first non-zero element, count the number of
certain elements, etc.
There is also a class of problems, when there are changes in the elements on the
whole subsegments : adding or appropriation of elements at some subsegments of
the array .
For example, you need to perform the following two types of queries added to all
elements of a certain length value , and recognize the value of an individual
element . Then, as the set value that must be added to all elements of the ith
block (for example, initially all ); Then when you run the query "addition" will
need to perform addition to all the elements of "tails", and then perform addition to
all the elements for blocks lying entirely in the segment . A response to the
second request is likely to be just where . Thus, the addition of
the segment will run for , and the request is a separate element - for .
Finally, you can use both types of tasks: changing elements on an interval and
response to requests for the same interval. Both types of operations will be carried
out for . To do this already will have to do two "block" of the array and one
- to ensure changes in the segment, the other - to answer queries.
4
Can give an example, and other tasks, which can be applied sqrt-decomposition. For
example, you can solve the problem of maintaining a set of numbers with the
ability to add / remove numbers, check numbers belonging to the set, the search -th
order number. To solve this problem it is necessary to store numbers in sorted order,
separated by a few blocks of numbers in each. When you add or remove
numbers will have to produce a "rebalancing" of blocks, throwing the number of
start / end of one block in the beginning / end of the neighboring blocks.
Sqrt-decomposition of the input query
Consider now a very different ideas about the use sqrt-decomposition.
Suppose that we have some problem in which we are given some input data, and
then enter the commands / queries, each of which we have to give to process and
issue a response. We consider the case when requests are as requested (do not

change the state of the system, but only asks for some information) and modifying
(ie affecting the state of the system is initially set to the input data).
Specific task can be quite complex, and "honor" her decision (which reads one
request, processes it, changing the state of the system, and returns a response) may
be technically difficult or even be beyond the power of deciding. On the other hand,
the solution of "off-line" version of the task, i.e. when there is no modifying operation,
and there are only requesting queries - often much easier. Suppose that we know
how to solve "offline" version of the problem, ie, building for some time
some data structure that can answer queries, but does not know how to handle
modifying queries.
Then we will divide the input requests for blocks (how long - is not specified, this
length is denoted by ). At the start of processing for each block will construct a
data structure for "off-line" version of the task of data at the beginning of this block.
Now we will in turn take requests from the current block and each handle. If the
current request - modifying, then skip it. If the current request - request, please refer
to the data structure for the off-line version of the problem, but before taking into
account all modifying queries in the current block . Take account of such
modifying queries is not always possible, and it should be fast enough - for the
time or a little worse; denote this time through .
Thus, if we all requests that need to process them
time. Value should be chosen based on the specific form of the functions
and . For instance, if and , then the best choice
would be , and the asymptotic behavior of the final turn .
Since the arguments given above is too abstract, we present a few examples of
problems to which this applies sqrt-decomposition.
5
An example of the problem: adding the interval
Condition of the problem: given an array of numbers , and received two
types of queries: find value in the th element of the array, and add a number to all
elements of the array in some interval .

Although this problem can be solved without this technique with the partition of
requests for blocks, we present it here - as simple and obvious application of this
method.
So, let us divide the input requests for blocks (where - the number of
requests). At the beginning of the first block any requests to build structures is not
necessary, just store the array . Come on now at the request of the first block. If
the current request - a request the addition, it is still miss him. If the current request -
request for reading values in some positions , the first just take as a response
value . Then loop through all missed in this block requests the addition, and for
those who gets in , we apply them to increase the current account.
Thus, we have learned to respond to requests for requesting time .
It remains only to note that at the end of each block of queries we must use all
modifying queries of this block to the array . But it is easy to do for - enough
for each request adding noted in the auxiliary array at the number , and the
point - the number , and then go through this array, adding a running total of
the array .
Thus, the final decision will be asymptotic behavior .
Example problem: disjoint-set-union division
There is an undirected graph with vertices and edges. Receives requests of
three kinds: add an edge , remove the edge , and check whether or
not related peaks and through.
If no removal requests, then the solution of the problem would have been well-known
data structure disjoint-set-union (a system of disjoint sets) . However, in the
presence of deletions task becomes much more complicated.
Do as follows. At the beginning of each block requests see what the ribs in this block
will be removed, and immediately remove them from the graph. Now we construct a
system of disjoint sets (dsu) on the resulting graph.
As we now have to respond to the next request from the current block? Our system
of disjoint sets "knows" all the edges, except for those that are added / removed in
the current block. However, removal of dsu we do not have to - we have removed all

6
such advance edges of the graph. Thus, all that can be - it's extra, add ribs, which
can be a maximum of pieces.
Consequently, in response to the current request requesting we can just let the
preorder traversal on the connected components dsu, that will work for , as
we have in the examination will only edges.
Off-line tasks to requests for subsegments and
versatile array sqrt-heuristics for them
Consider another interesting variation ideas sqrt-decomposition.
Suppose we have some problem in which there is an array of numbers, and received
requesting queries of the form - to learn something about the
subsegments . We believe that the request is modified, and we are known
in advance, ie, task - off-line.
Finally, we introduce the latest restriction : we believe that we can quickly
recalculate the answer to the query when you change the left or right border of the
unit. Ie if we knew the answer to a query , then quickly be able to find an
answer to your inquiry or or or .
We now describe the universal heuristics for all these problems. Sort the requests
for the pair: . Ie we sorted requests to the number sqrt-block, which
lies in the left end, and at equality - at the right end.
Consider now the group queries with the same value , and will process all
requests of the group. The answer to the first query count trivial way. Each
subsequent request will be considered on the basis of the previous response: ie to
move left and right boundaries of the previous request to the borders of the next
request, while maintaining the current response. We estimate the asymptotic
behavior: left margin every time could move no more time, and the right - no
more than once in the sum over all the demands of the current group. Total, if the
current group consisted of requests will be made in the amount of not more
than recount. In total, around the algorithm work -
recount.

A simple example of this heuristic is on such a task: to find out the number of
different numbers in the segment of the array .
A little more of sophistication version of this problem is a problem with one of the
rounds Codeforces .

7
Fenwick tree
Fenwick tree - a data structure tree on the array with the following properties:
1) allows us to calculate the value of a reversible operation of G on any interval [L; R]
for time O (log N) ;
2) allows you to change the value of any element of O (log N) ;
3) requires O (N) memory , or rather, exactly the same as, and an array of N
elements;
4) can be easily generalized to the case of multidimensional arrays.
The most common use of wood Fenwick - to calculate the amount of the segment,
ie function G (X1, , Xk) = X1 + + Xk.
Fenwick tree was first described in the article "A new data structure for cumulative
frequency tables" (Peter M. Fenwick, 1994).
Description
For ease of description, we assume that the operation G, for which we build a tree -
is successful .
Suppose we are given an array A [0 N-1]. Fenwick tree - an array T [0 N-1], in
which each element stores the sum of some elements of the array A:
T
i
= the sum of A
J
for all F (i) <= J <= i ,
where F (i) - a function that we will define later.
Now we can write pseudo code for calculating the sum of the functions on the

interval [0; R] for the function of the cell changes:
int sum (int r)
{
int result = 0;
while (r> = 0) {
result + = t [r];
r = f (r) - 1;
}
return result;
}
void inc (int i, int delta)
{
for all j, for which F (j) <= i <= j
{
t [j] + = delta;
}
}
Sum function works as follows. Instead of going through all elements of the array A,
it moves through the array of T, doing "jumping" through the segments where
possible. First, it adds to the response value of the sum on the interval [F (R); R],
then takes the sum of the interval [F (F (R) -1); F (R) -1], and so on, until the bar
reaches zero.
Function inc moves in the opposite direction - in the direction of increasing index,
updating the sum value T
J
for only those positions for which it is needed, i.e. for all j,
for which F (j) <= i <= j.
8
Obviously, the choice of F depends on both the speed of the operations. Now we
consider the function that allows to achieve a logarithmic performance in both cases.

Define the value of F (X) as follows. Consider the binary representation of the
numbers and look at its least significant bit. If it is zero, then F (X) = X. Otherwise,
the binary representation of X ends in a group of one or more units. Replace all the
units in the group with zeros, and assign that number value of the function F (X).
This rather complicated description corresponds to a very simple formula:
F (X) = X & (X + 1) ,
where k - a bitwise logical "AND".
It is easy to see that this formula corresponds to the verbal description function given
above.

We just need to learn how to quickly find these numbers j, for which F (j) <= i <= j.
However, it is not hard to make sure that all these numbers are obtained from i j
successive replacements of the rightmost (least significant) zero in the binary
representation. For example, for i = 10, we find that j = 11, 15, 31, 63, etc.
Ironically, such an operation (replacement of the youngest zero per unit) also
corresponds to a very simple formula:
H (X) = X | (X + 1) ,
where | - is bitwise logical "OR".
Implementation Fenwick tree for the sum of one-
dimensional case
vector <int> t;
int n;
void init (int nn)
{
n = nn;
t.assign (n, 0);
}
int sum (int r)
{
int result = 0;

for (; r> = 0; r = (r & (r + 1)) - 1)
result + = t [r];
return result;
}
void inc (int i, int delta)
{
for (; i <n; i = (i | (i + 1)))
t [i] + = delta;
}
int sum (int l, int r)
{
return sum (r) - sum (l-1);
}
9
void init (vector <int> a)
{
init ((int) a.size ());
for (unsigned i = 0; i <a.size (); i ++)
inc (i, a [i]);
}
Implementation Fenwick tree for a minimum of one-
dimensional case
It should be noted immediately that, as Fenwick tree allows you to find the value of
the function in an arbitrary interval [0; R], we will not be able to find at least on the
interval [L; R], where L> 0. Then, all value changes should only occur downward
(again, because does not work to turn the function min). This is a significant
limitation.
vector <int> t;
int n;
const int INF = 1000 * 1000 * 1000;

void init (int nn)
{
n = nn;
t.assign (n, INF);
}
int getmin (int r)
{
int result = INF;
for (; r> = 0; r = (r & (r + 1)) - 1)
result = min (result, t [r]);
return result;
}
void update (int i, int new_val)
{
for (; i <n; i = (i | (i + 1)))
t [i] = min (t [i], new_val);
}
void init (vector <int> a)
{
init ((int) a.size ());
for (unsigned i = 0; i <a.size (); i ++)
update (i, a [i]);
}
Implementation Fenwick tree for the sum of two-
dimensional case
As already noted, Fenwick tree is easily generalized to the multidimensional case.
vector <vector <int>> t;
10
int n, m;
int sum (int x, int y)

{
int result = 0;
for (int i = x; i> = 0; i = (i & (i + 1)) - 1)
for (int j = y; j> = 0; j = (j & (j + 1)) - 1)
result + = t [i] [j];
return result;
}
void inc (int x, int y, int delta)
{
for (int i = x; i <n; i = (i | (i + 1)))
for (int j = y; j <m; j = (j | (j + 1)))
t [i] [j] + = delta;
}
The system of disjoint sets
This article describes the data structure "a system of disjoint sets" (in English
"disjoint-set-union", or simply "DSU").
This data structure provides the following features. Initially, there are several
elements, each of which is a separate (his own) set. In one operation, you
can combine any two sets , and you can request, in which the set is now
specified item.Also, in the classic version, introduced another operation - the creation
of a new element, which is placed in a separate set.
Thus, the basic data structure of the interface consists of only three operations:
• - adds a new element by placing it in a new set
consisting of one it.
• - unites these two sets (set in which there is an
element , and set in which there is an element ).
• - returns in which the set is specified item . In fact, at
the same time returns one of the elements of the set
(called representative or leader (in English literature "leader")). This
representative is selected in each set of the data structure (and may

vary over time, namely after the call ).
11
For example, if a call for any one of the two elements has returned to the
same value, this means that these elements are in one and the same set, and
otherwise - to the different sets.
Described below is a data structure allows each of these operations in nearly
the average (for more details see the asymptotic behavior. below after the
description of the algorithm).
Also in one of the sub-article describes an alternative embodiment of the DSU, which
allows to achieve the asymptotic behavior of an average of one request
at ; and when (i.e., much more ) - and at the time average
of the request (see. "Storage DSU as explicit list sets").
Building an efficient data structure
We first define the form in which we will store all the information.
Set of elements we will be stored in the form of trees : one tree corresponds to one
set. The root of the tree - a representative (leader) of the set.
When implemented, this means that we lead an array in which each element
we store a reference to its parent in the tree. For the roots of trees, we assume that
their ancestors - they (ie, link to fixate at this point).
Naive implementation
We can already write the first implementation of a system of disjoint sets. It would be
quite inefficient, but then we will improve it with the help of two methods, eventually
getting almost constant during operation.
So, all the information about the sets of elements stored with us using the
array .
To create a new item (operation ), we simply create a tree with the root
at the top , noting that her father - she herself.
To combine the two sets of (operation ), we first find the leaders of
the set, which is , and sets in which there is . If the leaders of the match, then do
not do anything - it means that the sets already been merged. Otherwise, you can

simply indicate that the ancestor vertex is (or vice versa) - thus joining one tree to
another.
Finally, the implementation of the search operation leader ( ) is simple:
we ascend the ancestors from the top , until we reach the root, ie while a reference
to the parent does not keep to himself. This operation is more convenient to
implement recursively (especially it will be convenient later, in relation to added
optimizations).
void make_set (int v) {
parent[v] = v;
12
}

int find_set (int v) {
if (v == parent[v])
return v;
return find_set (parent[v]);
}

void union_sets (int a, int b) {
a = find_set (a);
b = find_set (b);
if (a != b)
parent[b] = a;
}
However, such an implementation of a system of disjoint sets is very inefficient . It is
easy to construct an example where after several unions of sets get a situation that a
lot - it's a tree, degenerated into a long chain. As a result, each call will
work on this test during the order of the depth of the tree, ie, for .
This is very far from that of the asymptotics we were going to get (constant
time). Therefore, we consider two optimizations that allow (applied even separately)

significantly improved performance.
Heuristics compression path
This heuristic is designed to accelerate the work .
It lies in the fact that when after the call we find the desired leader
sets, then remember that at the apex , and all passed the path peaks - this is the
leader . The easiest way to do it by forwarding them to this vertex .
Thus, an array of ancestors meaning varies slightly: now it is compressed
array of ancestors , ie for each vertex there may be stored not in the immediate
ancestor, and the ancestor ancestor ancestor ancestor ancestor, etc.
On the other hand, it is clear that you can do to these pointers always point
to the leader: otherwise, in operation would have to update the leaders
in the elements.
Thus, the array should be treated just as an array of ancestors, possibly
partially compressed.
The new implementation of the operation is as follows:
int find_set (int v) {
if (v == parent[v])
13
return v;
return parent[v] = find_set (parent[v]);
}
This simple realization is doing everything that was intended, first by the recursive
call is the leader of the set, and then, in the process of promotion of the stack, this
leader is assigned to links for all of the passed elements.
To implement this operation and can not recursive, but then you have to perform two
passes on a tree: first find the desired leader, the second - it would put all the
vertices of the path. However, in practice, non-recursive implementation does not
provide a significant benefit.
Evaluation of the asymptotic behavior of the application of heuristics
compression path

We show that the use of a heuristic path compression achieves logarithmic
asymptotics : a single request on average.
Note that since the operation is a call operation, two and
more operations, we can focus only on the proof time of the evaluation
operations .
Let's call the weight of the top of the number of descendants of this node
(including its itself). Vertex weights, obviously, can only increase during the
algorithm.
We call span rib difference weights of all edges: (apparent
ancestor vertices in weight is always greater than the descendent vertices). It can be
noted that the range of a fixed rib can only increase during the algorithm.
In addition, we will divide the edges in the classes : we say that the edge has a
class if it belongs to the segment sweep . Thus, the class edges - is
a number from to .
We now fix an arbitrary vertex and will monitor how the edge in her ancestor: first, it
is not (until the top is the leader), then held an edge of some sort in the top (where
the set with a vertex joins another set), and then may change compression paths
during call . It is clear that we are interested in the asymptotic behavior of
only the latter case (with path compression): All other cases require time on a
single request.
Consider the work of a call operation : it takes place in a tree along a path ,
erasing all edges of this path and redirecting them to the leader.
14
Consider this path and exclude from consideration last edge of each class (ie no
more than one edge of the class ). Thus, we excluded the
edges of each request.
Let us now consider all the other edges of this path. For each such edge, if it has
class , it turns out that in this way there is one edge class (otherwise we would be
obliged to exclude the current edge, as the sole representative of the class ). Thus,
after path compression rib is replaced by at least an edge class . Given that

reduces the weight of the edges can not we get that for each vertex affected by the
query , an edge in her ancestor was either excluded or severely
increased its class.
Hence, we finally obtain the asymptotic behavior of work
requests: that (if ) is logarithmic while working on a single
query on average.
Heuristics union by rank
We consider here the other heuristics, which in itself can accelerate the time of the
algorithm, and in combination with compression heuristic ways and at all capable of
achieving almost constant running time per request, on average.
This heuristic is a slight change in the work : if a naive implementation of
it, a tree will be appended to what is determined by chance, but now we will do it on
the basis of rank .
There are two variants of the rank heuristic: In one embodiment, the rank of a tree
is the number of vertices in it, in the other - the depth of the tree (or rather, the
upper limit to the depth of the tree, as the joint application of heuristics compression
ways the real depth of the tree can be reduced).
In both cases the essence of heuristics is the same: when the tree will
be attached to the lower rank to a tree with a higher rank.
We present the implementation of the rank heuristic based on the size of the
trees :
void make_set (int v) {
parent[v] = v;
size[v] = 1;
}

void union_sets (int a, int b) {
a = find_set (a);
b = find_set (b);
if (a != b) {

if (size[a] < size[b])
swap (a, b);
parent[b] = a;
size[a] += size[b];
}
15
}
We present the implementation of the rank heuristic based on the depth of trees :
void make_set (int v) {
parent[v] = v;
rank[v] = 0;
}

void union_sets (int a, int b) {
a = find_set (a);
b = find_set (b);
if (a != b) {
if (rank[a] < rank[b])
swap (a, b);
parent[b] = a;
if (rank[a] == rank[b])
++rank[a];
}
}
Both options are equivalent to rank a heuristic perspective asymptotic however in
practice possible to use any of them.
Evaluation of the asymptotic behavior of the application of heuristics
rank
We show that the asymptotic behavior of the system of disjoint sets using only
heuristics rank, without compression heuristics paths will slide to one inquiry on

average: .
Here we show that for any of the two options rank heuristic depth of each tree will be
the value of that will automatically mean logarithmic asymptotics for the
request , and therefore the request .
Consider the rank heuristic in the depth of the tree . We show that if the rank of
the tree is , the tree contains at least vertices (here will automatically follow that
the rank and, hence, the depth of the tree, is the value ). Will prove by
induction: for this is obvious. When compressing ways depth can only
decrease.Rank tree increases to where it is joined by a tree of rank
; applying to these two trees the size of the induction hypothesis, we get that
new tree rank really will have a minimum of vertices as required.
Let us now consider the rank heuristic size trees . We show that if the size of the
tree is , its height is not more . Will prove by induction: for the
statement is true. When compressing ways depth can only decrease, so path
compression nothing breaks. Suppose now that merges two tree size and ; then
16
by the induction of their height is less than or equal to, respectively,
and . Without loss of generality, we assume that the first tree - small
( ), so after the merger of the depth of the tree resulting from the
peaks will be equal to:
To complete the proof, we must show that:
that there is almost obvious inequality, as and .
Combining heuristics: path compression plus rank heuristic
As mentioned above, the combined use of these heuristics gives the best results
particularly in the end reaching almost constant operating time.
We do not give here the proof of the asymptotic behavior, as it is very volumetric
(see., Eg, feed, Leiserson, Rivest, Stein "Algorithms. Design and Analysis"). For the
first time this evidence was conducted Tarjanne (1975).
The final result is that the joint application of heuristics path compression and union
by rank while working on a request is received , on average, where -

the inverse Ackermann function , which grows very slowly, so slowly that for all
reasonable limits , it does not exceed 4 (about to ) .
It is therefore about the asymptotic behavior of the system of disjoint sets is
appropriate to say "almost constant during operation."
We give here the final implementation of the system of disjoint sets that
implements both of these heuristics (heuristics used rank the relative depth of trees):
void make_set (int v) {
parent[v] = v;
rank[v] = 0;
}

int find_set (int v) {
if (v == parent[v])
return v;
return parent[v] = find_set (parent[v]);
}

void union_sets (int a, int b) {
a = find_set (a);
b = find_set (b);
if (a != b) {
if (rank[a] < rank[b])
17
swap (a, b);
parent[b] = a;
if (rank[a] == rank[b])
++rank[a];
}
}
Use in various tasks and improve

In this section we consider some applications of the data structure "a system of
disjoint sets", as trivial, and use some improvements to the data structure.
Support for the connected components of the graph
This is one of the most obvious applications of the data structure "a system of
disjoint sets", which, apparently, and stimulated the study of the structure.
Formally, the problem can be formulated as follows: initially given a blank graph
gradually in this graph can be added to the top and undirected edges, as well as
requests come - "in the same whether the connected components are peaks
and ? ".
Directly applying here the above data structure, we get a solution that processes a
request to add the vertex / edge or request a review of the two peaks - in almost
constant time on average.
Given that almost exactly the same problem is posed using Kruskal's algorithm for
finding the minimum spanning tree , we immediately obtain the improved version
of the algorithm , which works almost in linear time.
Sometimes encountered in practice inverted version of this problem : originally a
graph with some vertices and edges, and receives requests to remove the ribs. If the
task is given to offline, ie we can know in advance all the requests, to solve this
problem as follows: perevernёm problem backwards: we assume that we have an
empty graph, which can be added to the edges (first add the edge of the last
request, then the penultimate, etc. ). Thus, as a result of the inversion of this task,
we came to the usual problem whose solution described above.
Search the connected components in the image
One of the underlying surface applications DSU is to solve the following problem:
there is the image pixels. Initially, the entire image is white, but then it draws
some black pixels. Is required to determine the size of each "white" connected
components in the final image.
To solve, we just iterate through all the white cells of the image, for each cell iterate
through its four neighbors, and if a neighbor is also white - then call
these two vertices. Thus, we will have a DSU with vertices corresponding to the

pixels of the image. Receive as a result trees DSU - is sought connected
components.
This problem can be solved easily using the crawl depth (or bypass wide ), but the
method described here has a definite advantage: it can handle a matrix line
(operating only with the current line, the previous line, and the system of disjoint sets
built for the elements of one line ), i.e. using the procedure memory.
18
Support for additional information for each set
"The system of disjoint sets" makes it easy to store any additional information
pertaining to the sets.
A simple example - it's the size of sets : how to store them, it was described in the
description of the rank of heuristics (where information was recorded for the current
leader of the set).
Thus, together with the leader of each set can store any additional required
information in a specific task.
Application DSU compression "jumps" over the interval. The
problem of painting subsegments Offline
One of the most common applications of DSU is that if there is a set of items, and
each item comes from one edge, then we can quickly (in almost constant time) to
find the end point, which do we get if we move along the edges of a given starting
point.
A good example of this application is the task of painting subsegments : a
segment length , each cell of which (ie the length of each piece ) has zero
color.Receives requests species - repaint segment in color . Required to
find the final color of each cell. Request assumed to be known in advance, ie, task -
Offline.
For solutions we can make DSU-structure that for each cell will keep a reference to
the near right unpainted cell. Thus, initially, each cell indicates herself, and after
painting the first subsegment - cell before subsegment will point to the cell after the
end of the subsegment.

Now, to solve the problem, we consider repainting requests in reverse order , from
last to first. To execute the query each time we just using our DSU find the most left
unpainted cell inside the segment, repaint it, and we move the pointer out of it right
the next empty cell.
Thus, we are actually using the DSU with heuristic compression ways, but without
rank heuristics (because it is important, who will be the leader after the
merger).Consequently, the asymptotic behavior of the final amount on
request (however, small compared with other data structures constant).
Implementation:
void init() {
for (int i=0; i<L; ++i)
make_set (i);
}

void process_query (int l, int r, int c) {
for (int v=l; ; ) {
v = find_set (v);
if (v >= r) break;
answer[v] = c;
parent[v] = v+1;
}
}
19
However, it is possible to implement this solution with the rank heuristic : we will
store for each set in a array , which is the set of ends (ie, the rightmost
point).Then it will be possible to combine two sets to one in their ranking heuristics
by placing then get a lot of new right border. Thus, we obtain a solution for .
Support distances to the leader
Sometimes in specific applications of disjoint sets pops requirement to maintain the
distance to the leader (ie the path length to the edges in the tree from the current

node to the root of the tree).
If there were no heuristics compression paths, any difficulties would not arise - the
distance to the root just equal to the number of recursive calls that did
function .
However, as a result of compression paths several ribs path could squeeze into a
single edge. Thus, with each vertex will have to store additional information: length
of the current edges from the vertex to the ancestor .
When implementing convenient to represent the array function
now return more than one number, and a pair of numbers: the top of the leader and
the distance to it:
void make_set (int v) {
parent[v] = make_pair (v, 0);
rank[v] = 0;
}

pair<int,int> find_set (int v) {
if (v != parent[v].first) {
int len = parent[v].second;
parent[v] = find_set (parent[v].first);
parent[v].second += len;
}
return parent[v];
}

void union_sets (int a, int b) {
a = find_set (a) .first;
b = find_set (b) .first;
if (a != b) {
if (rank[a] < rank[b])
swap (a, b);

parent[b] = make_pair (a, 1);
if (rank[a] == rank[b])
++rank[a];
}
}
20
Support for parity path length and the problem of verification
of bipartite graphs online
By analogy with the path length to the leader, so it is possible to maintain the path
length of the parity before. Why is this application has been allocated in a separate
paragraph?
The fact is that usually demand parity storage path emerges in connection with the
next task : given initially empty graph, it can be added to the edges, and do queries
of the form "if the connected component containing a given vertex bipartite ? ".
To solve this problem, we can start a system of disjoint sets of connected
components for storage, and store each vertex parity path length to its leader. Thus,
we can quickly check whether the addition of lead to a violation of the specified rib
bipartite graph or not: namely, if the ends of the ribs lie in the same connected
component, and thus have the same parity path length to the leader, the addition of
the edge lead to the formation of a cycle of odd length and evolution of the current
components in non-bipartite.
Home complexity with which we are confronted at the same time - this is what we
must carefully considering parity, producing the union of two trees in the
function .
If we add an edge connecting the two components into one, then when joining
one tree to another, we need to tell him this parity to result in peaks and would
receive different parity path length.
Vyvedёm formula on who should receive this parity, we expose the leader of one of
the set when connecting it to the leader of another set. We denote by parity path
length from the top to the leader it sets through - parity path length from the top

to the leader of her set, and through - the required parity, we have to put attachable
leader. If the set with a vertex joins with the top of the set , becoming a subtree,
then after joining at the top of its parity does not change and remains equal , and
at the top of the parity becomes equal to (a symbol here denotes the
operation XOR (symmetric difference)). We require that these two differed parity,
ie their XOR is unity. Ie we obtain the equation for :
deciding which find:
Thus, regardless of which set is attached to which, the said formula should be used
to set parity ribs conducted from one leader to the next.
We present the implementation of DSU with support parity. As in the previous
paragraph, for convenience, we use a pair of storage ancestors and results of
operations . In addition, for each set we store in the array ,
whether it is still bipartite or not.
21
void make_set (int v) {
parent[v] = make_pair (v, 0);
rank[v] = 0;
bipartite[v] = true;
}

pair<int,int> find_set (int v) {
if (v != parent[v].first) {
int parity = parent[v].second;
parent[v] = find_set (parent[v].first);
parent[v].second ^= parity;
}
return parent[v];
}

void add_edge (int a, int b) {

pair<int,int> pa = find_set (a);
a = pa.first;
int x = pa.second;

pair<int,int> pb = find_set (b);
b = pb.first;
int y = pb.second;

if (a == b) {
if (x == y)
bipartite[a] = false;
}
else {
if (rank[a] < rank[b])
swap (a, b);
parent[b] = make_pair (a, x ^ y ^ 1);
bipartite[a] &= bipartite[b];
if (rank[a] == rank[b])
++rank[a];
}
}

bool is_bipartite (int v) {
return bipartite[ find_set(v) .first ];
}
Algorithm for finding the RMQ (at least in the interval)
for an average Offline
Formally, the problem is formulated as follows: it is necessary to implement a data
structure that supports two types of queries: the addition of a specified
number ( ), and search and retrieval of the current minimum

number . We assume that each number is added only once.
22
In addition, we believe that the entire sequence of requests is known to us in
advance, ie, task - Offline.
The idea of the solution is as follows. Rather than take turns responding to each
request, brute over the number and determine the answer to a query, this
number should be. To do this, we need to find the first unanswered request, coming
after the addition of this number - it is easy to understand that this is the
request, the answer to that is a number .
Thus, the idea here is obtained, similar to the task of painting the segments .
You can obtain a solution for an average of inquiry if we abandon the rank
heuristic and will simply be stored in each element of the closest link to the right of
the query , and use path compression to maintain those links after
the merger.
You can also get a solution and , if we use the rank heuristic and will be
stored in each set position number where it ends (that in the previous embodiment
solutions achieved automatically due to the fact that the links were always just right -
now it will be necessary stored explicitly).
Algorithm for finding the LCA (lowest common ancestor in the
tree) for an average Offline
Tarjan algorithm for finding the LCA for an average online is described in the
relevant article . This algorithm compares favorably with other search algorithms LCA
in its simplicity (especially compared to the optimal algorithm Farah-Colton-Bender ).
Storage DSU as an explicit list of sets. The application of this
idea at the confluence of various data structures
One of the alternative methods of storing DSU is to maintain each set as explicitly
stored list of its elements . In this case, each element is also saved a reference to
the representative (leader) of his set.
At first glance it seems that this inefficient data structure: the union of two sets, we
will have to add one list to the other end, as well as update the leader of all the

elements of one of the two lists.
However, as it turns out, the use of heuristics weight similar to that described
above, can significantly reduce the asymptotic behavior of the operation:
up to perform queries over the elements.
Under the weight heuristic is understood that we are always going to add the
lesser of two more sets in . Adding one set to another is easy to
implement in the time order of the size of the set to be added, and the search for a
leader - a time with this method of storage.
23
We prove the asymptotic behavior for performing queries. We
fix an arbitrary element and examine how it impacted the merge
operation . When an element exposed for the first time, we can say that
the size of his new set will be at least . When exposed a second time - it can be
argued that he gets into a lot of size at least (as we add more to the smaller
set). And so on - we see that an element could affect up to merge
operations. Thus, in the sum over all vertices of this , plus on every
request - as required.
Here is an example implementation :
vector<int> lst[MAXN];
int parent[MAXN];

void make_set (int v) {
lst[v] = vector<int> (1, v);
parent[v] = v;
}

int find_set (int v) {
return parent[v];
}


void union_sets (int a, int b) {
a = find_set (a);
b = find_set (b);
if (a != b) {
if (lst[a].size() < lst[b].size())
swap (a, b);
while (!lst[b].empty()) {
int v = lst[b].back();
lst[b].pop_back();
parent[v] = a;
lst[a].push_back (v);
}
}
}
Also, the idea of adding elements to a smaller set of more can be used outside of the
DSU, to solve other problems.
For example, consider the following problem : given a tree, each leaf of which is
attributed to any number (the same number may occur more than once in different
leaves). Is required for each node of the tree to find out the number of different
numbers in its subtree.
Applying for this problem the same idea, we can obtain a solution: Let bypass in
depth on the tree, which will return a pointer to the numbers - a list of all the
numbers in the subtree of this node. Then, to get an answer to the current node
24
(unless, of course, it is not a leaf) - bypassing the need to call in the depth of all the
children of this top and combine all received one, the size of which will be the
answer to the current node. To effectively combine multiple one just applies the
above method: will combine the two sets by simply adding one element in a larger
smaller set. As a result, we obtain a solution for , since the addition of
one element is made of .

Storage DSU preserving explicit tree
structures. Perepodveshivanie. Search algorithm bridges
in the graph of the average online
One of the most powerful applications of data structures "of disjoint sets" is that it
allows you to store the same time as the compressed and uncompressed tree
structure . Compressed structure can be used for rapid association of trees and
check for two vertices belonging to one tree, and uncompressed - for example, to
search for the path between two given vertices, or other rounds of the tree structure.
When implemented, this means that in addition to the usual array of compressed
DSU ancestors we zavedёm array normal, uncompressed,
ancestors . It is clear that maintaining such an array does not worsen
the asymptotic behavior: changes in it occur only when the two trees, and only one
element.
On the other hand, in practical applications often requires two joining tree to learn
said rib is not necessarily exiting from their roots. This means that we have no other
choice but to perepodvesit one of the trees for the specified vertex, then we were
able to attach the tree to the next, making the root of the tree of child nodes to be
added to the second end edge.
At first glance, it appears that the operation perepodveshivaniya - very expensive
and greatly worsen the asymptotic behavior. Indeed, for perepodveshivaniya tree for
the top we have to go from the vertex to the root of the tree, updating all
pointers and .
However, in reality things are not so bad: you simply perepodveshivat from the two
trees, which is less to get the asymptotic behavior of one association, equal
to the average.
For more details (including evidence asymptotics) cm. search algorithm bridges in
the graph of the average online .
Historical retrospective
The data structure "a system of disjoint sets" was known relatively long time.
Way to store this structure in the form of forest trees was apparently first described

by Haller and Fisher in 1964 (Galler, Fisher "An Improved Equivalence Algorithm"),
but a full analysis of the asymptotic behavior was carried out much later.
25

×