Tải bản đầy đủ (.pptx) (32 trang)

Thuyết trình cơ sở dữ liệu nâng cao point access method

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (929.42 KB, 32 trang )

Point Access Method

Nhóm 1 :
Lâm Tuấn Anh
Nguyễn Đình Tân Anh
Lê Minh Châu


Point Access Method

1.
2.
3.

Spatial Data
Main Memory Structure
Point Access Methods


Spatial Data


Characteristic of Spatial Data









Complex Structure
Dynamic
Spatial databases tend to be large
There is no standard algebra defined on spatial data
Many spatial operators are not closed
Spatial database operators more expensive than standard relational operators
There is no total order among spatial object.


Queries in Spatial Data


Queries in Spatial Data


Exact Match Query ( EMQ )



Point Query (PQ )









Condition : Given object o’ with spatial extent o’.G in Euclide with d-dimension

Target : Find all objects o with same spatial extent as o’
Query

Condition : Given a point p in Euclide with d-dimension
Target : Find all objects o ovelapping with p
Query


Queries in Spatial Data


Enclosure Query ( EQ )





Condition : Given object o’ with spatial extent o’.G in Euclide with d-dimension
Target : Find all objects o enclosing o’
Query


Queries in Spatial Data


Spatial Join







Condition : Given two collections R and S of spatial objects and a
spatial predicate θ
Target : find all pairs of objects (o, o’) belongs to RxS where θ(o.G, o’.G) evaluates to true
Query


Spatial Data


Requirements for Multidimensional Access Methods











Dynamics.
Secondary/tertiary storage management
Broad range of supported operations.
Independence of the input data and insertion sequence
Simplicity
Scalability
Time efficiency

Space efficiency
Concurrency and recovery


Main Memory Structure

i
i
i
i

th
th
th
th

point: pi
polygon: ri
centroid: ci
minimum bounding

box: mi

Figure 9. Running example.


Main Memory Structure

i
i

i
i

th
th
th
th

point: pi
polygon: ri
centroid: ci
minimum bounding

box: mi

Figure 9. Running example.


Main Memory Structure

i
i
i
i

th
th
th
th


point: pi
polygon: ri
centroid: ci
minimum bounding

box: mi

Figure 10. k-d construction


Main Memory Structure

i
i
i
i

th
th
th
th

point: pi
polygon: ri
centroid: ci
minimum bounding

box: mi

Figure 11. k-d tree



Main Memory Structure





Designed for main memory applications where all the data are available without accessing the disk.
Do not take secondary storage management into account explicitly
In many spatial database applications the amount of data to be managed is notoriously large


Point Access Methods

• Multidimensional Hashing
• Hierarchical Access Method


Multidimensional Hashing





No total order for objects in two- and higher-dimensional space that completely preserves spatial proximity
Try to construct hashing functions that preserve proximity at least to some extent
Goal: Objects located close to each other in original space should be likely to be stored close together on
the disk
=>minimizing the number of disk accesses per range query



The Grid File


A d-dimensional orthogonal grid on the
universe.



The grid is not necessarily regular, the
resulting cells may be of different
shapes and sizes.



Each cell is associated with one bucket,
but a bucket may contain several
adjacent cells



Since the directory may grow large, it is
usually kept on secondary storage



To guarantee that data items are always
found with no more than two disk
accesses for exact match queries, the

grid itself is kept in main memory,
represented by d one-dimensional arrays
called scales


EXCELL



decomposes the universe regularly: all grid cells are of equal size
each new split results in the halving of all cells and therefore in the doubling of the directory size


The Two-Level Grid File









Use a second grid file to
manage the grid directory.
The first of the two levels is
called the root directory,
Second level: the actual grid
directory.
root directory contain

pointers to the directory
pages of the lower level,
which in turn contain
pointers to the data pages.
Splits are often confined to
the subdirectory regions
without affecting too much
the surroundings.
=>slower directory growth
not solve the problem of
super linear directory size.


The Twin Grid File





increase space utilization by introducing a second grid file
relationship between these two grid files is not hierarchical but somewhat more balanced.
Both grid files span the whole universe.
The distribution of the data among the two files is performed dynamically.


Hierarchical Access Method








Based on binary of multi-way tree structure
like hashing, stores data in bucket
each bucket is leaf of a node, and a disk page
interior nodes of the tree guide search
search: top-down tree traversal
difference between different methods: characteristics of the regions


Hierarchical Access Method


k-d-B-tree











combination of adaptive k-d-tree and B-tree
partition the universe like adaptive k-d
associates subspaces to tree nodes
interior nodes are intervals

nodes in same level are mutually disjoint
perfectly balanced (like B-tree)
search straightforward, like k-d-tree
insert: search, find the right bucket, if required split and move half the data to it.
Deletion: search, remove, if necessary merge node with siblings


Hierarchical Access Method


k-d-B-tree











combination of adaptive k-d-tree and B-tree
partition the universe like adaptive k-d
associates subspaces to tree nodes
interior nodes are intervals
nodes in same level are mutually disjoint
perfectly balanced (like B-tree)
search straightforward, like k-d-tree
insert: search, find the right bucket, if required split and move half the data to it.

Deletion: search, remove, if necessary merge node with siblings


Hierarchical Access Method


LSD tree






directory is organized same as adaptive k-d-tree
better adaptation to data distribution (in compare to fixed binary partitioning)
external balancing property: heights of external subtrees differ at most by one
combines two split strategies to accommodate skewed data:



data-dependent : based on data, tries to achieve most balanced structure (equal number of data in both sides of
split)



distribution-dependent: split at fixed dimension and position (know distribution is assumed)


Hierarchical Access Method



×