Thuyết trình cơ sở dữ liệu nâng cao point access method

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (929.42 KB, 32 trang )

Point Access Method

Nhóm 1 :
Lâm Tuấn Anh
Nguyễn Đình Tân Anh
Lê Minh Châu

Point Access Method

1.
2.
3.

Spatial Data
Main Memory Structure
Point Access Methods

Spatial Data

Characteristic of Spatial Data
•
•
•
•
•
•
•

Complex Structure
Dynamic
Spatial databases tend to be large
There is no standard algebra defined on spatial data
Many spatial operators are not closed
Spatial database operators more expensive than standard relational operators
There is no total order among spatial object.

Queries in Spatial Data

Queries in Spatial Data
•

Exact Match Query ( EMQ )

•

Point Query (PQ )

•
•
•

•
•
•

Condition : Given object o’ with spatial extent o’.G in Euclide with d-dimension

Target : Find all objects o with same spatial extent as o’
Query

Condition : Given a point p in Euclide with d-dimension
Target : Find all objects o ovelapping with p
Query

Queries in Spatial Data
•

Enclosure Query ( EQ )

•
•
•

Condition : Given object o’ with spatial extent o’.G in Euclide with d-dimension
Target : Find all objects o enclosing o’
Query

Queries in Spatial Data
•

Spatial Join

•
•
•

•

Condition : Given two collections R and S of spatial objects and a
spatial predicate θ
Target : find all pairs of objects (o, o’) belongs to RxS where θ(o.G, o’.G) evaluates to true
Query

Spatial Data
•

Requirements for Multidimensional Access Methods

•
•
•
•
•
•
•
•
•

Dynamics.
Secondary/tertiary storage management
Broad range of supported operations.
Independence of the input data and insertion sequence
Simplicity
Scalability
Time efficiency

Space efficiency
Concurrency and recovery

Main Memory Structure

i
i
i
i

th
th
th
th

point: pi
polygon: ri
centroid: ci
minimum bounding

box: mi

Figure 9. Running example.

Main Memory Structure

i
i

i
i

th
th
th
th

point: pi
polygon: ri
centroid: ci
minimum bounding

box: mi

Figure 9. Running example.

Main Memory Structure

i
i
i
i

th
th
th
th

point: pi
polygon: ri
centroid: ci
minimum bounding

box: mi

Figure 10. k-d construction

Main Memory Structure

i
i
i
i

th
th
th
th

point: pi
polygon: ri
centroid: ci
minimum bounding

box: mi

Figure 11. k-d tree

Main Memory Structure

•
•
•

Designed for main memory applications where all the data are available without accessing the disk.
Do not take secondary storage management into account explicitly
In many spatial database applications the amount of data to be managed is notoriously large

Point Access Methods

• Multidimensional Hashing
• Hierarchical Access Method

Multidimensional Hashing
•
•
•
•

No total order for objects in two- and higher-dimensional space that completely preserves spatial proximity
Try to construct hashing functions that preserve proximity at least to some extent
Goal: Objects located close to each other in original space should be likely to be stored close together on
the disk
=>minimizing the number of disk accesses per range query

The Grid File
•

A d-dimensional orthogonal grid on the
universe.

•

The grid is not necessarily regular, the
resulting cells may be of different
shapes and sizes.

•

Each cell is associated with one bucket,
but a bucket may contain several
adjacent cells

•

Since the directory may grow large, it is
usually kept on secondary storage

•

To guarantee that data items are always
found with no more than two disk
accesses for exact match queries, the

grid itself is kept in main memory,
represented by d one-dimensional arrays
called scales

EXCELL
•
•

decomposes the universe regularly: all grid cells are of equal size
each new split results in the halving of all cells and therefore in the doubling of the directory size

The Two-Level Grid File
•
•
•
•
•
•
•
•

Use a second grid file to
manage the grid directory.
The first of the two levels is
called the root directory,
Second level: the actual grid
directory.
root directory contain

pointers to the directory
pages of the lower level,
which in turn contain
pointers to the data pages.
Splits are often confined to
the subdirectory regions
without affecting too much
the surroundings.
=>slower directory growth
not solve the problem of
super linear directory size.

The Twin Grid File
•
•
•
•

increase space utilization by introducing a second grid file
relationship between these two grid files is not hierarchical but somewhat more balanced.
Both grid files span the whole universe.
The distribution of the data among the two files is performed dynamically.

Hierarchical Access Method
•
•
•
•

•
•

Based on binary of multi-way tree structure
like hashing, stores data in bucket
each bucket is leaf of a node, and a disk page
interior nodes of the tree guide search
search: top-down tree traversal
difference between different methods: characteristics of the regions

Hierarchical Access Method
•

k-d-B-tree

•
•
•
•
•
•
•
•
•

combination of adaptive k-d-tree and B-tree
partition the universe like adaptive k-d
associates subspaces to tree nodes
interior nodes are intervals

nodes in same level are mutually disjoint
perfectly balanced (like B-tree)
search straightforward, like k-d-tree
insert: search, find the right bucket, if required split and move half the data to it.
Deletion: search, remove, if necessary merge node with siblings

Hierarchical Access Method
•

k-d-B-tree

•
•
•
•
•
•
•
•
•

combination of adaptive k-d-tree and B-tree
partition the universe like adaptive k-d
associates subspaces to tree nodes
interior nodes are intervals
nodes in same level are mutually disjoint
perfectly balanced (like B-tree)
search straightforward, like k-d-tree
insert: search, find the right bucket, if required split and move half the data to it.

Deletion: search, remove, if necessary merge node with siblings

Hierarchical Access Method
•

LSD tree

•
•
•
•

directory is organized same as adaptive k-d-tree
better adaptation to data distribution (in compare to fixed binary partitioning)
external balancing property: heights of external subtrees differ at most by one
combines two split strategies to accommodate skewed data:

•

data-dependent : based on data, tries to achieve most balanced structure (equal number of data in both sides of
split)

•

distribution-dependent: split at fixed dimension and position (know distribution is assumed)

Hierarchical Access Method

Thuyết trình cơ sở dữ liệu nâng cao point access method

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về