Chapter 8 Physical Database Design

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (203.13 KB, 50 trang )

Chapter 08: Physical Database Design

Database Design Process
Conceptual
Model
Logical
Model
External
Model
Conceptual
requirements
Conceptual
requirements
Conceptual
requirements
Conceptual
requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Internal

Model
Physical
Design

Physical Database Design
•
Many physical database design decisions are implicit
in the technology adopted
–
Also, organizations may have standards or an
“information architecture” that specifies operating
systems, DBMS, and data access languages thus
constraining the range of possible physical
implementations.
•
We will be concerned with some of the possible
physical implementation issues

Physical Database Design
•
The primary goal of physical database design is data processing
efficiency
•
We will concentrate on choices often available to optimize
performance of database services
•
Physical Database Design requires information gathered during earlier
stages of the design process

Physical Design Information

•
Information needed for physical file and database
design includes:
–
Normalized relations plus size estimates for them
–
Definitions of each attribute
–
Descriptions of where and when data are used
•
entered, retrieved, deleted, updated, and how often
–
Expectations and requirements for response time, and
data security, backup, recovery, retention and integrity
–
Descriptions of the technologies used to implement the
database

Physical Design Decisions
•
There are several critical decisions that will affect the integrity and
performance of the system
–
Storage Format
–
Physical record composition
–
Data arrangement
–
Indexes

–
Query optimization and performance tuning

Storage Format
•
Choosing the storage format of each field (attribute). The DBMS
provides some set of data types that can be used for the physical
storage of fields in the database
•
Data Type (format) is chosen to minimize storage space and maximize
data integrity

Objectives of data type selection
•
Minimize storage space
•
Represent all possible values
•
Improve data integrity
•
Support all data manipulations
•
The correct data type should, in minimal space,
represent every possible value (but eliminate
illegal values) for the associated attribute and can
support the required data manipulations (e.g.
numerical or string operations)

Access Data Types
•

Numeric (1, 2, 4, 8 bytes, fixed or float)
•
Text (255 max)
•
Memo (64000 max)
•
Date/Time (8 bytes)
•
Currency (8 bytes, 15 digits + 4 digits decimal)
•
Autonumber (4 bytes)
•
Yes/No (1 bit)
•
OLE (limited only by disk space)
•
Hyperlinks (up to 64000 chars)

Access Numeric types
•
Byte
–
Stores numbers from 0 to 255 (no fractions). 1 byte
•
Integer
–
Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes
•
Long Integer (Default)
–

Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions). 4
bytes
•
Single
–
Stores numbers from -3.402823E38 to –1.401298E–45 for negative
values and from 1.401298E–45 to 3.402823E38 for positive values.
4 bytes
•
Double
–
Stores numbers from –1.79769313486231E308 to –
4.94065645841247E–324 for negative values and from
1.79769313486231E308 to 4.94065645841247E–324 for positive values.
15 8 bytes
•
Replication ID
–
Globally unique identifier (GUID) N/A 16 bytes

Designing Physical Records
•
A physical record is a group of fields stored in adjacent memory
locations and retrieved together as a unit
•
Fixed Length and variable fields

Data Storage
•
Storing Data: Disks

•
Buffer manager
•
Representing relational data in a disk

The Memory Hierarchy
Main Memory = Disk Cache
•
Volatile
•
256M-1G
•
Access time:
10-100 nanoseconds
•
Persistent
•
10-100 GB storage
•
speed:
•
Rate=5-10 MB/S
•
Access time=
10-15 msecs.
•
1.5 MB/S transfer rate
•
280 GB typical
capacity

•
Only sequential access
•
Not for operational
data
Processor Cache:
•
access time 10 nano’s
•
512K
Disk
Tape

Main Memory
•
Fastest, most expensive (excluding cache)
•
Today: 512MB are common even on PCs
•
Many databases could fit in memory
–
New industry trend: Main Memory Database
–
E.g TimesTen
•
Main issue is volatility

Secondary Storage
•
Disks

•
Slower, cheaper than main memory
•
Persistent !!!
•
The unit of disk I/O = bloc k
–
Typically 1 block = 4k
–
A disk block is also called a disk page or simply a page
•
Used with a main memory buffer

Block
•
Blocking factor (bfr) for a file is the average number of records stored
in a disk block.
•
Suppose the block size of a database system is 2000 bytes. Customer
table has an average record length of 190 bytes. Assume the overhead
of a block for the data is 100 bytes.
–
What is the blocking factor?

The Mechanics of Disk
Mechanical characteristics:
•
Rotation speed (5400RPM)
•
Number of platters (1-30)

•
Number of tracks (<=10000)
•
Number of sectors (256/track)
•
Number of bytes / sector (2
9
=512)
•
Block size (2
12
=4096)
Platters
Spindle
Disk head
Arm movement
Arm assembly
Tracks
Sector
Cylinder

Important Disk Access Characteristics
•
Block access time = Disk latency + transfer time
•
Disk latency = seek time + rotational latency
•
Seek time = time for the head to reach the right track
–
10ms – 40ms

•
Rotational latency = rotation time to get to the right sector
–
Time for one rotation = 10ms
–
Average rotation latency = 10ms/2
•
Transfer time = typically 5-10MB/s
•
Disks read/write one block at a time (typically 4kB)

Representing Data Elements
•
Relational database elements:
CREATE TABLE Product (
pid INT PRIMARY KEY,
name CHAR(20),
description VARCHAR(200),
maker CHAR(10) REFERENCES Company(name))
•
A tuple is represented as a record

Record Formats: Fixed Length
•
Information about field types same for all
records in a file; stored in system catalogs.
•
Finding i’th field requires scan of record.
•
Note the importance of schema information!

Base address (B)
L1 L2
L3 L4
F1 F2
F3 F4
Address = B+L1+L2

Record Header
L1 L2
L3 L4
F1 F2
F3 F4
To schema
length
timestamp
Need the header because:
•
The schema may change
for a while new+old may coexist
•
Records from different relations may coexist
header

Variable Length Records
L1 L2
L3 L4
F1 F2
F3 F4
Other header information
length

Place the fixed fields first: F1, F2
Then the variable length fields: F3, F4
Null values take 2 bytes only
Sometimes they take 0 bytes (when at the end)
header

Records With Referencing Fields
L1 L2
L3
F1 F2
F3
Other header information
length
header
E.g. to represent one-many or many-many relationships

Storing Records in Blocks
•
Blocks have fixed size (typically 4k)
R1R2R3
BLOCK
R4

Spanning Records Across Blocks
•
When records are very large
•
Or even medium size: saves space in blocks
block
header

block
header
R1 R2
R2
R3

Chapter 8 Physical Database Design

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về