Tải bản đầy đủ (.ppt) (50 trang)

Chapter 8 Physical Database Design

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (203.13 KB, 50 trang )


Chapter 08: Physical Database Design

Database Design Process
Conceptual
Model
Logical
Model
External
Model
Conceptual
requirements
Conceptual
requirements
Conceptual
requirements
Conceptual
requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Internal


Model
Physical
Design

Physical Database Design

Many physical database design decisions are implicit
in the technology adopted

Also, organizations may have standards or an
“information architecture” that specifies operating
systems, DBMS, and data access languages thus
constraining the range of possible physical
implementations.

We will be concerned with some of the possible
physical implementation issues

Physical Database Design

The primary goal of physical database design is data processing
efficiency

We will concentrate on choices often available to optimize
performance of database services

Physical Database Design requires information gathered during earlier
stages of the design process

Physical Design Information


Information needed for physical file and database
design includes:

Normalized relations plus size estimates for them

Definitions of each attribute

Descriptions of where and when data are used

entered, retrieved, deleted, updated, and how often

Expectations and requirements for response time, and
data security, backup, recovery, retention and integrity

Descriptions of the technologies used to implement the
database

Physical Design Decisions

There are several critical decisions that will affect the integrity and
performance of the system

Storage Format

Physical record composition

Data arrangement

Indexes


Query optimization and performance tuning

Storage Format

Choosing the storage format of each field (attribute). The DBMS
provides some set of data types that can be used for the physical
storage of fields in the database

Data Type (format) is chosen to minimize storage space and maximize
data integrity

Objectives of data type selection

Minimize storage space

Represent all possible values

Improve data integrity

Support all data manipulations

The correct data type should, in minimal space,
represent every possible value (but eliminate
illegal values) for the associated attribute and can
support the required data manipulations (e.g.
numerical or string operations)

Access Data Types


Numeric (1, 2, 4, 8 bytes, fixed or float)

Text (255 max)

Memo (64000 max)

Date/Time (8 bytes)

Currency (8 bytes, 15 digits + 4 digits decimal)

Autonumber (4 bytes)

Yes/No (1 bit)

OLE (limited only by disk space)

Hyperlinks (up to 64000 chars)

Access Numeric types

Byte

Stores numbers from 0 to 255 (no fractions). 1 byte

Integer

Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes

Long Integer (Default)


Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions). 4
bytes

Single

Stores numbers from -3.402823E38 to –1.401298E–45 for negative
values and from 1.401298E–45 to 3.402823E38 for positive values.
4 bytes

Double

Stores numbers from –1.79769313486231E308 to –
4.94065645841247E–324 for negative values and from
1.79769313486231E308 to 4.94065645841247E–324 for positive values.
15 8 bytes

Replication ID

Globally unique identifier (GUID) N/A 16 bytes

Designing Physical Records

A physical record is a group of fields stored in adjacent memory
locations and retrieved together as a unit

Fixed Length and variable fields

Data Storage

Storing Data: Disks


Buffer manager

Representing relational data in a disk

The Memory Hierarchy
Main Memory = Disk Cache

Volatile

256M-1G

Access time:
10-100 nanoseconds

Persistent

10-100 GB storage

speed:

Rate=5-10 MB/S

Access time=
10-15 msecs.

1.5 MB/S transfer rate

280 GB typical
capacity


Only sequential access

Not for operational
data
Processor Cache:

access time 10 nano’s

512K
Disk
Tape

Main Memory

Fastest, most expensive (excluding cache)

Today: 512MB are common even on PCs

Many databases could fit in memory

New industry trend: Main Memory Database

E.g TimesTen

Main issue is volatility

Secondary Storage

Disks


Slower, cheaper than main memory

Persistent !!!

The unit of disk I/O = bloc k

Typically 1 block = 4k

A disk block is also called a disk page or simply a page

Used with a main memory buffer

Block

Blocking factor (bfr) for a file is the average number of records stored
in a disk block.

Suppose the block size of a database system is 2000 bytes. Customer
table has an average record length of 190 bytes. Assume the overhead
of a block for the data is 100 bytes.

What is the blocking factor?

The Mechanics of Disk
Mechanical characteristics:

Rotation speed (5400RPM)

Number of platters (1-30)


Number of tracks (<=10000)

Number of sectors (256/track)

Number of bytes / sector (2
9
=512)

Block size (2
12
=4096)
Platters
Spindle
Disk head
Arm movement
Arm assembly
Tracks
Sector
Cylinder

Important Disk Access Characteristics

Block access time = Disk latency + transfer time

Disk latency = seek time + rotational latency

Seek time = time for the head to reach the right track

10ms – 40ms


Rotational latency = rotation time to get to the right sector

Time for one rotation = 10ms

Average rotation latency = 10ms/2

Transfer time = typically 5-10MB/s

Disks read/write one block at a time (typically 4kB)

Representing Data Elements

Relational database elements:
CREATE TABLE Product (
pid INT PRIMARY KEY,
name CHAR(20),
description VARCHAR(200),
maker CHAR(10) REFERENCES Company(name))

A tuple is represented as a record

Record Formats: Fixed Length

Information about field types same for all
records in a file; stored in system catalogs.

Finding i’th field requires scan of record.

Note the importance of schema information!

Base address (B)
L1 L2
L3 L4
F1 F2
F3 F4
Address = B+L1+L2

Record Header
L1 L2
L3 L4
F1 F2
F3 F4
To schema
length
timestamp
Need the header because:

The schema may change
for a while new+old may coexist

Records from different relations may coexist
header

Variable Length Records
L1 L2
L3 L4
F1 F2
F3 F4
Other header information
length

Place the fixed fields first: F1, F2
Then the variable length fields: F3, F4
Null values take 2 bytes only
Sometimes they take 0 bytes (when at the end)
header

Records With Referencing Fields
L1 L2
L3
F1 F2
F3
Other header information
length
header
E.g. to represent one-many or many-many relationships

Storing Records in Blocks

Blocks have fixed size (typically 4k)
R1R2R3
BLOCK
R4

Spanning Records Across Blocks

When records are very large

Or even medium size: saves space in blocks
block
header

block
header
R1 R2
R2
R3

×