Tải bản đầy đủ (.pdf) (4 trang)

cơ sở dữ liệu lê thị bảo thu examination sinhvienzone com (1)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (703.38 KB, 4 trang )

Overview

Secondary key(unique value) Number of index entries =
Number of record. Dense
Non-key: 1.duplicate index entries with the same K(i) value
2. keep a list of pointers < P (i, 1), ..., P (i, k) > in the index entry for K(i). 3. one entry for each distinct index field value + an
extra level of indirection to handle the multiple pointers
Ordered file: Primary, Clustering. Indexing field is Key
Primary, Secondary. Indexing field is not Key Clustering,
Secondary. Dense Secondary. Nondense all
• Multi-Level insertion and deletion of new index entries
has problem because ordered file → use tree.
B-tree: (p ∗ P ) + ((p − 1) ∗ (Pt + V )) ≤ B
B + -tree: internal node pleaf /2 <= q(pointer) <= p,
q-1 search values . leaf node: pleaf /2 <= q(data pointer) <=
pl eaf
Internode: (p ∗ P ) + ((p − 1) ∗ V ) ≤ B
Leaf node: (pleaf ∗ (Pt + V )) + P ≤ B
Insert: Leaf full: j = ((pleaf + 1)/2) remain. Parent full:
j - 1 remained (j = ((p + 1)/2)) )

Disk - File Structures - Hash
Open addressing: the program checks the subsequent positions in order until an unused (empty) position is found. Chaining:Multiple hashing The program applies a second hash function if the first results in a collision
Dynamic and extendible hashing do not require an overflow
area. Dynamic and extendible hashing do not require an overflow
area. Blocks are split in linear orderas the file expands
• Formular: bf r = B/R . b = r/bf r . Level tree t fr1t ≥ 1
0

(r1 number record, f0 bft of level 1).
Disk rd = (1/2) * (1/p) ; btt = B/tr msec; btr = (B/(B


+ G)) * tr bytes/msec; Trw = 2 * rd msec = 60000/p msec
If we have a track size of 50 Kbytes and p is 3600 rpm,
then the transfer rate in bytes/msec is tr = (50 * 1000)/(60 *
1000/3600) = 3000 bytes/msec
The average time (s) needed to find and transfer a block,
given its block address, is estimated by (s + rd + btt) msec
Seek time: s msec. Rotational delay: rd msec. Block transfer time: btt msec. Rewrite time: Trw msec. Transfer rate: tr
bytes/msec. Bulk transfer rate: btr bytes/msec. Block size: B
bytes. Interblock gap size: G bytes. Disk speed: p rpm (revolutions per minute)

Query Processing and Optimization
• External sorting: Sorting algorithms for large files not fit
entirely in main memory
• Sort-Merge strategy: starts by sorting small subfiles
(runs) then merges the sorted runs
Sorting phase: nR = (b/nB ) (Read 3 blocks of the file
→ sort → run: 3 blocks)
Merging phase dM = M in(nB − 1, nR ); nP =
(logdM (nR )) (Each step: 1. Read 1 block from (nB -1) runs
to buffer. 2. Merge → temp block. 3. If temp block full: write to
file. 4. If any empty block: Read next block from corresponding
run.)
nR : number of initial runs; b: number of file blocks; nB :
available buffer space; dM : degree of merging; nP : number of
passes
• Implementing the SELECT Operation

Index
• Primary: Order data file on key field.. One index for each
lock. Number of index entries = number of block. Nondense

• Clustering: Order on non-key field. One index entry →
each distinct value of the field. (index entry → first data block
contains records with that field value). Number of index entries
= Number of distinct indexing field values. Nondense
At most one primary index or one clustering index but not
both
• Secondary.The index is an ordered file with two fields
(indexing field + block pointer or record pointer). Can be many
secondary indexes

1
CuuDuongThanCong.com

/>

Using a secondary (B+-tree) index.: On an equality
comparison retrieve a single record if unique values (is a key) or
multiple records if the indexing field is not a key (>, ≥, <, or
≤)
Conjunctive (AND) selection If an attribute involved
in any single simple conditionin the conjunctive condition has an
access path that permits the use of one of the methods S2 to S6,
use that condition to retrieve the records and then check whether
each retrieved record satisfies the remaining simple conditions in
the conjunctive condition.
Conjunctive selection using a composite index: If
two or more attributes are involved in equality conditions in the
conjunctive condition and a composite index (or hash structure)
exists on the combined field, we can use the index directly
Algorithm for SET operations: UNION, INTERSECTION (must be sorted)

Step in converting a query during heuristic optimization 1. Initial (canonical) query tree. 2. Moving SELECT operations down. 3. Apply more restrictive SELECT operation first.
4. Replacing Cartesian Product and Select with Join operation.
5. Moving Project operations down the query tree.
• Cost estimate
Information about the size of a file number of records
(tuples) (r), record size (R), number of blocks (b) blocking factor
(bfr)
Information about indexes and indexing attributes
of a file Number of levels (x) of each multilevel index Number
of first-level index blocks (b I1 ) Number of distinct values (d) of
an attribute. Selectivity (sl) of an attribute .Selection cardinality
(s) of an attribute. (s = sl * r)
Linear search CS1a = b For an equality condition on key:
if the record is found CS1b = (b/2) else CS1a = b.
Binary search CS2 = log2 b + (s/bf r) − 1 . For equality
condition on a unique (key) attribute CS2 = log2 b
Using a primary index (S3a) or hash key (S3b) to
retrieve a single record CS3a = x + 1; CS3b = 1 for static or
linear hashing; CS3b = 2 for extendible hashing;
Using an ordering index to retrieve multiple
records For the comparison condition on a key field with an
ordering index, CS4 = x + (b/2)
Using a clustering index to retrieve multiple
records for an equality condition: CS5 = x + (s/bf r)
Using a secondary (B+-tree) index: equality comparison, CS6a = x + s (option 1 & 2); CS6a = x + s + 1
(option 3); comparison condition such as >, <, >=, <= CS6b =
x + (bI1 /2) + (r/2)
Conjunctive selection using a composite index:
Same as S3a, S5 or S6a, depending on the type of index.
• Cost Functions for JOIN: |(R C S)| = js ∗ |R| ∗ |S|

Nested-loop join CJ1 = bR + (bR ∗ bS ) + ((js ∗ |R| ∗
|S|)/bf rRS ) (Use Rfor outer loop)
Single-loop join For a secondary index: CJ2a = bR +
(|R| ∗ (xB + sB )) + ((js ∗ |R| ∗ |S|)/bf rRS );
For a clustering index : CJ2a = bR + (|R| ∗ (xB +
(sB /bf rB ))) + ((js ∗ |R| ∗ |S|)/bf rRS );
For a primary index: CJ2a = bR + (|R| ∗ (xB + 1)) + ((js ∗
|R| ∗ |S|)/bf rRS );
If a hash key exists for one of the two join attributes —B of
S: CJ2a = bR + (|R| ∗ h) + ((js ∗ |R| ∗ |S|)/bf rRS ); (h: the average
number of block accesses to retrieve a record, given its hash key
value, h>=1)
Sort-merge join: CJ3a = CS + bR + bS + ((js ∗ |R| ∗
|S|)/bf rRS ); (CS: Cost for sorting files)

Transaction
logical unit of database processing (an atomic unit of work )
that includes one or more access operations (read -retrieval, write
-insert or update, delete).
• Why Concurrency Control is needed
The Lost Update Problem two transactions that access
the same database items have their operations interleaved in a
way that makes the value of some database item incorrect
The Temporary Update (or Dirty Read) Problem
one transaction updates a database item and then the transaction fails for some reason
The Incorrect Summary Problem one transaction is
calculating an aggregate summary function on a number of
records while other transactions are updating some of these
records
The unrepeatable Read Problem Transaction Treads

the same item twice and the item is changed by another transaction T’ between the two reads
• Why recovery is needed
A computer failure (system crash) A hardware or
software error occurs in the computer system during transaction
execution
A transaction or system error Some operation in the
transaction may cause it to fail, such as integer overflow or division by zero
Local errors or exception conditions certain conditions
necessitate cancellation of the transaction. a programmed abort
in the transaction causes it to fail
Concurrency control enforcement The concurrency
control method may decide to abort the transaction, to be
restarted later, because it violates serializability or because several transactions are in a state of deadlock
Disk failure Some disk blocks may lose their data because
of a read or write malfunction or because of a disk read/write
head crash
Physical problems and catastrophes This refers to an
endless list of problems that includes power or air-conditioning
failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the operator
•Types
of
log
record:
[start_transaction,T],
[write_item,T,X,old_value,new_value], [read_item,T,X], [commit,T], [abort,T].
• Commit Point of a Transaction: all its operations that
access the database have been executed successfully and the
effect of all the transaction operations on the database has been
recorded in the log
• Roll Back of transactions Needed for transactions that

have a [start_transaction,T] entry into the log but no commit
entry [commit,T] into the log
• ACID properties
Atomicity: either performed in its entirety or not performed at al (Rocovery)
Consistency preservation take the database from one
consistent state to another (programmer, concurrent control, re-

2
CuuDuongThanCong.com

/>

covery, toàn vẹn)
Isolation: the execution of a transaction should not be interfered with by any other transaction executing concurrently
(concurrent control)
Durability or permanency changes must never be lost
because of subsequent failure (Recovery)
• Recoverable schedule no transaction T in S commits
until all transactions T’ that have written an item that T reads
have committed
• Cascadeless schedule One where every transaction reads
only the items that are written by committed transactions
• Strict Schedules a transaction can neither read or write
an item X until the last transaction that wrote X has committed

strict two phase locking scheme conflicting transactions may get
deadlocked
Intention-shared (IS): indicates that a shared lock(s)
will be requested on some descendent nodes(s). Intentionexclusive (IX): indicates that an exclusive lock(s) will be
requested on some descendent node(s). Shared-intentionexclusive (SIX): indicates that the current node is locked in

shared mode but an exclusive lock(s) will be requested on some
descendent nodes(s).
A node N can be locked by a transaction T in S or IX mode
only if the parent node is already locked by T in either IS or IX
mode. A node N can be locked by T in X, IX, or SIX mode
only if the parent of N is already locked by T in either IX or SIX
mode

Concurrency Control Techniques
• Two-Phase Locking Techniques
Lock and Unlock are Atomic operations.
A transaction is well-formed if It must lock the data item
before it reads or writes to it and It must not lock an already
locked data items and it must not try to unlock a free data item
For a transaction these two phases must be mutually exclusively ,that is ,during locking phase unlocking phase must not
start and during unlocking phase locking phase must not begin
Conservative: Prevents deadlock by locking all desired
data items before transaction begins execution
Basic: Transaction locks data items incrementally. This
may cause deadlock which is dealt with
Strict: A transaction T does not release any of its exclusive (write) locks until after it commits or aborts.
Rigorous: A Transaction T does not release any of its
locks (Exclusive or shared) until after it commits or aborts
• Dealing with Deadlock and Starvation
Deadlock prevention A transaction locks all data items
it refers to before it begins execution
Deadlock detection and resolution The scheduler
maintains a wait-for-graph for detecting cycle
Deadlock avoidance Wound-Wait (younger is allowed
wait older) and Wait-Die (older is allowed to wait younger) algorithms use times tamps to avoid deadlocks by rolling-back victim. Wound-Wait and wait-die scheme can avoid starvation

• Timestamp Ordering
write_item(X) If TS(T) > write_TS(X), then delay T
until the transaction T’ that wrote X has terminated (committed
or aborted)
read_item(X) If TS(T) > write_TS(X), then delay T
until the transaction T’ that wrote X has terminated (committed
or aborted)
Ensures the schedules are both strict and conflict serializable
• Thomas’s Write Rule If read_TS(X) > TS(T) then
abort and roll-back T and reject the operation. If write_TS(X)
> TS(T), then just ignore the write operation and continue execution because it is already outdated and obsolete.
•Multiversion Concurrency Control Techniques: A
read operation in this mechanism is never rejected. more storage (RAM and disk) is required. a garbage collection is run.
A new version of Xi is created only by a write operation
• MultiversionTwo-Phase Locking Using Certify
Locks
Allow a transaction T’ to read a data item X while it is
write locked by a conflicting transaction T
This is accomplished by maintaining two versions of each
data item X: 1. One version must always have been written by
some committed transaction. This means a write operation always creates a new version of X. 2. The second version created
when a transaction acquires a write lock an the item
read and write operations from conflicting transactions can
be processed concurrently
may delay transaction commit because of obtaining certify locks on all its writes. It avoids cascading abort but like

Database Recovery Techniques
The flushing is controlled by Modified and Pin-Unpin bits
• Data Update
Immediate Update As soon as a data item is modified

in cache, the disk copy is updated
Deferred Update All modified data items in the cache
is written either after a transaction ends its execution or after a
fixed number of transactions have completed their execution
Shadow update The modified version of a data item
does not overwrite its disk copy but is written at a separate disk
location
In-place update The disk version of the data item is
overwritten by the cache version
• Steal/No-Steal and Force/No-Force
Steal: Cache can be flushed before transaction commits.
No-Steal Cache cannot be flushed before transaction commit
Force: Cache is immediately flushed (forced) to disk before the transaction commit
No-Force Otherwise.
Steal/No-Force (Undo/Redo). Steal/Force (Undo/Noredo). No-Steal/No-Force (Redo/No-undo). No-Steal/Force (Noundo/No-redo)
• Deferred Update (No Undo/Redo) A set of transactions records their updates in the log. At commit point under
WAL scheme these updates are saved on database disk. After
reboot from a failure the log is used to redo all the transactions

3
CuuDuongThanCong.com

/>

affected by this failure. No undo is required because no AFIM is
flushed to the disk before a transaction commits.
This environment requires some concurrency control mechanism to guarantee isolationproperty of transactions. In a system recovery transactions which were recorded in the log after
the last checkpoint were redone. The recovery manager may scan
some of the transactions recorded before the checkpoint to get
the AFIMs.

Active table: All active transactions are entered in this table. Commit table: Transactions to be committed are entered in
this table. During recovery, all transactions of the committable
are redone and all transactions of activetables are ignored since
none of their AFIMs reached the database
• Immediate Update
Undo/Redo Algorithm(Single-user environment)
Undo of a transaction if it is in the active table. Redo of a transaction if it is in the committable
Undo/Redo Algorithm(Concurrent execution) To
minimize the work of the recovery manager check pointing is
used.

ARIES
Recovery
Algorithm
Steal/no-force
(UNDO/REDO)
consists of three steps: 1. Analysis: step identifies the dirty
(updated) pages in the buffer and the set of transactions active
at the time of crash. The appropriate point in the log where redo

is to start is also determined. 2. Redo: necessary redo operations
are applied. 3. Undo: log is scanned backwards and the operations of transactions active at the time of crash are undone in
reverse order. A log record is written for: data update ,
transaction commit, transaction abort, undo (In the case of undo
a compensating log record is written)
• The following steps are performed for recovery
Analysis phase :Start at the begin_checkpoint record
and proceed to the end_checkpoint record end_checkpoint
record . Access transaction table and dirty page table are appended to the end of the log. Modify transaction table and dirty
page table: An end log record was encountered for T → delete

entry T from transaction table. Some other type of log record is
encountered for T’ → insert an entry T’ into transaction table
if not already. The log record corresponds to a change for page
P → insert an entry P (if not present) with the associated LSN
in dirty page table present, or the last LSN is modified. Redo
phase: Starts redoing at a point in the log where it knows that
previous changes to dirty pages have already been applied to
disk. Finding the smallest LSN, M of all the dirty pages in the
Dirty Page Table. A change recorded in the log pertains to the
page P that is not in the Dirty Page Table → no redo. A change
recorded in the log (LSN = N) pertain to Page P and the Dirty
Page Table contains an entry for P with LSN > N → no redo.
Page P is read from disk and the LSN stored on that page > N
→ no redo

4
CuuDuongThanCong.com

/>


×