Tải bản đầy đủ (.pdf) (36 trang)

NORMALIZING AND DENORMALIZING DATA pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (289.46 KB, 36 trang )

2B.1 Relational Database Desi
g
n
L
L
E
E
S
S
S
S
O
O
N
N
:
:
2
2
B
B
N
N
O
O
R
R
M
M
A
A


L
L
I
I
Z
Z
I
I
N
N
G
G
A
A
N
N
D
D
D
D
E
E
N
N
O
O
R
R
M
M

A
A
L
L
I
I
Z
Z
I
I
N
N
G
G
D
D
A
A
T
T
A
A
O
O
b
b
j
j
e
e

c
c
t
t
i
i
v
v
e
e
s
s
In this section, you will learn to:
 Describe the Top-down and Bottom-up approach

Describe data redundancy

Describe the first, second, and third normal forms

Describe the Boyce-Codd Normal Form (BCNF)
 Appreciate the need for denormalization
2B.2
Relational Database Desi
g
n
2B.3 Relational Database Desi
g
n
Normalizing and Denormalizing Data Lesson 2B / Slide 1 of 18©NIIT
Normalizing and Denormalizing Data

Objectives
In this section, you will learn to:

Describe the Top-down and Bottom-up approach

Describe data redundancy

Describe the first, second, and third normal forms

Describe the Boyce-Codd Normal Form (BCNF)

Appreciate the need for denormalization
I
I
N
N
S
S
T
T
R
R
U
U
C
C
T
T
O
O

R
R
N
N
O
O
T
T
E
E
S
S
Lesson Overview
The lesson introduces the top-down and bottom-up approaches of logical database
design. This lesson also explains normalization as a technique to avoid data
redundancy and covers the various normal forms. In addition, this lesson explains
denormalization as a technique for improving query performance.
2B.4
Relational Database Desi
g
n
Normalizing and Denormalizing Data Lesson 2B / Slide 2 of 18©NIIT
Normalizing and Denormalizing Data
Pre-assessment Questions
1. The scenario where a student can do only one project and no other student can do
the same project, the relationship between student and project is a ______
relationship.
a. One-to-One
b. One-to-Many
c. Many-to-One

d. Many-to-Many
2. Which of the following options is true?
a. The primary key of the supertype is the primary key of the subtype.
b. The foreign key of the supertype is the primary key of the subtype.
c. The primary key of the supertype is the foreign key of the subtype.
d. The foreign key of the supertype is the foreign key of the subtype.
2B.5 Relational Database Desi
g
n
Normalizing and Denormalizing Data Lesson 2B / Slide 3 of 18©NIIT
Normalizing and Denormalizing Data
Pre-assessment Questions (Contd )
3. A candidate key that does not become a primary key is called a(n) ______ key.
a. Candidate key
b. Foreign key
c. Alternate key
d. Composite key
4. Which of the following problems arise when a primary key is allowed NULL values?
a. It becomes difficult to identify the rows uniquely.
b. It becomes difficult to identify the columns uniquely.
c. It becomes difficult to join tables.
d. It becomes difficult to identify foreign key.
5. In ______, every higher-level entity must also be a lower-level entity.
a. Generalization
b. E/R diagram
c. Specialization
d. Many-to-Many relationship
2B.6
Relational Database Desi
g

n
Normalizing and Denormalizing Data Lesson 2B / Slide 4 of 18©NIIT
Normalizing and Denormalizing Data
Solutions
Ans1. One-to-One
Ans2
.
The primary key of the supertype is the foreign key of the subtype.
Ans3. Alternate key
Ans4. It becomes difficult to identify the rows uniquely.
Ans5. Generalization
2B.7 Relational Database Desi
g
n
N
N
O
O
R
R
M
M
A
A
L
L
I
I
Z
Z

A
A
T
T
I
I
O
O
N
N
Top-Down and Bottom-Up Approach
Normalizing and Denormalizing Data Lesson 2B / Slide 5 of 18©NIIT
Normalizing and Denormalizing Data
Top-Down and Bottom-Up Approach

There are two approaches to logical database design:

The top-down approach

The bottom-up approach

The E/R modeling technique is the top-down approach. It involves
identifying entities, relationships and attributes, drawing the E/R
diagram, and mapping the diagram to tables.

Normalization is the bottom-up approach. It is a step-by-step
decomposition of complex records into simple records.

Normalization reduces redundancy using the principle of non-loss
decomposition.


Non-loss decomposition is the reduction of a table to smaller tables
without any loss of information.

The bottom-up approach is best for validation of existing designs.
In the previous sessions, we described logical database design using the entity-
relationship diagramming technique. There are two approaches to logical database
design:
The top-down approach
The bottom-up approach
2B.8
Relational Database Desi
g
n
The E/R modeling technique is the top-down approach. It involves identifying entities,
relationships and attributes, drawing the E/R diagram, and mapping the diagram to
tables.
In this session, we will explain Normalization, which is the bottom-up approach.
Normalization is a step-by-step decomposition of complex records into simple records.
Normalization reduces redundancy using the principle of non-loss decomposition. Non-
loss decomposition is the reduction of a table to smaller tables without any loss of
information.
Very often, the process of normalization follows the process of drawing E/R diagrams.
However, depending on how detailed and precise the E/R diagram is, the process of
normalization may not be necessary at all. The tables derived from the E/R diagram
may already be normalized. In fact, they will always be at least in the first normal
form.
Persons strictly following the bottom-up approach do not go through the E/R modeling
process at all. After the collection of data to be stored in the database is complete, the
data is normalized. The top-down approach is best for validation of existing designs.

Data Redundancy
Normalizing and Denormalizing Data Lesson 2B / Slide 6 of 18©NIIT
Normalizing and Denormalizing Data
Data Redundancy

Redundancy means repetition of data.

Redundancy increases the time involved in updating, adding, and
deleting data.

Redundancy also increases the utilization of disk space, and hence,
disk I/O increases.

Redundancy can lead to:

Update anomalies—Inserting, modifying, and deleting data
may cause inconsistencies.

Inconsistencies—Errors are more likely to occur when facts
are repeated.

Unnecessary utilization of extra disk space.
2B.9 Relational Database Desi
g
n
Redundancy means repetition of data. Redundancy increases the time involved in
updating, adding, and deleting data. It also increases the utilization of disk space and
hence, disk I/O increases.
For example, consider the structure of the Student table:
Student

StudentId
StudentName
StudentBirthdate
StudentAddress
StudentCity
StudentZip
StudentClass
StudentSemester
StudentTest1
StudentTest2
The sample data for the Student table would be:
Student
ID
Student
Name
…… StudentSemester
Student
Test1
Student
Test2
001 Mary …… SEM-1 40 65
001 Mary …… SEM-2 56 48
002 Jake …… SEM-1 93 84
002 Jake …… SEM-2 85 90
The details of the students along with the marks are present in one table called
Student. The details of the students like StudentId, StudentName, and
StudentAddress are repeated while recording marks of different semesters. The
repeated data is redundant. In addition, if you need to modify the address of a
student, it has to be modified in multiple rows for that student. If not done, it could
lead to data inconsistency across rows.

2B.10
Relational Database Desi
g
n
If there are one thousand students and the details for each student occupies two
hundred bytes, then two hundred thousand bytes are repeated. Hence, a lot of disk
space is used up unnecessarily.
Redundancy can, therefore, lead to:
Update anomalies—Inserting, modifying, and deleting data may cause
inconsistencies.
Inconsistencies—Errors are more likely to occur when facts are repeated.

Unnecessary utilization of extra disk space.
You can use your experience and common sense to design a database. However, you
can use systematic approaches like normalization to reduce redundancy and duplicity.
Need for Normalization
Normalizing and Denormalizing Data Lesson 2B / Slide 7 of 18©NIIT
Normalizing and Denormalizing Data
Need for Normalization

Normalization is a scientific method of breaking down complex
table structures into simple table structures by using certain rules.

Using normalization, you can reduce redundancy in a table and
eliminate the problems of inconsistency and disk space usage.

You can also ensure that there is no loss of information.

Normalization has several benefits as follows:


It enables faster sorting and index creation.

It helps to create more clustered indexes.

It requires few indexes per table.

It reduces the number of NULL values in a table.

It makes the database compact.
2B.11 Relational Database Desi
g
n
Normalizing and Denormalizing Data Lesson 2B / Slide 8 of 18©NIIT
Normalizing and Denormalizing Data
Need for Normalization (Contd )

The performance of an application is directly linked to the database
design.

Some rules that should be followed to achieve a good database
design are:

Each table should have an identifier.

Each table should store data for a single type of entity.

Columns that accept NULLs should be avoided.

The repetition of values or columns should be avoided.
Normalization is a scientific method of breaking down complex table structures into

simple table structures by using certain rules. Using this method, you can, reduce
redundancy in a table and eliminate the problems of inconsistency and disk space
usage. You can also ensure that there is no loss of information.
Normalization has several benefits. It enables faster sorting and index creation, more
clustered indexes, few indexes per table, few NULLs, and makes the database
compact. Normalization helps to simplify the structure of tables. The performance of
an application is directly linked to the database design. A poor design hinders the
performance of the system. The logical design of the database lays the foundation for
an optimal database.
Some rules that should be followed to achieve a good database design are:
Each table should have an identifier.

Each table should store data for a single type of entity.

Columns that accept
NULL
s should be avoided.

The repetition of values or columns should be avoided.
2B.12
Relational Database Desi
g
n
I
I
N
N
S
S
T

T
R
R
U
U
C
C
T
T
O
O
R
R
N
N
O
O
T
T
E
E
S
S
Normalization
First, ask the students the following question to elicit their understanding about the
approaches to logical database designing:
What are the various database design approaches that you can think of?
You can give the following additional information about non-loss decomposition:
Non-loss decomposition: Non-loss decomposition ensures that a join produces an
exact copy of the original table. A decomposition is not non-loss if a join produces a

superset of the original table and the rows in the table cannot be uniquely identified.
You can give the following additional information about data redundancy:
Data redundancy results in wastage of space and loss of data integrity in the
database. Data redundancy leads to three types of anomalies. These are:
1. Update anomaly: This is data inconsistency resulting from data redundancy
and partial update.
2. Deletion anomaly: This is the unintended loss of data due to deletion of other
data.
3. Insertion anomaly: This is the inability to add data to the database due to
absence of other data.
2B.13 Relational Database Desi
g
n
D
D
I
I
F
F
F
F
E
E
R
R
E
E
N
N
T

T
N
N
O
O
R
R
M
M
A
A
L
L
F
F
O
O
R
R
M
M
S
S
A
A
N
N
D
D
D

D
E
E
N
N
O
O
R
R
M
M
A
A
L
L
I
I
Z
Z
A
A
T
T
I
I
O
O
N
N
Normal Forms

Normalizing and Denormalizing Data Lesson 2B / Slide 9 of 18©NIIT
Normalizing and Denormalizing Data
Normal Forms

Normalization results in the formation of tables that satisfy certain
specified rules and represent certain normal forms.

The normal forms are used to ensure that various types of
anomalies and inconsistencies are not introduced in the database.

A table structure is always in a certain normal form.

The most important and widely used normal forms are:

First Normal Form (1NF)

Second Normal Form (2 NF)

Third Normal Form (3 NF)

Boyce-Codd Normal Form (BCNF)
Normalization results in the formation of tables that satisfy certain specified rules and
represent certain normal forms. The normal forms are used to ensure that various
types of anomalies and inconsistencies are not introduced in the database. A table
structure is always in a certain normal form. Several normal forms have been
identified. The most important and widely used normal forms are:

First Normal Form (1NF)

Second Normal Form (2 NF)

Third Normal Form (3 NF)
2B.14
Relational Database Desi
g
n
Boyce-Codd Normal Form (BCNF)
UNNORMALIZED RELATION
1 NF
2 NF
3 NF
BCNF
The first, second, and third normal forms were originally defined by Dr. E. F. Codd.
Later, Boyce and Codd introduced another normal form called the Boyce-Codd Normal
Form. As displayed in the above figure, a relation that is in first normal form may also
be in second normal form or third normal form.
2B.15 Relational Database Desi
g
n
Functional Dependency
Normalizing and Denormalizing Data Lesson 2B / Slide 10 of 18©NIIT
Normalizing and Denormalizing Data
Functional Dependency

The normalization theory is based on the fundamental notion of
functional dependency.

In a relation R, attribute A is functionally dependent on attribute B
if each value of A in R is associated with precisely one value of B.

Attribute B is called the determinant.


All attributes of a table must be functionally dependent on the key.
However, functional dependency does not require an attribute to
be the key in order to functionally determine other attributes.

Functional dependency can also be defined as follows:
Given a relation R, attribute A is functionally dependent on B only
if whenever two tuples of R agree on their B value, they must
agree on their A value.

Functional dependencies represent many-to-one relationships.
The normalization theory is based on the fundamental notion of functional
dependency. First, let’s examine the concept of functional dependency.
Given a relation (you may recall that a table is also called a relation) R, attribute A is
functionally dependent on attribute B if each value of A in R is associated with
precisely one value of B.
In other words, attribute A is functionally dependent on B if and only if, for each value
of B, there is exactly one value of A.
Attribute B is called the determinant.
2B.16
Relational Database Desi
g
n
Consider the following table Employee:
Employee
Code Name City
E1 Mac New York
E2 Sandra CA
E3 Henry Paris
Given a particular value of Code, there is precisely one corresponding value for

Name. For example, for Code E1 there is exactly one value of Name, Mac. Hence,
Name is functionally dependent on Code. Similarly, there is exactly one value of City
for each value of Code. Hence, the attribute City is functionally dependent on the
attribute Code. The attribute Code is the determinant. You can also say that Code
determines City and Name.
Now that we know something about functional dependencies, let us redefine the
concept of keys in terms of functional dependencies. In the above example of the
entity EMPLOYEE, the attribute code will be unique in every tuple. Hence, it is a
candidate key. All attributes must be functionally dependent on the key.
However, functional dependency does not require an attribute to be the key in order
to functionally determine other attributes. The following example explains this.
Suppose you need to store information about scores of students for distance education
programs. The attributes that you need to store are:
ID : The identity codes of the students to whom the scorecard is sent
CITY : The city to which the scorecard is sent
C_CODE : The course code for which the scorecard is sent
SCORE : The total score of the student
Note that the city to which the scorecard is sent is also the city where the student is
located. Therefore, ID functionally determines CITY. But ID is not a candidate key. The
candidate key in this case, will be a combination of ID and C_CODE. Attributes ID and
C_CODE are the foreign keys that reference the tables that store the customer and
product information respectively. Therefore, even though ID is not a candidate key, it
still functionally determines another attribute (CITY).
Functional dependency can also be defined as follows:
Given a relation R, attribute A is functionally dependent on B only if whenever two
tuples of R agree on their B value, they must agree on their A value.
2B.17 Relational Database Desi
g
n
Assume that the information about scores was stored in a table named SCORES_INFO

as shown in the following figure. Notice that for a particular value of ID, the value of
CITY is the same in every tuple.
SCORES_INFO
ID CITY C_CODE SCORE
AD0036 London C1 90
AD0078 New York C1 88
CC0075 New York C2 93
CC0097 Florida C1 75
AD0036 London C2 87
CC0075 New York C1 66
Recognizing functional dependencies is a part of the process of understanding what
the data means. For instance, in the relation SCORES_INFO, the fact that ID
functionally determines CITY means that each student is located precisely in one city.
This means that:

There is a constraint in the real world that the database models- each student is
located in only one city.
This constraint must be observed in the database.
Also, note that functional dependencies represent many-to-one relationships. This
functional dependency (ID determines CITY) also means that there are many students
located in a city, but one student is located in only one city.
We will soon see that the concepts of normalization lead to very simple means of
declaring such functional dependencies.
2B.18
Relational Database Desi
g
n
First Normal Form (1 NF)
Normalizing and Denormalizing Data Lesson 2B / Slide 11 of 18©NIIT
Normalizing and Denormalizing Data

First Normal Form (1 NF)

A table is said to be in the 1 NF when each cell of the table
contains precisely one value.
A table is said to be in the 1 NF when each cell of the table contains precisely one
value.
2B.19 Relational Database Desi
g
n
Consider the following table Project.
Project
Ecode Dept DeptHead ProjCode Hours
E101 Systems E901 P27
P51
P20
90
101
60
E305 Sales E906 P27
P22
109
98
E508 Admin E908 P51
P27
NULL
72
The data in the table is not normalized because the cells in ProjCode and Hours have
more than one value.
By applying the 1NF definition to the Project table, you arrive at the following table:
Project

Ecode Dept DeptHead ProjCode Hours
E101 Systems E901 P27 90
E101 S
y
stems E901 P51 101
E101 S
y
stems E901 P20 60
E305 Sales E906 P27 109
E305 Sales E906 P22 98
E508 Admin E908 P51 NULL
The relational model does not permit tables that are unnormalized.
Therefore, the tables obtained from the E/R diagram should at least be
in 1 NF.
2B.20
Relational Database Desi
g
n
Second Normal Form (2 NF)
Normalizing and Denormalizing Data Lesson 2B / Slide 12 of 18©NIIT
Normalizing and Denormalizing Data
Second Normal Form (2 NF)

A table is said to be in 2 NF when it is in 1 NF and every attribute
in the row is functionally dependent upon the whole key, and not
just part of the key.

Guidelines for converting a table to 2 NF:

Find and remove attributes that are functionally dependent on

only a part of the key and not on the whole key. Place them in
a different table.

Group the remaining attributes.
A table is said to be in 2 NF when it is in 1 NF and every attribute in the row is
functionally dependent upon the whole key, and not just part of the key.
Consider the Project table:
Project
ECode
ProjCode
Dept
DeptHead
Hours
2B.21 Relational Database Desi
g
n
The table has the following rows:
Ecode ProjCode Dept DeptHead Hours
E101 P27 Systems E901 90
E305 P27 Finance E909 10
E508 P51 Admin E908 NULL
E101 P51 Systems E901 101
E101 P20 Systems E901 60
E508 P27 Admin E908 72
This situation could lead to the following problems:
Insertion
The department of a particular employee cannot be recorded until the employee
is assigned a project.
Updation
For a given employee, the employee code, department name, and department

head are repeated several times. Hence, if an employee is transferred to
another department, this change will have to be recorded in every row of the
Employee table pertaining to that employee. Any omission will lead to
inconsistencies.
Deletion
When an employee completes work on a project, the employee’s record is
deleted. The information regarding the department to which the employee
belongs will also be lost.
The primary key here is composite (ECode + ProjCode).
The table satisfies the definition of 1NF. You need to now check if it satisfies 2NF.
In the table, for each value of ECode, there is more than one value of Hours. For
example, for ECode, E101, there are three values of Hours - 90, 101 and 60. Hence,
Hours is not functionally dependent on ECode. Similarly, for each value of ProjCode,
there is more than one value of Hours. For example, for ProjCode P27, there are
three values of Hours - 90, 10 and 72. However, for a combination of the ECode and
ProjCode values, there is exactly one value of Hours. Hence, Hours is functionally
dependent on the whole key, ECode + ProjCode.
2B.22
Relational Database Desi
g
n
Now, you must check if Dept is functionally dependent on the whole key,
ECode+ProjCode. For each value of ECode, there is exactly one value of Dept. For
example, for ECode 101, there is exactly one value, the System department. Hence,
Dept is functionally dependent on ECode. However, for each value of ProjCode,
there is more than one value of Dept. For example, ProjCode P27 is associated with
two values of Dept, System and Finance. Hence, Dept is not functionally dependent
on ProjCode. Dept is, therefore, functionally dependent on part of the key (which is
ECode) and not functionally dependent on the whole key (ECode+ProjCode). Similar
dependency is true for the DeptHead attribute. Therefore, the table Project is not in

2NF. For the table to be in 2NF, the non-key attributes must be fully functionally
dependent on the whole key and not part of the key.
Guidelines for Converting a Table to 2NF

Find and remove attributes that are functionally dependent on only a part of the
key and not on the whole key. Place them in a different table.
Group the remaining attributes.
To convert the table Project into 2NF, you must remove the attributes that are not
fully functionally dependent on the whole key and place them in a different table along
with the attribute that it is functionally dependent on. In the above example, since
Dept is not fully functionally dependent on the whole key ECode+ProjCode, you
place Dept along with ECode in a separate table called EmployeeDept. We also
move the DeptHead to the EmployeeDept table.
Now, the table Project will contain ECode, ProjCode, and Hours.
EmployeeDept Project
ECode De
pt
De
p
tHead Ecode Pro
j
Code Hours
E101 S
y
stems E901 E101 P27 90
E305 Finance E909 E101 P51 101
E508 Admin E908 E101 P20 60
E305 P27 10
E508 P51 NULL
E508 P27 72

2B.23 Relational Database Desi
g
n
Third Normal Form (3 NF)
Normalizing and Denormalizing Data Lesson 2B / Slide 13 of 18©NIIT
Normalizing and Denormalizing Data
Third Normal Form (3 NF)

A table is said to be in the 3 NF when it is in 2 NF and every non-
key attribute is functionally dependent only on the primary key.

Guidelines for converting a table to 3 NF:

Find and remove non-key attributes that are functionally
dependent on attributes that are not the primary key. Place
them in a different table.

Group the remaining attributes.
A relation is said to be in 3 NF when it is in 2 NF and every non-key attribute is
functionally dependent only on the primary key.
Consider the table Employee.
Employee
ECode Dept DeptHead
E101 Systems E901
E305 Finance E909
E402 Sales E906
E508 Admin E908
E607 Finance E909
2B.24
Relational Database Desi

g
n
ECode Dept DeptHead
E608 Finance E909
The problems with dependencies of this kind are:
Insertion
The department head of a new department that does not have any employees at
present cannot be entered in the DeptHead column. This is because the
primary key is unknown.
Updation
For a given department, the code for a particular department head (DeptHead)
is repeated several times. Hence, if a department head moves to another
department, the change will have to be made consistently across the table.
Deletion
If the record of an employee is deleted, the information regarding the head of
the department will also be deleted. Hence, there will be a loss of information.
You must check if the table is in 3NF. Since each cell in the table has a single value,
the table is in 1NF.
The primary key in the Employee table is ECode. For each value of ECode, there is
exactly one value of Dept. Hence, the attribute Dept is functionally dependent on the
primary key, ECode. Similarly, for each value of ECode, there is exactly one value of
DeptHead. Therefore, DeptHead is functionally dependent on the primary key
ECode. Hence, all the attributes are functionally dependent on the whole key, ECode.
Hence the table is in 2NF.
However, the attribute DeptHead is dependent on the attribute Dept also. As per
3NF, all non-key attributes have to be functionally dependent only on the primary key.
This table is not in 3NF since DeptHead is functionally dependent on Dept, which is
not a primary key.
Guidelines for Converting a Table to 3NF


Find and remove non-key attributes that are functionally dependent on
attributes that are not the primary key. Place them in a different table.
Group the remaining attributes.
To convert the table Employee into 3NF, you must remove the column DeptHead
since it is not functionally dependent on only the primary key ECode, and place it in
2B.25 Relational Database Desi
g
n
another table called Department along with the attribute Dept that it is functionally
dependent on.
Employee Department
ECode Dept Dept DeptHead
E101 Systems Systems E901
E305 Finance Sales E906
E402 Sales Admin E908
E508 Admin Finance E909
E607 Finance
E608 Finance

×