Tải bản đầy đủ (.pdf) (10 trang)

Hướng dẫn học Microsoft SQL Server 2008 part 11 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (621.62 KB, 10 trang )

Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 62
Part I Laying the Foundation
To use the standard organization chart as an example, each tuple in the employee entity represents
one employee. Each employee reports to a supervisor who is also listed in the
employee entity. The
ReportsToID foreign key points to the supervisor’s primary key.
Because
EmployeeID is a primary key and ReportsToID is a foreign key, the relationship cardinal-
ity is one-to-many, as shown in Figure 3-12. One manager may have several direct reports, but each
employee may have only one manager.
FIGURE 3-12
The reflexive, or recursive, relationship is a one-to-many relationship between two tuples of the same
entity. This shows the organization c hart for members of the Adventure Works IT department.
Primary Key: ContactID Foreign Key: ReportsToID
Contact
Ken Sánchez <NULL>
Jean Trenary Ken Sánchez
Stephanie Conroy Jean Trenary
François Ajenstat Jean Trenary
Dan Wilson Jean Trenary
A bill of materials is a more complex form of the recursive pattern because a part may be built from sev-
eral source parts, and the part may be used to build several parts in the next step of the manufacturing
process, as illustrated in Figure 3-13.
FIGURE 3-13
The conceptual diagram of a many-to-many recursive relationship shows multiple cardinality at each
end of the relationship.
Part
An associative entity is required to resolve the many-to-many relationship between the component parts
being used and the part being assembled. In the
MaterialSpecification sample database, the BoM
62


www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 63
Relational Database Design 3
(bill of materials) associative entity has two foreign keys that both point to the Part entity, as shown
in Figure 3-14. The first foreign key points to the part being built. The second foreign key points to the
source parts.
FIGURE 3-14
The physical implementation of the many-to-many re flexive relationship must include a associative
entity to resolve the many-to-many relationship, just like the many-to-many two-entity relationship.
Part
Widget Super Widget
BoM
Part APart B Part C
Primary Key:ContactID
Widget
Thing1 Bolt
ForeignKey:AssemblyID Foreign Key: ComponentID
Widget Part A
Part B
Super Widget
Part A
Widget
Part A Thing 1
Part A Bolt
Part B
Thing 1
Super Widget Part A
SuperWidget
Part C
Part C

In the sample data, Part A is constructed from two parts (a Thing1 and a bolt) and is used in the assem-
bly of two parts (Widget and SuperWidget).
The first foreign key points to the material being built. The second foreign key points to the source
material.
Entity-Value Pairs Pattern
E
very couple of months, I hear about data modelers working with the
entity-value pairs pattern
,alsoknown
as the
entity-attribute-value (EAV) pattern
, sometimes called the
generic pattern
or
property bag
/
property
table pattern
, illustrated in Figure 3-15. In the
SQL Server 2000 Bible
, I called it the ‘‘dynamic/relational
pattern.’’
continued
63
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 64
Part I Laying the Foundation
continued
FIGURE 3-15
The entity-values pairs pattern is a simple design with only four tables: class/type, attribute/column,

object/item, and value. The value table stores every value for every attribute for every item — one long list.
Class
Category
Object
Item
Attribute
Property
Value
This design can be popular when applications require dynamic attributes. Sometimes it’s used as an OO
DBMS physical design within a RDBMS product. It’s also gaining popularity with cloud databases.
At first blush, the entity-value pairs pattern is attractive, novel, and appealing. It offers unlimited logical
design alterations without any physical schema changes — the ultimate flexible extensible design.
But there are problems. Many problems . . .
■ The entity-value pairs pattern lacks data integrity — specifically, data typing. The data
type is the most basic data constraint. The basic entity-value pairs pattern stores every
value in a single nvarchar or sql_variant column and ignores data typing. One
option that I wouldn’t recommend is to create a value table for each data type. While
this adds data typing, it certainly complicates the code.
■ It’s difficult to query the entity-value pairs pattern. I’ve seen two solutions. The most
common method is hard-coding .NET code to extract and normalize the data. Another
option is to code-gen a table-valued UDF or crosstab view for each class/type to
extract the data and return a normalized data set. This has the advantage of being
usable in normal SQL queries, but performance and inserts/updates remain difficult.
Either solution defeats the dynamic goal of the pattern.
■ Perhaps the greatest complaint against the entity-value pairs pattern is that it’s nearly
impossible to enforce referential integrity.
Can the value-pairs pattern be an efficient, practical solution? I doubt it. I continue to hear of projects using
this pattern that initially look promising and then fail under the weight of querying once it’s fully populated.
Nulltheless, someday I’d like to build out a complete EAV code-gen tool and test it under a heavy load — just
for the fun of it.

64
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 65
Relational Database Design 3
Database design layers
I’ve observed that every database can be visualized as three layers: domain integrity (lookup) layer, busi-
ness visible layer, and supporting layer, as drawn in Figure 3-16.
FIGURE 3-16
Visualizing the database as three layers can be useful when designing the conceptual diagram and
coding the SQL DLL implementation.
• Domain Integrity
Look up tables
• Business Entities (Visible)
Objects the user can describe
• Supporting Entities
Associative tables
While you are designing the conceptual diagram, visualizing the database as three layers can help orga-
nize the entities and clarify the design. When the database design moves into the SQL DDL implementa-
tion phase, the database design layers become critical in optimizing the primary keys for performance.
The center layer contains those entities that the client or subject-matter expert would readily recognize
and understand. These are the main work tables that contain working data such as transaction, account,
or contact information. When a user enters data on a daily basis, these are the tables hit by the insert
and update. I refer to this layer as the visible layer or the business entity layer.
Above the business entity layer is the domain integrity layer. This top layer has the entities used for val-
idating foreign key values. These tables may or may not be recognizable by the subject-matter expert or
a typical end-user. The key point is that they are used only to maintain the list of what’s legal for a for-
eign key, and they are rarely updated once initially populated.
Below the visible layer live the tables that are a mystery to the end-user — associative tables used to
materialize a many-to-many logical relationship are a perfect example of a supporting table. Like the vis-
ible layer, these tables are often heavily updated.

Normal Forms
Taking a detailed look at the normal forms moves this chapter into a more formal study of relational
database design.
Contrary to popular opinion, the forms are not a progressive methodology, but they do represent a pro-
gressive level of compliance. Technically, you can’t be in 2NF until 1NF has been met. Don’t plan on
designing an entity and moving it through first normal form to second normal form, and so on. Each
normal form is simply a different type of data integrity fault to be avoided.
65
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 66
Part I Laying the Foundation
First normal form (1NF)
The first normalized form means the data is in an entity format, such that the following three conditions
are met:
■ Every unit of data is represented within scalar attributes. A scalar value is a value ‘‘capable of
being represented by a point on a scale,’’ according to Merriam-Webster.
Every attribute must contain one unit of data, and each unit of data must fill one attribute.
Designs that embed multiple pieces of information within an attribute violate the first normal
form. Likewise, if multiple attributes must be combined in some way to determine a single
unit of data, then the attribute design is incomplete.
■ All data must be represented in unique attributes. Each attribute must have a unique name and a
unique purpose. An entity should have no repeating attributes. If the attributes repeat, or the
entity is very wide, then the object is too broadly designed.
A design that repeats attributes, such as an order entity that includes
item1, item2,and
item3 attributes to hold multiple line items, violates the first normal form.
■ All data must be represented within unique tuples. If the entity design requires or permits
duplicate tuples, that design violates the first normal form.
If the design requires multiple tuples to represent a single item, or multiple items are repre-
sented by a single tuple, then the table violates first normal form.

For an example of the first normal form in action, consider the listing of base camps and tours from the
Cape Hatteras Adventures database. Table 3-3 shows base camp data in a model that violates the
first normal form. The repeating tour attribute is not unique.
TABLE 3-3
Violating the First Normal Form
BaseCamp Tour1 Tour2 Tour3
Ashville Appalachian Trail Blue Ridge Parkway Hike
Cape Hatteras Outer Banks Lighthouses
Freeport Bahamas Dive
Ft. Lauderdale Amazon Trek
West Virginia Gauley River Rafting
To redesign the data model so that it complies with the first normal form, resolve the repeating group
of tour attributes into a single unique attribute, as shown in Table 3-4, and then move any multiple val-
ues to a unique tuple. The
BaseCamp entity contains a unique tuple for each base camp, and the Tour
entity’s BaseCampID refers to the primary key in the BaseCamp entity.
66
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 67
Relational Database Design 3
TABLE 3-4
Conforming to the First Normal Form
Tour Entity BaseCamp Entity
BaseCampID(FK) Tour BaseCampID (PK) Name
1 Appalachian Trail 1 Ashville
1 Blue Ridge Parkway Hike 2 Cape Hatteras
2 Outer Banks Lighthouses 3 Freeport
3 Bahamas Dive 4 Ft. Lauderdale
4 Amazon Trek 5 West Virginia
Gauley River Rafting

Another example of a data structure that desperately needs to adhere to the first normal form is a cor-
porate product code that embeds the department, model, color, size, and so forth within the code. I’ve
even seen product codes that were so complex they included digits to signify the syntax for the follow-
ing digits.
In a theoretical sense, this type of design is wrong because the attribute isn’t a scalar value. In practical
terms, it has the following problems:
■ Using a digit or two for each data element means that the database will soon run out of
possible data values.
■ Databases don’t index based on the internal values of a string, so searches require scanning the
entire table and parsing each value.
■ Business rules are difficult to code and enforce.
Entities with non-scalar attributes need to be completely redesigned so that each individual data attribute
has its own attribute. Smart keys may be useful for humans, but it is best if it is generated by combining
data from the tables.
Second normal form (2NF)
The second normal form ensures that each attribute does in fact describe the entity. It’s a dependency
issue. Does the attribute depend on, or describe, the item identified by the primary key?
If the entity’s primary key is a single value, this isn’t too difficult. Composite primary keys can some-
times get into trouble with the second normal form if the attributes aren’t dependent on every attribute
in the primary key. If an attribute depends on one of the primary key attributes but not the other, that
is a partial dependency, which violates the second normal form.
An example of a data model that violates the second normal form is one in which the base camp phone
number is added to the
BaseCampTour entity, as shown in Table 3-5. Assume that the primary key
67
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 68
Part I Laying the Foundation
(PK) is a composite of both the BaseCamp and the Tour, and that the phone number is a permanent
phone number for the base camp, not a phone number assigned for each tour.

TABLE 3-5
Violating the Second Normal Form
PK-BaseCamp PK-Tour Base Camp PhoneNumber
Ashville Appalachian Trail 828-555-1212
Ashville Blue Ridge Parkway Hike 828-555-1212
Cape Hatteras Outer Banks Lighthouses 828-555-1213
Freeport Bahamas Dive 828-555-1214
Ft. Lauderdale Amazon Trek 828-555-1215
West Virginia Gauley River Rafting 828-555-1216
The problem with this design is that the phone number is an attribute of the base camp but not the
tour, so the
PhoneNumber attribute is only partially dependent on the entity’s primary key.
An obvious practical problem with this design is that updating the phone number requires either updat-
ing multiple tuples or risking having two phone numbers for the same phone.
The solution is to remove the partially dependent attribute from the entity with the composite keys, and
create an entity with a unique primary key for the base camp, as shown in Table 3-6. This new entity is
then an appropriate location for the dependent attribute.
TABLE 3-6
Conforming to the Second Normal Form
Tour Entity Base Camp Entity
PK-Base Camp PK-Tour PK-Base Camp PhoneNumber
Ashville Appalachian Trail Ashville 828-555-1212
Ashville Blue Ridge Parkway Hike Cape Hatteras 828-555-1213
Cape Hatteras Outer Banks Lighthouses Freeport 828-555-1214
Freeport Bahamas Dive Ft. Lauderdale 828-555-1215
Ft. Lauderdale Amazon Trek West Virginia 828-555-1216
West Virginia Gauley River Rafting
68
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 69

Relational Database Design 3
The PhoneNumber attribute is now fully dependent on the entity’s primary key. Each phone number is
stored in only one location, and no partial dependencies exist.
Third normal form (3NF)
The third normal form checks for transitive dependencies. A transitive dependency is similar to a partial
dependency in that they both refer to attributes that are not fully dependent on a primary key. A depen-
dency is transient when
attribute1 is dependent on attribute2, which is dependent on the pri-
mary key.
The second normal form is violated when an attribute depends on part of the key. The third normal
form is violated when the attribute does depend on the key but also depends on another non-key
attribute.
The key phrase when describing third normal form is that every attribute ‘‘must provide a fact about the
key, the whole key, and nothing but the key.’’
Just as with the second normal form, the third normal form is resolved by moving the non-dependent
attribute to a new entity.
Continuing with the Cape Hatteras Adventures example, a guide is assigned as the lead guide respon-
sible for each base camp. The
BaseCampGuide attribute belongs in the BaseCamp entity; but it is a
violation of the third normal form if other information describing the guide is stored in the base camp,
as shown in Table 3-7.
TABLE 3-7
Violating the Third Normal Form
Base Camp Entity
BaseCampPK BaseCampPhoneNumber LeadGuide DateofHire
Ashville 1-828-555-1212 Jeff Davis 5/1/99
Cape Hatteras 1-828-555-1213 Ken Frank 4/15/97
Freeport 1-828-555-1214 Dab Smith 7/7/2001
Ft. Lauderdale 1-828-555-1215 Sam Wilson 1/1/2002
West Virginia 1-828-555-1216 Lauren Jones 6/1/2000

The DateofHire describestheguidenotthebase,sothehire-date attribute is not directly dependent
on the
BaseCamp entity’s primary key. The DateOfHire’s dependency is transitive — it describes the
key and a non-key attribute — in that it goes through the
LeadGuide attribute.
Creating a
Guide entity and moving its attributes to the new entity resolves the violation of the third
normal form and cleans up the logical design, as demonstrated in Table 3-8.
69
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 70
Part I Laying the Foundation
TABLE 3-8
Conforming to the Third Normal Form
Tour Entity LeadGuide Entity
BaseCampPK LeadGuide LeadGuidePK DateofHire
Ashville, NC Jeff Davis Jeff Davis 5/1/99
Cape Hatteras Ken Frank Ken Frank 4/15/97
Freeport Dab Smith Dab Smith 7/7/2001
Ft. Lauderdale Sam Wilson Sam Wilson 1/1/2002
West Virginia Lauren Jones Lauren Jones 6/1/2000
Best Practice
I
f the entity has a good primary key and every attribute is scalar and fully dependent on the primary key,
then the logical design is in the third normal form. Most database designs stop at the third normal form.
The additional forms prevent problems with more complex logical designs. If you tend to work with
mind-bending modeling problems and develop creative solutions, then understanding the advanced forms
will prove useful.
The Boyce-Codd normal form (BCNF)
The Boyce-Codd normal form occurs between the third and fourth normal forms, and it handles a prob-

lem with an entity that has multiple candidate keys. One of the candidate keys is chosen as the primary
key and the others become alternate keys. For example, a person might be uniquely identified by his or
her social security number (ssn), employee number, and driver’s license number. If the ssn is the pri-
mary key, then the employee number and driver’s license number are the alternate keys.
The Boyce-Codd normal form simply stipulates that in such a case every attribute must describe every
candidate key. If an attribute describes one of the candidate keys but not another candidate key, then
the entity violates BCNF.
Fourth normal form (4NF)
The fourth normal form deals with problems created by complex composite primary keys. If two inde-
pendent attributes are brought together to form a primary key along with a third attribute but the two
attributes don’t really uniquely identify the entity without the third attribute, then the design violates the
fourth normal form.
70
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 71
Relational Database Design 3
For example, assume the following conditions:
1. The
BaseCamp and the base camp’s LeadGuide were used as a composite primary key.
2. An
Event and the Guide were brought together as a primary key.
3. Because both used a guide all three were combined into a single entity.
The preceding example violates the fourth normal form.
The fourth normal form is used to help identify entities that should be split into separate entities. Usu-
ally this is only an issue if large composite primary keys have brought too many disparate objects into a
single entity.
Fifth normal form (5NF)
The fifth normal form provides the method for designing complex relationships that involve multiple
(three or more) entities. A three-way or ternary relationship, if properly designed, is in the fifth normal
form. The cardinality of any of the relationships could be one or many. What makes it a ternary rela-

tionship is the number of related entities.
As an example of a ternary relationship, consider a manufacturing process that involves an operator, a
machine, and a bill of materials. From one point of view, this could be an operation entity with three
foreign keys. Alternately, it could be thought of as a ternary relationship with additional attributes.
Just like a two-entity many-to-many relationship, a ternary relationship requires a resolution entity in
the physical schema design to resolve the many-to-many relationship into multiple artificial one-to-many
relationships; but in this case the resolution entity has three or more foreign keys.
In such a complex relationship, the fifth normal form requires that each entity, if separated from the
ternary relationship, remains a proper entity without any loss of data.
It’s commonly stated that third normal form is enough. Boyce-Codd, fourth, and fifth normal forms may
be complex, but violating them can cause severe problems. It’s not a matter of more entities vs. fewer
entities; it’s a matter of properly aligned attributes and keys.
As I mentioned earlier in this chapter, Louis Davidson (aka Dr. SQL) and I co-present a
session at conferences on database design. I recommend his book
Pro SQL Server 2008
Relational Database Design and Implementation
(Apress, 2008).
Summary
Relational database design, covered in Chapter 2, showed why the database physical schema is critical
to the database’s performance. This chapter looked at the theory behind the logical correctness of the
database design and the many patterns used to assemble a database schema.
■ There are three phases in database design: the conceptual (diagramming) phase, the SQL
DDL (create table) phase, and the physical layer (partition and file location) phase. Databases
designed with only the conceptual phase perform poorly.
71
www.getcoolebook.com

×