Tải bản đầy đủ (.pdf) (10 trang)

Hướng dẫn học Microsoft SQL Server 2008 part 9 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.08 MB, 10 trang )

Nielsen c02.tex V4 - 07/21/2009 12:02pm Page 42
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 43
Relational Database
Design
IN THIS CHAPTER
Introducing entities, tuples,
and attributes
Conceptual diagramming vs.
SQL DDL
Avoiding normalization
over-complexity
Choosing the right database
design pattern
Ensuring data integrity
Exploring alternative patterns
Normal forms
I
play jazz guitar — well, I used to play before life became so busy.
(You can listen to some of my MP3s on my ‘‘about’’ page on
www.sqlserverbible.com.) There are some musicians who can
hear a song and then play it; I’m not one of those. I can feel the rhythm, but
I have to work through the chords and figure them out almost mathematically
before I can play anything but a simple piece. To me, building chords and chord
progressions is like drawing geometric patterns on the guitar neck using the frets
and strings.
Music theory encompasses the scales, chords, and progressions used to make
music. Every melody, harmony, rhythm, and song draws from music theory. For
some musicians there’s just a feeling that the song sounds right. For those who
make music their profession, they understand the theory behind why a song feels
right. Great musicians have both the feel and the theory in their music.


Designing databases is similar to playing music. Databases are designed by
combining the right patterns to correctly model a specific solution to a problem.
Normalization is the theory that shapes the design. There’s both the mathematic
theory of relational algebra and the intuitive feel of an elegant database.
Designing databases is both science and art.
Database Basics
The purpose of a database is to store the information required by an organization.
Any means of collecting and organizing data is a database. Prior to the Informa-
tion Age, information was primarily stored on cards, in file folders, or in ledger
books. Before the adding machine, offices employed dozens of workers who spent
all day adding columns of numbers and double-checking the math of others. The
job title of those who had that exciting career was computer.
43
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 44
Part I Laying the Foundation
Author’s Note
W
elcome to the second of five chapters that deal with database design. Although they’re spread out in
the table of contents, they weave a consistent theme that good design yields great performance:
■ Chapter 2, ‘‘Data Architecture,’’ provides an overview of data architecture.
■ This chapter details relational database theory.
■ Chapter 20, ‘‘Creating the Physical Database Schema,’’ discusses the DDL layer of
database design and development.
■ Partitioning the physical layer is covered in Chapter 68, ‘‘Partitioning.’’
■ Designing data warehouses for business intelligence is covered in Chapter 70,
‘‘BI Design.’’
There’s more to this chapter than the standard ‘‘Intro to Normalization.’’ This chapter draws on the lessons
I’ve learned over the years and has a few original ideas.
This chapter covers a book’s worth of material (which is why I rewrote it three times), but I tried to concisely

summarize the main ideas. The chapter opens with an introduction to database design term and concepts.
Then I present the same concept from three perspectives: first with the common patterns, then with my
custom Layered Design concept, and lastly with the normal forms. I’ve tried to make the chapter flow, but
each of these ideas is easier to comprehend after you understand the other two, so if you have the time, read
the chapter twice to get the most out of it.
As the number crunching began to be handled by digital machines, human labor, rather than being
eliminated, shifted to other tasks. Analysts, programmers, managers, and IT staff have replaced the
human ‘‘computers’’ of days gone by.
Speaking of old computers, I collect abacuses, and I know how to use them too — it
keeps me in touch with the roots of computing. On my office wall is a very cool
nineteenth-century Russian abacus.
Benefits of a digital database
The Information Age and the relational database brought several measurable benefits to organizations:
■ Increased data consistency and better enforcement of business rules
■ Improved sharing of data, especially across distances
■ Improved ability to search for and retrieve information
■ Improved generation of comprehensive reports
■ Improved ability to analyze data trends
The general theme is that a computer database originally didn’t save time in the entry of data, but rather
in the retrieval of data and in the quality of the data retrieved. However, with automated data collection
in manufacturing, bar codes in retailing, databases sharing more data, and consumers placing their own
orders on the Internet, the effort required to enter the data has also decreased.
44
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 45
Relational Database Design 3
The previous chapter’s sidebar titled ‘‘Planning Data Stores’’ discusses different types or
styles of databases. This chapter presents the relational database design principles and pat-
terns used to develop operational, or OLTP (online transaction processing), databases.
Some of the relational principles and patterns may apply to other types of databases, but databases that

are not used for first-generation data (such as most BI, reporting databases, data warehouses, or refer-
ence data stores) do not necessarily benefit from normalization.
In this chapter, when I use the term ‘‘database,’’ I’m referring exclusively to a relational, OLTP-style
database.
Tables, rows, columns
A relational database collects related, or common, data in a single list. For example, all the product
information may be listed in one table and all the customers in another table.
A table appears similar to a spreadsheet and is constructed of columns and rows. The appeal (and the
curse) of the spreadsheet is its informal development style, which makes it easy to modify and add to
as the design matures. In fact, managers tend to store critical information in spreadsheets, and many
databases started as informal spreadsheets.
In both a spreadsheet and a database table, each row is an item in the list and each column is a specific
piece of data concerning that item, so each cell should contain a single piece of data about a single item.
Whereas a spreadsheet tends to be free-flowing and loose in its design, database tables should be very
consistent in terms of the meaning of the data in a column. Because row and column consistency is so
important to a database table, the design of the table is critical.
Over the years, different development styles have referred to these concepts with various different terms,
listed in Table 3-1.
TABLE 3-1
Comparing Database Terms
The List of Common A Piece of Information
Development Style Items An Item in the List in the List
Legacy software File Record Field
Spreadsheet Spreadsheet/worksheet/
named range
Row Column/cell
Relational algebra/
logical design
Entity, or relation Tuple (rhymes with
couple)

Attribute
SQL DDL design Table Row Column
Object-oriented
design
Class Object instance Property
45
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 46
Part I Laying the Foundation
SQL Server developers generally refer to database elements as tables, rows, and columns when discussing
the SQL DDL layer or physical schema, and sometimes use the terms entity, tuple, and attribute when
discussing the logical design. The rest of this book uses the SQL terms (table, row, column), but this
chapter is devoted to the theory behind the design, so I also use the relational algebra terms (entity,
tuple, and attribute).
Database design phases
Traditionally, data modeling has been split into two phases, the logical design and the physical design;
but Louis Davidson and I have been co-presenting at conferences on the topic of database design and
I’ve become convinced that Louis is right when he defines three phases to database design. To avoid
confusion with the traditional terms, I’m defining them as follows:
■ Conceptual Model: The first phase digests the organizational requirements and identifies the
entities, their attributes, and their relationships.
The conceptual diagram model is great for understanding, communicating, and verifying the
organization’s requirements. The diagramming method should be easily understood by all the
stakeholders — the subject-matter experts, the development team, and management.
At this layer, the design is implementation independent: It could end up on Oracle, SQL
Server, or even Access. Some designers refer to this as the ‘‘logical model.’’
■ SQL DDL Layer: This phase concentrates on performance without losing the fidelity of the
logical model as it applies the design to a specific version of a database engine — SQL Server
2008, for example, generating the DDL for the actual tables, keys, and attributes. Typically,
the SQL DDL layer generalizes some entities, and replaces some natural keys with surrogate

computer-generated keys.
The SQL DDL layer might look very different than the conceptual model.
■ Physical Layer: The implementation phase considers how the data will be physically stored
on the disk subsystems using indexes, partitioning, and materialized views. Changes made to
this layer won’t affect how the data is accessed, only how it’s stored on the disk.
The physical layer ranges from simple, for small databases (under 20Gb), to complex, with
multiple filegroups, indexed views, and data routing partitions.
This chapter focuses on designing the conceptual model, with a brief look at normalization followed by
a repertoire of database patterns.
Implementing a database without working through the SQL DLL Layer design phase is a
certain path to a poorly performing database. I’ve seen far too many database purists who
didn’t care to learn SQL Server implement conceptual designs only to blame SQL Server for the horrible
performance.
The SQL DLL Layer is covered in Chapter 20, ‘‘Creating the Physical Database Schema.’’
Tuning the physical layer is discussed in Chapters 64, ‘‘Indexing Strategies,’’ and 68,
‘‘Partitioning.’’
46
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 47
Relational Database Design 3
Normalization
In 1970, Dr. Edgar F. Codd published ‘‘A Relational Model of Data for Large Shared Data Bank’’ and
became the father of relational database. During the 1970s Codd wrote a series of papers that defined
the concept of database normalization. He wrote his famous ‘‘Codd’s 12 Rules’’ in 1985 to define what
constitutes a relational database and to defend the relational database from software vendors who
were falsely claiming to be relational. Since that time, others have amended and refined the concept of
normalization.
The primary purpose of normalization is to improve the data integrity of the database by reducing or
eliminating modification anomalies that can occur when the same fact is stored in multiple locations
within the database.

Duplicate data raises all sorts of interesting problems for inserts, updates, and deletes. For example, if
the product name is stored in the order detail table, and the product name is edited, should every order
details row be updated? If so, is there a mechanism to ensure that the edit to the product name prop-
agates down to every duplicate entry of the product name? If data is stored in multiple locations, is it
safe to read just one of those locations without double-checking other locations? Normalization prevents
these kinds of modification anomalies.
Besides the primary goal of consistency and data integrity, there are several other very good reasons to
normalize an OLTP relational database:
■ Performance: Duplicate data requires extra code to perform extra writes, maintain consis-
tency, and manipulate data into a set when reading data. On my last large production contract
(several terabytes, OLTP, 35K transactions per second), I tested a normalized version of the
database vs. a denormalized version. The normalized version was 15% faster. I’ve found similar
results in other databases over the years.
Normalization also reduces locking contention and improves multiple-user concurrency
■ Development costs: While it may take longer to design a normalized database, it’s easier to
work with a normalized database and it reduces development costs.
■ Usability: By placing columns in the correct table, it’s easier to understand the database and
easier to write correct queries.
■ Extensibility: A non-normalized database is often more complex and therefore more difficult
to modify.
The three ‘‘Rules of One’’
Normalization is well defined as normalized forms — specific issues that address specific potential
errors in the design (there’s a whole section on normal forms later in this chapter). But I don’t design a
database with errors and then normalize the errors away; I follow normalization from the beginning to
the conclusion of the design process. That’s why I prefer to think of normalization as positively stated
principles.
When I teach normalization I open with the three ‘‘Rules of One,’’ which summarize normalization from
a positive point of view. One type of item is represented by one entity (table). The key to designing
47
www.getcoolebook.com

Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 48
Part I Laying the Foundation
a schema that avoids update anomalies is to ensure that each single fact in real life is modeled by a
single data point in the database. Three principles define a single data point:
■ One group of similar things is represented by one entity (table).
■ One thing is represented by one tuple (row).
■ One descriptive fact about the thing is represented by one attribute (column).
Grok these three simple rules and you’ll be a long way toward designing a properly normalized
database.
Normalization As Story
T
he Time Traveler’s Wife
, by Audrey Niffenegger, is one of my favorite books. Without giving away the
plot or any spoilers, it’s an amazing sci-fi romance story. She moves through time conventionally, while
he bounces uncontrollably through time and space. Even though the plot is more complex than the average
novel, I love how Ms. Niffenegger weaves every detail together into an intricate flow. Every detail fits and
builds the characters and the story.
In some ways, a database is like a good story. The plot of the story is in the data model, and the data
represents the characters and the details.
Normalization
is the grammar of the database.
When two writers tell the same story, each crafts the story differently. There’s no single correct way to tell a
story. Likewise, there may be multiple ways to model the database. There’s no single correct way to model
a database — as long as the database contains all the information needed to extract the story and it follows
the normalized grammar rules, the database will work. (Don’t take this to mean that any design might be
a correct design. While there may be multiple correct designs, there are many more incorrect designs.) A
corollary is that just as some books read better than others, so do some database schemas flow well, while
other database designs are difficult to query.
As with writing a novel, the foundation of data modeling is careful observation, a n understanding of reality,
and clear thinking. Based on those insights, the data modeler constructs a logical system — a new virtual

world — that models a slice of reality. Therefore, how the designer views reality and identifies entities and
their interactions will influence the design of the virtual world. Like postmodernism, there’s no single perfect
correct representation, only the viewpoint of the author/designer.
Identifying entities
The first step to designing a database conceptual diagram is to identify the entities (tables). Because any
entity represents only one type of thing, it takes several entities together to represent an entire process
or organization.
Entities are usually discovered from several sources:
■ Examining existing documents (order forms, registration forms, patient files, reports)
■ Interviews with subject-matter experts
■ Diagramming the process flow
At this early stage the goal is to simply collect a list of possible entities and their facts. Some of the
entities will be obvious nouns, such as customers, products, flights, materials, and machines.
48
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 49
Relational Database Design 3
Other entities will be verbs: shipping, processing, assembling parts to build a product. Verbs may be
entities, or they may indicate a relationship between two entities.
The goal is to simply collect all the possible entities and their attributes. At this early stage, it’s also
useful to document as many known relationships as possible, even if those relationships will be edited
several times.
Generalization
Normalization has a reputation of creating databases that are complex and unwieldy. It’s true that some
database schemas are far too complex, but I don’t believe normalization, by itself, is the root cause.
I’ve found that the difference between elegant databases that are a joy to query and overly complex
designs that make you want to polish your resume is the data modeler’s view of entities.
When identifying entities, there’s a continuum, illustrated in Figure 3-1, ranging from a broad
all-inclusive view to a very specific narrow definition of the entity.
FIGURE 3-1

Entities can be identified along a continuum, from overly generalized with a single table, to overly
specific with too many tables.
Result:
Overly
Simple
One Table
Overly
Complex
Specific
Tables
• Data-driven design
• Fewer tables
• Easier to extend
The overly simple view groups together entities that are in fact different types of things, e.g., storing
machines, products, and processes in the single entity. This approach might risk data integrity for two
reasons. First, it’s difficult to enforce referential integrity (foreign key constraints) because the primary
key attempts to represent multiple types of items. Second, these designs tend to merge entities with
different attributes, which means that many of the attributes (columns) won’t apply to various rows
and will simply be left null. Many nullable columns means the data will probably be sparsely filled and
inconsistent.
At the other extreme, the overly specific view segments entities that could be represented by a single
entity into multiple entities, e.g., splitting different types of subassemblies and finished products into
multiple different entities. This type of design risks flexibility and usability:
■ The additional tables create additional work at every layer of the software.
■ Database relationships become more complex because what could have been a single rela-
tionship is now multiple relationships. For example, instead of relating an assembly process
between any part, the assembly relationship must now relate with multiple types of parts.
49
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 50

Part I Laying the Foundation
■ The database has now hard-coded the specific types of similar entities, making it very difficult
to add another similar type of entity. Using the manufacturing example again, if there’s an
entity for every type of subassembly, then adding another type of subassembly means changes
at every level of the software.
The sweet spot in the middle generalizes, or combines, similar entities into single entities. This approach
creates a more flexible and elegant database design that is easier to query and extend:
■ Look for entities with similar attributes, or entities that share some attributes.
■ Look for types of entities that might have an additional similar entity added in the future.
■ Look for entities that might be summarized together in reports.
When designing a generalized entity, two techniques are essential:
■ Use a lookup entity to organize the types of entities. For the manufacturing example, a
subassemblytype attribute would serve the purpose of organizing the parts by subassembly
type. Typically, this would be a foreign key to a
subassemblytype entity.
■ Typically, the different entity types that could be generalized together do have some differences
(which is why a purist view would want to segment them). Employing the supertype/subtype
(discussed in the ‘‘Data Design Patterns’’ section) solves this dilemma perfectly.
I’ve heard from some that generalization sounds like denormalization — it’s not. When generalizing, it’s
critical that the entities comply with all the rules of normalization.
Generalized databases tend to be data-driven, have fewer tables, and are easier to extend. I was once
asked to optimize a database design that was modeled by a very specific-style data modeler. His design
had 78 entities, mine had 18 and covered more features. For which would you rather write stored
procedures?
On the other hand, be careful to merge entities because they actually do share a root meaning in
the data. Don’t merge unlike entities just to save programming. The result will be more complex
programming.
Best Practice
G
ranted, knowing when to generalize and when to segment can be an art form and requires a repertoire of

database experience, but generalization is the buffer against database over-complexity; and consciously
working at understanding generalization is the key to becoming an excellent data modeler.
In my seminars I use an extreme example of specific vs. generalized design, asking groups of three to
four attendees to model the database in two ways: first using an overly specific data modeling technique,
and then modeling the database trying to hit the generalization sweet spot.
Assume your team has been contracted to develop a database for a cruise ship’s activity director — think
Julie McCoy, the cruise director on the Love Boat.
50
www.getcoolebook.com
Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 51
Relational Database Design 3
The cruise offers a lot of activities: tango dance lessons, tweetups, theater, scuba lessons, hang-gliding,
off-boat excursions, authentic Hawaiian luau, hula-dancing lessons, swimming lessons, Captain’s dinners,
aerobics, and the ever-popular shark-feeding scuba trips. These various activities have differing require-
ments, are offered multiple times throughout the cruise, and some are held at different locations. A pas-
senger entity already exists; you’re expected to extend the database with new entities to handle activities
but still use the existing passenger entity.
In the seminars, the specialized designs often have an entity for every activity, every time an activity is
offered, activities at different locations, and even activity requirements. I believe the maximum number
of entities by a seminar group is 36. Admittedly, it’s an extreme example for illustration purposes, but
I’ve seen database designs in production using this style.
Each group’s generalized design tends to be similar to the one shown in Figure 3-2. A generalized
activity entity stores all activities and descriptions of their requirements organized by activity type. The
ActivityTime entity has one tuple (row) for every instance or offering of an activity, so if hula-dance
lessons are offered three times, there will be three tuples in this entity.
FIGURE 3-2
A generalized cruise activity design can easily accommodate new activities and locations.
Generalized Design
ActivityType
Activity

Time
Activity
Time
Passenger
Location
SignUp
Primary keys
Perhaps the most important concept of an entity (table) is that it has a primary key — an attribute or
set of attributes that can be used to uniquely identify the tuple (row). Every entity must have a primary
key; without a primary key, it’s not a valid entity.
By definition, a primary key must be unique and must have a value (not null).
51
www.getcoolebook.com

×