Tải bản đầy đủ (.pdf) (20 trang)

Beginning Database Design- P15 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (566.11 KB, 20 trang )

The ADVERTISEMENT table is decided on being the fact table. The MUSICIAN table is a single layer fact
table. The
BAND, MERCHANDISE, SHOWS, and DISCOGRAPHY tables constitute a two-layer dimensional
hierarchy.
There are some gray areas between static and dynamic information in this situation. There are other
ways of building this data warehouse database model.
Project Management
When it comes to planning and establishing timelines, there are some remarkably good project-planning
software tools available. When planning any software project, including a database model design, there
is often more than one person involved in a project. This excludes the planner. If not using a software
tool, and multiple people are to perform multiple overlapping, interdependent tasks, you could end up
with lots of drawings, and an awfully full garbage can. These things change. Plans can change. People
get sick and go on vacation. They find new jobs. Sometimes people simply don’t show up at all! In other
words, expect things to change.
The fact is when it comes to planning, establishing timelines, and budgeting there is no enforceable or
reasonably useful applicable set of rules and steps. It just doesn’t exist. In any planning process (including
budgeting), for any type of project, there is no way to accurately predict to any degree of mathematical
certainty. None of us can really read the future. Planning, establishing timelines, and budgeting entail a
flexible process, which is almost entirely dependant on the expertise, and past experience, of the planner.
And then again, if changes and unexpected snags or events are not anticipated, a project can very quickly
spiral out of control.
Project Planning and Timelines
A software development company can meet with its financial demise by not allowing enough time in
project plans. It is quite common in the software development field for project-tender budgets from
multiple companies to be as much as 50 times different. In other words, the difference in dollar estimates
between the lowest and highest bidders can be enormous. This indicates guesswork. It is often the case
that the most inexperienced companies put in the lowest bids. The highest bids are probably greedy.
Those between the average and the highest are probably the most realistic. Also, they are likely to make a
profit, and thus be around to support very expensive software tools, say, 5 to 10 years from now.
As already stated, there is much research into project planning in general. Unfortunately, few, if any
quantifiable and useful results have been obtained. Expert level assessment based on past experience of


experts, is usually the best measure of possibilities.
There is an International Standards Institute (ISO) model called the “ISO 9000-3 Model.” This model
is used more to give a method of quality assurance against the final product, of, say, an analysis of a
database model. This ISO model does not give a set of instructions as to how to go about performing
analysis process itself, but rather presents a method of validation after the fact.
253
Planning and Preparation Through Analysis
15_574906 ch09.qxd 11/4/05 10:49 AM Page 253
The accuracy of a planned budget is dependent on the experience of the planner. Database designers,
administrators, and programmers think that their project managers and planners do nothing. They are
wrong. The planners take all the risk, make all the wild guesses; the programmers get to write all that
mathematically precise programming code. So, the next time you see your project manager in a cold
sweat, you know why. It’s not your job on the line — it’s theirs.
Here are some interesting—and sometimes amusing —terms, often used to describe planning, budgeting,
and project management:
❑ “Why did the chicken cross the road?” —This is a well-known quotation about a now unfortu-
nately discontinued software development consultancy company. This company was famous
for producing enormous amounts of paper. Lots of paper can help to make more profit for the
software development company. The more paper produced, the more useless waffle. However,
the more paper produced, the less likely something has been overlooked, and the more likely
the plan and budget are accurate. The other problem with lots of paper is no one has time to
read it, and doesn’t really want to either.
❑ “Get someone with a great, big, long signature and a serious title to sign off on it ASAP.” —This one is
also called “passing the buck.” That buck has to stop somewhere, and the higher up, the better,
and preferably not with the planner. The planner has enough things to think about without
worrying about whether their wildest guesses will bear fruit, or simply crash and burn.
❑ “Don’t break it if it’s already fixed.” —If something is working properly, why change it?
❑ “Use information in existing systems, be they computerized or on paper.” — Existing structures can
usually tell more about a company that its people can. There is, however, a danger that if a sys-
tem is being replaced, there is probably something very wrong with the older system.

❑ “Try not to reinvent the wheel.” —When planning on building software or a new database model,
use other people’s ideas if they are appropriate. Of course, beware of outdated ideas. Do thor-
ough research. The Internet is an excellent source of freely available ideas, both old and new.
❑ “More resources do not mean faster work.”—The more people involved in a project, the more con-
fusion. Throwing more bodies at a project can just as likely make a project impossible to man-
age, as it can make it go faster.
Figure 9-30 shows a pretty picture of what a project timeline might look like, containing multiple people,
multiple skills levels, multiple tasks (both overlapping and interdependent). Project timelines can
become incredibly complicated. Simplify, if possible. Too much interdependency can lead to problems if
one area overruns on time limitations. Keep something spare in terms of people, available hours, and
otherwise. Expect changes. Plan with some resources in reserve.
Figure 9-30 shows five separate tasks in a simplistic project-timeline Gantt chart. Task 1 is assigned to
Jim. There is no time conflict between Task 2 and Task 3, and can both be assigned to Joe (one person).
Task 3 and Task 4 overlap between both Janet and Joe, in more ways than one. Joe is going to be very
busy indeed. A project manager would allow overlap where the person doing the assigned tasks is
known to be capable of multiple concurrent activities.
254
Chapter 9
15_574906 ch09.qxd 11/4/05 10:49 AM Page 254
Gantt charts were invented as a graphical tool for project management. Gantt charts allow for a pictorial
illustration of a schedule, thus helping with the planning, coordination between, and tracking of multiple
tasks. Tasks can be independent or interdependent. Many off-the-shelf software tools allow computerized
project management with Gantt charts (and otherwise), such as Microsoft Project, Excel spreadsheet
plug-ins, and even Visio.
Figure 9-30: An example project timeline Gantt chart.
Budgeting
When it comes to budgeting for a project, budgeting is part of planning, and is thus open to expert
interpretation. Once again, there is research on how to apply formal methods to budgeting. Much of that
research is unsuccessful. In the commercial world, most observations offer a lot of guesswork based
on experience. Also, there is an intense reluctance on the part of the experts to justify and quantify the

how and why of the results they come up with. Budgeting is, as already stated, just like planning. It is
educated guesswork. There is nothing to really present in a book such as this one, telling someone like
you, the reader, how to go about budgeting a project, such as a database modeling project.
One big problem with software development (including database model development and design) is its
intangibility. Software projects are difficult to quantify, partially because they are unpredictable, they can
change drastically during development. The sheer complexity of software is another factor. Complexity is
not only within each separate unit and step. One step could be the database model analysis and design.
Another step could be writing of front-end application code. There is a complexity issue simply because of
the need to account for all possibilities. There is also complexity because of the heavy interdependence
between all parts of software development. This interdependence stems from the highest level (comparing,
for example, database model and application coding) even down to the smallest piece (comparing two
different fields in a single table, in a database model).
In short, database model analysis and design is extremely difficult to budget for. The most common and
probably successful practice is to guess at an estimated cost based on estimated time to be spent (work
hours). Then take the result and add a percentage, even as much as 50 percent, and sometimes multiply-
ing the first conservative estimate by much more.
Task Name
Task number 1
5/23/2005
Task number 2
Task number 4
Task number 3
Task number 5
MTWThF
5/30/2005
MTWThF
6/6/2005
MTWThF
Jim
Joe

Joe
Janet
Janet and Joe
255
Planning and Preparation Through Analysis
15_574906 ch09.qxd 11/4/05 10:49 AM Page 255
A number of detail areas already have been discussed previously in this chapter:
❑ Hiring help costs money —Hired help in the form of software development expertise can be astro-
nomically expensive.
❑ Hardware costs—Hardware can be cheap to incredibly expensive. Generally, more expensive
hardware leads to the following scenarios:
❑ Expensive hardware can help to alleviate software costs, but usually in the form of
hiding performance problems. Sloppy database design and software development can
be overcome by upgrading hardware. If growth is expected, and the hardware selected
is ever outgrown, costly problems could be the result. The temptation when using
expensive hardware is to not worry too much about building database and software
applications properly. Fast-performing hardware lets you get away with a lot of things.
❑ Expensive hardware is complicated. Hiring people to set up and maintain that complexity
can be extremely expensive. Simplistic and cheap hardware requires less training and
lower skills levels. Less-skilled labor is much cheaper.
❑ Maintenance —Maintenance is all about complexity and quality. The more complex something
is, the more difficult it is to maintain. Somewhat unrelated, but just as important, poor quality
results in a lot more maintenance. Indirectly, maintenance is related to the life of a database
model, and ultimately applications. How long will they last, how long will something be useful,
and help to turn a profit? There is a point where constant maintenance is outweighed by doing a
complete rewrite. In other words, sometimes rebuilding databases and software applications
from scratch can save money, rather than continuing to maintain older software that has seen so
many changes that it is no longer cost-effective to maintain.
❑ Training —Training affects all levels, from technical staff to unknown huge quantities of end-users
on the Internet. Obviously, you don’t want to have to train Internet users located at far-flung parts

of the globe. Attempting to train Internet users is pointless. People lose interest in applications that
are difficult to use. You might as well start again in this case. Training in-house staff is different.
The higher the complexity, the more training is involved. Training costs money —sometimes a lot
of money!
Summary
In this chapter, you learned about:
❑ The basics of analysis and design of a database model
❑ A usable set of semi-formal steps, for the database modeling analysis process
❑ Some common problem areas and misconceptions that can arise
❑ How talking, listening, and following a paper trail can be of immense value
❑ How to create a database model to cover objectives for both an OLTP and a data warehouse
database model
256
Chapter 9
15_574906 ch09.qxd 11/4/05 10:49 AM Page 256
❑ How to refine a database model using the application of business rules, through basic analysis,
creating both an OLTP and a data warehouse database model
❑ How to apply everything learned so far with a comprehensive case study, creating both an
OLTP and a data warehouse database model
This chapter has attempted to apply everything learned in previous chapters, by going through the
motions of beginning with the process of creating a database model. An online auction company has
been used as the case study example.
Chapter 10 expands on the analysis process (simplistic in this chapter at best) and design with more
detail provided for the OLTP and database warehouse database models, as presented analytically in this
chapter. The idea is to build things step-by-step, with as much planning as possible. This chapter has
begun that database model building process, as the first little step in the right direction.
Exercises
Use the ERDs in Figure 9-19 and Figure 9-29 to help you answer these questions.
1. Create scripts to create tables for the OLTP database model Figure 9-19. Create the tables in the
proper order by understanding the relationships between the tables.

2. Create scripts to create tables for the data warehouse database model Figure 9-29. Once again,
create the tables in the proper order by understanding the relationships between the tables.
257
Planning and Preparation Through Analysis
15_574906 ch09.qxd 11/4/05 10:49 AM Page 257
15_574906 ch09.qxd 11/4/05 10:49 AM Page 258
10
Creating and Refining Tables
During the Design Phase
“Everything in the world must have design or the human mind rejects it. But in addition, it must
have purpose or the human conscience shies away from it.” (John Steinbeck)
Analysis is all about what needs to done. Design does it!
This chapter builds on and expands on the basic analytical process and structure discovered during
the case study approach in Chapter 9, which covered analysis. Analysis is the process of discovering
what needs to be done. Analysis is all about the operations of a company, what it does to make a
living. Design is all about how to implement that which was analyzed into a useful database model.
This chapter passes from the analysis phase into the design phase by discovering through a case
study example how to build and relate tables in a relational database model.
By the end of this chapter, you will have a much deeper understanding of how database models
are created and designed. This chapter teaches you how to begin the implementation of what was
analyzed (discovered) in Chapter 9. In short, implementation is the process of creating tables, and
sticking those tables together with appropriate relationships.
In this chapter, you learn about the following:
❑ Database model design
❑ The difference between analysis and design
❑ Creating tables
❑ Enforcing inter-table relationships and referential integrity
❑ Normalization without going too far
❑ Denormalization without going too far
16_574906 ch10.qxd 11/4/05 10:46 AM Page 259

A Little More About Design
When designing a database model, the intention is to refine database model structure for reasons of
maintaining data integrity, good performance, and adaptations to specific types of applications, such as
a data warehouse. A data warehouse database model requires a different basic table structure than an
OLTP database model, mostly for reasons of acceptable reporting performance.
Performance is not exclusively a programming or implementation construction or even a testing or
“wait until it’s in production” activity. Performance should be included at the outset in both analysis and
design stages. This is particularly important for database model analysis and design for two reasons:
❑ Redesigning a database model after applications are written will change applications (a little
like a rewrite—a complete waste of time and money).
❑ There is simply no earthly reason or excuse to not build for performance right from the beginning
of the development process. If this is not possible, you might want additional expertise in the
form of specialized personnel. Hiring a short-term consultant at the outset of a development
project could save enormous amounts of maintenance, rewriting, redevelopment costs, and time
in the future.
Business events and operations discovered in the analysis stage should be utilized to drive the design
process, which consists of refining table pictures and ERDs already drawn. For larger projects, the
design stage can also consist of detailed technical specifications. Technical specifications are used by
programmers and administrators to create databases and programming code.
Essentially, the beginning of the design process marks the point where the thought processes of analysts
and programmers begin to mix. In the analysis stage, the approach was one of a business operation (a
business-wide view) of a company. In the design stage, it starts to get more technical.
When designing a database model, you should begin thinking about a database model from the
perspective of how applications will use that database model. In other words, when considering how to
build fact and dimensional table structures in a data warehouse database model, consider how reports
will be structured. Consider how long those reports will take to run. For a data warehouse, not only are
the table contents and relationships important, but factors such as reconstruction of data using materialized
views and alternate indexing can help with the building and performance of data warehouse reporting.
Relational database model design includes the following:
❑ Refine Database Models —At this stage of the game, most of this is about normalization and

denormalization.
❑ Finalization and Approval—This includes finalization (and most especially, approval) of business
and technical design issues. You need to get it signed off for two reasons:
❑ Software development involves large amounts of investment in both money and time.
Management, and probably even executive management approval and responsibility, is
required. A designer does not need this level of worry, but may need some powerful clout
to back up the development process. Other departments and managers getting in the way
of the development process could throw your schedule and budget for a complete loop.
260
Chapter 10
16_574906 ch10.qxd 11/4/05 10:46 AM Page 260
Every project needs a sponsor and champion; otherwise, there is no point in progressing.
❑ You need to cover your back. You also need to be sure that you are going in the right
direction because you may very well not be. There is usually a good reason why
your boss is your boss, and it usually has a lot do with him or her having a lot more
experience. Experience is always valuable. It is extremely likely that this person
will help you get things moving when you need, such as when you need
information and help from other departments.
So, it’s not really about passing the buck up the ladder. It’s really about getting a job done. The stronger
approval and support for a project, the better the chance of success —unless, of course, it’s your money.
In that case, it’s your problem! So, pinch those pennies. Of course, some entrepreneurs say that the
secrets to making money are all about cash flow, spending it, and not stashing it in the bank for a rainy
day. Be warned, however, that software development is a very expensive and risky venture!
So, the design stage is the next stage following the analysis stage. Design is a process of figuring out how
to implement what was discovered during the analysis stage. As described in Chapter 9, analysis is
about what needs to be done. Design is about how it should be done. The design stage deals with the
following aspects of database model creation:
❑ More precise tables, including practical application of normalization.
❑ Establishment of primary and foreign key fields.
❑ Enforcement of referential integrity by establishing and quantifying precise relationships

between tables, using primary and foreign key fields.
❑ Denormalization in the design stage (the sooner the better), particularly in the case of data
warehouse table structures.
❑ Alternate and extra indexing in addition to that of referential integrity, primary and foreign
keys; however, alternate indexing is more advanced (detailed) design, and is discussed in
Chapter 11.
❑ Advanced database structures, such as materialized views, and some specialized types of
indexing. Similar to alternate indexing, this is also more advanced (detailed) design, and is
discussed in Chapter 11.
❑ Precise field definitions, structure, and their respective datatypes (again advanced design).
The intention of this chapter is to focus on the firm and proper establishment of inter-table relationships,
through the application of normalization and denormalization for both OLTP and data warehouse
database models. This process is performed as a case study, continuing with the use of the online auction
company introduced in Chapter 9.
Let’s create some tables.
261
Creating and Refining Tables During the Design Phase
16_574906 ch10.qxd 11/4/05 10:46 AM Page 261
262
Chapter 10
Case Study: Creating Tables
In Chapter 9, tables were created on an analytical level, creating basic pictures. Following the basic pictures,
simple ERDs were constructed. In this section, basic commands are used to create the initial simple tables,
as shown in the analytical process of Chapter 9. The idea is to retain the step-by-step instruction of each
concept layer, in the database modeling design process, for both OLTP and data warehouse database
models. These models are created for the online auction house case study database models.
The OLTP Database Model
Figure 10-1 shows a simple analytical diagram, covering the various operational aspects of the online
auction house, OLTP database model. Notice how the
BIDS table is connected to both the LISTING and

BUYER tables. This is the only table in this database structure that is connected to more than one table
and not as part of a hierarchy. Category tables are part of a hierarchy.
Figure 10-1: Analytical OLTP online auction house database model.
Figure 10-2 shows the application of business rules to the simple analytical diagram shown in Figure 10-1.
Once again, notice the dual links for the
BID table (now called BID for technical accuracy because each
record represents a single bid), to both the
LISTING and the BUYER tables. This double link represents a
Primary
Category
Secondary
Category
Tertiary
Category
Buyer
History
Listing
Seller
History
Seller Bids
Buyer
16_574906 ch10.qxd 11/4/05 10:46 AM Page 262
263
Creating and Refining Tables During the Design Phase
many-to-many relationship. The BID table is in actuality a resolution of the many-to-many relationship
between the
LISTING and BUYER tables. In other words, each buyer can place many bids on a single item
listed for auction; however, each auction listed item can also have many buyers making those many bids.
Figure 10-2: Business rules OLTP online auction house database model.
The easiest way to create tables in a database model is to create them in a top-down fashion, from static

to dynamic tables, gradually introducing more and more dependent detail. In others words, information
that does not change very often is created first. Information changing frequently is created after static
tables. Technically speaking, it’s all about catering to dependencies first. The first tables created are those
with no dependencies. Subsequent tables are created when tables depended on have already been
created. Begin by creating the three category static tables. The
SECONDARY table depends on the PRIMARY
table, and the TERTIARY table depends on both the SECONDARY, and thus the PRIMARY table as well;
therefore you need to create
PRIMARY, then SECONDARY, followed by TERTIARY tables:
CREATE TABLE CATEGORY_PRIMARY(PRIMARY STRING);
CREATE TABLE CATEGORY_SECONDARY(SECONDARY STRING);
CREATE TABLE CATEGORY_TERTIARY(TERTIARY STRING);
Some databases (if not many databases) do not allow use of keywords, such as PRIMARY, or even
SECONDARY. PRIMARY could be reserved to represent a primary key and SECONDARY could be reserved
to represent secondary (alternate) indexing. If you get an error, simply use another name.
Listing
listing#
description
image
start_date
listing_days
currency
starting_price
reserve_price
buy_now_price
number_of_bids
winning_price
Seller
seller
popularity_rating

join_date
address
return_policy
international
payment_methods
Seller_History
buyer
comment_date
listing#
comments
Buyer_History
seller
comment_date
listing#
comments
Buyer
buyer
popularity_rating
join_date
address
Bid
bidder
seller
bid_price
bid_date
Category_Primary
primary
Category_Tertiary
tertiary
Category_Secondary

secondary
16_574906 ch10.qxd 11/4/05 10:46 AM Page 263
264
Chapter 10
The SELLER and BUYER tables are also static. According to the ERD shown in Figure 10-2 they are not
dependencies. So you can create the
SELLER and BUYER tables next:
CREATE TABLE SELLER
(
SELLER STRING,
POPULARITY_RATING INTEGER,
JOIN_DATE DATE,
ADDRESS STRING,
RETURN_POLICY STRING,
INTERNATIONAL STRING,
PAYMENT_METHODS STRING
);
CREATE TABLE BUYER
(
BUYER STRING,
POPULARITY_RATING INTEGER,
JOIN_DATE DATE,
ADDRESS STRING
);
In these table creation script sections, I have begun to use de-formalized datatypes, such as STRING
(representing text strings of any length), plus DATE for dates, and INTEGER for whole numbers.
Next, you can create the two history tables because the
SELLER and BUYER tables are now available.
Note how the
SELLER_HISTORY table does not have a SELLER field, because this is implied by the direct

parent relationship to the
SELLER table. The same applies to the BUYER_HISTORY table, containing the
SELLER field only.
CREATE TABLE SELLER_HISTORY
(
BUYER STRING,
COMMENT_DATE DATE,
LISTING# STRING,
COMMENTS STRING
);
CREATE TABLE BUYER_HISTORY
(
SELLER STRING,
COMMENT_DATE DATE,
LISTING# STRING,
COMMENTS STRING
);
Next, create the LISTING table:
16_574906 ch10.qxd 11/4/05 10:46 AM Page 264
265
Creating and Refining Tables During the Design Phase
CREATE TABLE LISTING
(
LISTING# STRING,
DESCRIPTION STRING,
IMAGE BINARY,
START_DATE DATE,
LISTING_DAYS INTEGER,
CURRENCY STRING,
STARTING_PRICE MONEY,

RESERVE_PRICE MONEY,
BUY_NOW_PRICE MONEY,
NUMBER_OF_BIDS INTEGER,
WINNING_PRICE MONEY
);
This table introduces a new generic datatype called BINARY, used to store an image. That image could
be a JPG, BMP, or any other type of graphic file format. Binary object datatypes allow storage of binary
formatted data inside relational databases. The
BINARY datatype is not really important to this book;
however, storing images into text strings is awkward.
Lastly, create the
BID table (the BID table is dependent on the LISTING table):
CREATE TABLE BID
(
BIDDER STRING,
SELLER STRING,
BID_PRICE MONEY,
BID_DATE DATE
);
These two tables introduce a new generic datatype called MONEY. You can assume that the MONEY
datatype, for most currencies, will have two fixed decimal places. Therefore, $100 will be represented as
100.00. There is a
CURRENCY field in the LISTING table. Online auction companies can operate
internationally, implying international sales and international bids from buyers in other countries.
Different currencies may be involved. A few unusual currencies (such as some of the Arabian Peninsula
currencies) actually use three rather than two decimal places. Thus, 100 “whatever’s” would be stored
as 100.000.
The Data Warehouse Database Model
The previous section created very basic tables for the online auction house OLTP database model. Now
do exactly the same thing for the data warehouse database model of the online auction house. Figure

10-3 shows a simple analytical diagram displaying the various operational sections for the online
auction house data warehouse database model. All the fact information is shoved into a single table.
Later on in this chapter, this information will be separated into separate fact tables (or separate star
schemas).
16_574906 ch10.qxd 11/4/05 10:46 AM Page 265
Figure 10-3: Analytical data warehouse online auction house database model.
Multiple star schemas within a single data warehouse are sometimes known as individual data marts.
Figure 10-4 shows the application of business rules to the simple analytical diagram shown in Figure
10-3 for the data warehouse database model. Once again, you must take table dependencies into
account.
It is significant to observe how the three category tables, shown in Figure 10-2 (the OLTP database
model), have been merged into a single hierarchical category table (
CATEGORY_HIERARCHY) in the
data warehouse model shown in Figure 10-4. This is a form of denormalization, used especially in data
warehouse databases to simplify and compress dimensional information use.
Category
Hierarchy
Time
Seller
Location
Buyer
Listing
-Bids
-History
Listing
-Bids
-History
266
Chapter 10
16_574906 ch10.qxd 11/4/05 10:46 AM Page 266

267
Creating and Refining Tables During the Design Phase
Figure 10-4: Business rules data warehouse online auction house database model.
Now you create the tables for the data warehouse model shown in Figure 10-4. In a well-designed data
warehouse star schema, there is only one layer of dependence between a single layer of unrelated
dimensions and a single fact table. Create the dimensions:
CREATE TABLE LOCATION
(
REGION STRING,
COUNTRY STRING,
STATE STRING,
CITY STRING
);
CREATE TABLE TIME
Seller
seller
popularity_rating
join_date
address
return_policy
international
payment_methods
Category_Hierarchy
parent
category
Buyer
buyer
popularity_rating
join_date
address

Location
region
country
state
city
Time
month
quarter
year
Listing_Bids_History
listing#
listing_description
listing_image
listing_start_date
listing_days
listing_currency
listing_starting_price
listing_reserve_price
listing_buy_now_price
listing_number_of_bids
listing_winning_price
listing_winner_buyer
bidder
bidder_price
bidder_date
history_buyer
histor
y_buyer_comment_date
history_buyer_comments
history_seller

history_seller_comment_date
history_seller_comments
16_574906 ch10.qxd 11/4/05 10:46 AM Page 267
268
Chapter 10
(
MONTH INTEGER,
QUARTER INTEGER,
YEAR INTEGER
);
CREATE TABLE CATEGORY_HIERARCHY
(
PARENT STRING,
CATEGORY STRING
);
CREATE TABLE BUYER
(
BUYER STRING,
POPULARITY_RATING INTEGER,
JOIN_DATE DATE,
ADDRESS BIGSTRING
);
CREATE TABLE SELLER
(
SELLER STRING,
POPULARITY_RATING INTEGER,
JOIN_DATE DATE,
ADDRESS BIGSTRING,
RETURN_POLICY BIGSTRING,
INTERNATIONAL STRING,

PAYMENT_METHODS BIGSTRING
);
In this table script, yet another datatype called BIGSTRING has been introduced. A BIGSTRING
datatype is used to represent fields that may become multiple fields, or even a subsidiary table at a later
stage, through normalization. For example,
ADDRESS represents an address. An address can consist of
many fields, such as
STREET, ZIPCODE, CITY, STATE, and so on. The field PAYMENT_METHODS is
named with plurality to indicate multiple possible acceptable methods of payment. Different sellers are
likely to accept a number of a group of acceptable payment methods. Thus, the
BIGSTRING datatype
applies to the
PAYMENT_METHODS field. For example, one seller may be willing to accept personal
checks and cash. Another seller might only accept cash and credit cards, but not personal checks.
Now, let’s create the fact table:
CREATE TABLE LISTING_BIDS_HISTORY
(
LISTING# STRING,
LISTING_DESCRIPTION STRING,
LISTING_IMAGE BINARY,
LISTING_START_DATE DATE,
LISTING_DAYS INTEGER,
LISTING_CURRENCY STRING,
LISTING_STARTING_PRICE MONEY,
LISTING_RESERVE_PRICE MONEY,
LISTING_BUY_NOW_PRICE MONEY,
LISTING_NUMBER_OF_BIDS INTEGER,
16_574906 ch10.qxd 11/4/05 10:46 AM Page 268
269
Creating and Refining Tables During the Design Phase

LISTING_WINNING_PRICE MONEY,
LISTING_WINNER_BUYER STRING,
BIDDER MONEY,
BIDDER_PRICE MONEY,
BIDDER_DATE DATE,
HISTORY_BUYER STRING,
HISTORY_BUYER_COMMENT_DATE DATE,
HISTORY_BUYER_COMMENTS BIGSTRING,
HISTORY_SELLER STRING,
HISTORY_SELLER_COMMENT_DATE DATE,
HISTORY_SELLER_COMMENTS BIGSTRING
);
This fact table shows the source of all fields as being listing, bidder, buyer history, and seller history.
Tables have been created for the OLTP and data warehouse database models for the online auction
house. The next step is to establish and enforce referential integrity.
Case Study: Enforcing Table Relationships
Referential integrity maintains and enforces the data integrity of relationships between tables. In other
words, referential integrity ensures that where a child table record exists, the parent table record exists
as well.
Referential Integrity
In Figure 10-2, you cannot delete a SELLER record without deleting all the seller’s listings first. If the
seller is deleted, their listings become orphaned records. An orphaned record is term applied to a record
not findable within the logical table structure of a database model. Essentially, the seller’s name and
address details are stored in the
SELLER table and not in the LISTING table. If the seller record was
deleted, any of their listings are useless because you don’t know who is selling it. Similarly, if buyer
information with winning bids were deleted, the seller wouldn’t know who to mail it to. Referential
integrity, through the use of primary and foreign keys, acts to ensure that the following activities are
prohibited:


INSERT check —Achild record cannot be added to a child table unless the parent record exists in
the parent table.

UPDATE check —Parent and child table records cannot have their linking key field values
changed, unless both are changed simultaneously (“simultaneously” implies within the same
transaction).

DELETE check —Aparent table record cannot be deleted when a child table record exists, unless
all related child table records are deleted first.

DELETE CASCADE check —Aparent table record can be deleted if all child records are deleted
first. This is known as a cascading delete. Cascade deletion is rarely implemented because it
can result in serious data loss (usually through operator or programmer error). Additionally,
cascade deletions can cause serious locking conflicts because large numbers of child records
could have to be deleted, when deleting a parent record. This type of locking problem can occur
when a parent record has many records in child tables, through a multiple level table structure.
16_574906 ch10.qxd 11/4/05 10:46 AM Page 269
Primary and Foreign Keys
Primary and foreign keys are used to establish referential integrity relationships between parent and
child tables. The parent table contains the primary key, and the child table the foreign key. The term
primary implies most significant field for a table, and thus uniquely identifying. Each seller record would
have a unique seller name (the name of the person or company selling the item at auction). Two sellers
can’t have the same name, leading to the obvious silly result. How would you dissimilate between two
different sellers? Impossible. The term foreign implies a key that is foreign to a child table, whose
uniquely identifying value lies in another table (the parent table containing the primary key).
Now demonstrate implementation of primary and foreign keys by re-creating the OLTP database model
tables, as shown in Figure 10-2. The ERD in Figure 10-2 has been changed to the ERD shown in Figure 10-5.
Figure 10-5: Defining primary and foreign keys for the OLTP online auction house database model.
The table contents in Figure 10-5 may appear somewhat confusing. All the dotted lines have changed to
solid lines (non-identifying to identifying relationships) and the primary keys are largely composite key

fields. Yuk! Use surrogate keys instead. Before explaining why relationships have been changed from
non-identifying to identifying, first implement surrogate keys.
Seller_History
Seller
seller
popularity_rating
join_date
address
return_policy
international
payment_methods
seller (FK)
buyer
comment_date
listing#
comments
Buyer_History
buyer (FK)
seller
comment_date
listing#
comments
Buyer
buyer
popularity_rating
join_date
address
Category_Primary
primary
Category_Secondary

primary (FK)
secondary
Category_Tertiary
primary (FK)
secondary (FK)
tertiary
Listing
listing#
primary (FK)
secondary (FK)
tertiary (FK)
seller (FK)
buyer (FK)
description
image
start_date
listing_date
currency
starting_price
reserve_price
buy_now_price
number_of_bids
winning_price
Bid
primary (FK)
secondary (FK)
tertiary (FK)
seller (FK)
listing# (FK)
buyer (FK)

bidder
bid_price
bid_date
270
Chapter 10
16_574906 ch10.qxd 11/4/05 10:46 AM Page 270
Using Surrogate Keys
Figure 10-6 shows the same model as shown in Figure 10-5, except with surrogate keys added. The only
table without a surrogate primary key field is the
LISTING table, still using the LISTING# field as its pri-
mary key.
LISTING# is likely to be some type of auto counter anyway, and so a surrogate key is not nec-
essary in this case. Figure 10-6 still has composite primary key fields.
Figure 10-6: Using surrogate keys as primary keys with identifying relationships.
One important point about Figure 10-6 is that the three category tables, plus the buyer and seller table,
have additional identifier fields as primary keys (for example,
BUYER.BUYER_ID). However, these tables
still have their original primary key fields, now no longer as the primary key (for example,
BUYER.BUYER).
The identifiers have become surrogates (replacements) for the original string datatype names.
Identifying relationships in Figure 10-6 imply that a child table record is specifically identified by a parent
table record, through the connection between parent table primary key, and child table foreign key.
Seller_History
Seller
seller_id
seller
popularity_rating
join_date
address
return_policy

international
payment_methods
seller_id (FK)
buyer
comment_date
listing#
comments
Buyer_History
buyer_id (FK)
seller
comment_date
listing#
comments
Buyer
buyer_id
buyer
popularity_rating
join_date
address
Category_Primary
primary_id
primary
secondary
Category_Secondary
primary_id (FK)
secondary_id
Category_Tertiary
prim
ary_id (FK)
secondary_id (FK)

tertiary_id
tertiary
Listing
primary_id (FK)
secondary_id (FK)
tertiary_id (FK)
seller_id (FK)
buyer_id (FK)
listing#
description
image
start_date
listing_date
currency
starting_price
reserve_price
buy_now_price
number_of_bids
winning_price
Bid
primary_id (FK)
secondary_id (FK)
tertiary_id (FK)
seller_id (FK)
buyer_id (FK)
listing# (FK)
bidder
bid_price
bid_date
271

Creating and Refining Tables During the Design Phase
16_574906 ch10.qxd 11/4/05 10:46 AM Page 271
272
Chapter 10
Identifying versus Non-Identifying Relationships
One more improvement that can be made is to change all the identifying relationships back to
non-identifying relationships, where appropriate, as shown in Figure 10-7. Using non-identifying
relationships (represented by dotted lines in Figure 10-7), child tables are no longer uniquely identified
by parent table primary key values (by the parent table primary key). Notice how much simpler the ERD
in Figure 10-7 has become. Also, notice how the
BID, SELLER_HISTORY, and BUYER_HISTORY tables are
still related to their parent tables using identifying relationships (solid, non-dotted lines in Figure 10-7).
The two history tables do not have primary keys of their own. This means they can contain duplicated
records for each. In other words, a buyer can contain many history entries, and, therefore, each buyer can
have many
BUYER_HISTORY records. The BID table does not have its own exclusive primary key field
because it is a table resolving the many-to-many relationship between the
LISTING and BID tables.
Figure 10-7: Using surrogate keys as primary keys with some non-identifying relationships.
Parent Records without Children
A parent record without children is where a parent table can have records, such that child related records
are not required, within child tables. For example, there is no reason why a secondary category should
always be created for a primary category. In other words, a primary category can be created where no
secondary or tertiary categories exist for that primary category. For the online auction house, it depends
on how categories are organized. It may be that some types of secondary and tertiary categories sell
extremely well. This could perhaps make them primary categories as well. This particular complication
is ignored.
Seller_History
Seller
seller_id

seller
popularity_rating
join_date
address
return_policy
international
payment_methods
seller_id (FK)
buyer
comment_date
listing#
comments
Buyer_History
buyer_id (FK)
seller
comment_date
listing#
comments
Buyer
buyer_id
buyer
popularity_rating
join_date
address
Category_Primary
primary_id
primary
secondary
Category_Secondary
secondary_id

primary_id (FK)
tertiary
Category_Tertiary
tertiary_id
secondary_id (FK)
Listing
buyer_id (FK)
seller_id (FK)
secondary_id (FK)
tertiary_id (FK)
description
image
start_date
listing_days
currency
starting_price
reserve_price
buy_now_price
number
_of_bids
winning_price
Bid
buyer_id (FK)
listing# (FK)
bidder
bid_price
bid_date
listing#
16_574906 ch10.qxd 11/4/05 10:46 AM Page 272

×