Tải bản đầy đủ (.pdf) (5 trang)

SQL PROGRAMMING STYLE- P26 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (84.7 KB, 5 trang )


92 CHAPTER 5: DATA ENCODING SCHEMES

5.3 General Guidelines for Designing Encoding Schemes

These are general guidelines for designing encoding schemes in a
database, not firm, hard rules. You will find exceptions to all of them.

5.3.1 Existing Encoding Standards

The use of existing standard encoding schemes is always recommended.
If everyone uses the same codes, data will be easy to transfer and collect
uniformly. Also, someone who sat down and did nothing else but work
on this scheme probably did a better job than you could while trying to
get a database up and running.
As a rule of thumb, if you don’t know the industry in which you are
working, ask a subject-area expert. Although that sounds obvious, I have
worked on a media library database project where the programmers
actively avoided talking to the professional librarians who were on the
other side of the project. As a result, recordings were keyed on GUIDs
and there were no Schwann catalog numbers in the system. If you cannot
find an expert, then Google for standards. First, check to see if ISO has a
standard, then check the U.S. government, and then check industry
groups and organizations.

5.3.2 Allow for Expansion

Allow for expansion of the codes. The ALTER statement can create more
storage when a single-character code becomes a two-character code, but
it will not change the spacing on the printed reports and screens. Start
with at least one more decimal place or character position than you think


you will need. Visual psychology makes “01” look like an encoding,
whereas “1” looks like a quantity.

5.3.3 Use Explicit Missing Values to Avoid NULLs

Rationale:

Avoid using NULLs as much as possible by putting special values in
the encoding scheme instead. SQL handles NULLs differently than
values, and NULLs don’t tell you what kind of missing value you are
dealing with.
All-zeros are often used for missing values and all-nines for
miscellaneous values. For example, the ISO gender codes are 0 =
Unknown, 1 = Male, 2 = Female, and 9 = Not Applicable. “Not applicable”
means a lawful person, such as a corporation, which has no gender.

5.3 General Guidelines for Designing Encoding Schemes 93

Versions of FORTRAN before the 1977 standard read blank
(unpunched) columns in punchcards as zeros, so if you did not know a
value, you skipped those columns and punched them later, when you
did know. Likewise, using encoding schemes with leading zeros was a
security trick to prevent blanks in a punchcard from being altered. The
FORTRAN 77 standard fixed its “blank versus zero” problem, but it lives
on in SQL in poorly designed systems that cannot tell a NULL from a
blank string, an empty string, or a zero.
The use of all-nines or all-Z’s for miscellaneous values will make those
values sort to the end of the screen or report. NULLs sort either always
to the front or always to the rear, but which way they sort is
implementation defined.


Exceptions:

NULLs cannot be avoided. For example, consider the column
“termination_date” in the case of a newly hired employee. The use of a
NULL makes computations easier and correct. The code simply leaves
the NULL date or uses COALESCE (some_date,
CURRENT_TIMESTAMP) as is appropriate.

5.3.4 Translate Codes for the End User

As much as possible, avoid displaying pure codes to users, but try to
provide a translation for them. Translation in the front is not required for
all codes, if they are common and well known to users. For example,
most people do not need to see the two-letter state abbreviation written
out in words. At the other extreme, however, nobody could read the
billing codes used by several long-distance telephone companies.
A part of translation is formatting the display so that it can be read by
a human being. Punctuation marks, such as dashes, commas, currency
signs, and so forth, are important. However, in a tiered architecture,
display is done in the front end, not the database. Trying to put leading
zeros or adding commas to numeric values is a common newbie error.
Suddenly, everything is a string and you lose all temporal and numeric
computation ability.
These translation tables are one kind of auxiliary table; we will discuss
other types later. They do not model an entity or relationship in the
schema but are used like a function call in a procedural language. The
general form for these tables is:

CREATE TABLE SomeCodes

(encode <datatype> NOT NULL PRIMARY KEY,
definition <datatype> NOT NULL);

94 CHAPTER 5: DATA ENCODING SCHEMES

Sometimes you might see the definition as part of the primary key or
a CHECK() constraint on the “encode” column, but because these are
read-only tables, which are maintained outside of the application, we
generally do not worry about having to check their data integrity in the
application.

5.3.4.1 One True Lookup Table

Sometimes a practice is both so common and so stupid that it gets a
name, and, much like a disease, if it is really bad, it gets an abbreviation.
I first ran into the One True Lookup Table (OTLT) design flaw in a
thread on a CompuServe forum in 1998, but I have seen it rediscovered
in newsgroups every year since.
Instead of keeping the encodings and their definition in one table
each, we put all of the encodings in one huge table. The schema for this
table was like this:

CREATE TABLE OneTrueLookupTable
(code_type INTEGER NOT NULL,
encoding VARCHAR(n) NOT NULL,
definition VARCHAR(m) NOT NULL,
PRIMARY KEY (code_type, encoding));

In practice,


m

and

n

are usually something like 255 or 50—default
values particular to their SQL product.
The rationale for having all encodings in one table is that it would let
the programmer write a single front-end program to maintain all of the
encodings. This method really stinks, and I strongly discourage it.
Without looking at the following paragraphs, sit down and make a list of
all the disadvantages of this method and see if you found anything that I
missed. Then read the following list:
1.

Normalization

. The real reason that this approach does not
work is that it is an attempt to violate first normal form. I can
see that these tables have a primary key and that all of the
columns in a SQL database have to be scalar and of one data
type, but I will still argue that it is not a first normal form table.
The fact that two domains use the same data type does not
make them the same attribute. The extra “code_type” column
changes the domain of the other columns and thus violates first
normal form because the column in not atomic. A table should

5.3 General Guidelines for Designing Encoding Schemes 95


model one set of entities or one relationship, not hundreds of
them. As Aristotle said, “To be is to be something in particular;
to be nothing in particular is to be nothing.”
2.

Total storage size

. The total storage required for the OTLT is
greater than the storage required for the one encoding, one
table approach because of the redundant encoding type
column. Imagine having the entire International Classification
of Diseases (ICD) and the Dewey Decimal system in one table.
Only the needed small single encoding tables have to be put
into main storage with single auxiliary tables, while the entire
OTLT has to be pulled in and paged in and out of main storage
to jump from one encoding to another.
3.

Data types

. All encodings are forced into one data type, which
has to be a string of the largest length that any encoding—
present and future—used in the system, but VARCHAR(n) is
not always the best way to represent data. The first thing that
happens is that someone inserts a huge string that looks right
on the screen but has trailing blanks or an odd character to the
far right side of the column. The table quickly collects garbage.
CHAR(n) data often has advantages for access and storage
in many SQL products. Numeric encodings can take advantage
of arithmetic operators for ranges, check digits, and so forth

with CHECK() clauses. Dates can be used as codes that are
translated into holidays and other events. Data types are not a
one-size-fits-all affair. If one encoding allows NULLs, then all
of them must in the OTLT.
4.

Validation

. The only way to write a CHECK() clause on the
OTLT is with a huge CASE expression of the form:

CREATE TABLE OneTrueLookupTable
(code_type CHAR(n) NOT NULL
CHECK (code_type IN (<type 1>, , <type n>)),
encoding VARCHAR(n) NOT NULL
CHECK (CASE WHEN code_type = <type 1>
AND <validation 1>
THEN 1

—assume that your SQL product can support a huge
CASE expression
WHEN code_type = <type n>

96 CHAPTER 5: DATA ENCODING SCHEMES

AND <validation n>
THEN 1
ELSE 0 END = 1),
definition VARCHAR(m) NOT NULL,
PRIMARY KEY (code_type, encoding));


This means that validation is going to take a long time,
because every change will have to be considered by all the
WHEN clauses in this oversized CASE expression until the
SQL engine finds one that tests TRUE. You also need to add a
CHECK() clause to the “code_type” column to be sure that the
user does not create an invalid encoding name.
5.

Flexibility

. The OTLT is created with one column for the
encoding, so it cannot be used for (n) valued encodings where
(

n

> 1). For example, if I want to translate (longitude, latitude)
pairs into a location name, I would have to carry an extra
column.
6.

Maintenance

. Different encodings can use the same value, so
you constantly have to watch which encoding you are working
with. For example, both the ICD and Dewey Decimal system
have three digits, a decimal point, and three digits.
7.


Security

. To avoid exposing rows in one encoding scheme to
unauthorized users, the OTLT has to have VIEWs defined on it
that restrict users to the “code_type”s they are allowed to
update. At this point, some of the rationale for the single table
is gone, because the front end must now handle VIEWs in
almost the same way it would handle multiple tables. These
VIEWs also have to have the WITH CHECK OPTION clause,
so that users do not make a valid change that is outside the
scope of their permissions.
8.

Display

. You have to CAST() every encoding for the front end.
This can be a lot of overhead and a source of errors when the
same monster string is CAST() to different data types in
different programs.

5.3.5 Keep the Codes in the Database

A part of the database should have all of the codes stored in tables. These
tables can be used to validate input, to translate codes in displays, and as
part of the system documentation.

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×