Tải bản đầy đủ (.pdf) (5 trang)

SQL PROGRAMMING STYLE- P29 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (107.54 KB, 5 trang )


4.4 Scale Conversion 77

4.3 Using Scales

Absolute and ratio scales are also called extensive scales because they
deal with quantities, as opposed to the remaining scales, which are
intensive because they measure qualities. Quantities can be added and
manipulated together, whereas qualities cannot. Table 4.1 describes the
different types of scales and their attributes.
The origin for the absolute scale is numeric zero, and the natural
functions are simple arithmetic. However, things are not always this
simple. Temperature has an origin point at absolute zero, and its natural
functions average heat over mass. This is why you cannot defrost a
refrigerator, which is at 0 degrees Celsius, by putting a chicken whose
body temperature is 35 degrees Celsius inside of it. The chicken does not
have enough mass relative to heat. However, a bar of white-hot steel will
do a nice job.

4.4 Scale Conversion

Scales can be put in a partial order based on the permissible
transformations:

Table 4.1

Scale properties

Type of Scale
Natural
Ordering


Natural
Origin Functions Example

Nominal No No No City names
(“Atlanta”)
Categorical No No No Species (dog, cat)
Absolute Yes Yes Yes Eggs (dozen)
Ordinal Yes No No Preferences (agree 1
to 5 scale)
Rank Yes Yes No Contests (win, place,
show)
Interval Yes No Yes Time (hours, min-
utes)
Ratio Yes Yes Yes Length (meters),
Mass (grams)

78 CHAPTER 4: SCALES AND MEASUREMENTS



An attribute might not fit exactly into any of these scales. For
example, you mix nominal and ordinal information in a single scale,
such as in questionnaires that have several nonresponse categories. It is
common to have scales that mix ordinal and an interval scale by
assuming the attribute is really a smooth monotone function. Subjective
rating scales (“strongly agree,” “agree,” . . . “strongly disagree”) have no
equally spaced intervals between the ratings, but there are statistical
techniques to ensure that the difference between two intervals is within
certain limits. A binary variable is at least an interval scale, and it might
be a ratio or absolute scale, if it means that the attribute exists or does

not exist.
The important principle of measurement theory is that you can
convert from one scale to another only if they are of the same type and
measure the same attribute. Absolute scales do not convert, which is why
they are called absolute scales. Five apples are five apples, no matter how
many times you count them or how you arrange them on the table.
Nominal scales are converted to other nominal scales by a mapping
between the scales.
That means you look things up in a table. For example, I can convert
my English city names to Polish city names with a dictionary. The
problem comes when there is not a one-to-one mapping between the
two nominal scales. For example, English uses the word “cousin” to
identify the offspring of your parents’ siblings, and tradition treats them
all pretty much alike.
Chinese language and culture have separate words for the same
relations based on the genders of your parents’ siblings and the age
relationships among them (e.g., the oldest son of your father’s oldest
brother is a particular type of cousin and you have different social
obligations to him). Something is lost in translation.

4.5 Derived Units 79

Ordinal scales are converted to ordinal scales by a monotone
function. That means you preserve the ordering when you convert.
Looking at the MSH for geologists, I can pick another set of minerals,
plastics, or metals to scratch, but rock samples that were definitely softer
than others are still softer. Again, there are problems when there is not a
one-to-one mapping between the two scales. My new scale may be able
to tell the difference between rocks, whereas the MSH could not.
Rank scales are converted to rank scales by a monotone function that

preserves the ordering, like ordinal scales. Again, there are problems
when there is not a one-to-one mapping between the two scales. For
example, different military branches have slightly different ranks that
don’t quite correspond to each other.
In both the nominal and the ordinal scales, the problem was that
things that looked equal on one scale were different on another. This has
to do with range and granularity, which was discussed in section 4.1.1 of
this chapter.
Interval scales are converted to interval scales by a linear function;
that is, a function of the form

y

=

a



×

x

+

b

. This preserves the ordering
but shifts the origin point when you convert. For example, I can convert
temperature from degrees Celsius to degrees Fahrenheit using the

formula F = (9.0

÷

5.0

×

C) + 32.
Ratio scales are converted to ratio scales by a constant multiplier,
because both scales have the same ordering and origin point. For
example, I can convert from pounds to kilograms using the formula p =
0.4536

×

k. This is why people like to use ratio scales.

4.5 Derived Units

Many of the scales that we use are not primary units but rather derived
units. These measures are constructed from primary units, such as miles
per hour (time and distance) or square miles (distance and distance).
You can use only ratio and interval scales to construct derived units.
If you use an absolute scale with a ratio or interval scale, you are
dealing with statistics, not measurements. For example, using weight
(ratio scale) and the number of people in New York (absolute scale), we
can compute the average weight of a New Yorker, which is a statistic, not
a unit of measurement.
The SI measurements use a basic set of seven units (i.e., meter for

length, kilogram for mass, second for time, ampere for electrical current,
degree Kelvin for temperature, mole for molecules, and candela for light)
and construct derived units. ISO standard 2955 (“Information

80 CHAPTER 4: SCALES AND MEASUREMENTS

processing—Representation of SI and other units for use in systems with
limited character sets”) has a notation for expressing SI units in ASCII
character strings. (See ISO-2955, “Representation of SI and other units
for use in systems with limited character sets”) The notation uses
parentheses, spaces, multiplication (shown by a period), division
(shown by a solidus, or slash), and exponents (shown by numerals
immediately after the unit abbreviation). There are also names for most
of the standard derived units. For example, “100 kg.m

÷

s

2

” converts to
10 Newtons (the unit of force), written as “10 N” instead.

4.6 Punctuation and Standard Units

A database stores measurements as numeric data represented in a binary
format, but when the data is input or output, a human being wants
readable characters and punctuation. Punctuation identifies the units
being used and can be used for prefix, postfix, or infix symbols. It can

also be implicit or explicit.
If I write $25.15, you know that the unit of measure is the dollar
because of the explicit prefix dollar sign. If I write 160 lbs., you know
that the unit of measure is pounds because of the explicit postfix
abbreviation for the unit. If I write 1989 MAR 12, you know that this is a
date because of the implicit infix separation among month, day, and
year, achieved by changing from numerals to letters, and the optional
spaces. The ISO and SQL defaults represent the same date, using explicit
infix punctuation, with 1989-03-12 instead. Likewise, a column header
on a report that gives the units used is explicit punctuation.
Databases do not generally store punctuation. The sole exception
might be the proprietary MONEY or CURRENCY data type found in
many SQL implementations as a vendor extension. Punctuation wastes
storage space, and the units can be represented in some internal format
that can be used in calculations. Punctuation is only for display.
It is possible to put the units in a column next to a numeric column
that holds their quantities, but this is awkward and wastes storage space.
If everything is expressed in the same unit, the units column is
redundant. If things are expressed in different units, you have to convert
them to a common unit to do any calculations. Why not store them in a
common unit in the first place? The DBA has to be sure that all data in a
column of a table is expressed in the same units before it is stored. There
are some horror stories about multinational companies sending the same
input programs used in the United States to their European offices,

4.7 General Guidelines for Using Scales in a Database 81

where SI and English measurements were mixed into the same database
without conversion.
Ideally, the DBA should be sure that data is kept in the same units in

all the tables in the database. If different units are needed, they can be
provided in a VIEW that hides the conversions (thus the office in the
United States sees English measurements and the European offices see SI
units and date formats; neither is aware of the conversions being done
for it).

4.7 General Guidelines for Using Scales in a Database

The following are general guidelines for using measurements and scales
in a database and not firm, hard rules. You will find exceptions to all of
them.
1.

In general, the more unrestricted the permissible transformations on
a scale are, the more restricted the statistics

. Almost all statistics
are applicable to measurements made on ratio scales, but only
a limited group of statistics may be applied to measurements
made on nominal scales.
2.

Use CHECK() clauses on table declarations to make sure that only
the allowed values appear in the database

. If you have the
CREATE DOMAIN feature of SQL-92, use it to build your
scales. Nominal scales would have a list of possible values;
other scales would have range checking. Likewise, use the
DEFAULT clauses to be sure that each scale starts with its

origin value, a NULL, or a default value that makes sense.
3.

Declare at least one more decimal place than you think you will need
for your smallest units

. In most SQL implementations, rounding
and truncation will improve with more decimal places.
The downside of SQL is that precision and the rules for
truncation and rounding are implementation dependent, so a
query with calculations might not give the same results on
another product. However, SQL is more merciful than older
file systems, because the DBA can ALTER a numeric column so
it will have more precision and a greater range without
destroying existing data or queries. Host programs may have to
be changed to display the extra characters in the results,
however.

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×