Tải bản đầy đủ (.pdf) (23 trang)

Architects examination of form and function the dimensional model

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (242.23 KB, 23 trang )






White Paper
An Architect‘s Evaluation of Form and
Function– the Dimensional Data Model

Donavon Gooldy, Senior Principal

Tuesday, May 27, 2014



Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 2
Table of contents
1 Introduction 3
2 Model Characteristics 4
3 Dimensional Model Architectural Origins 5
3.1 The Entity Relationship Model Form 5
3.2 An Organized Performance Architecture Response 6
4 The Dimension Model Form 8
5 The Dimensional Model Function 10
6 The Limits of Single Form Design 11
6.1 Function Limiting Characteristics the Dimensional Form 11
6.1.1 The Dimensional Form Does Not Extend Well 11
6.1.2 The Dimensional Form Is Not Flexible 12
6.1.3 The Form Does Not Describe the Business 13
7 Applying the Dimensional Form without Requirements 15
7.1 Client A 15


7.2 Client B 15
7.3 Common Characteristics 16
7.4 Bottom-Up Warehouse Design 19
8 System Architecture Form to Fulfill Multiple Functions 21
8.1 Combining Model Forms 21
8.2 Integrating Model Form with Technology Form 22
9 Conclusion 23



Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 3
1 Introduction

"It is the pervading law of all things organic and inorganic, of all things physical and
metaphysical, of all things human and all things superhuman, of all true manifestations of the
head, of the heart, of the soul, that the life is recognizable in its expression, that form ever
follows function. This is the law."


Louis Sullivan
“Form follows function - that has been misunderstood. Form and function should be one,
joined in a spiritual union.”
Frank Lloyd Wright
To be an architect of information solutions is to understand the concept of form following
function intuitively, as a matter of nature, because design (creation of form) is about enabling
informational function. Taking the title ―architect‖ affirms one‘s conscious method design
based decision process in terms of aligning form with functional needs.
As one examines form‘s relationship to function within the dimensional model, the evaluation
of the model form must not be based solely on Sullivan‘s statement, but on Wright‘s; form not
only follows function, but function follows form.

The concept of form and function unity highlights that form is not only based on function, but
also limits it, many times strictly. Form and function are bound together in a cause and effect
relationship; function is the cause of the form, while form both facilitates function and limits it.
When considering the data warehouse function, one considers the overall goal to delivery
information, allowing the business to measure its activity and understand the impacts of its
actions in the market place. This high-level statement of function though, is far too general for
the evaluation of model form. As will be demonstrated, a more detailed understanding of
system functionality is needed before determining model form application.
The function-limiting impact of form is often overlooked in design, particularly data model
design. By implementing a specific design form, are the broader limits on function considered?
What system design steps are needed to mitigate those limitations?
Too often data practitioners apply the form they know best, the latest form they‘ve come to
appreciate or a form that is deemed a ―best practice‖ in their circles.
True architects are not practitioners of ―best practices‖. They practice the application of forms
to function based on principles derived from cause and effect analysis.
The architect studies the relationship of form and function, of cause and effect and then
applies forms specific to the required functions. The architect deals with the complexity of the
client‘s multi-functional needs and devises multi-component solution forms to deliver
functionality incapable of being delivered in single form solutions.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 4
2 Model Characteristics
One generally thinks about a model form in terms of certain characteristics. Through the
evaluation these characteristics and examination of model form, it becomes evident how they
align with, support and limit function in relationship to data and information delivery.
o The model‘s ability to extend
 to extend a data model for new content/capability without disruption and
redesign of processes

o The model‘s ability to be flexible

 to support multiple purposes or functions

o The model‘s ability to describe the business and subjects within the corporate
structure
 to document the business using data

o The model‘s ability to support any valid business question
 to answer business questions without specific design structuring
 not a matter of ease or performance but a matter of ability

o The model‘s ability to efficiently and quickly answer business questions (report query
performance)
 to provide acceptable query performance for corporate decision support
and analysis

o The model‘s ability to demonstrate business performance
 to measure business performance

The critical examination of limiting aspects to the dimensional model gives the architect the
foundational principles necessary to understand the application of dimensional form in
Information Architecture solutions.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 5
3 Dimensional Model Architectural Origins

The dimensional model form is designed to greatly simplify database optimization for queries
that would otherwise be applied against an Entity Relationship (ER) model. Because the
dimensional model is a design response used to overcome ER form limits, there must first be
examination of the ER form and its characteristics as a comparison basis.
3.1 The Entity Relationship Model Form

1. To free the collection of relations from undesirable insertion, update and deletion
dependencies;
2. To reduce the need for restructuring the collection of relations, as new types of data are
introduced, and thus increase the life span of application programs;
3. To make the relational model more informative to users;
4. To make the collection of relations neutral to the query statistics, where these statistics are
liable to change as time goes by.
— E.F. Codd, "Further Normalization of the Data Base Relational Model"
Each of Codd‘s goals not only provides insight to ER model function, but are also instructive as
to the reasons for the dimensional model form.
The Data Architect produces an ER model that describes the business through ―Entities‖
representing each of the objects, actors, organizational fictions, contracts, business activities
and others in the business landscape. If it can be named as a subject, it must be represented
as an entity within the model. Each entity is given an identifier known as the primary key.
Additional attributes are added to describe only the primary key.
Foreign key relationships document each business relationship existing between entities. These
relationships are instilled in the model logically rather than by direct data association. This
distinction is fundamental to the examination of the ER and Dimensional Model form
characteristics and its ability to deliver specific functionality.
This examination won‘t delve into the application of normalization rules, except to state that
many modelers deal with normalization intuitively as a matter of entity definition and
evaluation of attribute when creating the ER model. Normalization rules represent a method of
thinking regarding the evaluation of data content in model development. Normalization
ensures all entities are defined purely and that all business relationships within the model are
defined logically rather than by physical association.






Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 6
As one examines Codd‘s goals it is obvious that they align with some of the model
characteristics previously discussed. Those characteristics are:
 extensibility
 flexibility
 ability to describe the subject
 ability to support any valid business question
Cobb‘s fourth goal may appear somewhat cryptic, but is central to an architect‘s
understanding of both model forms and support of Codd‘s preceding goals.
In a fully normalized model there is no statistical data relationship bias that emphasizes one
relationship or eliminates another, because relationships are implemented logically. Data that
is not normalized, associates data physically on the same row, creating a bias. When data is
organized this way, certain questions can be answered, while others cannot.
Applying rules of normalization ensures no bias exists for one type of business question or
another.
One can ask any valid business question of a normalized model. Based on the model‘s
logically implemented relationships, (foreign key) one will always get the answer. There is no
need to know future questions. It will always work if each entity is represented within the model
that is germane to the question and each relationship between the entities documented
logically. As long as one is willing to write the necessary queries and wait, the model will
answer.
Therefore, the normalized entity relationship model form is designed for flexibility, to answer
any business question. It eliminates relationship bias by describing each entity purely and
documenting all business relationship logically, providing data relationship neutrality.
Extensibility is another outcome of eliminating relational bias, as will be seen later.
The normalized form that gives us this functionality also limits function. To answer more than
simple business questions, complex queries need to be written with many joins that follow
relational paths, and identifying specific content within data sets using correlated sub-queries.
The query may need to do mixed aggregation to common group by levels as well as use outer
joins complicating query optimization. Temp tables and multiple query steps may need to be

used in some cases. In data warehousing, all of this complex query optimization results in issues
of access and join serialization in relationship to lots of I/O from large data reading, buffering
and sorting.
No one wants to wait hours for BI report results. In the early days of data warehousing, on at
least one RDMBS, the longer the query ran, the more likely it would end in error due to the
database‘s concurrency architecture.
3.2 An Organized Performance Architecture Response

At the time of Ralph Kimball‘s first edition release of The Data Warehouse Toolkit, most data
warehouse servers were hosted on SMP database servers. These types of servers do not scale

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 7
parallel processing linearly as MPP clusters do, and often led to a variety of very limiting data
forms that were intended to improve query performance.
The introduction of the dimensional model provided an organized, systematic design basis for
a performance architecture form leading to predictable query optimization.
It also addressed another issue at the time; it‘s much simpler to write queries against. Hand
coding queries against an ER model for any sort of complicated reporting requires a good
deal of skill, experience and time. While users still need to write manual queries, Business
Intelligence software has diminished that by supporting metadata driven abstraction that
interprets the physical data model for the user.
When dimensional models are designed properly for reporting they require only selection of
attributes and measure required, direct join to dimensions needed, application of WHERE or
JOIN filters, appropriate aggregate functions and GROUP BY clauses (and perhaps a HAVING
clause.)

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 8
4 The Dimension Model Form

Dimensional modeling achieves its performance advantage by designing denormalizations

into data organizations specific to answering a limited range of business questions. These
denormalizations take the form of placing data in physical relationships and eliminating the
logical business-based relationships that follow an entity-to-entity-to-entity form, in favor of
more direct report grouping reference relationship to business metrics.
In other words, the dimensional model form creates explicit relationship biases to simplify
queries, reduce I/O and eliminate query optimization complexity, which delivers answers to
business questions efficiently and quickly.
The pattern of denormalization follows the form of a central table called a fact table
containing one or more business measurements called facts. The facts may be sourced from a
variety of transactional and reference sources, all of which may be used in combination to
answer certain classes of business questions.
The fact table row always has the context of a time period, either date or time together. The
time period may be either date or higher level time period, such as week, month, quarter or
year. Facts maybe transactional, a point-in-time snapshot state of metrics or period-based
aggregate.
The fact table also has foreign key relationship attributes relating the fact rows to reference
tables called dimensions. Dimensions may represent a single entity identity of data, but
typically contain attributes from, or derived from, multiple entities describing a subject.
Typically there is at least one dimension associated with the fact table that has at its basis in on
an entity with a natural business-based relationship to the business activity represented in facts
of the fact table. There are usually other dimension relationships that are one or two entities
removed from the business activity documented in the fact table. There may also be
additional dimensions related to the facts that must be derived by processing other business
activity.
Keep in mind that if a source does not actually document all of the data relationships, for
example the customer‘s origination sales channels, then these relationships must be derived
from processing business activity records, such as sales or service orders.
One must also build into the process and structure of the star schema all of the complex
processing that would be needed in against the entity relationship model to bring data up to
common simplified form, fit to answering functionally similar business questions.

The philosophy of the dimensional model is to do all of processing once to form a common
basis for a class of business questions or analysis, storing the results of that process in the star
schema so that BI queries avoid that complex process at report runtime. It is a ‗process once,
use it many times‘ approach.
The end result should be a star schema capable of delivering measurements based on simple
SELECT, JOIN, WHERE and GROUP BY statements.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 9



Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 10
5 The Dimensional Model Function

One concludes that the dimensional form is a performance architecture intended to improve
report query performance. However so far, a full understanding of why dimensional models
perform so well and what limits them has yet to be exposed.
The star schema design is created to measure business. It is created with a business function
orientation, as opposed to the subject area orientation of the ER model.
The form is one of centralization of a series of measures (facts) surrounded by attributes gives
business context to those measurements.
While some consumers may refer to the content as subjects, the real orientation is focused on
business reporting and analysis. It may be Sales Analysis or Risk Analysis, but these are
organized to support specific business functions and not provide general data as a subject.
Instead of presenting data as it exists in an ER model, or in the source, data is organized to
make decisions.
Some of Webster‘s definitions of the word ―Information‖ are:
1. ―knowledge obtained from investigation, study, or instruction‖
2. ―INTELLIGENCE, NEWS‖
3. ―FACTS, DATA‖

Architects do not design dimensional models that deliver measurements (facts) randomly as
data. The purpose is to deliver organized information to the business clients that supports the
client‘s business decision making function.
To be ―information,‖ measures have to be organized and presented with functional context;
without that, it is simply data. Providing data is what an ER model does. It delivers it without
bias. It‘s up the consumer to discern how to make it provide information. In a dimensional
model, much of that work of organizing data as information is performed in advance of the
report execution.
Therefore, a primary function for which the dimensional form is employed is that of a
performance architecture built upon the direct structuring of information for specific business
function.
It is important to make this distinction because there are other means of implementing
performance architectures for delivering information that do not rely on data denormalizations
in a database.
And, this is not to say that dimensional model content is the final state of the information
organization. In systems that employ the dimensional form, it represents the foundational state
of information that is further organized into reporting to deliver KPIs, comparisons, trends,
graphics and other business oriented presentations of information

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 11
6 The Limits of Single Form Design

All that has been examined to this point represents the foundation for the remaining
examination.
Architects realized that there are limits to form. An automobile maker creates a variety of
forms for different functional needs. Each of those forms has recognizable limits. A Freightliner
semi-truck with a raised roof sleeper, Hendrickson AIRTEK axels, and front suspensions is
designed for long distance freight hauling in comfort, but it is not functional for the morning
commute. One might drive it downtown, but the fuel consumption empties the wallet and
guarantied, it won‘t fit in the parking garage.

Clearly design form has limits. The architect‘s role is to understand those design form limits and
produce system designs using integrated design forms to fulfill functional requirements.
And by form, not only model forms are available for examination, but also a wide variety of
technology based design forms as well.
6.1 Function Limiting Characteristics the Dimensional Form
The dimensional model is a powerful performance architecture form for the delivery of
information to businesses when properly applied. Like the ER form, the dimensional form has
limitations in its recognized function.
6.1.1 The Dimensional Form Does Not Extend Well
Ability to extend is a relative evaluation comparing one form to another. The evaluation is
really about how much disruption to process, existing data and retesting is involved in existing
implementations.
Purveyors of the dimensional model sometimes state that extending the dimensional form is as
easy as adding new attributes to dimensions, or new dimensions and dimensional keys to an
existing fact table from a specific point in time forward, and backfilling attributes and foreign
keys with the standard defaults for NULL or Not Applicable definition.
The reality of dimensional model extension is rather different.
1. Changes in Processing
Even when this approach can be taken, the addition of new content means there is a change
in existing processing. Aside from additional sourcing, the processing typically involves
integration with content sourced from multiple entity sources. If the target is an existing fact
with new dimensionality, the amount of disruption will depend on whether the new dimension
needs to be set in the primary key or not.



Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 12
2. Effects of Historic Context Changes
This also assumes that the historic data context does not need to be updated to reflect the
new content. If it does, then Type-2 Dimension keys will need to be restated to take into

account a new source of temporal attribute change. This in turn means that dimension key
references on the facts need to be restated, which results complete rebuilds of fact tables as
a consequence.
3. Cost of disruption often avoided.
These are often the primary reasons why star schemas in some implementations don‘t get
updated, or are not updated for long periods of time, even when business needs change. The
development and testing time needed to implement these change is painful for some clients.
4. Comparison to the Entity Relationship Form
The ER model form is one that was designed explicitly for extensibility. First, by modeling based
on pure entities and identifying keys and attributes that describe those entities, there is a very
good chance of ensuring that entity definitions are complete and less likely to need new
attributes in the future.
ER modeling is based on subject area organization. The subject area is typically associated
with a single prime entity that serves as a parent and ancestor to all other content in the
subject area. Modeling should always be based on parent dependencies. Because of this,
entities left for later are child entities. As new entities are added to the model using foreign
keys from existing parent entities rather than adding a parent, the need to add new foreign
keys to existing entities is eliminated.
Because there is no disruption to parent entities in the addition of a new one, only new load
processes are added instead of changing existing processes of the surrounding entities.
If new historic attributes are added to existing entities of an ER modeled EDW there is no
disruption to the key structures cascading to other referencing entities. In an ER-modeled EDW
the entities primary key substitutes (surrogate keys) only for the natural key and not the
temporal key of the attribute historic context. The temporal context of the data is not
transferred through the foreign key reference, and therefore will never be a disruption in any
related entity regarding changes to foreign key values.
6.1.2 The Dimensional Form Is Not Flexible
The star schema is built to answer the business questions for which it was designed. It is a
performance architecture based on the creation of the relational biases and inclusion of
processing that supports its performance relative to the range of business analysis and reports

intended.
An individual star schema has a limited range of business questions that can be asked of it. This
is not only due to the fact that the measures or facts represented on the fact table related to
limited business activity, but also due to limits imposed by reporting role context of the
dimensional relationships.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 13
For example, in the cellular industry, reporting revenue and network usage together is often
needed, since network usage partially drives revenue. Revenue is realized by monthly bill
cycle, which overlaps two months.
Since reporting is for many millions of Customer‘s or Subscription, detail billing and usage data
must be aggregated to the bill cycle level.
In a typical reporting Bill Cycle Context is often used. If however reporting needs a monthly
context a separate fact table is required.
Additionally, one can only measure facts by the contextual role base relationship that have
been foreseen and include by way of dimensional relationship to the fact. If one wants to
measure the success of a sales channel by revenue generation, then a sales channel
dimension has to be associated to the fact based on channel the subscription was sold
through. The same would be true of measuring program success and measuring promotion
effectiveness.
To ask additional questions or even broaden the question, additional star schemas capable of
answering those questions need to be design or modifications to existing designs need to be
made. It must be recognize that new questions in the future, if not just variations of old ones,
will likely require further star schema development.
6.1.3 The Form Does Not Describe the Business
The fact that the dimensional form does not describe subject content is at the heart of the
forms inflexibility. Yes, the form has attribution describing certain dimensions, but one cannot
look at a dimensional model and derive an understanding of how the business works in
relationship to entities that make up the business. And one cannot discern the business
relationships that exist between business entities.

This is the major limiting factor to its broader use when Enterprise Data Warehousing need to
function as a central data repository solution, able to deliver any form of business data
required for any use.
Once data is dimensionally cast for business function, as opposed to modeled for business
description, the denormalizations of data and relationships eliminates the ability to understand
the basis of the denormalizations and how they were applied in the first place. Only a
reporting relationship can be determined rather than a business data relationship.
There may be a model in the architect‘s hear as to how the entities represented in the
dimension are actually related, but there is nothing in the dimensional model that describes
the business.
The guidance of an ER model, either documented or undocumented (mental understanding)
is required to know how the entities and relationships describe the business model in order to
build other star schemas.
The star schema is incapable of supporting functions broader than its purpose. While it is
possible to join facts from two or more star schemas, the business context of the question
asked of the combined fact tables are limited to the fact table‘s common dimensional

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 14
context. Because of this, it is possible that the context of the questions asked of the combined
fact tables is more limited than that of a single fact table.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 15
7 Applying the Dimensional Form without Requirements

It is evident from the examination of the dimensional form that its design needs to be guided
by detailed use requirements. Without those requirements, one cannot properly identify
needed business measurements and align those measurements with business function. This
alignment of measures to business function is crucial to not only attaining the performance
goals of the dimensional model, but also producing a usable model.
Yet practitioners still try to deliver data warehouses based solely on ―data requirements‖

because it is believe that an EDW should be dimensional. These practitioner don‘t realize that
there are no such rules except those of form and function that dictate a specific model
organization for an EDW.
In his career, the author has reviewed a number of EDW implementations, some of which
completely failed and others breathing on political life support. The following two examples
outline the consequences of delivering dimensional form without the instruction of use
requirements.
7.1 Client A
Client A had a data warehouse built by a consulting firm. The ―architect‖ who drafted the
solution document explained that the dimensional form was chosen because ―everyone
knows that an EDW has to be dimensional.‖ Reporting and use requirements were never
documented, but a dimensional model was implemented nonetheless. Most of the common
characteristics, documented in the next section, were present in the implementation.
Client A experienced a user revolt. IT delivered an EDW at significant expense that made no
sense to the business users. The EDW could not answer business questions the users posed
because it did not conform any real businesses use requirements.
Instead, the users insisted on using the EDW source staging area because it was somewhat
normalized and left the data in a state still capable of delivering on their use requirements.
7.2 Client B
Client B also hired a consulting firm to build its EDW system. Client B recognized the system had
great difficulty delivering reports, but, due to the client‘s large investment, the system was not
deemed a failure.
A more detailed evaluation reveals several errors related to the data model. The systems
―architect‖ did not first design an overall solution framework based on the client‘s needs, but
instead took a piecemeal approach of consideration and built on a component by
component basis.
The ―architect‖ first delivered an ODS that was Entity Relationship modeled. In this case the
ODS had no other purpose than to serve as a data integration area. It fed no other system
and provided no other functionality.


Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 16
In an attempt to create value for the ODS, the architect decided that the EDW had to be
dimensionally modeled, because the optics of an identically modeled ODS and EDW would
point to the fact that the ODS was nothing more than a very expensive staging area.
Because the implementation was based on a bottom-up approach and IT was well insulated
from the business, the client could not gather use requirements.
A dimensional model with all of the Common Characteristics of a ―dimensional‖ form without
a purpose (detailed in the next section) was developed and implemented for the EDW.
An application that presented EDW sourced data to the clients customers was then
developed. The application which needed the historic perspective of the EDW, however, the
application‘s data requirements dictated a normalized data. Therefore additional processing
had to be developed to properly re-normalize the data from the ―dimensionally‖
denormalized data of the EDW.
The Application‘s model was remarkably similar to that of the ODS.
The EDW system was deployed on a very robust MPP server cluster. It had the horse power to
deliver from an ER model, answer the business were starting to ask. However, because the
data was organized in a form not guided by requirements, but by making the data ―look‖
dimensional, the architect and modeler had baked in relational biases that were difficult or
impossible to resolve by query alone.
After reviewing the model and the use requirements the client had started to see, the author
told Client B that the data marts could NOT be virtualized but needed to be physically
implemented. The reasoning was delivered with a certain amount of political sensitivity
because the IT client could not afford to have the project viewed as a failure.
In the author‘s opinion, the system was an architectural failure. Client B had implemented the
EDW on a database server capable of virtualizing much of the information delivery through
SQL processing for business reporting from an ER model. The architect and modeler had
created such disorder that Client B now had to spend a lot more money to write processing
that would undo the denormalization as part of creating the real reporting model that was
needed.
7.3 Common Characteristics

Both of these clients‘ implementations had the same application of an artificial dimensional
form applied to the warehouse, guided not by reporting or analytics requirements, but by a
modeler‘s imagination of how data could ―made dimensional‖ or how the client ―might‖ use
the data. Since it has been established that the dimensional form needs to be aligned for
business function, the use of the term ―artificial dimensional form‖ is warranted anytime
dimensional model design is not guided by use requirements, which defines the function.
In both of these examples the model practitioners did not understand that the dimensional
form only gains its performance architecture status when it is functionally aligned to deliver
actual reporting and analysis capabilities (usable information).

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 17
The goal of most modelers who attempt to model dimensionally without requirements is that
they believe that by shaping the data in a dimensional form they are somehow making it
easier to use and that the users will be able to make their reporting work with the new model.
Typical patterns that are repeated by modelers in this approach are evident in the following
features:
1. All facts containing measurement data are single transaction based facts
Transactions are automatically converted to facts because these are a source basis for
many business measures. Typically though, a good deal of business reporting does not look
at a single type of business activity in isolation.
For instance, in the cellular phone industry, reports combine subscription base, additions
and defections as well as contract renewals to provide context.
Combine multiple fact tables requires preplanning to ensure that proper common
dimensional context supports the reporting requirement. Additionally joining multiple fact
table, unless planned for may not support performance SLAs.
Because no use requirements exist, there is nothing to guide the modeler to produce facts
that would facilitate a specific business function or SLA,
Typically fact tables created in this manner have a dimensional context based on
relationships that are immediately associated with the transaction in source. They may also
associate the fact to additional dimensions based on entities that have a parent

relationship to those entities forming the basis of dimensions immediately associated with
the fact or transaction.
Finally, the modeler might create any derived relationships to the fact if they learn the
client finds that reference useful.
The point is, that when real reporting requirements arise that demand additional
dimensional relationships and relationships based on additional role types, the fact table
needs to be rebuilt anyway.
2. A one-to-one relationship exists between a very large fact and a very large dimension
The condition arises when the modeler directly casts a transaction into a fact and the fact
has so many non-measurement attributes with which to contend. They believe they cannot
leave them as degenerate dimensions on the fact and have the attributes have to go
somewhere.
Based on requirements one would normally identify which of these are actually needed for
business consumption. Further, it would likely be found that some actually describe another
dimension or that they can be logical grouped into multiple junk dimensions.
The use of a one-to-one dimension is counterproductive to the function of the dimensional
model. I/O is the major factor in database performance. Additional joins in the star schema
when joining to small dimensions have minimal impact on performance when compared
to significant increases in I/O for multi-table joins based on large fact cardinality.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 18
Measures used together in reporting are organized on the same fact table to eliminate
fact-to-fact joins, which represents additional I/O. A join of a one-to-one fact-to-dimension
relationship is no different than a fact-to-fact join. And that I/O will occur every time a user
needs as little as a single attribute from that dimension.
Joins do not by themselves significantly decrease performance in a star schema, I/O does.
The smaller the dimension the more likely the dimension is to be cashed in memory.
Unless the implementation is on a Columnar Database, this practice is counter to the
performance architecture function of the dimensional form.
3. Presence of ―Factless‖ Facts

What does one do with reference data not directly related to a fact table if one thinks the
reference data belongs in the data warehouse, yet has no supporting reporting
requirements? One creates a Factless Fact!
Turn all those entities into dimensions, eliminate the business rule based relationships
between them, create a ―fact‖ table, and associate all the entity based dimension with
one another by way of the fact, thus obscuring an understanding of the natural
relationship of the entities to one another. By creating a data bias not informed by
requirements, the modeler has no idea whether the bias will actually be useful to the
function for the star.
The legitimate purpose of a factless fact is to produce row counts based on a requirement
for reporting. Factless Facts are typically rare in the model based on real use requirements.
They are common in dimensional models driven by the desire to cast data in the
dimensional ―form‖ where no requirements exist.
4. Many-to-Many relationships are left unresolved
Data architects work with business users to eliminate many-to-many relationships in a
dimensional model because it is understood that if improperly used, the many-to-many
causes a duplication of measurement rows in the output.
There are several techniques to eliminate the many-to-many relationship. The first is to
create specific role based relationships of single members of the many-to-many dimension.
The other is to ―flatten out‖ the many to a single row that represents the mix of the
combinations that the many represent. This has to be done based on requirements and
working with the business to produce a representation that provides them the information
they need in the report.
A typed many-to-many that can identify a single typed row is the last resort for inclusion of
a many-to-many in a dimensional model.
To leave many-to-many dimensional relationships unresolved is to leave a reporting
accident waiting to happen and the dimension useless for reporting purposes.


Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 19

The consequence of misapplying the dimensional form is the creation of a good deal of
unnecessary dysfunction for the client later when they actually need to use the warehouse for
business. These implementations occur whenever the modeler has to provision data in the
warehouse based on a bottom up approach (data requirement first, reporting requirements
sometime later) and believe they must deliver a dimensional model.
7.4 Bottom-Up Warehouse Design
It is always desirable to drive any data warehouse implementation on actual use
requirements, even when this means extensive business consulting to help the organization
understand how best to measure their business.
However, there are a number of clients who cannot commit to the process of requirements
analysis that informs data warehouse design.
It should be recognized that for many companies implementing a Data Warehouse or
enabling Business Intelligence for the first time, are in fact, just stating a process of transitioning
from an operational focus to one that is more strategic. They are determining how they should
measure or gain business insights, and how to focuses attention on customers.
The clients can‘t always determine their requirements at the early stages of the process but
they do benefit from business data availability.
While the process of this focus shift progresses, reporting requirements change rapidly and the
data warehouse must be able to respond as the business identifies useful information.
The need to fulfill a data requirement still needs to be guided by analysis that determines
necessary subject areas and sources necessary to enabling specific business analyses
capabilities. Doing so not only ensures that data delivered is useful, but also allows delivery of
priority capabilities first.
The data warehouse in these cases needs to be easily extendable to assimilate new subject
content when missing data is identified. As implied, it has to be flexible to answer questions not
yet thought of.
Flexibility, to answer any question, means that every subject area in the EDW can be cast and
recast in structure and content organization to meet any function or functions the business
has.
The functionality described here is that of central business data repository. It is fulfilled by a

normalized Entity Relationship Model (most likely with modest denormalizations such as
allowing some repeating groups).
Data from the central repository, called the data warehouse or ―Enterprise Data Warehouse‖,
will feed the organized information components of solution architecture. Some call these
components data marts, some call it information delivery. It is typically in the dimensional form
to efficiently deliver functionally aligned measurements to the business for reporting and
analysis.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 20
In a top-down approach where reporting and use requirement guide the design of the EDW,
Information Delivery structure are directly defined. However, in some cases there may still be
functional reasons that a central repository is still needed.
Additionally, there are a number of ways to deliver dimensional capabilities with BI tools,
including BI cube functionality. It is a waste of company resources to deliver a star schema
with the sole purpose of creating a cube if the cube can be created from query of an ER
source model.
When a client can articulate reporting requirements, and there are no other functions that
require an ER business based model as the foundation of the warehouse, there is no reason for
the client to pay for more functionality than is need.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 21
8 System Architecture Form to Fulfill Multiple Functions

It is common for clients to have system requirements calling for the support of multiple
functions as implied in prior sections. The distinction of functional difference between a central
business data repositories and organized information structures exists largely because form
does determine function, which was Wright‘s point.
These two functions are in conflict with one another due to the fact that model form required
for each does not support the other‘s function, on the most common technology used for
data warehouse today, SMP servers.

The application of multiple forms to support the multiple functions in system solution is the
means by which to deal with limits to a single form.
8.1 Combining Model Forms
Bill Inmon‘s Corporate Information Factory is a concept that was developed to deal with such
requirements. The central idea is to build a central historic data warehouse repository that is
entity relationship modeled for flexibility, extensibility and business descriptive. This is the
foundation form from which various information delivery structures can be created in response
to use requirements.
The advantage that this system architecture form has is that it can accomplish what two
individual model forms cannot by themselves.
With a well-documented ER business modeled foundation of the data warehouse in place,
any information delivery form can be cast and recast as requirements change, always more
easily than from an operational system source. This is because the data warehouse model is
based on business organization and presentation of the data, rather than how the operational
system stores and treats data. The differences between these perspectives can be significant.
Additionally the data warehouse can also contain a number of architectural features and
structures put in place to ease the delivery of common dimensional patterns used across
many fact tables.
Many times dimensional components such as dimensions and even simpler facts can be the
result of database or materialized views, eliminating the expense of ETL processing.
The limits of this form is cost and time. This form is typically viewed as the most expensive way to
deliver a data warehouse and also viewed as the slowest. While the observation regarding
cost is valid in the short run, it can be more efficiency and value for many clients facing
requirement changes as part of their typical business cycle.
As for time, the way the delivery is organized can greatly influence implementation. ER model
form is built to absorb new content easily. This means they are not built monolithically, but by
subject area. Functional information delivery can drive the subject content order of the EDW.
Don‘t wait for the entire EDW to be delivered to deliver reporting, but rather prioritize subject
content delivery to the EDW based on reporting requirements.


Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 22
The key here is to define the architecture that fits the customer needs. The client should not
pay for an architecture they don‘t need, nor should they rely on an architecture that doesn‘t
fulfill their information management system requirements.
8.2 Integrating Model Form with Technology Form
Combining the ER form with MPP database server architecture creates a high degree of data
warehouse flexibility.
The MPP form brings the ability to apply linearly scalable parallel processing to the data
warehouse. The technology form of the MPP architecture is its shared nothing, multi node
distributed computing environment. Once data is properly distributed, parallel query
optimization is far more straightforward than that of the SMP servers.
Teradata‘s consulting organization has a bottom up approach that emphasizes delivering
industry-based ER models. They use the power of the database engine to side step much of
the need for delivering an information organized performance architecture that the
dimensional model form represents.
Instead, the implementations virtualize as much of the physical information organization as
possible with database views and create lightly processed materializations of information
organization structures where additional performance is needed to meet SLAs. This provides
the flexibility of the Corporate Information Factory with less development cost than of the prior
section requiring a full suite of physical dimensional schema‘s supported by ETL processes.
ETL process development represents the most significant labor cost of warehouse system
development.
Clients pay a premium for MPP server technology. The technology exists to provide the ability
to perform significant parallel data processing. It makes some sense to use the technology to
allow for greater flexibility in the data warehouse. The MPP platform itself is a performance
architecture based on technology.
The architect needs to carefully consider the decision of forgoing extensibility and flexibility
when deploying a model based performance architecture designed for the SMP servers on
technology that is a performance architecture itself.
This is not to say that such an implementation is wrong, but implementing a dimensional

models on such a platform as default practice is not the practice of an architect.

Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 23
9 Conclusion

The tools the architect works with is the knowledge of form‘s function enabling and limiting
characteristics. While this discussion centers on of the relationship that model form and
function have with one another, the same principle of form‘s support for function and form‘s
limiting effect on function has broad applications in all architectural applications and
disciplines, whether it is the evaluation of model form, technology form or even methodology.
To be an architect is to be a student of form and function and apply form based on these
principles of form‘s effects.
The architect‘s role is to recognize the clients‘ needs and apply form based on those needs.
There is always a balancing of the application of form, usually constrained by the client‘s
tolerances for cost. The architect has to ensure the client fully understands the impact of
compromises in form‘s application.
The architect that can engage the client in a fact-based discussion in terms of cause and
effect related to their needs has a much better opportunity to deliver what is appropriate and
ensure client satisfaction.
Architectural leadership is grounded in the knowledge that Form and Function are in fact
united in a co-dependent relationship of cause and effect. For the architect, in the
application of form, the question of ―why‖, is of far more importance than the statement of
―what‖.

×