C H A P T E R 2
Best Practices for
Database Programming
Software development is not just a practical discipline performed by coders, but also an area of
academic research and theory. There is now a great body of knowledge concerning software
development, and lengthy academic papers have been written to propose, dissect, and discuss different
approaches to development. Various methodologies have emerged, including test-driven development
(TDD), agile and extreme programming (XP), and defensive programming, and there have been
countless arguments concerning the benefits afforded by each of these schools of thought.
The practices described in this chapter, and the approach taken throughout the rest of this book, are
most closely aligned with the philosophy of defensive programming. However, the topics discussed here
can be applied just as readily in any environment. While software theorists may argue the finer
differences between different methodologies (and undoubtedly, they do differ in some respects), when it
comes down to it, the underlying features of good programming remain the same whatever
methodology you apply.
I do not intend to provide an exhaustive, objective guide as to what constitutes best practice, but
rather to highlight some of the standards that I believe demonstrate the level of professionalism that
database developers require in order to do a good job. I will present the justification of each argument
from a defensive point of view, but remember that they are generally equally valid in other
environments.
Defensive Programming
Defensive programming is a methodology used in software development that suggests that developers
should proactively anticipate and make allowances for (or “defend against”) unforeseen future events.
The objective of defensive programming is to create applications that can remain robust and effective,
even when faced with unexpected situations.
Defensive programming essentially involves taking a pessimistic view of the world—if something
can go wrong, it will: network resources will become unavailable halfway through a transaction; required
files will be absent or corrupt; users will input data in any number of ways different from that expected,
and so on. Rather than leave anything to chance, a defensive programmer will have predicted the
possibility of these eventualities, and will have written appropriate handling code to check for and deal
with these situations. This means that potential error conditions can be detected and handled before an
actual error occurs.
Note that defensive programming does not necessarily enable an application to continue when
exceptional circumstances occur, but it does make it possible for the system to behave in a predictable,
controlled way—degrading gracefully, rather than risking a crash with unknown consequences. In many
23
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
24
cases, it may be possible to identify and isolate a particular component responsible for a failure, allowing
the rest of the application to continue functioning.
There is no definitive list of defensive programming practices, but adopting a defensive stance to
development
is generally agreed to include the following principles:
• Keep t
hings simple (or KISS—keep it simple, stupid). Applications are not made
powerful and effective by their complexity, but by their elegant simplicity.
Complexity allows bugs to be concealed, and should be avoided in both
application design and in coding practice itself.
• “If it ain
’t broke, fix it anyway.” Rather than waiting for things to break, defensive
programming encourages continuous, proactive testing and future-proofing of an
application against possible breaking changes in the future.
• Be challenging, thorough, an
d cautious at all stages and development. “What if?”
analyses should be conducted in order to identify possible exceptional scenarios
that might occur during normal (and abnormal) application usage.
• Ext
ensive code reviews and testing should be conducted with different peer
groups, including other developers or technical teams, consultants, end users, and
management. Each of these different groups may have different implicit
assumptions that might not be considered by a closed development team.
• Assum
ptions should be avoided wherever possible. If an application requires a
certain condition to be true in order to function correctly, there should be an
explicit assertion to this effect, and relevant code paths should be inserted to
check and act accordingly based on the result.
• Applica
tions should be built from short, highly cohesive, loosely coupled modules.
Modules that are well encapsulated in this way can be thoroughly tested in
isolation, and then confidently reused throughout the application. Reusing
specific code modules, rather than duplicating functionality, reduces the chances
of introducing new bugs.
Throughout the remainder of this chapter, I'll be providing simple examples of what I believe to be
best
practices demonstrating each of these principles, and these concepts will be continually
reexamined in later chapters of this book.
Attitudes to Defensive Programming
The key advantages of taking a defensive approach to programming are essentially twofold:
• Defensive applications are ty
pically robust and stable, require fewer essential bug
fixes, and are more resilient to situations that may otherwise lead to expensive
failures or crashes. As a result, they have a long expected lifespan, and relatively
cheap ongoing maintenance costs.
• In many ca
ses, defensive programming can lead to an improved user experience.
By actively foreseeing and allowing for exceptional circumstances, errors can be
caught before they occur, rather than having to be handled afterward. Exceptions
can be isolated and handled with a minimum negative effect on user experience,
rather than propagating an entire system failure. Even in the case of extreme
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
25
unexpected conditions being encountered, the system can still degrade gracefully
and act according to documented behavior.
However, as with any school of thought, defensive programming is not without its opponents. Some
of the criticisms commonly made of defensive coding are listed following. In each case, I’ve tried to give
a reasoned response to each criticism.
Defensive code takes longer to develop.
It is certainly true that following a defensive methodology can result in a longer up-front development
time when compared to applications developed following other software practices. Defensive
programming places a strong emphasis on the initial requirements-gathering and architecture design
phases, which may be longer and more involved than in some methodologies. Coding itself takes longer
because additional code paths may need to be added to handle checks and assertions of assumptions.
Code must be subjected to an extensive review that is both challenging and thorough, and then must
undergo rigorous testing. All these factors contribute to the fact that the overall development and release
cycle for defensive software is longer than in other approaches.
There is a particularly stark contrast between defensive programming and so-called “agil
e”
development practices, which focus on releasing frequent iterative changes on a very accelerated
development and release cycle. However, this does not necessarily mean that defensive code takes
longer to develop when considered over the full life cycle of an application. The additional care and
caution invested in code at the initial stages of development are typically paid back over the life of the
project, because there is less need for code fixes to be deployed once the project has gone live.
Writing code that anticipates and handles every possible scenario makes defensive
applications bloated.
Code bloat suggests that an application contains unnecessary, inefficient, or wasteful code. Defensive
code protects against events that may be unlikely to happen, but that certainly doesn’t mean that they
can’t happen. Taking actions to explicitly test for and handle exceptional circumstances up front can
save lots of hours spent possibly tracing and debugging in the future. Defensive applications may
contain more total lines of code than other applications, but all of that code should be well designed,
with a clear purpose. Note that the label of “defensive programming” is sometimes misused: the
addition of unnecessary checks at every opportunity without consideration or justification is not
defensive programming. Such actions lead to code that is both complex and rigid. Remember that true
defensive programming promotes simplicity, modularization, and code reuse, which actually reduces
code bloat.
Defensive programming hides bugs that then go unfixed, rather than making them
visible.
This is perhaps the most common misconception applied to defensive practices, which manifests from a
failure to understand the fundamental attitude toward errors in defensive applications. By explicitly
identifying and checking exceptional scenarios, defensive programming actually takes a very proactive
approach to the identification of errors. However, having encountered a condition that could lead to an
exceptional circumstance, defensive applications are designed to fail gracefully—that is, at the point of
development, potential scenarios that may lead to exceptions are identified and code paths are created
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
26
to handle them. To demonstrate this in practical terms, consider the following code listing, which
describes a simple stored procedure to divide one number by another:
CREATE PROCEDURE Divide (
@x decimal(18,2),
@y decimal(18,2)
)
AS BEGIN
SELECT @x / @y
END;
GO
Based on the code as written previously, it would be very easy to cause an exception using this
procedure if, for example, the supplied valu
e of @y was 0. If you were simply trying to prevent the error
message from occurring, it would be possible to consume (or “swallow”) the exception in a catch block,
as follows:
ALTER PROCEDURE Divide (
@x decimal(18,2),
@y decimal(18,2)
)
AS BEGIN
BEGIN TRY
SELECT @x / @y
END TRY
BEGIN CATCH
/* Do Nothing */
END CATCH
END;
GO
However, it is important to realize that the preceding code listing is not de
fensive—it does nothing
to prevent the exceptional circumstance from occurring, and its only effect is to allow the system to
continue operating, pretending that nothing bad has happened. Exception hiding such as this can be
very dangerous, and makes it almost impossible to ensure the correct functioning of an application. The
defensive approach would be, before attempting to perform the division, to explicitly check that all the
requirements for that operation to be successful are met. This means asserting such things as making
sure that values for @x and @y are supplied (i.e., they are not NULL), that @y is not equal to zero, that the
supplied values lie within the range that can be stored within the decimal(18,2) datatype, and so on.
The following code listing provides a simplified defensi
ve approach to this same procedure:
ALTER PROCEDURE Divide (
@x decimal(18,2),
@y decimal(18,2)
)
AS BEGIN
IF @x IS NULL OR @y IS NULL
BEGIN
PRINT 'Please supply values for @x and @y';
RETURN;
END
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
27
IF @y = 0
BEGIN
PRINT '@y cannot be equal to 0';
RETURN;
END
BEGIN TRY
SELECT @x / @y
END TRY
BEGIN CATCH
PRINT 'An unhandled exception occurred';
END CATCH
END;
GO
For the purposes of the preceding example, each a
ssertion was accompanied by a simple PRINT
statement to advise which of the conditions necessary for the procedure to execute failed. In real life,
these code paths may handle such assertions in a number of ways—typically logging the error, reporting
a message to the user, and attempting to continue system operation if it is possible to do so. In doing so,
they prevent the kind of unpredictable behavior associated with an exception that has not been
expected.
Defensive programming can be co
ntrasted to the fail fast methodology, which focuses on
immediate recognition of any errors encountered by causing the application to halt whenever an
exception occurs. Just because the defensive approach doesn’t espouse ringing alarm bells and flashing
lights doesn’t mean that it hides errors—it just reports them more elegantly to the end user and, if
possible, continues operation of the core part of the system.
Why Use a Defensive Approach to Database Development?
As stated previously, defensive programming is not the only software development methodology that
can be applied to database development. Other common approaches include TDD, XP, and fail-fast
development. So why have I chosen to focus on just defensive programming in this chapter, and
throughout this book in general? I believe that defensive programming is the most appropriate approach
for database development for the following reasons:
Database applications tend to have a long
er expected lifespan than other
software applications. Although it may be an overused stereotype to suggest that
database professionals are the sensible, fastidious people of the software
development world, the fact is that database development tends to be more slow-
moving and cautious than other technologies. Web applications, for example, may
be revised and relaunched on a nearly annual basis, in order to take advantage of
whatever technology is current at the time. In contrast, database development
tends to be slow and steady, and a database application may remain current for
many years without any need for updating from a technological point of view. As a
result, it is easier to justify the greater up-front development cost associated with
defensive programming. The benefits of reliability and bug resistance will typically
be enjoyed for a longer period.
Users (and management) are less tolerant of bugs in database ap
plications. Most
end users have come to tolerate and even expect bugs in desktop and web
software. While undoubtedly a cause of frustration, many people are routinely in
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
28
the habit of hitting Ctrl+Alt+Delete to reset their machine when a web browser
hangs, or because some application fails to shut down correctly. However, the
same tolerance that is shown to personal desktop software is not typically extended
to corporate database applications. Recent highly publicized scandals in which
bugs have been exploited in the systems of several governments and large
organizations have further heightened the general public’s ultrasensitivity toward
anything that might present a risk to database integrity.
Any bugs that do exist in database ap
plications can have more severe
consequences than in other software. It can be argued that people are absolutely
right to be more worried about database bugs than bugs in other software. An
unexpected error in a desktop application may lead to a document or file becoming
corrupt, which is a nuisance and might lead to unnecessary rework. But an
unexpected error in a database may lead to important personal, confidential, or
sensitive data being placed at risk, which can have rather more serious
consequences. The nature of data typically stored in a database warrants a
cautious, thorough approach to development, such as defensive programming
provides.
Designing for Longevity
Consumer software applications have an increasingly short expected shelf life, with compressed release
cycles pushing out one release barely before the predecessor has hit the shelves. However, this does not
have to be the case. Well-designed, defensively programmed applications can continue to operate for
many years. In one organization I worked for, a short-term tactical management information data store
was created so that essential business reporting functions could continue while the organization’s systems
went through an integration following a merger. Despite only being required for an immediate post-merger
period, the (rather unfortunately named) Short Term Management Information database continued to be
used for up to ten years later, as it remained more reliable and robust than subsequent attempted
replacements.
And let that be a lesson in choosing descriptive names for your databases that won’t age with time!
Best Practice SQL Programming Techniques
Having looked at some of the theory behind different software methodologies, and in particular the
defensive approach to programming, you’re now probably wondering about how to put this into
practice. As in any methodology, defensive programming is more concerned with the mindset with
which you should approach development than prescribing a definitive set of rules to follow. As a result,
this section will only provide examples that illustrate the overall concepts involved, and should not be
treated as an exhaustive list. I’ll try to keep the actual examples as simple as possible in every case, so
that you can concentrate on the reasons I consider these to be best practices, rather than the code itself.
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
29
Identify Hidden Assumptions in Your Code
One of the core tenets of defensive programming is to identify all of the assumptions that lie behind the
proper functioning of your code. Once these assumptions have been identified, the function can either
be adjusted to remove the dependency on them, or explicitly test each condition and make provisions
should it not hold true. In some cases, “hidden” assumptions exist as a result of code failing to be
sufficiently explicit.
To demonstrate this concept, consider the following code listing, which creates and populates a
Customers and an O
rders table:
CREATE TABLE Customers(
CustID int,
Name varchar(32),
Address varchar(255));
INSERT INTO Customers(CustID, Name, Address) VALUES
(1, 'Bob Smith', 'Flat 1, 27 Heigham Street'),
(2, 'Tony James', '87 Long Road');
GO
CREATE TABLE Orders(
OrderID INT,
CustID INT,
OrderDate DATE);
INSERT INTO Orders(OrderID, CustID, OrderDate) VALUES
(1, 1, '2008-01-01'),
(2, 1, '2008-03-04'),
(3, 2, '2008-03-07');
GO
Now consider the following query to select a list of every cu
stomer order, which uses columns from
both tables:
SELECT
Name,
Address,
OrderID
FROM
Customers c
JOIN Orders o ON c.CustID = o.CustID;
GO
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
30
The query executes successfully and we get the results expected:
Bob Smith Flat 1, 27 Heigham Street 1
Bob Smith Flat 1, 27 Heigham Street 2
Tony James 87 Long Road 3
But what is the hidden assumption? The column names listed in the SELECT query were not qualified
with table names, so what would happen if the table structure were to change in the future? Suppose
that an Address column were added to the Orders table to enable a separate delivery address to be
attached to each order, rather than relying on the address in the Customers table:
ALTER TABLE Orders ADD Address varchar(255);
GO
The unqualified column name, Address, specified in the SELECT query, is now ambiguous, and if we
attempt to run the original query again we receive an error:
Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'Address'.
By not recognizing and correcting the hidden assumption contained in the original code, the query
subsequently broke as a result of the additional column being added to the Orders table. The simple
practice that could have prevented this error would have been to ensure that all column names were
prefixed with the appropriate table name or alias:
SELECT
c.Name,
c.Address,
o.OrderID
FROM
Customers c
JOIN Orders o ON c.CustID = o.CustID;
GO
In the previous case, it was pretty easy to spot the hidden assumption, because SQL Server gave a
descri
ptive error message that would enable any developer to locate and fix the broken code fairly
quickly. However, sometimes you may not be so fortunate, as shown in the following example.
Suppose that you had a table, M
ainData, containing some simple values, as shown in the following
code listing:
CREATE TABLE MainData(
ID int,
Value char(3));
GO
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
31
INSERT INTO MainData(ID, Value) VALUES
(1, 'abc'), (2, 'def'), (3, 'ghi'), (4, 'jkl');
GO
Now suppose that every change made to the M
ainData table was to be recorded in an associated
ChangeLog table. The following code demonstrates this structure, together with a mechanism to
automatically populate the ChangeLog table by means of an UPDATE trigger attached to the MainData table:
CREATE TABLE ChangeLog(
ChangeID int IDENTITY(1,1),
RowID int,
OldValue char(3),
NewValue char(3),
ChangeDate datetime);
GO
CREATE TRIGGER DataUpdate ON MainData
FOR UPDATE
AS
DECLARE @ID int;
SELECT @ID = ID FROM INSERTED;
DECLARE @OldValue varchar(32);
SELECT @OldValue = Value FROM DELETED;
DECLARE @NewValue varchar(32);
SELECT @NewValue = Value FROM INSERTED;
INSERT INTO ChangeLog(RowID, OldValue, NewValue, ChangeDate)
VALUES(@ID, @OldValue, @NewValue, GetDate());
GO
We can test the trigger by running a simple U
PDATE query against the MainData table:
UPDATE MainData SET Value = 'aaa' WHERE ID = 1;
GO
The query appears to be functioning correctly—SQL Server Management Studio repo
rts the following:
(1 row(s) affected)
(1 row(s) affected)
CHAPTER 2 BEST PRACTICES FOR DATABASE PROGRAMMING
32
And, as expected, we find that one row has been updated in the MainData table:
ID Value
1 aaa
2 def
3 ghi
4 jkl
and an associated row has been created in the ChangeLog table:
ChangeID RowID OldValue NewValue ChangeDate
1 1 abc aaa 2009-06-15 14:11:09.770
However, once again, there is a hidden assumption in the code. Within the trigger logic, the
variables @ID, @OldValue, and @NewValue are assigned values that will be inserted into the ChangeLog table.
Clearly, each of these scalar variables can only be assigned a single value, so what would happen if you
were to attempt to update two or more rows in a single statement?
UPDATE MainData SET Value = 'zzz' WHERE ID IN (2,3,4);
GO
If you haven’t worked it out yet, perhaps the messages rep
orted by SQL Server Management Studio
will give you a clue as to the result:
(1 row(s) affected)
(3 row(s) affected)