Tải bản đầy đủ (.pdf) (1,043 trang)

1309 SQL server 2012 data integration recipes

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (49.71 MB, 1,043 trang )

www.it-ebooks.info


For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.

www.it-ebooks.info


Contents at a Glance
About the Author��������������������������������������������������������������������������������������������������������� xlv
About the Technical Reviewers.................................................................................. xlvii
Acknowledgments....................................................................................................... xlix
Introduction.....................................................................................................................li
■■Chapter 1: Sourcing Data from MS Office Applications................................................1
■■Chapter 2: Flat File Data Sources...............................................................................61
■■Chapter 3: XML Data Sources...................................................................................133
■■Chapter 4: SQL Databases........................................................................................179
■■Chapter 5: SQL Server Sources.................................................................................241
■■Chapter 6: Miscellaneous Data Sources...................................................................285
■■Chapter 7: Exporting Data from SQL Server.............................................................343
■■Chapter 8: Metadata.................................................................................................425
■■Chapter 9: Data Transformation...............................................................................481
■■Chapter 10: Data Profiling........................................................................................559
■■Chapter 11: Delta Data Management........................................................................619
■■Chapter 12: Change Tracking and Change Data Capture..........................................681
■■Chapter 13: Organising And Optimizing Data Loads.................................................731

v
www.it-ebooks.info




■ Contents at a Glance

■■Chapter 14: ETL Process Acceleration......................................................................801
■■Chapter 15: Logging and Auditing............................................................................853
■■Appendix A: Data Types............................................................................................931
■■Appendix B: Sample Databases and Scripts.............................................................973
Index............................................................................................................................989

vi
www.it-ebooks.info


Introduction
Microsoft SQL Server 2012 is a vast subject. One part of the ecosystem of this powerful and comprehensive
database which has evolved considerably over many years is data integration – or ETL if you want to use another
virtually synonymous term. Long gone are the days when BCP was the only available tool to load or export data.
Even DTS is now a distant memory. Today the user is spoilt for choice when it comes to the plethora of tools and
options available to get data into and out of the Microsoft RDBMS. This book is an attempt to shed some light on
many of the ways in which data can be both loaded into SQL Server and sent from it into the outside world. I also
try to give some ideas as to which techniques are the most appropriate to use when faced with various different
challenges and situations.
This book is not, however, just an SSIS manual. I have a profound respect for this excellent product, but
do not believe that it is the “one stop shop” which some developers take it to be. I wanted to show readers that
there are frequently alternative technologies which can be applied fruitfully in many ETL scenarios. Indeed my
philosophy is that when dealing with data you should always apply the right solution, and never believe that
there is only one answer. Consequently this book includes recipes on many of the other tools in the SQL Server
universe. Sometimes I have deliberately shown varied ways of dealing with essentially the same challenge. I
hope by doing this to arouse your curiosity and also to provide some practical examples of ways to get data from

myriad sources into SQL Server databases cleanly and efficiently.
Although this book specifically targets users of SQL Server 2012 I try, wherever feasible, to say if a recipe can
be applied to previous versions of the database. I also try and highlight any new features and differences between
SQL Server 2012 and older versions. This is because it is unlikely that users will only ever deal with the latest
version of this RDBMS, and are likely to have multiple versions in production on most sites. I only ever go back to
SQL Server 2005 when pointing out how the database has evolved, as this was the version which introduced SSIS which was the major turning point in SQL Server-based ETL.
As the book is focused on SQL Server nearly all the code used is T-SQL. Some of the samples given are
extremely simple, others are more complex. All of it is concentrated on ETL requirements. Consequently you will
find no OLTP or DBA-based examples in this book. You will find a few touches of MDX where handling Analysis
Services data is concerned and some VB.Net where SSIS script tasks are used. I have chosen to use VB.Net in
nearly all the SSIS script tasks described in this book as it is, in my experience, the .Net language that many
T-SQL programmers are most familiar with. Nonetheless I have added one or two snippets of C# (particularly
where CLR assemblies are used) to avoid accusations of neglecting this particular language.
Data integration is a vast subject. Consequently, in an attempt to apply a little structure to a potentially
enormous and disparate domain, this book is divided into two main parts.
The first part—Chapters 1 through 7—deals with the mechanics of getting data into and out of SQL Server.
Here you will find the essential details of how to connect to various data sources, and then ingurgitate the data.
As many potential pitfalls and traps as possible are brought to your attention for each data source.
The second part—Chapters 8 through 15—deal with the wider ETL environment. Here we progress from the
nuts and bolts to the coordinated whole of extracting, transforming, and (efficiently) loading data. These chapters
take the reader on a trip through the process of metadata analysis, data transformation, profiling source data,
logging data processes, and some of the ways of optimizing data loads.
For this book I decided to avoid the ubiquitous AdventureWorks, and use my own sample database. There
are a few reasons for this. Firstly, I thought that AdventureWorks was so large and complex that it could divert
attention from some of the techniques which I wanted to explain. I prefer to use an extremely simplistic data

li
www.it-ebooks.info



■ Introduction

structure so that the reader is free to focus on the essence of what is being explained, and not the data itself.
Secondly I wished to avoid the added complexity of the multiple interrelated tables and foreign keys present in
AdventureWorks. Finally I did not want to be using data which took time to load. This way, once again, you can
concentrate on process and principle, and not develop “ETL-stare” while you watch a clock ticking as thousands
of records churn into a table, accompanied by whirling on-screen images or the blinking of a bleary-eyed hard
disk indicator. Consequently I have preferred to use an extremely uncluttered set of source data. A full description
of the source database(s) is given in Appendix B.
Please also note that this book is not destined to be a progressive self-tuition manual. You are strongly
advised to drop and recreate the sample databases between recipes to ensure a clean environment to test the
examples that are given. Indeed the whole philosophy of the recipe-based approach is that you can dip in
anywhere to find help, except in the rare cases where there are specific indications that a recipe requires prior
reading or builds on a previous explanation.
The recipes in this book cover a wide variety of needs, from the extremely simple to the relatively complex.
This is in an attempt to cover as wide a range of subjects as possible. The consequence is that some recipes may
seem far too simplistic for certain readers, while others may wonder if the more advanced solutions are relevant
to their work. I can only hope that SQL Server beginners will find easy answers and that advanced users will
nonetheless find tweaks and suggestions which add to their knowledge. In all cases I sincerely hope that you will
find this book useful.
Inevitably, not every question can be answered and not every issue resolved in one book. I truly hope that I
have covered many of the essential ETL tasks that you will face, and have provided ways of solving a reasonable
number of the problems that you may encounter. My apologies, then, to any reader who does not find the answer
to their specific issue, but writing an encyclopaedia was not an option. In any case, I can only encourage you to
read recipes other than those that cover the precise subject that interests you, as you may find potential solutions
elsewhere in this book.
I wish you good luck in using SQL Server to extract, transform, and load data. And I sincerely hope that you
have as much fun with it as I had writing this book.
—Adam Aspin


lii
www.it-ebooks.info


Chapter 1

Sourcing Data from MS Office
Applications
I suspect that many industrial-strength SQL Server applications have begun life as a much smaller MS Officebased idea, which has then grown and been extended until it has finished as a robust SQL Server application. In
any case, two Microsoft Office programs—Excel and Access—are among the most frequently used sources of data
for eventual loading into SQL Server. There are many reasons for this, from their sheer ubiquity to the ease with
which users can enter data into Access databases and Excel spreadsheets. So it is no wonder that we developers
and DBAs spend so much of our time loading data from these sources into SQL Server.
There are a number of ways in which data can be pushed or pulled from MS Office sources into SQL Server.
These include:


Using T-SQL (OPENDATASOURCE and OPENROWSET)



Linked Servers (yes, an Access database or even an Excel spreadsheet can be a linked server)



SSIS



The SQL Server Import Wizard




The SQL Server Migration Assistant for Access

This chapter examines all these techniques and tries to give you some guidelines on their optimal uses (and
inevitable limitations).
Any sample files used in this chapter are found in the C:\SQL2012DIRecipes\CH01 directory—assuming
that you have downloaded the samples from the book’s companion web site and installed them as described in
Appendix B.

1-1. Ensuring Connectivity to Access and Excel
Problem
You want to be able to import data from all versions of Excel and Access (including the latest file formats) in both
32-bit and 64-bit environments.

Solution
You need to install the Microsoft Access Connectivity Engine (ACE) driver. Here are the steps to follow:
1.

Click Download on the requisite web page. This will download the executable file to
your selected directory.

1
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

■■Note  The ACE driver can be found at www.microsoft.com/en-us/download/details.aspx?id=13255. This

location could change over time—but a quick Internet search should point you to the current source fast enough.
2.

Double-click the AccessDatabaseEngine.exe file that you have downloaded. This will
be AccessDatabaseEngine_x64.exe for the 64-bit version.

3.

Follow the instructions.

4.

In SSMS, expand Server Objects ➤ Linked Servers ➤ Providers.

5.

Assuming that the driver installation was successful, you should see the
Microsoft.ACE.OLEDB.12.0 provider.

6.

Double-click the provider and check Allow InProcess and Dynamic Parameter.

As an alternative to steps 4-6, if you prefer a command-line approach, run the following T-SQL snippet
(C:\SQL2012DIRecipes\CH01\SetACEProperties.Sql in the samples for this book):
EXECUTE master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0' , N'AllowInProcess' , 1;
GO
EXECUTE master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0' , N'DynamicParameters' , 1;
GO
You now have the driver installed and ready to use.


How It Works
Before attempting to read data from Excel or Access, it is vital to ensure that the drivers that allow the files to
be read are installed on your server. Only the “old” 32-bit Jet driver is currently installed with an SQL Server
installation, and that driver has severe limitations. These are principally that it cannot read the latest versions of
Access and Excel, and that it will not function in a 64-bit environment.
Using the latest ACE driver generally makes your life much easier, as the newest versions have
all the capabilities of the older versions as well as adding extra functionality. Despite being called the
“AccessDatabaseEngine,” this driver also reads and writes data to Excel files, as well as to text files.
Confusingly, the 2007 Office System Driver and the Microsoft Access Engine 2010 redistributable are both
found as “Microsoft.ACE.OLEDB.12.0” in the list of linked server providers in SSMS. The 64-bit SQL Server
applications can access to 32-bit Jet and 2007 Office System files by using 32-bit SQL Server Integration Services
(SSIS) on 64-bit Windows.
The versions of the Office drivers currently available are listed in Table 1-1.

2
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Table 1-1.  MS Office Drivers

Driver Title

Driver Name

Source

Comments


OLEDB Provider
for Microsoft Jet

Microsoft.Jet.
OLEDB.4.0

SQL Server Installation
(installed with the client tools)

32-bit only
Reads and writes Excel &
Access 97-2003
Accepts .xls and .mdb formats

2007 Office
System Driver

Microsoft.
ACE.
OLEDB.12.0

www.microsoft.com/downloads/
thankyou.aspx?familyId=
7554f536-8c28-4598-9b72ef94e038c891&displayLang=en

32-bit only
Reads and writes Excel &
Access 97-2007
Accepts .xls/.xlsx/.xslm/.xlsx/

.xlsb and .mdb/.accdb formats

Microsoft Access
Engine 2010
redistributable

Microsoft.
ACE.
OLEDB.12.0

www.microsoft.com/downloads/en/
details.aspx?familyid=C06B8369-60DD4B64-A44B-84B371EDE16D&displaylang
=en#Instructions

32-bit or 64-bit versions
available
Reads and writes Excel &
Access 97-2010
Accepts .xls/.xlsx/.xslm/.xlsx/.
xlsb and .mdb/.accdb formats

Hints, Tips, and Traps


If you still want to use the old 32-bit Jet driver, then you can do so provided that you save
the Excel source in Excel 97–2003 format and are working in a 32-bit environment.



The ACE drivers are supported by Windows 7; Windows Server 2003 R2, 32-bit x86;

Windows Server 2003 R2, x64 editions; Windows Server 2008 R2; Windows Server 2008
with Service Pack 2; Windows Vista with Service Pack 1; and Windows XP with
Service Pack 3.



You can only install either the 64-bit version of the ACE driver or the 32-bit version on the
same server. This means that you cannot develop in Business Intelligence development
Studio (BIDS) or SQL Server Development Tools (SSDT) with the 64-bit ACE driver
installed—as BIDS/SSDT is a 32-bit environment. However, if you install the 32-bit ACE
driver instead, then you cannot run a 64-bit package, and have to use one of the 32-bit
workarounds. Ideally, you should develop in a 32-bit environment with the 32-bit ACE
driver installed (or on a 64-bit machine, but do not expect to run the package normally),
and deploy to a 64-bit environment where the 64-bit driver is ready and waiting.

1-2. Importing Data from Excel
Problem
You want to import data from an Excel spreadsheet as fast and as simply as possible.

3
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Solution
Run the SQL Server Import and Export Wizard and use it to guide you through the import process.
Here is the process to follow:
1.


In SQL Server Management Studio, right-click a database (preferably the one into
which you want the data imported), click Tasks ➤ Import Data (see Figure 1-1).

Figure 1-1.  Launching the Import/Export Wizard from SSMS
2.

Skip the splash screen. The Choose a Data Source screen appears.

3.

Select Microsoft Excel as the data Source, and enter or browse for the file to import.
Be sure to select the Excel version that corresponds to the type of source file from the
pop-up list, and specify if your data includes headers (see Figure 1-2).

4
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Figure 1-2.  Choosing a Data Source in the Import/Export Wizard
4.

Click Next. The Choose a Destination dialog box appears (see Figure 1-3).

5
www.it-ebooks.info


s


Figure 1-3. Choosing a Destination in the Import/Export Wizard
5.

Ensure that the destination is SQL Server Native Client, that the server name is
correct, and that you have selected the right destination database (CarSales_Staging
in this example) and the authentication mode which you are using (with the
appropriate username and password for SQL Server authentication).

6.

Click Next. The Specify Table Copy or Query dialog box appears (see Figure 1-4).

6
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Figure 1-4.  Specifying Table Copy or Query in the Import/Export Wizard
7.

Accept the default “Copy data from one or more tables or views”.

8.

Click Next. The Select Source Tables or Views dialog box appears (see Figure 1-5).

7
www.it-ebooks.info



CHAPTER 1 ■ Sourcing Data from MS Office Applications

Figure 1-5.  Choosing the Source Table(s) in the Import/Export Wizard
9.

Select the worksheet(s) to import.

10. Click Next. The Save and Run Package dialog box appears (see Figure 1-6).

8
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Figure 1-6.  Running the Import/Export Wizard package
11. Ensure that Run Immediately is checked and that Save SSIS Package is not checked.
12. Click Next. The Complete the Wizard dialog box appears (see Figure 1-7).

9
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Figure 1-7.  Completing the Import/Export Wizard
13. Click Finish. The Execution Results dialog box appears. Assuming that all went well,
the data has loaded successfully (see Figure 1-8).


10
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Figure 1-8.  Successful execution using the Import/Export Wizard
14. Click Close to end the process.

How It Works
There will probably be times when your sole aim is to get a load of data from an Excel spreadsheet into an SQL
Server table as fast as possible. Now, when I say “fast,” I do not only mean that the time to load is very short, but
that the time spent setting up the load process is minimal and that the job gets done without going to the bother
of setting up an SSIS package, defining a linked server, or writing T-SQL using OPENROWSET to do the job. This is
where the SQL Server Import and Export Wizard (DtsWizard for short) comes into its own. An extra inducement
is that the guidance provided by the DtsWizard application can be invaluable if you only import spreadsheet data
infrequently.

11
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

As this is the first time that the Import and Export Wizard is explained in this book, I have tried to make the
explanation as complete as possible. The advantage is that you will find many of the techniques explained here
useable for other types of source data, too.
You should use the SQL Server Import and Export Wizard:



When you need to import data from an Excel spreadsheet into an SQL Server table just
once.



When you do not intend to perform the action regularly or frequently.



When you rarely import Excel data, you don’t want to get lost in the arcane world of SSIS
and/or rarely used SQL commands. You want the data imported fast.



When you want to import data from multiple worksheets or ranges in the same workbook.

Assuming that your Excel data is clean and structured like a data table, then the data will load. It can either
be transferred to a new table (or new tables), which are created in the destination database with the same
name(s) as the source worksheets, or into existing SQL Server tables. You can decide which of these alternatives
you prefer in step 8.

Hints, Tips, and Traps


If you are working in a 64-bit environment, the 32-bit version of the Import/Export
Wizard runs from SSMS. To force the 64-bit version to run, choose Start ➤ All Programs
➤ Microsoft SQL Server 2012 ➤ Import and Export Data (64 bit). Should you need to
install the 32-bit version of the wizard, select either Client Tools or SQL Server Data Tools
(SSDT) during setup.




If you plan on using the DtsWizard.exe frequently, add the path to the executable to your
system path variable—unless it has already been added.



You can also launch the SQL Server Import and Export Wizard executable by entering
Start ➤ Run ➤ DtsWizard.exe (normally found in C:\Program Files\Microsoft SQL
Server\110\DTS\Binn), or by double-clicking on the executable in a Windows Explorer
window (or even a command window).

1-3. Modifying Excel Data During a Load
Problem
You want to import data from an Excel spreadsheet, but need to perform a few basic modifications during the
import. These could include altering column mapping, changing data types, or choosing the destination table(s),
among other things.

Solution
Apply some of the available options of the SQL Server Import and Export Wizard. As we are looking at options
for the SQL Server Import and Export Wizard, I will describe them as a series of “mini-recipes,” which extend the
previous recipe.

■■Note  Step numbers in the sections to follow refer to the process in Recipe 1-2.

12
www.it-ebooks.info



CHAPTER 1 ■ Sourcing Data from MS Office Applications

Querying the Source Data
To filter the source data, at step 6, choose the “Write a query to specify the data to transfer”option. You see the
dialog box in Figure 1-9.

Figure 1-9.  Specifying a source query to select Excel data
Here you can enter an SQL query to select the source data. If you have a saved an SQL query, you can browse
to load it. Note that you use the same kind of syntax as when using OPENROWSET, as described in Recipe 1-4. When
writing queries, note that worksheet data sources have a “$” postfix, but ranges do not.

Altering the Destination Table Name
In step 8, you can change the destination table name to override the default worksheet or range name.

13
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Replacing the Data in the Destination Table
Another available option is to replace all the data in the destination table. Of course, this will only affect an
existing table—if the table does not exist, then DTSWiz creates one whichever option is selected.
To do this, at step 8 from earlier, click Edit Mappings. The Column Mappings dialog box appears
(see Figure 1-10).

Figure 1-10.  Editing column mappings in the Import/Export Wizard
Selecting Delete Rows in Destination Table truncates the destination table before inserting the new data.
This option is only available if the file exists already.


Enabling Identity Insert
The Column Mappings dialog box (see Figure 1-10) also lets you enable identity insert, and insert values into an
SQL Server Identity column. Simply check the “Enable identity insert” check box.

14
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Adjusting Column Mappings
The Column Mappings dialog box also lets you specify which source column maps to which specific destination
column. Simply select the required destination column from the pop-up list—or <Ignore> if you do not wish to
import the data for a specific column.

Changing Field Types for New Tables
You can—within the permissible limits of data type mappings—change both field types and lengths/sizes.
Altering the size of a text field avoids the default 255-character import text field length. Changing the field type
modifies the field type during the data load.
If you are creating a new table, then the new table is created with the newly defined field types and sizes.
However, be warned, altering data types will not alter the data, and any types or data lengths that you choose
must be compatible with the source data, or the load will fail.

Creating an SQL Server Integration Services (SSIS) Package
from the Import/Export Wizard
An extremely useful feature of the Import/Export Wizard is the ability to create a fully-fledged SSIS package from
the parameters that you have set when configuring your import. This is probably no surprise, as the Import/
Export Wizard is, essentially, an SSIS package generator. While the packages that it generates are not perfect, they
are a good—and fast—start to an ETL creation process.
To generate the SSIS package, simply check the Save SSIS Package box in the Save and Execute

Package dialog box (see step 9, Figure 1-6). You are prompted for a file location. The package is created when
you click Finish.

How It Works
Having stressed (I hope) that DtsWizard is a fabulous tool for rapid, simple data imports, I wanted to extend your
understanding by showing how versatile a tool the DtsWizard can prove to be in more complex import scenarios.
This is due to the wide range of options and parameters that are available to help you to fine-tune Excel imports.

Hints, Tips, and Traps


If you are using SQL Server 2005, then you will find a couple of minor differences in the
Choose a Data Source dialog box shown in Figure 1-2.



Clicking on any messages in the message column of the final dialog box (see Figure 1-8) is
invaluable for getting error messages should there be any problems.

1-4. Specifying the Excel Data to Load During an Ad-Hoc Import
Problem
You want to import only a specific subset of data from an Excel spreadsheet by defining the rows to load or
filtering the source data.

15
www.it-ebooks.info


s


Solution
Use SQL Server’s OPENROWSET command as part of a SELECT statement. This lets you use standard T-SQL to subset
the source data. For example, you can run the following code snippets:
1.

In the CarSales_Staging database, create a destination table named LuxuryCars
defined as follows (C:\SQL2012DIRecipes\CH01\tblLuxuryCars.Sql):
CREATE TABLE dbo.LuxuryCars
(
InventoryNumber int NULL,
VehicleType nvarchar(50) NULL
) ;
GO

2.

Enable remote queries, either by running the Facets/Surface Area Configuration tool
(or the Surface Area Configuration tool directly in SQL Server 2005), or running the
T-SQL given in the following
(C:\SQL2012DIRecipes\CH01\AllowDistributedQueries.Sql):
EXECUTE master.dbo.sp_configure 'show advanced options', 1;
GO
reconfigure ;
GO
EXECUTE master.dbo.sp_configure 'ad hoc distributed queries', 1 ;
GO
reconfigure;
GO

3.


Run the following SQL snippet
(C:\SQL2012DIRecipes\CH01\OpendatasourceInsertACE.Sql):
INSERT INTO CarSales_Staging.dbo.LuxuryCars (InventoryNumber, VehicleType)
SELECT CAST(ID AS INT) AS InventoryNumber, LEFT(Marque, 50) AS VehicleType
FROM OPENDATASOURCE(
'Microsoft.ACE.OLEDB.12.0',
'Data Source = C:\SQL2012DIRecipes\CH01\CarSales.xls;Extended Properties = Excel 12.0')...
Stock$
WHERE MAKE LIKE '%royce%'
ORDER BY Marque;

How It Works
There are times when quick access to the data in an Excel worksheet is all you need. This could be because you
need to perform a quick SELECT...INTO or INSERT INTO...SELECT using Excel as the data source. In this case,
firing up SSIS—or even running the Import Wizard (see Recipe 1-2)—to load data can seem like overkill. This
is where judicious application of SQL Server’s OPENDATASOURCE and OPENROWSET commands as part of a SELECT
statement can be extremely useful.

16
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Indeed, as you will see shortly, once you know how to connect to the source file, even quite complex T-SQL
SELECT statements can be used on Excel source data. And, as you are writing standard SQL commands, they can
be run from a query window or as part of a stored procedure. This is particularly useful when:



You want to read the contents of an Excel worksheet, but don’t want to clutter up your
database with extra tables of information.



The data will be read infrequently.



You know the file (workbook) and worksheet names, and have a good idea of the data
structures—in other words, you can open the file to read it.



When you want to perform ad hoc querying, and choose the columns and filter the data
using standard SQL commands.

Without attempting to be exhaustive, there are some variations on this theme. I use either the Jet driver or
the ACE driver indiscriminately. I use Excel worksheets in both 97–2003 and 2007–2010 formats because the
techniques described works with all these formats. I am not adding INSERT INTO or SELECT ... INTO Code here,
but presume that you will be selecting one or the other in a real–world scenario,

■■Note  As this is, after all, an ad-hoc scenario, you could well have to run SSMS in "Administrator” mode – by
right-clicking on SQL Server Management Studio from the start menu and selecting "Run as Administrator”. This
is because the user running SSMS must have read and Write permissions on the TEMP directory used by the SQL
Server Startup account.
Assuming that you have a named range (TinyRange in the sample file), then you can return the data in the
range using T-SQL like this:
SELECT ID, Marque FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database = C:\SQL2012DIRecipes\CH01\CarSales.xls', TinyRange);

If the range does not contain column headers, then you will need to add the HDR = NO property to the T-SQL,
as follows. Otherwise, the first row is presumed to be column headers.
SELECT ID, Marque FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;HDR = NO;Database = C:\SQL2012DIRecipes\CH01\CarSales.xls', TinyRange);
If you know the Excel range references corresponding to the data that you want to return, then you can use
an SQL snippet like this:
SELECT ID, Marque FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0;Database = C:\SQL2012DIRecipes\CH01\CarSales.xlsx',
'SELECT * FROM [Stock$A2:B3]');
You must remember to provide the worksheet as well as the range, as no default worksheet is presumed.
Similarly, remember to add HDR = NO if the range does not contain column headers.
As the previous snippet showed, you can pass an entire SELECT statement via the OLEDB driver to Excel. This
presents a whole range of possibilities, such as choosing individual columns. For example:
SELECT ID, Marque FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0;Database = C:\SQL2012DIRecipes\CH01\CarSales.xlsx',
'SELECT ID, Marque FROM [Stock$A1:C3]');

17
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

Just as in a standard T-SQL statement, you can alias the columns returned. For example:
SELECT InventoryNumber,VehicleType FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0;Database = C:\SQL2012DIRecipes\CH01\CarSales.xlsx',
'SELECT ID AS InventoryNumber, Marque AS VehicleType FROM [Stock$A2:C3]');
The “pass-through” query that you send to Excel can also sort the data that is returned. The following
example sorts by Marque:
SELECT ID, Marque FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0;Database = C:\SQL2012DIRecipes\CH01\CarSales.xlsx',
'SELECT ID, Marque FROM [Stock$A2:C3] ORDER BY Marque');
Finally, if you want to add a WHERE clause, you can do so:
SELECT InventoryNumber,VehicleType FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0;Database = C:\SQL2012DIRecipes\CH01\CarSales.xlsx',
'SELECT ID AS InventoryNumber, Marque AS VehicleType
FROM Stock$ WHERE MAKE LIKE ''%royce%'' ORDER BY Marque');
In the provider options, you need to check Supports ‘Like’Operator for such a sort to work. Note also that you
will need to duplicate the single quotes if you are using the LIKE operator.
You might have a source file without headers for the data. In this case, all you need to do is add HDR = NO;
to the syntax. In these circumstances, it is probably best to use column aliases to give the output data greater
readability, or the OLEDB provider will merely rename all the columns F1, F2, and so forth. For example:
SELECT InventoryNumber,VehicleType FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0;HDR = NO;Database = C:\SQL2012DIRecipes\CH01\CarSales.xlsx',
'SELECT F1 AS InventoryNumber, F2 AS VehicleType FROM [Stock$A2:C3] WHERE MAKE LIKE
''%royce%'' ORDER BY Marque');
HDR is not the only property that you might need to know about when importing Excel data. Table 1-2
describes your options. Understanding the IMEX (mixed data types) property is also useful in some cases.
Table 1-2.  Jet and ACE Extended Properties

Property Name

Description

Examples

HDR

Specifies if the first row returned contains headers.


HDR = NO

IMEX

Allows for mixed data types to be imported inside a single column.

IMEX = 1

Extended properties do require further explanation. Here, HDR merely indicates to the driver whether your
source data contains header rows. As the presumption (at least using the Jet and ACE drivers) is that there are
header rows, setting this property to NO when there are no headers avoids not only having the first record appear
as the column names, but also a potential mismatch of data types. It is worth noting that you do not need to
specify the Excel file type (.xls/.xlsx/.xslm/.xlsx/.xlsb) as the ACE driver will recognize the file type automatically.
IMEX is marginally trickier. It does not force the data in a column to be imported as text—it forces the mixed
data type defined in the registry for this OLEDB driver to be used. As this registry entry is text by default, it nearly
always forces the data in as text. It will not convert the data to text. Depending on the driver (that is, when using
the Jet driver in most cases), not setting IMEX = 1 can cause a load failure or return NULLs instead of numeric values
in a column containing text and numbers.

18
www.it-ebooks.info


CHAPTER 1 ■ Sourcing Data from MS Office Applications

1-5. Planning for Future Use of a Linked Server
Problem
You want to import only a subset of data from an Excel spreadsheet, but you suspect that you will need to carry
out this operation repeatedly, and eventually migrate it to a linked server solution. You do not want to have to
rewrite everything further down the line.


Solution
Use SQL Server’s OPENDATASOURCE command as part of a SELECT statement. For example,
(C:\SQL2012DIRecipes\CH01\OpendatasourceSelect.Sql):
SELECT ID AS InventoryNumber, LEFT(Marque,20) AS VehicleType
INTO RollsRoyce
FROM OPENDATASOURCE(
'Microsoft.ACE.OLEDB.12.0',
'Data Source = C:\SQL2012DIRecipes\CH01\CarSales.xls;Extended Properties = Excel 8.0')...Stock$
WHERE MAKE LIKE '%royce%'
ORDER BY Marque;

How It Works
The OPENROWSET command is suited to ad hoc querying. However, you may be evaluating data connection
possibilities with a view to eventually using a linked server. In this case, you may prefer to use the
OPENDATASOURCE command as a kind of “halfway house” to linked servers (described in the next recipe). This sets
the scene for you to update your code to replace OPENDATASOURCE with a four-part linked server reference.
Inevitably, there are many variations on this particular theme (which only selects all the data from a source
worksheet and uses only the ACE driver), so here are a few of them. As the objective is to import data into SQL
Server, I will let you choose whether to include this code in either a SELECT..INTO or an INSERT INTO ...SELECT
clause. Of course, you can use the Jet driver if you prefer. If you are using Excel 2007/2010, you must set the
extended properties in the T-SQL to Excel 12.0.
SELECT ID, Marque FROM OPENDATASOURCE(
'Microsoft.ACE.OLEDB.12.0',
'Data Source = C:\SQL2012DIRecipes\CH01\CarSales.xlsx;Extended Properties = Excel 12.0')...Stock$;
To select all the data in a named range, use the following T-SQL:
SELECT ID, Marque
FROM OPENDATASOURCE(
'Microsoft.ACE.OLEDB.12.0',
'Data Source = C:\SQL2012DIRecipes\CH01\CarSales.xls;Extended Properties = Excel 8.0')... TinyRange;

To select—and if you wish alias—columns in the Excel source data, use T-SQL like in the following. Note that
this is applied to the T-SQL, and is not part of a pass-through query.
SELECT ID AS InventoryNumber, Marque AS VehicleType
FROM OPENDATASOURCE(
'Microsoft.ACE.OLEDB.12.0',
'Data Source = C:\SQL2012DIRecipes\CH01\CarSales.xls;Extended Properties = Excel 8.0')...Stock$;

19
www.it-ebooks.info


×