SQL Server 2000 Design and Implementation, Part IX -
Importing and Exporting Data...

This series of articles is focusing primarily on the physical data store Microsoft developers are most likely to use for their applications, SQL Server. The main reference for this series of articles is MCSE: SQL Server 2000 Design by Israel and Jones.


By: Chris Sully Date: June 5, 2004 Printer Friendly Version

Introduction

This series of articles is focussing primarily on the physical data store Microsoft developers are most likely to use for their applications, SQLServer. The main reference for this series of articles is MCSE: SQLServer 2000 Design by Israel and Jones.

Here's a quick overview of what we've covered so far as well as what we shall be covering:

Article I: we started with developing a logical data model, before looking at how we go about implementing the physical database in SQLServer 2000.

Article II: creating and maintaining databases, in particular the files that comprise the database and how they can be split into filegroups and why they should be on occasion.

Article III: creating and maintaining tables - looking at what we can store in tables, and how we do so.

Article IV: data integrity - default values, check constraints and rules, primary keys, unique constraints, as well as foreign keys and relationships.

Article V: indexes and the related area of statistics.

Article VI: creating and maintaining the various database objects that exist- views, stored procedures, transactions, user defined functions and triggers. This concludes the look at the physical design per se. The remainder of the articles look at issues relating to data access.

Article VII: accessing data

Article VIII: modifying data

Article IX: importing and exporting data

Article X: locking

Article XI: security

Article XII: analysing and optimising data access

Thus far we've reached article IX: importing and exporting data. In this article we'll be looking at the two sets of tools provided by SQL Server for this purpose: bulk copy and DTS. For a change in this series we don't have a great deal of ground to cover in a singe article.

Bulk Copy/ BULK INSERT

SQLServer includes two utilities for the bulk copying of data: BCP and BULK INSERT.

BCP is the legacy method for loading or extracting data and has been included with SQLServer for a number of versions. BCP is a distinct program called bcp.exe, which is located in the MSSQL\BINN folder and is a command line utility run from the command prompt. As a command line utility it lends itself to being scripted.

BCP was written to import and extract data between SQLServer and a data file. This data file can be in a text format, or it could be in a binary format that is understood by SQLServer (native format). The utility can be run in an interactive mode with minimal options specified or it can be automated with additional text files.

If the data file and the SQLServer table are in different formats, with different column names or orders, then a format file must be used. This specifies the mapping between the data file and SQLServer.

The syntax of bcp is:

bcp {[[database_name.][owner].]{table_name | view_name} | "query"}
    {in | out | queryout | format} data_file
    [-m max_errors] [-f format_file] [-e err_file]
    [-F first_row] [-L last_row] [-b batch_size]
    [-n] [-c] [-w] [-N] [-V (60 | 65 | 70)] [-6]
    [-q] [-C code_page] [-t field_term] [-r row_term]
    [-i input_file] [-o output_file] [-a packet_size]
    [-S server_name[\instance_name]] [-U login_id] [-P password]
    [-T] [-v] [-R] [-k] [-E] [-h "hint [,...n]"]

For further details of the various arguments see Books Online.

BULK INSERT is an enhancement to the TSQL language that allows the execution of a bulk copy program from within the server. This command is similar to the BCP program but you can use it from within a TSQL batch. Unlike the BCP program only members of the Sysadmin or Bulkadmin fixed server roles have permission to run the BULK INSERT command.

The syntax of bulk insert is:

BULK INSERT [ [ 'database_name'.] [ 'owner' ].] { 'table_name' FROM 'data_file' }
  [ WITH
    (
      [ BATCHSIZE [ = batch_size ] ]
      [ [ , ] CHECK_CONSTRAINTS ]
      [ [ , ] CODEPAGE [ = 'ACP' | 'OEM' | 'RAW' | 'code_page' ] ]
      [ [ , ] DATAFILETYPE [ =
        { 'char' | 'native'| 'widechar' | 'widenative' } ] ]
      [ [ , ] FIELDTERMINATOR [ = 'field_terminator' ] ]
      [ [ , ] FIRSTROW [ = first_row ] ]
      [ [ , ] FIRE_TRIGGERS ]
      [ [ , ] FORMATFILE = 'format_file_path' ]
      [ [ , ] KEEPIDENTITY ]
      [ [ , ] KEEPNULLS ]
      [ [ , ] KILOBYTES_PER_BATCH [ = kilobytes_per_batch ] ]
      [ [ , ] LASTROW [ = last_row ] ]
      [ [ , ] MAXERRORS [ = max_errors ] ]
      [ [ , ] ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) ]
      [ [ , ] ROWS_PER_BATCH [ = rows_per_batch ] ]
      [ [ , ] ROWTERMINATOR [ = 'row_terminator' ] ]
      [ [ , ] TABLOCK ]
    )
  ]

For further details of the various arguments see Books Online. They are similar to those for bcp.

Example:

Create a text file with note pad containing names with first and surnames separated by tabs on each line, e.g.:

Chris Sully
Sian  Davies
Fiona  Phelps
Tania  Barra-Shaw

Save this as c:\test.txt

In Query Analyzer create a new table to transfer this data to:

CREATE TABLE BulkInsertTest
(
  Firstname varchar(20),
  Surname varchar(20)
)

Now BULK INSERT the data:

BULK INSERT BulkInsertTest FROM 'c:\test.txt'

and check it's worked:

select * from BulkInsertTest

DTS

DTS is a subsystem of SQLServer designed to move and manipulate data between formats and platforms. It is much more powerful than BCP. The subsystem includes DTS connections that allow access to data in various formats, DTS tasks that can transform data or perform some actions, and the ability to link these items together into a workflow using constraints. The entire collection of tasks, connections and constraints is called a package.

The DTS programming environment is primarily a GUI based development tool called the DTS designer. By default all tasks that are added to a package execute in parallel when the package is run. Tasks can be arranged in a workflow however, by arranging the tasks in sequence. Once each task completes it returns a success or failure code.

Connections are sources or destinations of data based on OLEDB. These may be SQLServers, Access databases, Excel spreadsheets, text files etc.

Tasks are units of functionality that can be used to complete various processes. These tasks can be added to a package and linked together to perform almost any type of data transfer, manipulation or extraction. Many tasks require connections and the DTS designer will ask for the source and destination connections of the task. Example tasks are: FTP, transform data, execute SQL, send mail, etc. There are almost 20 standard tasks.

Once the tasks are added to a package they can be connected together in a workflow. This can be of three types: on completion, on success and on failure.

A collection of connections, tasks, and workflows is known as a DTS package. Most often the whole package will be run using the DTSRUN command line utility. Packages can be saved in four different formats:

As a simple example for starters you may like to use the DTS designer to import the data as per the BULK INSERT example earlier. The steps you will need to accomplish on the designer surface are:

  1. Add a connection to the destination database of your package.
  2. Add a Bulk Insert task specifying c:\test.txt as the source data file.
  3. Save and execute the package.

There we leave importing and exporting data. In the next article we consider issues surrounding locking in SQLServer.

References

MCSE: SQLServer 2000 Design
Israel and Jones
Sybex

MSDN

Books Online