SQL Server 2000 Design and Implementation, Part VII -
Accessing Data...

This series of articles shall focus on the data store you use for your .NET applications. In particular it shall consider relational database management systems in general for the first article and SQL Server 2000 in particular for the remaining.


By: Chris Sully Date: May 22, 2004

Introduction

This series of articles is focussing primarily on the physical data store Microsoft developers are most likely to use for their applications, SQLServer. The main reference for this series of articles is MCSE: SQLServer 2000 Design by Israel and Jones.

Here's a quick overview of what we've covered so far as well as what we shall be covering:

Article I: we started with developing a logical data model, before looking at how we go about implementing the physical database in SQLServer 2000.

Article II: creating and maintaining databases, in particular the files that comprise the database and how they can be split into filegroups and why they should be on occasion.

Article III: creating and maintaining tables - looking at what we can store in tables, and how we do so.

Article IV: data integrity - default values, check constraints and rules, primary keys, unique constraints, as well as foreign keys and relationships.

Article V: indexes and the related area of statistics.

Article VI: creating and maintaining the various database objects that exist- views, stored procedures, transactions, user defined functions and triggers. This concludes the look at the physical design per se. The remainder of the articles look at issues relating to data access.

Article VII: accessing data

Article VIII: modifying data

Article IX: importing and exporting data

Article X: locking

Article XI: security

Article XII: analysing and optimising data access

Thus far we've reached article VII: accessing data. In this article we'll be looking at the SELECT statement, cursors, distributed queries and accessing data via XML in SQLServer.

The examples presented are based on the Northwind database supplied with SQLServer.

SELECT

Basics – single table queries

The simplest type of SELECT query involves a single table:

SELECT * FROM Customers

The data is returned in natural order, the order in which the data is stored in the table. The asterisk translates to all columns. You can specify columns in which case you are engaging in vertical partitioning of the data. It is wasteful of resources to return columns that are not required.

You can use the ORDER BY clause to order the returned list by any of the columns.

SELECT * FROM Customers ORDER BY ContactName ASC

The default is ascending (ASC) which can be switched to DESC. You may also include multiple column names in the ORDER BY clause, separating with commas. Note that text, ntext and image columns are not valid in the ORDER BY clause.

The WHERE clause allows filtering on row criteria:

select * from Customers where Country='UK'

You can specify multiple criteria via use of the keyword AND between the qualifiers. You can use any valid TSQL operator in the WHERE clause, e.g. >, <, != or <> and LIKE.

You can specify aliases (alternate names) for columns.

select
  CustomerID 'Customer Code',
  CompanyName as 'Company',
  'Contact' = ContactName
from Customers where Country='UK'

As you can see from the above there are three different ways for an alias to be specified. Note that it is the AS syntax which is the ANSI standard and hence should be preferred.

There are a couple of other ways to limit the results: SET ROWCOUNT and TOP. The former is included for backward compatibility and should not be used if possible though there are instances where it offers required functionality not possible with TOP. With TOP the programmer can specify an absolute number of rows or a percentage of the result set. Note if there is an ORDER BY clause this is applied before TOP.

SELECT TOP 5 * from Customers where Country='UK'

Joins – multiple table queries

The most common method of linking information in different tables is the inner join. An inner join works by comparing columns in two tables and returning the requested information if the values of the columns match.

Commonly you will employ aliases to shorten code length, as follows:

SELECT c.CustomerID AS 'Customer Code',
  c.Companyname AS 'Company',
  o.OrderID AS 'Order Code',
  o.OrderDate
  FROM Customers AS c, Orders AS o
  WHERE c.CustomerID = o.CustomerID
  ORDER BY o.OrderDate

As you can see you can apply aliases to tables names as well as table columns.

However the above isn't ANSI standard; the following is:

SELECT c.CustomerID AS 'Customer Code',
  c.Companyname AS 'Company',
  o.OrderID AS 'Order Code',
  o.OrderDate
  FROM Customers AS c INNER JOIN Orders AS o
  ON c.CustomerID = o.CustomerID
  ORDER BY o.OrderDate

Whereas inner joins require that the qualifying conditions be met for each row to be included in the result set outer joins allow all rows from one or more tables to be included in the result set. There are three types of outer join that can be written in TSQL: left outer joins, right outer joins and full outer joins.

The left and right outer joins are very similar, differing only in which table has all its rows included. The left outer join includes all the rows from the table on the left side of the join syntax, while the right outer join is the opposite.

The full outer join returns all rows from both tables, matching up the rows wherever a match can be found and placing NULLs in the places where no matching rows exist.

The cross join, aka cross product, is rarely used except for creating test data. If you cross join two tables you get all possible combinations of rows in the first with rows in the second.

All the joins follow a similar syntax.

Aggregate Operators

Aggregate operators provide a summary of information in a query. Supported in SQLServer are: SUM, AVG, COUNT, MIN, MAX.

Whenever a non-aggregated field is included in a query, it must be accompanied by a GROUP BY clause and listed in this clause.

You can also use a HAVING clause which is like a WHERE clause for aggregates.

CUBE and ROLLUP are used with the GROUP BY clause to add additional rows that summarise the totals by groups. CUBE will provide additional summary rows for every combination of group and sub group that is returned by the query. ROLLUP will provide summary rows in hierarchical order from the lowest level group to the highest.

The COMPUTE is included for backward compatibility to produce aggregates as extra result sets. COMPUTE also includes an optional BY keyword that will break the results down into further results sets.

The UNION command is used to join two SELECT queries. The results can include or exclude duplicates (UNION vs UNION ALL). Both queries that are included in the UNION must have the same number of columns.

The DISTINCT command is used to remove duplicates from a single result set.

A scalar, aka literal, is essentially a constant value that you want to return. You can also use a scalar in combination with a logical test.

Here's an example of a SELECT query using the GROUP BY clause:

SELECT o.CustomerID, SUM(od.Quantity*od.UnitPrice) AS 'Total Sales'
FROM Orders o, [Order Details] od
WHERE o.OrderID = od.OrderID
AND o.OrderDate> '31/12/1997'
GROUP BY o.CustomerID

Subqueries

A subquery is a query embedded inside another query. They can appear in many places including:

Functions

The following are available for use in your SELECT statements and elsewhere. See Books Online if you don't know what purpose any of the following serve.

General

String

Date and time

Cursors

Cursors simulate a table structure derived from a result set, and allow the programmer to access the data row by row. Note that whilst essential occasionally, cursors should be avoided if possible as they are at odds with the efficient set processing fashion in which SQLServer prefers to deal with data.

Cursors can be used to retrieve data and update it row by row, but cannot be used to insert data. Options can dictate whether the changes to the cursor data are reflected in the underlying data source.

Cursor options:

LOCAL|GLOBAL: LOCAL being the default.

LOCAL cursors are visible only within the batch, stored procedure or trigger in which they are declared. They can only be referenced by variables within these constructs, though they can be assigned to an output parameter from a stored procedure and assigned to a cursor variable in the calling program. A LOCAL cursor is deallocated implicitly when the connection closes.

A GLOBAL cursor has a scope that includes the current connection and may span multiple batches, stored procedures or triggers. The cursor will only be implicitly deallocated when the connection closes.

It is best practice to deallocate a cursor as soon as it is no longer needed.

FORWARD_ONLY|SCROLL

Controls behaviour with regard to accessing data within the cursor.

FORWARD_ONLY performs faster/ uses less resources but SCROLL allows you to move backwards through the data if required. You should use FORWARD_ONLY unless you have a good reason to use SCROLL.

STATIC|KEYSET|DYNAMIC|FAST_FORWARD

Controls how the cursor interacts with the data on which it is based as well as how SQLServer controls the cursor internally.

STATIC cursors consist of a temporary copy of the data from the underlying sources. They are implemented as a temporary table in Tempdb, as defined by the SELECT statement. Changes to the underlying data are not reflected in the cursor. This type of cursor is not updateable.

KEYSET cursors also use Tempdb, but only to store the key values that identify the rows in the underlying data. The main feature of this type of cursor is that changes to non-key values are reflected in the cursor. This allows the most up-to-date information about non-key fields to be available inside the cursor. Inserts as well as changes to key values are not reflected in the cursor unless the update is performed through the cursor and the WHERE CURRENT OF clause is used.

DYNAMIC cursors are effectively views of the underlying data source and reflect all changes to data made outside the cursor. Absolute fetches are not supported.

FAST_FORWARD cursors are optimised forward-scrolling, read-only cursors.

As you might expect, the greater the functionality offered the greater the resources required/ the poorer the performance. So choose appropriately.

READ_ONLY|SCROLL_LOCKS|OPTIMISTIC

READ_ONLY: no updates allowed.

SCROLL_LOCKS: updates through the cursor are guaranteed to succeed as SQLServer locks the rows as they are read into the cursor! Consider the implications of this!

OPTIMISTIC: used when locks on the rows read into the cursor are not required. If an update or delete is made to a row in the cursor that was changed by another source, the update or delete in the cursor will fail.

TYPE_WARNING: indicates a message should be sent to the client if the server changes the type of cursor that was submitted (e.g. if the client specifies options that necessitate such a change).

FOR UPDATE: specifies columns in the cursor that are updateable.

Once the cursor has been declared the data inside the cursor is retrieved using the FETCH command. Each time this statement is issued, it returns a single row of data into local variables that must match the data types of the columns included in the cursor. Options of FETCH, also dependent on the type of cursor defined, are: NEXT, PRIOR, FIRST, LAST, ABSOLUTE, RELATIVE.

Each FETCH execution sets the value of the global variable @@FETCH_STATUS. The value will be 0 if no error occurred.

An example is:

DECLARE Employee_Cursor CURSOR FOR
SELECT LastName, FirstName FROM Northwind.dbo.Employees
OPEN Employee_Cursor
FETCH NEXT FROM Employee_Cursor
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM Employee_Cursor
...--do something with the data!
END
CLOSE Employee_Cursor
DEALLOCATE Employee_Cursor

Distributed queries

The concept of linked servers allows a variety of data stores to be queried as if the data were stored in a table on the same system. Distributed queries are also supported by distributed partitioned views.

Any OLEDB provider can be used to create a linked server. The data sources exposed in this fashion can either expose their data as a rowset or return data from a query as a rowset. If the remote data source has a database engine that can return a rowset, the section of the query that deals with the remote data source is sent to that datasource for processing. For OLEDB sources that do not have an engine, such as Excel or text files, the OLEDB provider is responsible for executing the query and returning the rowset.

SQLServer registers information about the location of the data source, the drivers needed and any security information that is required. The data source is then queried from within a database using a four-part name of the object being queried, i.e.:

linked_server_name.catalog.schema.object_name

where catalog is the SQLServer instance name and schema is equivalent to the owner of the table. This four-part name is the same as that which could equally be used to identify local objects, but is not normally required.

The sp_addlinkedserver system stored procedure adds a linked server, which takes as parameters the values necessary to create the link.

Security in a linked server environment is handled by storing information about remote logins on the local server. The remote logins can be mapped directly to local logins, or a single account can be chosen for all queries sent to the linked server. One last alternative is to allow the current login’s credentials to be 'passed through' to the linked server. The linked server will still authenticate the login to the pertinent database objects. Adding a remote server login is accomplished via the sp_addlinkedsrvlogin system stored procedure.

Registering a linked server is a little cumbersome if you are only going to execute a few queries against the linked server. Thus SQLSever provides the OPENROWSET function for such scenarios which allows ad hoc access to the remote data source within the FROM clause of a SELECT query. It may also be referenced as the target of an INSERT, UPDATE or DELETE clause if the OLE DB provider supports this functionality. Again this functionality is achieved via an OLEDB data provider. You simply replace the table reference within the FROM statement of the SELECT clause with the OPENROWSET function call, passing in the appropriate parameters.

In traditional SQLServer views if a UNION operator is included in the view definition the view is not updateable. A distributed partitioned view is a special view that includes a union of data from multiple servers but is allowed to be updated if it meets certain criteria. Again you simply reference the external data source via the 4-part naming scheme as above when creating the view.

SQLServer 2000 and XML

I'll assume the reader is familiar with the general principles of XML.

The SELECT statement syntax includes a FOR XML clause that will return data as XML rather than a standard recordset. There are three different modes:

RAW: transforms each row in the recordset into an XML element with the identifier row. Each non-NULL column is mapped into an attribute of the element.

AUTO: allows SQLServer to format an XML document in a logical way based on the table schema. Each row becomes an element, with each column becoming an attribute.

EXPLICIT – requires the query to specify how the XML document will be formed via a universal table, thus providing maximum power and flexibility. See Books Online for further information.

FOR XML options are:

XMLDATA – returns the data schema as an inline schema, pre-pended to the document.

ELEMENTS – applicable to AUTO mode, the columns of the recordset are returned as sub-elements rather than attributes

BINARY BASE64 - indicates binary data should be returned as BASE64 encoded

An example:

SELECT Customers.CustomerID, Orders.OrderID, Customers.ContactName
FROM Customers, Orders
WHERE Customers.CustomerID = Orders.CustomerID
FOR XML AUTO

returns (partial result)

<Customers CustomerID="ALFKI" ContactName="Maria Anders">
<Orders OrderID="10643"/>
<Orders OrderID="10692"/>
<Orders OrderID="10702"/>
<Orders OrderID="10835"/>
<Orders OrderID="10952"/>
<Orders OrderID="11011"/>
</Customers>
<Customers CustomerID="ANATR" Contact "/>
<Orders OrderID="11063"/>
...

Accessing data through a URL

SQLServer 2000 integrates with IIS to provide access to SQLServer data. The data is returned as XML format and queried using the FOR XML option. You must configure IIS to support the option – a virtual root must be created that is accessible via a URL. Data will then be accessible via this URL in a few different ways: SQL queries can be appended to the URL as a parameter named sql.

You can also choose to create template files on the server that specify the query. These are themselves XML documents of a set format. Templates are not limited to inline SQL queries as stored procedures may be called as well, to which parameter values can be supplied.

XSL can be used to reformat the result sets delivered by templates.

There we conclude our look at the various ways in which we may obtain data from SQLServer. In the next article we turn to the ways in which we may add data to SQLServer.

References

MCSE: SQLServer 2000 Design
Israel and Jones
Sybex

MSDN

Books Online