SQL Server 2000 Design and Implementation, Part III -
Creating and Maintaining Tables...

This series of articles shall focus on the data store you use for your .NET applications. In particular it shall consider relational database management systems in general for the first article and SQL Server 2000 in particular for the remaining.


By: Chris Sully Date: April 16, 2004

Introduction

This series of articles is focussing primarily on the physical data store Microsoft developers are most likely to use for their applications, SQLServer. The main reference for this series of articles is MCSE: SQLServer 2000 Design by Israel and Jones.

Here's a quick overview of what we've covered so far as well as what we shall be covering:

Article I: developing a logical data model, before looking at how we go about implementing the physical database in SQLServer 2000.

Article II: creating and maintaining databases, in particular the files that comprise the database and how they can be split into filegroups and why they should be on occasion.

Article III: creating and maintaining tables - looking at what we can store in tables, and how we do so.

Article IV: data integrity - default values, check constraints and rules, primary keys, unique constraints, as well as foreign keys and relationships.

Article V: indexes and the related area of statistics.

Article VI: creating and maintaining the various database objects that exist- views, stored procedures, transactions, user defined functions and triggers. This concludes the look at the physical design per se. The remainder of the articles look at issues relating to data and data access.

Article VII: accessing data

Article VIII: modifying data

Article IX: importing and exporting data

Article X: locking

Article XI: security

Article XII: analysing and optimising data access

Thus far we've reached article III: creating and maintaining database tables. We'll be looking at what you can store in tables, how you go about creating and altering them as well as how tables themselves are structured and stored.

Creating a table

The full syntax for creating a table (see Books Online) is complex with the definition containing components that raise issues that will be considered in the remaining articles in this series but the basic syntax is:

CREATE TABLE tablename
({columnname datatype} [NULL |NOT NULL] [,...n]
)

Datatypes

Every column in a table has a datatype, except computed columns, which we'll encounter shortly. Data types are a part of domain integrity as they limit the nature of information that can be stored in a column. Datatypes can be system defined or user defined.

Nulls

A value of a specific column may (NOT NULL) or may not (NULL) be required. NULL means unknown. Note that NULL is not the same as zero or the empty string as NULL values will not count by default in operations like averages. If not specified the default value is controlled by the ANSI null default database option, obtainable via:

SELECT DATABASEPROPERTYEX('databasename','IsAnsiNullDefault')

You use ALTER DATABASE with the SET command option to change the value.

Unique Identifiers

There are several ways to make your table rows uniquely identified, e.g. the IDENTITY and ROWGUIDCOL properties.

Identity is SQLServer's auto-numbering property allowing specification of the seed (first created value) and subsequent increments thereof (default values are 1,1). When you insert a row a new identity value is automatically created; if you delete a row the identity value it used will not be reused. Unless you SET IDENTITY INSERT tablename ON you can't explicitly insert values into an identity column. A table can have only one identity column and it must be an integer.

A common need is to obtain the identity value last inserted. Three solutions:

  1. The global variable @@IDENTITY returns the last identity value inserted in the current session across all scopes. A scope is a stored procedure, trigger, function or a batch. This is an important consideration.
  2. IDENT_CURRENT is a function that returns the last inserted identity value in a specific table in any session and in any scope.
  3. SCOPE_IDENTITY is a function that returns the last inserted identity in the current session and scope.

When defining an identity you can specify it as NOT FOR REPLICATION meaning that the column will retain its values in the replicated table; otherwise they will be automatically generated which could cause issues.

Finally, you can query the table's identity column without knowing its name, referring to the IDENTITYCOL keyword instead.

Identity values ensure local uniqueness but do not guarantee global uniqueness. The UNIQUEIDENTIFIER datatype and the ROWGUIDCOL property are used to indicate that a column is a globally unique identifier (GUID – a 128-bit number guaranteed to be unique generated by the system).

With the ROWGUIDCOL property set the column can be queried with the ROWGUIDCOL keyword in a SELECT statement.

See Books Online for further details.

Collation

See the last article for introductory information on collations. Each character column of a table can have a different character set or sort order. If you don't define a collation for a column then the column will default to that of the table. Generally, defining collations at the column level is not recommended – if you have internationalisation considerations it is better to use Unicode columns (NCHAR, NVARCHAR, NTEXT data types) and the database's default collation. You will use double the storage space but the resultant system will be easier to manage.

Filegroups

As previously introduced tables can be placed on specific filegroups (logical entities of one or more data files) and assigned for storage on a particular underlying system disk. A typical scenario might be archiving sales data to a separate database on a separate filegroup on a separate disk when sales have been invoiced. This will keep the sales table itself small and optimised for daily use until it is has been fully processed and it can be archived.

Filegroup placement is achieved simply by the ON keyword, e.g.

CREATE TABLE Clients ON filegroupname

By default the filegroup would be Primary.

Altering a table

There are a number of rules to changing a column structure. Some key rules are that the altered columns cannot be:

Note that extra modifications are allowed via Enterprise Manager as temporary tables are used in the background. Just keep in mind the extra resources such operations might use.

The other key consideration is that the new datatype must be implicitly convertible from the old.

Changing collation of a column is not a good idea if the column already contains data. Changing the collation may change the character set and sort order so data may be lost and query results may change.

You can also change nullability. Obviously if you want to disallow nulls you must ensure that existing data for that column does not contain nulls.

Lastly if you want to change or define a ROWGUIDCOL column you must first delete any reference to the GUID, then define another UniqueIdentifier column as being the new ROWGUIDCOL.

To add a column you use the ALTER DATABASE command with the ADD keyword.

e.g.

ALTER TABLE Clients ADD Email varchar(50)

If the table contains data and the added column does not allow NULL values, it should have a default value.

Similarly you can DROP columns easily. You cannot drop a column if:

Columns and data types

Datatypes are used to define column storage as well as the parameters of stored procedures and user defined functions and variables in TSQL scripts.

System datatypes

There are 27 datatypes provided by default by SQLServer and these are referred to as system datatypes. We won't go into comprehensive detail regarding the datatypes (see Books Online) but in overview: system datatypes can be grouped into six families:

1. exact numeric:
Integers: TINYINT (UNSIGNED), SMALLINT, INT and BIGINT. NUMERIC and DECIMAL are synonyms and represent fixed precision and scale numeric values. Precision represents the total number of digits in the number and scale the total number of decimal digits in the number. MONEY and SMALLMONEY are used to represent Euro compatible currency values. BIT: may have values 0,1, or NULL. Use a byte that can be shared by multiple BIT columns of the same table.
2. approximate numeric: FLOAT and REAL are floating point numbers.
3. date and time: DATETIME and SMALLDATETIME.
DATETIME is stored as two 4 byte integer values with the first integer representing the number of days before or after Jan 1 1900 and the second the number of milliseconds after midnight. SMALLDATETIME uses two 2 byte integer values which restricts the first integer to a positive value and the second to seconds rather than milliseconds, otherwise the same.
4. character strings:
CHAR & VARCHAR (single byte characters), NCHAR & NVARCHAR (double byte Unicode characters), limited respectively to 8000 and 4000 characters (8000 bytes). CHAR and NCHAR are fixed length and occupy all the defined space whilst NCHAR & NVARCHAR are variable length so they occupy the space they really occupy plus 2 bytes overhead per value. TEXT and NTEXT can be used when you need more than 8000 bytes of storage. They may be stored in the same data page as the other fields or in one or many separate pages.
5. binary strings: BINARY and VARBINARY are as per character strings but for binary strings. IMAGE is used to store larger items of data.
6. special:
SQLVARIANT offers the possibility to store almost everything in a column, parameter or variable. TABLE: allows the creation of a temporary result set that can be used as a parameter of a stored procedure or used with table valued user defined functions. It can’t be used for table columns. TIMESTAMP and ROWVERSION are synonyms. Note: TIMESTAMP is not the same as ANSI SQL-92 timestamp (which is equivalent to TSQL DATETIME), and hence the introduction of the ROWVERSION synonym. ROWVERSION: automatically stamps a row with a version id UNIQUEIDENTIFIER: is used for GUIDs as described earlier.

To be ANSI SQL-92 compliant SQLServer also offers synonyms to standard datatypes with limited support for them – it converts them directly to one of the above types.

User defined datatypes

You can create your own data types based on the system data types (not on other user defined datatypes). You can attach a default value and a validation rule to a user defined datatype. Once done the column inherits this validation rule and default value.

To create a datatype you use the sp_addtype stored procedure, e.g.

sp_addtype postcode,'char(8)', NULL

sp_droptype drops a user defined type if it is not referenced in tables, stored procedures or user defined functions.

The datatype definition is stored in the Systypes system table.

Computed columns

Computed columns are virtual columns, not physically stored in the database but still parts of a table structure. Their values are calculated on the fly from values of one or many other columns in the same table. An example would be an order total being defined as price * quantity. Defining this column as

Total AS Price * Quantity

is perfectly valid TSQL. You can even define the column using IF or CASE statements so as to use a different formula depending on other column values.

Computed columns in SQLServer 2000 may be indexed.

Extended properties

This new feature to SQL Server 2000 allows the creation of custom properties on almost every database object. An extended property is a SQL_VARIANT storage area that can be created on databases, users, user defined datatypes, tables, views, stored procedures, user defined functions, defaults, rules, columns, parameters, indexes, constraints and triggers.

In TSQL extended properties are managed through three system stored procedures: sp_addextendedproperty, sp_updateextendedproperty, sd_dropextendedproperty. Properties are stored in the Sysproperties table.

You may define extended properties at three object levels where objects at level 0 may or may not have child objects at levels 1 and/ or 2. For example, you (dbo – level 0) may want to add a description extended property to a column (level 2) of a table (level 1). The TQL would follow the following template:

sp_addextendedproperty 'Description','Description of column',
'user','dbo'
'table','tablename',
'column','columnname'

Thus if you want to reach a level two object you must specify the level 0 and 1 objects as well. Some objects don't have a level 2 (e.g. default and rule objects) and the user defined datatype object only exists at level 0.

To find the value of an extended property you can query the Sysproperties table directly or use the fn_listextendedproperty function.

Query Analyzer also offers a graphical way to manage extended properties via the normal interface.

Table storage

Each time a record is inserted into a table it is placed in an allocated page with sufficient available space to store it. The way it is physically stored varies depending on its constituent datatypes. Records are stored in 8KB data pages. The maximum length of a record is 8060 bytes, not including text, ntext and image columns. A record is divided into 5 zones:

  1. The row header is 4 bytes long and contains information about the row.
  2. The fixed length datazone contains the data for the fixed length datatypes of the records.
  3. The null block contains the nullability value of each column.
  4. The variable block contains the stored variable-length data lengths.
  5. The variable length data zone contains the actual data for the variable length datatypes.

You can check this structure by running DBCC PAGE.

Text, ntext and image storage (aka BLOBs – binary large objects) are managed differently as their size can be greater than that of the standard containing page itself. Columns of these datatypes are therefore stored in 'image pages' with only pointers to these structures being stored in the record structure. However, SQLServer 2000 introduced the new text in row option to allow storage of these datatypes directly in the row.

When text in a row is set to off each BLOB column is stored in an image B-tree structure, and the address of the root node – a 16-byte address – is stored in the row. When set to on every BLOB column is stored in the row if it fits in the page and if its size is less than the predefined maximum size (default 256 bytes – can set between 24 and 7000); if not it is stored in its own B-tree structure.

The text in row option is controlled via the sp_tableoption system stored procedure:

sp_tableoption tablename, 'text in row', 'on'

To set the maximum BLOB size:

sp_tableoption columnname, 'text in row', 'max size'

A final note with regard to BLOBs: by default BLOB data is stored in the standard table filegroup. The TEXTIMAGE_ON keyword allows storing BLOB columns in a specific filegroup. For most applications this would be a sensible option to take.

For further details of these options see Books Online.

In the next article we shall consider how we implement data integrity in SQLServer.

References

MCSE: SQLServer 2000 Design
Israel and Jones
Sybex

Books Online

MSDN