SQL Server 2000 Design and Implementation, Part IV -
Implementing Data Integrity...
This series of articles shall focus on the data store you use for your .NET applications. In particular it shall consider relational database management systems in general for the first article and SQL Server 2000 in particular for the remaining.
This series of articles is focussing primarily on the physical data store Microsoft developers are most likely to use for their applications, SQLServer. The main reference for this series of articles is MCSE: SQLServer 2000 Design by Israel and Jones.
Here's a quick overview of what we’ve covered so far as well as what we shall be covering:
Article I: we started with developing a logical data model, before looking at how we go about implementing the physical database in SQLServer 2000.
Article II: creating and maintaining databases, in particular the files that comprise the database and how they can be split into filegroups and why they should be on occasion.
Article III: creating and maintaining tables - looking at what we can store in tables, and how we do so.
Article IV: data integrity - default values, check constraints and rules, primary keys, unique constraints, as well as foreign keys and relationships.
Article V: indexes and the related area of statistics.
Article VI: creating and maintaining the various database objects that exist- views, stored procedures, transactions, user defined functions and triggers. This concludes the look at the physical design per se. The remainder of the articles look at issues relating to data access.
Article VII: accessing data
Article VIII: modifying data
Article IX: importing and exporting data
Article X: locking
Article XI: security
Article XII: analysing and optimising data access
Thus far we've reached article IV: implementing data integrity.
Data integrity defines rules for data accuracy and correctness. There are four types of data integrity:
These four types can be implemented in two ways:
Declarative integrity rules are always checked before the insert/ update/ delete so the operation is cancelled before the operation is undertaken. This is generally the converse for procedural integrity with the exception of INSTEAD OF triggers, which we'll meet later.
The following table shows the relationships between types and implementation:
| Integrity Type | Declarative implementation | Procedural implementation |
|---|---|---|
| domain | datatype nullability DEFAULT constraint CHECK constraint | default rule |
| entity | PRIMARY KEY constraint UNIQUE constraint | stored procedure trigger |
| referential | FOREIGN KEY constraint | stored procedure trigger |
| enterprise | N/A | stored procedure trigger |
When should you use which? Basically, constraints should always be preferred if they are functionally equivalent for efficiency reasons. In some situations however you need to resort to triggers, e.g.
Declarative integrity is implemented via the CREATE TABLE or ALTER TABLE statements and can be defined at the table or column level (see Books Online for the syntactic differences). As elsewhere DROP is used to remove constraints.
A few examples of constraints from the Northwind database:
Customers has a primary key constraint:
ALTER TABLE [dbo].[Customers] WITH NOCHECK ADD
CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED
(
[CustomerID]
)
Employees has primary key, check and foreign key which is actually self referencing which is entirely legitimate.
ALTER TABLE [dbo].[Employees] WITH NOCHECK ADD
CONSTRAINT [PK_Employees] PRIMARY KEY CLUSTERED
(
[EmployeeID]
) ON [PRIMARY] ,
CONSTRAINT [CK_Birthdate] CHECK ([BirthDate] < getdate())
GO
ALTER TABLE [dbo].[Employees] ADD
CONSTRAINT [FK_Employees_Employees] FOREIGN KEY
(
[ReportsTo]
) REFERENCES [dbo].[Employees] (
[EmployeeID]
To drop:
ALTER TABLE [dbo].[Employees] DROP CONSTRAINT FK_Employees_Employees
With CHECK and FOREIGN KEY constraints specifying WITH CHECK (the default) tells SQLServer to check existing values for compliance (as opposed to NO CHECK).
Naming the constraint is optional – SQLServer will generate a name for you if not supplied; in which case you can use sp_helpconstraint table_name to discover the name.
We'll now take a look at the available declarative implementations of each of the types of integrity in a little more detail.
Nullability and default values define the value inserted in a column when it is not specified in an INSERT statement.
The default constraint can be created at the time of table creation or added or dropped afterwards. Each column can have only one default constraint. TIMESTAMP, IDENTITY and ROWGUIDCOL columns cannot have default values as the value is defined already. The default value (/expression) can be defined as a constant, a system function, a system global variable or a user defined function. Note however that the name of another column cannot be used in a default expression.
Default objects (aka defaults) are another way of defining a default value for a column. Defaults are implemented as global variables that can be bound to columns or to user defined data types. The latter is where their power comes from and when you should consider using them. The CREATE DEFAULT statement is used to create defaults. Once created you use sp_bindefault to bind to a column.
They are primarily still in existence for legacy reasons.
Check rules and constraints limit the possible values that can be entered into a column, contributing to domain integrity. This is achieved via defining:
Check constraints are declarative, rules are procedural. Both can be bound to columns or user defined data types. As per defaults, constraints should be used unless binding to data types.
Check constraints are part of the table definition (see example above). They can be enabled or disabled. A column can have multiple check constraints. They are validated in creation order during inserts and updates. They must evaluate to a boolean expression. They can reference other columns of the same table.
sp_helpconstraint table_name gives summary info.
SQLServer automatically names constraints as follows:
Table level: CK_tablename_randomnumber
Column level: CK_tablename_columnname_randomnumber
Check constraints referencing more than one column must be defined at table level.
Like default objects they are primarily still in existence for legacy reasons and are similar in use. They are created with CREATE RULE and then assigned to objects (column or user defined data type) with sp_bindrule. A rule expression cannot reference other table columns so they work like column level check constraints.
Primary keys form the basic functionality for entity integrity checking. A primary key uniquely identifies each row and is formed by one or more columns in the table. In SQL Server 2000 the definition of a primary key automatically creates a unique index on the non-null columns that form the key. A table can have only one primary key.
A primary key can be created at the time of table creation or table modification. You cannot modify an existing primary key in TSQL – you need to drop the existing key and issue an ALTER TABLE statement to apply a new primary key.
A single column primary key can be defined at column level or at table level. A multi-column primary key must be defined at table level.
When created, SQLServer assigns a default name that begins with PK_ followed by the table name and random figures and letters. A specific name can be indicated via the CONSTRAINT keyword. Unless you specify the key to be NONCLUSTERED it will default to a CLUSTERED index.
Once created, the uniqueness of the key is enforced by the index and if you try to insert a duplicate primary key you will get an error.
Using a primary key is the preferred technique to implement a primary key in a table but a table can also hold alternate keys that can be implemented as unique constraints.
Unique constraints enforce the uniqueness of rows. While a table can have only one primary key constraint, it can have many unique constraints. Another difference between primary key and unique constraints is that unique constraints can be created on columns defined as NULL, although they cannot contain more than one null value.
As primary keys, unique constraints can be referenced by foreign key constraints to define relationships. Also a single column unique constraint can be created at column or table level, and a multi column unique constraint can only be created at table level. The auto-generated names will be UQ_tablename_random_number. A unique constraint is supported by a unique index that enforces the uniqueness of values. The index name is the constraint name.
You can add a unique constraint when you create the table or by adding a new column to an already created table.
In SQLServer 2000 relationships are declaratively defined with foreign key constraints. As with all other constraints a foreign key can be created at table creation or added afterwards. A foreign key constraint can reference columns defined as the primary key or unique constraints only and only in the same database. A foreign key behaves like a check constraint as it limits its values to that of the primary key or unique column values to which it is linked.
A foreign key can be defined on one or more columns. A one column foreign key can be declared at column or table level in the CREATE TABLE statement. A multi-column foreign key can only be declared at table level.
Cascading updates and deletes are not enforced by default. Enforcing a cascading delete means that if a parent row is deleted, all its child rows will be deleted in the same transaction. The same is true when you enforce a cascading update. If the parent row is updated, the child keys will be updated as well.
Triggers can enforce cascading updates and deletes, but constraints are preferred for performance reasons. Note that triggers offer further functionality. You cannot create a cascading foreign key constraint if the table has an INSTEAD OF DELETE or UPDATE trigger, you’ll receive an error.
You can disable enforcement of constraints via NOT FOR REPLICATION.
Again you can define the foreign key constraint at time of table creation or table modification.
That concludes our look at data integrity. In the next article we turn to the use of indexes to speed performance.
MCSE: SQLServer 2000 Design
Israel and Jones
Sybex
MSDN
Books Online