SQLServer 2000 Design and Implementation, Part I...
This series of articles shall focus on the data store you use for your .NET applications. In particular it shall consider relational database management systems in general for the first article and SQLServer 2000 in particular for the remaining.
This series of articles shall focus on the data store you use for your .NET applications. In particular it shall consider relational database management systems in general for the first article and SQLServer 2000 in particular for the remaining. The primary target audience for this series is the (.NET) developer wanting a better appreciation of the capabilities of SQLServer so that they are able to design and develop correct and optimised SQLServer database driven applications.
The main reference for this series of articles is MCSE: SQLServer 2000 Design by Israel and Jones. This book is primarily a study guide for Microsoft exam 70-229 and we are indeed covering all of the requirements (in overview) for the exam in this series of articles, if passing this exam is also a target of the reader. Note however that you will, in particular, need to gain further practical experience of the areas covered as well as related detail as we do not have the space to cover everything here. If you do decide to take the exam the Israel and Jones book is a good source of additional information, examples and exercises and is generally accepted to be one of the better study guides in the marketplace.
During our consideration of the various areas we shall be focussing on ideas and concepts and leaving the implementation detail for the reader to discover and experiment with. All of this detail will be found within SQLServer Books Online, and we will point you in the direction of this resource where appropriate. For example, we might highlight the different high-level categories of SELECT statement are possible without providing detailed examples of each. The idea is that we let you know what the possibilities are for you to explore further when you need to utilize this information. I'm firmly of the view that it is most important to know about the breadth of options available to you rather than the detail of one particular option. This way you are better prepared to make the best design decision. You can always look up the detailed syntax for a particular operation after all but if you don't know that there are 4 different ways to deliver a given set of functionality, each with its own pros and cons, you won’t be in a position to choose which option is best for your application.
Here's a quick overview of what we shall be covering:
Article I: we shall start in this article with developing the conceptual and logical data models.
Article II: we start looking at the physical data model with creating and maintaining databases.
Article III: creating and maintaining tables - looking at what we can store in tables, and how we do so.
Article IV: data integrity - default values, check constraints and rules, primary keys, unique constraints, as well as foreign keys and relationships.
Article V: indexes and the related area of statistics.
Article VI: creating and maintaining the various database objects that exist- views, stored procedures, transactions, user defined functions and triggers. This concludes the look at the physical design per se. The remainder of the articles look at issues relating to data and data access.
Article VII: accessing data
Article VIII: modifying data
Article IX: importing and exporting data
Article X: locking
Article XI: security
Article XII: analysing and optimising data access
Let's start with the conceptual and logical views of the data with an introduction to data modelling.
In common with the project/ application that encompasses it, the development of a database system may be split into 5 stages:
and in this article we shall be concentrating on stages 2 through 3 with a minimal amount to say on the subject of analysis. Subsequent articles will be looking at 4 and 5.
The ER model refers to the conceptual design stage and the relational model to the logical design stage. The ER model has become very popular, as it is easy to create the relational model from it.
The ER model converts your analysed information into data requirements and is used to facilitate communications between the database architect and the project stakeholders. An ER model is made up of three different elements:
entity: which represents real world concepts such as places, object, events, people, orders, customers, etc.
relationship: which represents associations between objects, such as the fact that a customer may place an order (or several orders for that matter)
attribute: which describes the entity, e.g. an invoice entity might have an invoice date, a customer might have a surname, etc.
Oversimplifying the process a little, identifying entities, attributes and relationships from the analysis documentation is achieved by transforming the nouns (subjects) to entities, the adjectives or nouns (direct objects) into the attributes and the verbs into the relationships.
Entities and attributes
We'll see shortly that key attributes play an important role in relationships between entities.
In an ER model diagram each entity is represented by a labelled rectangle with the label being the name of the entity. Each entity attribute is normally listed inside the entity rectangle.
Relationships
Relationships represent associations between entities and bind them with a set of defined rules. As well as being associated with verbs/ verb phrases in the conceptual model they have 3 other main characteristics:
Direction indicates the source/ parent entity
For example, a customer places and order, so the relationship goes from the customer to the order. A relationship always goes from a parent to one or more children.
Cardinality defines the number of instances of a specific entity that could be associated with an
instance(s) of another entity.
For example an employee may apply for one or more holidays.
Existence determines the precedence between entities
That is, the entity that must exist before another entity is created. It may be optional or mandatory. For
example an employee may apply for a holiday but each employee must belong to a department.
A relationship is represented by a line between both entities in an ER model. The direction of the relationship is indicated by an arrow and the line should be labelled with the cardinality and name of the relationship. Different tools will use slightly different notations to indicate cardinality. For example, you might see a 1 at one end of the relationship line with an M at the other indicating a 1:M or 1 to many relationship. SQLServer, when representing the physical design, uses a key on the one side and infinity symbol on the many.
Cardinality
one to one: one instance of a parent entity is associated with (at most) one instance of the child entity. In such a relationship the direction is from the independent entity to the dependent entity.
e.g. a CEO and a company as a CEO cannot be the CEO of more than one company.
Here the CEO is the dependent entity.
Now, you may be wondering why can't we just make CEO an attribute of company. Well, you can, and indeed this is what you would do in the majority of cases but you may decide to logically split information to keep entities small and manageable.
one to many: this most frequently used relationship occurs when one instance of the parent is associated with zero, one or many instances of the child entity. The direction of a one to many relationship is always from the one side to the many side.
e.g. a customer may place many orders.
many to many: occurs when one instance of the parent is associated with zero, one or many instances of the child entity and when one instance of the child entity is associated with zero, one or many instances of the parent entity.
e.g. orders and products; a product can be on many orders and an order can contain many products. Actually a product can be contained in zero, one or many orders and an order contains one or many products.
Many to many relationships cannot be directly implemented in a relational database, but must be transformed into at least 2 one to many relationships. In a many to many relationship the direction is arbitrary.
To circumvent the issue you introduce another (associative) entity, say OrderDetails. Then an order is made of one or many order details, and that each product may be referenced by zero, one or many order details. The original cardinality and existence are conserved by the new entity and relationships. A majority of many: many relationships are resolved this way.
A recursive relationship is exemplified by the common situation where an employee reports to his manager ... but the manager is an employee too. This is not an issue when it comes to implementation within a RDBMS but is another category of relationship to be aware of.
Keys
Key attributes allow us to uniquely identify records and to allow relationships to be created between entities.
The primary key is an attribute or set of attributes (in which case it is a composite primary key) identifying unique instances of each entity where every value has a value (no nulls). An entity might have multiple sets of such attributes in which case it has multiple candidate keys with the candidate keys not selected to be the primary key being alternate keys.
You may prefer to use an artificial key as the primary key if there are no candidate keys or the candidate keys are too large.
In general the primary key is identified in the ER model by underlining the name of the attributes that compose the key and optionally listing is at the beginning of the attributes list.
Every entity should really have a primary key and this scenario is necessary if the schema is to be considered to be in 1NF.
A foreign key is an attribute or set of attributes that identifies the child side of a relationship. A foreign key is in fact the ‘migrating’ primary key (or alternate key) of the parent entity.
A relationship is said to be identifying if the primary key of a child entity contains all the attributes of a foreign key. If the primary key of the child entity does not contain all the attributes of a foreign key, then the relationship is non-identifying.
Let's reinforce these ideas with a typical example. Consider the Order-Order details relationship. The Orders entity would have a primary key of OrderID, which would also appear in the OrderDetail entity as a foreign key to link the two entities with a parent-child relationship.
Data integrity rules
There are four types of integrity in RDBMSs
Domain integrity defines the possible values of an attribute via rules. In a database system the domain integrity is defined by the:
Entity integrity focuses on the existence of the primary key as entity integrity says that each instance should be uniquely identified.
Referential integrity relates to the relationships between entities and is generally defined as follows:
Which in summary says that orphans can't exist. There are several sets of rules for database insertions, updates and deletions that control how such situations are handled. For example, cascade delete will delete all child instances if the parent is deleted. SQLServer has default settings for the rules, which you may override if you so choose.
Enterprise integrity means business rules and these are normally defined in the functional model rather than the data model (another output of your conceptual/ logical design).
Thus far we have focussed on the conceptual model of the database looking at the ER model. We now turn to the logical view of the data and the relational model. In this relational model the database is a collection of 2 dimensional tables each referring to a 'thing' or entity. The organisation of these tables is the logical model, or logical view. Creation of the physical model is the next step in the process and this is the actual way the data is stored.
Going from ER model to relational model is fairly straightforward. Step 1 involves some differences in terminology:
| ER Model | Relational Model | Formal Name | Physical model |
|---|---|---|---|
| entity | table | relation | table |
| entity instance | row | tuple | record |
| attribute | column | attribute | field |
A relational table matches an ER entity. It defines the logical representation of the data and follows 6 rules:
Note that we're still talking about the logical representation here – you may already know that SQLServer doesn't actually automatically enforce some of the above mentioned constraints in the physical model.
Normalising data is the process of eliminating duplicated data by defining keys and creating new relationships and entities. The process starts with the logical model and each step results in a new, normalised model. The model can include up to 6 normal forms but its common practice to stop at the 3rd.
Normal form theory is based on functional dependency between columns. Column A is said to be functionally dependent on column B if each value of B is associated with only one value of A. We also say that A is a determinant of B.
The goal of normal forms is to remove redundant data from relational tables by splitting the tables into smaller tables, without losing data. Thus you'll be able to get back the data of the original data via a join of the new tables.
We'll quickly run through the first three normal forms and the processes to move from 1NF to 3NF. I would suggest you take the time to undertake a worked example of the process to gain a fuller understanding, as we won't have the space here. You may find that your database designs are in, or close to, 3NF already without following the prescribed process.
A relational table is in 1NF (actually a table is relational) if:
Note that we haven’t needed to consider functional dependency as yet – this follows in the remaining NFs.
Thus, for example if you had a table containing articles authors had written you might start with an author_id and a articles column with articles containing a comma separated list of articles that author has written. This fails point 2 so you split articles in article 1, article 2 and article 3. This fails point 3. So you add a new column article_id and create a composite primary key with author_id. The table is now in 1NF.
A relational table is in 2NF if:
Decomposing a table in 1NF to achieve 2NF is a logical process:
A relational table is in 3NF if:
Decomposing a table in 2NF to achieve 3NF is also a logical process:
I appreciate that this is not the simplest of processes so e-Mail me (chris.sully@cymru-web.net) if you'd be interested in an article just focussing on a worked example of the normalisation process to support the above.
Denormalisation introduces redundancy to improve data access performance. Note that data integrity is endangered by denormalisation and update performance may decrease.
Denormalisation techniques are:
Another way to improve performance is by partitioning tables:
We'll start looking at how to implement the logical data model as a physical model in SQLServer in the next article.
MCSE: SQLServer 2000 Design
Israel and Jones
Sybex
Books Online
MSDN