Analyzing Requirements and Defining Microsoft.NET Solution Architectures, Part VIII...

Creating the Logical Design (logical data model)...


By: Chris Sully Date: August 14, 2005

Introduction

The aim of this series of articles is to provide an overview of issues surrounding the architecture of .NET Solutions. In doing so it is following Microsoft’s recommended approach and, in particular, looking at the requirements of the related Microsoft certification exam (70-300), which itself is one of the requirements for the MCSD .NET qualification. The main reference for this series of articles is the Que publication: 'Exam Cram2 .NET Solution Architectures'.

We’ve looked at the following thus far:

Article I: phase 1 of the process, the envisioning phase.

Article II: started on phase 2 (gathering and analysing requirements) focussing on business requirements.

Article III: continued with phase 2, focussing on user requirements and issues surrounding globalisation and localisation of your application.

Article IV: finished off phase 2, focussing on operational and infrastructure requirements

Article V: phase 3, developing specifications.

Article VI: phase 4, creating the conceptual design.

Article VII: phase 5, creating the logical design (application architecture)

In this article we’ll complete our consideration of phase 5 with a look at creating the logical data model.

Designing a logical data model is all about preparing for the physical data design. You must try and ignore thoughts you have about the physical and concentrate on the logical, even though there is likely to be a close correlation between the two.

Many forms of data store are possible but you are likely to choose a database and this is likely to be a relational database, where data is organised into related objects. Relational database concepts are based on a paper written by Dr E F Codd in 1970 and little has changed since then.

Data modelling and logical data design

A data model is the result of developing the logical design for a proposed system and ends up being the definition around which all applications in the system must work. A common basis for data modelling is the Entity Relationship Model (ER Model).

Entity relationship modelling

The data must be organised into separate elements called entities that will later make up the physical tables within the database. Entities can be characterised as kernel, associative or characteristic and are defined by their attributes, which specify the data elements (fields and columns) of an entity.

The key attribute is the entity's identifier. Entities are usually interdependent, each holding information that relates to other entities with the relationship definable via dependencies.

The modelling process commences by identifying all the data elements. Then the elements can be divided into a logical entity structure with entities, attributes, identifiers and dependencies characterised as follows:

Entity – the basic division of data, likely to become tables in the relational database, of the following types:

Attribute – a descriptive element or property of an entity. These will become fields.

Identifier – usually a single attribute that defines one unique element of an entity. Identifiers allow for individual selection of records from an entity. These will become primary and foreign keys.

Dependency – occurs when an entity has little meaning or can't exist without at least one other entity in the database. These become table relationships in the physical design; there are three basic types:

Business Rules

Business rules help enforce data integrity or data correctness. In the logical design the business rule is defined as a limitation placed on the data. These rules can be implemented as a property of the data or as a function of another tier in the overall logical design of the project. The database server can validate an attribute so that it cannot exceed an upper limit, must have a value equal to one of a set of choices or must be of a set data type or format. When you implement business rules you can immediately take action to correct the data before it is stored to the database. These types of rules can be placed at the entity or attribute level but could also be performed in the user interface or within a business rules tier between the UI and data storage.

Defining tables and columns

Entities represent the things, places, people and concepts involved in a real world situation. An easy way to identify candidates for entities is to look for nouns in a case scenario. Four key steps need to be undertaken when determining entities:

  1. Isolate key nouns in the case study/ domain.
  2. Identify which nouns are needed as entities.
  3. Draw the entities.
  4. Write a brief description for each.

You should use a product like Visio to document your work diagrammatically.

Defining attributes/ columns

After recognising each entity we should identify a complete list of attributes (fields or columns in the physical model). Attributes are characteristics or properties defined within a single entity and correspond to real-world properties. Initially, each table should contain only the elements specific to that table and a key element to ensure that each record in the table can be uniquely identified. Sometimes you will add an attribute so it may act only as a key identifier, in which case it is referred to as a surrogate key.

You may need to decompose your attributes into their base elements, for example an address. This aids in developing a normalised database structure and also is a function of usage – will you need to utilise an individual’s postal code as a distinct item? It also aids data integrity of the decomposed items.

Defining primary and foreign keys

A primary key is not a necessity but providing one offers numerous benefits so should be considered for all tables. The primary key normally defines uniqueness in a table, in that every record of a table has its own unique primary key. The primary key is used in developing relationships and indexes.

The foreign keys use the primary key as a reference. For each foreign key there must be a matching primary key in the referred table. This constraint is known as referential integrity, prevents ‘phantom’ entries, serves as a checking mechanism and allows for cascading operations.

Normalising tables

Normalisation is the division of entities to achieve the most efficient use of data storage. Conversely denormalisation may be employed as planned redundancy to improve response time and use of resources. You undertake normalisation to the level required of your application.

There are three basic rules, known as normal forms (NF), to normalising a database structure. Their use provides these basic advantages:

1NF

An entity attribute shouldn't have more than one definable piece of data or repeating groups. e.g. name =>firstname and surname

2NF

A non-key attribute of an entity must depend on the entire primary key, not just a portion of the key. e.g. if the primary key of an orders entity contained two fields, Customer ID and Product ID, the attribute field Product Description wouldn't belong as it has no connection to the customer ID.

3NF

A non-key field must not depend on another non-key field. e.g. address information – postal code depends on city. In most situations this information would be denormalised, in pure 3rd normal form a separate entity would be used to provide additional address information.

4 and 5NF

Are largely academic so we won’t consider.

Denormalisation

After you have a logical design 'completely' normalised, rarely do you keep it in that state as you proceed to the physical design of the actual database. Although normalisation gives you a great deal of storage efficiency and might result in increased performance in some situations, there are a number of drawbacks to a completely normalised database. There are trade offs between storage efficiency, performance, and maintainability to be considered.

Data warehousing schemas often use a denormalised approach referred to as a star or snowflake schema. This schema structure takes advantage of typical decision support queries by using one central 'fact' table for the subject area and many dimension tables containing denormalised descriptions of the facts.

Additionally, if a join requires implementing using more than three tables, denormalisation should be considered. Further, the design is a factor of use and if the use of the database changes over time a restructuring should be considered.

Defining Relationships

That is, logically linking entities. They can be 1:1, 1:M and M:N as introduced earlier.

Relationships are implemented as parent and child entities. A key attribute in a child entity is attached to a related key value in a parent.

As M:N relationships are not supported by the ER model there are represented by creating two 1:m relationships with a new entity in the middle. The new entity is known as an associate or join entity.

For example consider the relationship between doctors and patients. A patient can have many doctors and a doctor can have many patients. This situation is resolved by creating an associative entity DoctorPatient then linking 1: m relationships from Doctor to DoctorPatient and Patient to DoctorPatient.

Setting up the relationships finalises a draft of the ER model. This draft will undergo modifications as the database approaches a physical design. You now have a working model that allows you to proceed with development.

A general listing of attributes for each entity and the relationship between these entities is important for progressing through the database design to the eventual completed system.

Defining the XML Schema

Defining in an XML schema could be considered to be taking the logical data structure that resulted from ER modelling to the next step. Note we’ll be focussing on XML schema rather than the less featureful DTDs. The XML schema provides a grammatical structure for XML data that is far superior to that of a DTD.

Within the XML schema document valid data is declared using element and attribute elements with the data structure created using simpleType and complexType elements.

An XML namespace is a set of names that can be used as element or attribute names in an XML document. The namespace uniquely qualifies element names to avoid conflicts between elements with the same name, and is identified by a URL.

For more information see my article on XSD Schemas: http://www.dotnetjohn.com/articles.aspx?articleid=43.

Summary

ER modelling clarifies data content and relationships through the use of entities that contain attributes and that are related to one another via identity keys. After the model is defined, normalisation rules are applied. These alter the design of the data model in an attempt to minimise data redundancy. Often data denormalisation will then be purposefully applied to add redundant data to improve system performance.

Unique, primary keys are applied to provide quick access to records. Foreign keys allow the definition of relationships between tables allowing imposition of business rules on the data. These and other business rules and constraints coincide with the company's business goals and help maintain data integrity and focus the system design to meet the business goals.

References

Exam Cram 2 .NET Solution Architectures
Cornish et al.
Que

http://msdn.microsoft.com/architecture/