SQL Server 2000 Design and Implementation, Part II -
Creating and Managing a DataBase (A)...
This series of articles shall focus on the data store you use for your .NET applications. In particular it shall consider relational database management systems in general for the first article and SQL Server 2000 in particular for the remaining.
This series of articles is focussing primarily on the physical data store Microsoft developers are most likely to use for their applications, SQLServer. The main reference for this series of articles is MCSE: SQLServer 2000 Design by Israel and Jones.
Here's a quick overview of what we’ve covered so far as well as what we shall be covering:
Article I: we started with developing the conceptual and logical data models.
Article II: in this article we shall start looking at the physical data model with creating and maintaining databases.
Article III: creating and maintaining tables - looking at what we can store in tables, and how we do so.
Article IV: data integrity - default values, check constraints and rules, primary keys, unique constraints, as well as foreign keys and relationships.
Article V: indexes and the related area of statistics.
Article VI: creating and maintaining the various database objects that exist- views, stored procedures, transactions, user defined functions and triggers. This concludes the look at the physical design per se. The remainder of the articles look at issues relating to data and data access.
Article VII: accessing data
Article VIII: modifying data
Article IX: importing and exporting data
Article X: locking
Article XI: security
Article XII: analysing and optimising data access
Thus far we've reached article II: creating and maintaining databases. Creating a database in SQLServer is easy; creating a database optimised for the specific tasks it is required to perform is a little more difficult. It is the issues surrounding this optimisation we shall be considering.
The complexity of database creation lies in the database file management, which is SQLServer specific and which needs to be known and understood if you are to take advantage of this knowledge to improve the performance of your applications in a production environment.
As there is a fair bit of ground to cover we'll split the subject matter into two parts: A and B. In this, part A, we shall look at Creating a Database, Query Processing and the Transaction Log, Space Management and Table Creation and Management Options.
As with many SQLServer operations there are two (main) ways to create a database, either via Enterprise Manager or directly in TSQL using query analyzer. Considering TSQL, the basic statement is:
| CREATE DATABASE database_name |
This will use the default parameters as alternatives have not been supplied. Two of these are a log file size of 1MB and a data file size the same as that of the model database (1MB unless its been changed since installation).
You can correctly infer from the last paragraph that a SQLServer database is made up of at least two physical files: a data file and a log file containing a log of the transactions of the database. However, the database can span multiple data and log files, and we'll consider why you might want to adopt this approach a little later. Database files are one of the following three types:
primary data file (.mdf). A database must have at least one .mdf file containing the database system tables and user tables.
secondary data file (.ndf). A database can have up to 32,766 .ndf files. The secondary data files are optional and contain user and system data not stored in the primary data file.
log file (.ldf). A database can have up to 32,766 .ldf files containing the transaction log.
By default data and log files are placed in the c:\program files\Microsoft SQL Server\MSSQL\Data folder.
For the full syntax of the CREATE TABLE command see Books Online, but here's a slightly more complex example for consideration:
e.g.
|
CREATE DATABASE Sales ON (NAME = Sales_dat, FILENAME = 'c:\program files\microsoft sql server\mssql\data\saledat.mdf', SIZE = 10, MAXSIZE = 50, FILEGROWTH = 5 ) LOG ON (NAME = 'Sales_log', FILENAME = 'c:\program files\microsoft sql server\mssql\data\salelog.ldf', SIZE = 5MB, MAXSIZE = 25MB, FILEGROWTH = 5MB ) |
We'll look at some of these options, and others, in a little more detail shortly. Firstly we need to review a little more information to provide some background knowledge for some of these options namely:
If you perform an update to data, this is done in memory ... it is the transaction log for the operation that will be written to disk. So when is the modification to the data itself written to disk? This depends on the activity of the database and the memory available. Pages of data not yet written to the disk are referred to as 'dirty pages'. Two processes control the writing back of dirty pages to the disk:
The checkpoint depends on the recovery interval database option. This depends on the activity of the database. By default in SQLServer 2000 the value of recovery interval is 0, which means that in the case of failure of SQLServer each database will recover in less than one minute. Recovery means that when SQLServer is restarted the transactions will be replayed to restore the database to the state before the service was lost.
Additionally if the available pages between two checkpoints decrease and fall below a predefined threshold, the lazy writer process flushes dirty pages from memory to disk to keep the number of available pages always above the threshold. This threshold is approximately 5 percent of the SQLServer cache.
Note that in SQLServer 2000 the cache is unified – there is no distinction between data and procedure cache (e.g. for use by stored procedures).
The lesson here is to keep in mind that updates, inserts etc. all occur in memory.
An understanding of SQLServer space management helps optimisation of queries. Primary and secondary data files are structured similarly; log files are a little different. Note that the topic of space management in SQLServer could occupy an article in itself so there’s a lot more to it than presented here. Here’s a flavour.
Each data file is made up of 64KB ‘extents’ and each extent is made up of 8KB 'pages'. A page is the unit of space allocation for tables and indexes. Different types of pages exist, depending on the content.
There are two types of extents: uniform (allocated entirely to one table or index) and mixed (shared between multiple tables and/ or indexes). The process of extent allocation is as follows: at the time of index or table creation the first 8 pages are allocated from mixed extents. From the 9th page onwards they are allocated in uniform extents.
8 different types of pages may exist in a database:
| data | user or system data except text, ntext and image |
| index | index data |
| text/image | text, ntext and image data |
| IAM | Index Allocation Map – how extends are used by an index or table |
| GAM/SGAM | Global Allocation Map/ Secondary GAM – how extents are allocated |
| PFS | Page Free Space how pages are allocated and free space in these pages |
| BCM | Bulk Changed Map – how extents have changed since the last log backup |
| DCM | Differential Changed Map – how extents have changed since the last database differential backup |
See Books Online for further information.
Log files have a different, simpler structure to data files that reflects the more restricted set of required operations. Log file are never modified, only written to most of the time and truncated occasionally.
For performance and internal management purposes a transaction log is divided into virtual log files, the number and size of which depends on the size of the log.
The use of the log depends on the recovery model. If not truncated, it grows indefinitely. Two choices for recovery model:
See Books Online for further information.
We'll now return to the CREATE TABLE and ALTER TABLE command options and shall also consider the allied command specifics where appropriate.
When you define the filename you provide the file both a physical name and a logical name corresponding to the NAME and FILENAME items of the CREATE DATABASE statement. The logical name is used within TSQL whereas the physical name is used at the OS level.
Each data and log file has three properties controlling their size and their growth operations:
SIZE defines the initial size of the file, by default in MBs. The size cannot be smaller than the size of the model database (initial default 1MB).
MAXSIZE defines the maximum size to which the file can grow though you may specify UNLIMITED (though obviously unlimited only up to the size of the disk!).
FILEGROWTH defines the growth increment of the file, specified in MBs by default though you can also specify a percentage. The value cannot exceed MAXSIZE.
Good advice is to think carefully about these settings:
With SQLServer 2000, the character set, sort order, and Unicode collation have been grouped into a 'collation', which can be defined at the server, database, or column level. Two types of collation names exist: Windows and SQL. Both can be used with the COLLATE clause of the CREATE TABLE statement.
From the developer perspective it is recommended to use Unicode rather than collations. Collations should be reserved for use only if you have to manage servers using different locales.
See Books Online for more information on the issues surrounding collations.
As well as growing automatically, a database can shrink manually or automatically controlled by the settings you specify. Automatic shrinking is not enabled by default on any version of SQLServer except desktop. Every 30 minutes a housekeeping process recovers ghost record space (records that have been logically deleted) and checks whether a shrink is necessary by analysing empty space. If more than 25% of a file is unused it is shrunk with the target size being either the initial file size or a size where 25% is unused space, whichever is greater. To set the syntax is:
| ALTER DATABASE databasename SET AUTO_SHRINK ON|OFF |
Note: it is not advisable to enable autoshrinking on production databases as the process can commence at anytime and may hog resources better employed elsewhere. Better is to monitor space usage and shrink manually when you know the server isn't busy.
To manually shrink a database you use the DBCC SHRINKDATABASE statement with parameters of the database name and the free space target size as a percentage of the overall size, e.g. to retain 50% of the free space:
| DBCC SHRINKDATABASE (databasename, 50) |
Note that if you asked for 50% and the data actually needed 60% of the existing space the shrink would be to 60%, i.e. the command will not delete data!
If you run the command without the % parameter you will receive information regarding how much free space there is in the database.
Two more options:
NOTRUNCATE moves the pages to the beginning of files and does not release the freed extents to the operating system.
TRUNCATEONLY does not move any data inside the files. The file is shrunk until an allocated extent is encountered.
The shrinking process not only shrinks data and log files, it moves data from one extent to another, or from one file to another moving data from the end of files to unused extents at the beginning of the file.
With the possible specification of multiple files and filegroups the process can become quite complex. See Books Online for further details.
When you shrink a database you shrink the log file at the same time, though the process for log files is a little different. The two options that apply to log files are:
Log truncation – deletion of the inactive portion of the log, i.e. those transactions which have been checkpointed. A checkpoint marks the writing of dirty pages to the disk.
Log shrinking means that part of the truncated inactive portion of the log can be released to the operating system.
You can also shrink a particular file with DBCC SHRINKFILE. This allows shrinkage to below its initial size. The new file size is specified in MBs. Note that the target size will only be reached if there is sufficient free space in the file!
Amongst many options you can:
Considering each briefly:
To modify or add a database file:
|
ALTER DATABASE databasename MODIFY FILE | ADD FILE (options) |
with options as per CREATE TABLE.
To remove a file you must first empty it, as follows:
| DBCC SHRINKFILE (filename, EMPTYFILE) |
You may then proceed to remove the file as you'd expect:
|
ALTER DATABASE databasename REMOVE FILE filename |
You can rename a database in two ways: ALTER DATABASE and sp_renamedb:
|
ALTER DATABASE databasename MODIFY NAME=newdatabasename |
Note that it is not possible to rename a database via EM.
That concludes part A of our look at creating and maintaining databases and related issues. In part B we'll complete our examination of this area with a look at Database Options and File Groups.
MCSE: SQLServer 2000 Design
Israel and Jones
Sybex
SQLServer Books Online
MSDN