Accessing and Manipulating XML Data in .NET III...
This series of articles shall examine how we can access and manipulate XML data using VB.NET and SQLServer.
Part III: XSD Schemas, XML and SQLServer
Knowledge assumed: VB.NET, VS.NET, XML, SQLServer, TSQL; Parts I & II
This series of articles examines how we can access and manipulate XML data using VB.NET and SQLServer. In particular we covered the following in parts I and II of the series:
1 Accessing data in an XML file
DOM
XmlReader
XmlNode and XmlDocument
XPath
2 Synchronising DataSets with XML via the DataDocument
The topics remaining for discussion in this final article of the series are:
3 XSD Schemas
4 XML and SQLServer
Generating data: FOR XML, ExecuteXmlReader
Updating data: SQLXML, DiffGrams
First off - you may want to take a look at another article on DotNetJohn on the subject of XSD schemas (http://www.dotnetjohn.com/articles/articleid43.aspx). This is a more general overview of XSD. In the present article we shall be more concerned with .NET specifics.
You can create an XSD schema in VS.NET by dragging and dropping XML elements from the toolbox. However, there may be occasions when you want to create a schema programmatically to match an existing object. This is what we’re now going to look at. The obvious source of this information is an XML file. This may contain explicit schema information or just implicit information through the structure and data therein.
If the file contains explicit schema information you can use a DataSet to read that information (via the ReadXmlSchema method) and create the corresponding schema as a separate file. Similarly you can read in schema information from a distinct schema file. A code snippet for the former might be:
|
Dim xtr As XmlTextReader = New XmlTextReader(server.mappath("articles.xml")) Dim ds As DataSet = New DataSet() ds.ReadXmlSchema(xtr) Dim sw As StringWriter = New StringWriter() ds.WriteXmlSchema(sw) response.write(sw.tostring()) |
If it doesn't contain explicit information you can still use the DataSet as it also exposes the functionality to infer an XML schema based on the data in the XML file. It is important to remember however that the XML file in question may not be representative of the extent of the schema elements, e.g. the range of an integer variable in the data sample may be 1-65 but the true schema may accept values in the range 1-99. A code snippet for this might be:
|
Dim xtr As XmlTextReader = New XmlTextReader(server.mappath("articles.xml")) Dim ds As DataSet = New DataSet() Dim ns As String() ds.InferXmlSchema(xtr, ns) Dim sw As StringWriter = New StringWriter() ds.WriteXmlSchema(sw) response.write(sw.tostring()) |
The primary use of an XSD file is to allow validation of a corresponding XML file: to ensure it is valid in addition to being well-formed. This can be achieved via the XmlValidatingReader class, which provides an additional layer between the XmlReader and XmlDocument, validating the document as it is read in between the two. Again you can validate against an inline schema or an external schema.
The XmlValidatingReader is implemented not to halt on errors by default but to raise an event for each error, which you may act upon. Note that an inline schema cannot contain an entry for the root element of the document, so even when the document is otherwise valid you will receive an error from that node.
I'm not going to cover the practicalities of this topic any further here as there is already an article on DotNetJohn on the subject: Upload an XML File and Validate Against a Schema (http://www.dotnetjohn.com/articles/articleid1.aspx). See this article for implementation details.
Note that you can also validate documents against other standards of schema files, for example Document Type Definitions (DTDs) can also be validated with the XmlValidatingReader class but as we’re happy in the modern Microsoft world we won’t enter into further details here!
No doubt Microsoft has several more improvements almost ready for the world in the forthcoming "Yukon", the code name for the next release of Microsoft SQL Server, currently in development, with both .NET integration and XML support. The boys have been steadily integrating XML support with SQLServer 2000 however, and these are the features we will focus on here in the forms of generating XML directly from SQLServer and updating SQLServer databases via XML.
FOR XML is the TSQL extension that causes results to be returned as XML.
ExecuteXmlReader is the ADO.NET method that allows .NET to accept results generated by SQLServer when the FOR XML syntax is used.
To use the FOR XML extension your simply add the two keywords to your TSQL SELECT statement. Further, there are three options for controlling the formatting of these results - termed modes.
When you use the RAW mode SQLServer returns one element always named row for each row of the resultset with the individual columns represented as attributes.
e.g. using the pubs database:
| SELECT * from EMPLOYEE FOR XML RAW |
returns
<row emp_id="PMA42628M" fname="Paolo" minit="M" lname="Accorti" job_id="13" job_lvl="35" pub_id="0877" hire_date="1992-08-27T00:00:00"/><row emp_id="PSA89086M" fname="Pedro" minit="S" lname="Afonso" job_id="14" job_lvl="89" pub_id="1389" hire_date="1990-12 ...
Note that if the output contains binary columns you must include the BINARY BASE64 option to avoid an error.
e.g. SELECT * from EMPLOYEE FOR XML RAW, BINARY BASE64
The second option is FOR XML AUTO where nested elements in the resultset are represented as nested elements in XML. Columns are still represented as attributes.
e.g.
| SELECT * from EMPLOYEE FOR XML AUTO |
returns
<EMPLOYEE emp_id="PMA42628M" fname="Paolo" minit="M" lname="Accorti" job_id="13" job_lvl="35" pub_id="0877" hire_date="1992-08-27T00:00:00"/><EMPLOYEE emp_id="PSA89086M" fname="Pedro" minit="S" lname="Afonso" job_id="14" job_lvl="89" pub_id="1389" hire_dat ...
You also have the option of using the ELEMENTS modifier which produces element-centric as opposed to attribute-centric XML as follows:
| SELECT * from EMPLOYEE FOR XML AUTO, ELEMENT |
returns
<EMPLOYEE><emp_id>PMA42628M</emp_id><fname>Paolo</fname><minit>M </minit><lname>Accorti</lname><job_id>13</job_id><job_lvl>35</job_lvl ><pub_id>0877</pub_id><hire_date>1992-08-27T00:00:00</hire_date></EMPLOYEE> <EMPLOYEE><emp_id>PSA89086M</emp_id><fname>Pedro ...
Finally, you have the EXPLICIT option, the most flexible option where you effectively explicitly define the form and hierarchy of the XML to be produced. With this flexibility comes added complexity.
For the EXPLICIT mode to produce the XML document, the rowset must have a certain format (called the universal table). This requires the SELECT query to be written in a certain way, which can then be processed to produce the requested XML document.
First the EXPLICIT mode requires the query to produce two meta-data columns:
The first column specified in the SELECT clause must be a named (Tag) number. The Tag column stores the tag number of the current element. Tag is an integer data type.
The second column specified must be a named (Parent) number of the parent element. The Parent column stores the tag number of the parent element.
These columns are used to determine the parent-child hierarchy in the XML tree. This information is then used to produce the desired XML tree. If the parent tag value stored in Parent column is 0 or NULL, the row is placed on the top level of the XML hierarchy.
The remainder of the universal table fully describes the resulting XML document.
Here an example that uses the Northwind database.
|
SELECT 1 as Tag, NULL as Parent, Customers.CustomerID as [Customer!1!CustomerID], NULL as [Order!2!OrderID] FROM Customers UNION ALL SELECT 2, 1, Customers.CustomerID, Orders.OrderID FROM Customers, Orders WHERE Customers.CustomerID = Orders.CustomerID ORDER BY [Customer!1!CustomerID], [Order!2!OrderID] FOR XML EXPLICIT |
which produces the output:
<Customer CustomerID="ALFKI"><Order OrderID="10643"/><Order OrderID="10692"/> <Order OrderID="10702"/><Order OrderID="10835"/><Order OrderID="10952"/> <Order OrderID="11011"/></Customer><Customer CustomerID="ANATR"><Order OrderID="10308"/> <Order OrderID="106 ...
If interested in writing such queries I suggest your review the topic in further detail on MSDN.
Finally, you can generate schema information as part of a SQLServer by including the XMLDATA option, e.g.:
| SELECT * from EMPLOYEE FOR XML AUTO, XMLDATA |
This is the ADO.NET method of the command class that facilitates the handling of SQLServer data obtained via FOR XML. Usage is quite simple, ExecuteXmlReader() returns an XmlReader so you set up the command with the necessary parameters and then assign the method result to an XmlReader object, e.g.
| Dim xr as XmlReader = cmd.ExecuteXmlReader() |
DiffGrams are the key concept here. These are special XML messages that the .NET Framework uses internally as a means of serialising changes in a DataSet for sending between one application tier to another, for example. Diffgrams can also be used by us developers to update data in SQLServer. However, you will need to install the necessary software support in the form of the SQLXML managed classes which provide an interface between .NET and SQLServer. This software is currently in version 3.0 and available for download from Microsoft. It includes the following upgrades to SQLServer:
Before you download and install SQLXML (you'll find it via www.microsoft.com/downloads) you would be wise to check that your system meets the pre-requisites. Notable amongst these for full functionality is the Microsoft SOAP Toolkit 2.0, though not necessary for the capabilities we're about to look at.
After you’ve installed SQLXML you can use the SqlXmlCommand class to execute a DiffGram, as follows.
Start with a new web form project and add a reference to the SQLXML.NET library, which will be located at:
C:\Program Files\SQLXML3.0\bin\Microsoft.Data.SqlXml.dll
To your web form code behind add:
|
Imports Microsoft.Data.SqlXml Imports Sytem.IO |
In the Page Load event add the code to connect to and modify the database based on the DiffGram. Alter the connection string if your setup differs:
|
'connect to SQLServer Dim sxc As SqlXmlCommand = New SqlXmlCommand("Provider=SQLOLEDB;" & _ "Server=(local);database=Northwind; Integrated Security=SSPI") 'set up the DiffGram sxc.CommandType = SqlXmlCommandType.DiffGram sxc.SchemaPath = "diffgram.xsd" sxc.CommandStream = New FileStream("diffgram.xml", FileMode.Open) 'execute sxc.ExecuteNonQuery() Response.Write("Database was updated!") |
Now we should define the required schema and xml file. Add the following XML, or something similar, to the project, naming it diffgram.xml:
|
<?xml version="1.0" standalone="yes"?> <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1"> <NewDataSet> <Customers diffgr:id="Customers1" msdata:rowOrder="0" diffgr:hasChanges="modified"> <CustomerID>BERGS</CustomerID> <ContactName>Chris Berglund</ContactName> </Customers> </NewDataSet> <diffgr:before> <Customers diffgr:id="Customers1" msdata:rowOrder="0"> <CustomerID>BERGS</CustomerID> <ContactName>Christina Berglund</ContactName> </Customers> </diffgr:before> </diffgr:diffgram> |
A DiffGram can be considered to be a before and after snapshot of a part of a SQLServer table. Here the first section lists a row in the Customers table and indicates that it has been modified. The second section specifies the existing data to be changed, data which allows SQLServer to find the necessary record.
In addition to the DiffGram a schema file is required that maps the element names in the DiffGram back to entities in the SQLServer database. The sql:relation attribute in the schema file indicates the table mappings and the sql:field attributes indicate the field mappings. Here's the schema:
|
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql="urn:schemas-microsoft-com:mapping-schema"> <xsd:element name="Customers" sql:relation="Customers" > <xsd:complexType> <xsd:sequence> <xsd:element name="CustomerID" sql:field="CustomerID" type="xsd:string" /> <xsd:element name="ContactName" sql:field="ContactName" type="xsd:string" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> |
If you run the page it should perform the update and you can check that the change has indeed been made via Query Analyzer. Note you may need to ensure the ASPNET account has sufficient write access to the involved files.
Note that DiffGrams can insert or delete data as well as modify data. For an insertion the DiffGram would contain the data for the new row but no old data and for a deletion data for the old row but not the new.
That's the final article in our overview of handling XML data in .NET. In this part we looked at XSD schema issues as well as the facilities available within SQLServer.
I hope the entire series of three articles has introduced you to some new ideas that will prove useful in your dealings with XML data using .NET and SQLServer.
.NET SDK
Developing XML WebServices and Server Components with VB.NET and the .NET Framework
Mike Gunderloy
Que
SQLServer Books Online
MSDN
http://www.msdn.microsoft.com