Accessing and Manipulating XML Data in .NET II...

This series of articles shall examine how we can access and manipulate XML data using VB.NET and SQLServer.


By: Chris Sully Date: August 26, 2003 Download the code.

Accessing and Manipulating XML Data in .NET II

Part II: XPath, DataSets and XmlDataDocuments

Knowledge assumed: VB.NET, VS.NET, XML; Part I

Introduction

This series of articles examines how we can access and manipulate XML data using VB.NET and SQLServer. In particular we covered the following in part I of the series:

1 Accessing data in an XML file
    DOM
    XmlReader
    XmlNode and XmlDocument

and shall finish off our look at access methods in this article with a look at using XPath to select and navigate XML nodes before continuing on to

2 Synchronising DataSets with XML via the XmlDataDocument

The topics then remaining for discussion in subsequent articles in this series are then:

3 XSD Schemas

4 XML and SQLServer
    Generating data: FOR XML, ExecuteXmlReader
    Updating data: SQLXML, DiffGrams

which I intend to cover in the 3rd and final article in this series.

XPath

XPath is another W3C standard and .NET supports version 1.0 of the standard. It can be thought of as a query language conceptually similar to SQL. Whereas SQL allows you to select a set of information from a table or set of tables, XPath allows you to select a set of nodes from the DOM representation of an XML document.

Why would you want to use XPath? For reasons of speed and efficiency – searches are commonly faster and consume fewer system resources than alternative approaches.

An important concept in XPath is current context which defines the set of nodes that will be searched by the XPath query. There are four choices:

./ : the current node
/ : the root node
.// : the XML hierarchy starting with the current node
// : the whole XML document

To identify a set of elements using XPath, you use the path down the DOM tree structure to those elements, separating tags with forward slashes. For example, the following XPath expression selects all the title elements in the articles.xml file (as employed in part I of this series of articles):

/articles/article/title

You could select all the Title elements without having to concern yourself with the exact path to them by using:

//title

You can also use wildcards in your expression:

/articles/*/title

If interested in attributes rather than elements the convention is to prefix the attribute name with an ‘@’:

//article/@pages

@* allows selection of all attributes:

//article/@*

though in this particular case the result will be the same as page is the only attribute in the document.

XPath also offers a predicate language to allow you to specify smaller groups of nodes in the XML tree, similar to the filtering capability offered by the SQL WHERE clause. For example, you could specify a particular value of an element:

/articles/article/[./PubDate=”12/12/2002”]

which would give you any and all articles published on this date. You can similarly filter on attributes.

XPath supports a selection of filtering functions, e.g.

/articles/article[starts-with(title,”.NET”)]

would return all items with a title starting with .NET.

Square brackets are used to indicate indexing with collections indexed from 1. Thus

/articles/article[1]

would return the first article node.

There is a last() function that will return the last element in a collection:

/articles/article[last()]

Finally, another useful operator is '|' which means union and so you can form the union of two XPath expressions to create a result set.

That's a very brief overview of the XPath language. Let's now introduce a couple of the key classes in .NET that utilise XPath expressions.

SelectNodes

The SelectNodes method of the XmlDocument takes an XPath expression and evaluates that expression over the document. The following example will demonstrate how to use SelectNodes within your code as well as allowing you to familiarise yourself with XPath via experimentation.

Drag and drop textbox, label and button controls to a web form and name them tbXPath, lblOutput and btnSearch. In the code behind file enter the following code to handle the button click:

'load the xml file
Dim xtr As XmlTextReader = New XmlTextReader(Server.MapPath("articles.xml"))
xtr.WhitespaceHandling = WhitespaceHandling.None
Dim xd As XmlDocument = New XmlDocument
xd.Load(xtr)
'retrieve matching nodes
Dim xnl As XmlNodeList = xd.DocumentElement.SelectNodes(tbXPath.Text)
'output the results
lblOutput.Text = ""
Dim xnod As XmlNode
For Each xnod In xnl
  'for elements display the corresponding text entity
  If xnod.NodeType = XmlNodeType.Element Then
    lblOutput.Text += (xnod.NodeType.ToString & ": " & xnod.Name & " = " & xnod.FirstChild.Value & "<br/>")
  Else
    lblOutput.Text += (xnod.NodeType.ToString & ": " & xnod.Name & " = " & xnod.Value & "<br/>")
  End If
Next
'clean up
xtr.Close()

Remember to include 'Imports System.Xml' in your code behind file.

This program allows you to enter an XPath expression and displays the results of applying the expression to articles.xml. It achieves this by instantiating an XmlDocument object and using the SelectNodes method of the object with the entered XPath expression as the argument.

XPathNavigator

The XPathNavigator class of the system.xml.XPath namespace provides read-only, random access to XML documents (as opposed to the forward only access provided by the XmlReader class). Two distinct tasks can be performed by the XPathNavigator class: selecting a set of nodes with an XPath expression and navigating the DOM representation of an XML document.

Selecting nodes

The XPathNavigator class can be used by any of XmlDocument, XmlDataDocument and XPathDocument objects though the latter should be used if the primary interest is in performing XPath operations as this object is optimised for such query operations.

The XPathDocument class exposes the CreateNavigator method which returns an XPathNavigator object for use. Similarly to the XmlReader a pointer to a current node in the DOM is maintained. An example:

'load the xml
Dim xpd As XPathDocument = New XPathDocument(Server.MapPath("articles.xml"))
'create the associated navigator
Dim xpn As XPathNavigator = xpd.CreateNavigator()
'select nodes to match the suppliedexpression
Dim xpni As XPathNodeIterator = xpn.Select(tbXPath.Text)
'output the results
lblOutput.Text = ""
While xpni.MoveNext
  lblOutput.Text += (xpni.Current.NodeType.ToString & ": " & xpni.Current.Name & " = " & xpni.Current.Value & "<br/>")
End While

is the code behind the button click on this occasion with the corresponding web controls exactly as per the previous example (SelectNodes). Use the 'Imports System.Xml.XPath' directive this time.

Thus the Select method of the XPathNavigator class returns an XPathNodeIterator object allowing you to visit each of the selected set of nodes in turn. This class exposes a Current property and a MoveNext method which are used to move through the nodeset above, presenting pertinent information along the way.

From the output of the program you can see that the node Value property is the concatenated text of all nodes beneath that node.

Navigating the DOM

The XPathNavigator class can also be used to move around the DOM via the following exposed methods, amongst others:

MoveToRoot
MoveToParent
MoveToPrevious
MoveToNext
MoveToFirstChild

Importantly the MoveTo method (and related) will never throw an error – it will return false when the requested navigation cannot be performed.

For further information see the SDK documentation.

Synchronising DataSets with XML via the XmlDataDocument

Returning to the DataDocument class introduced in part I of this series of articles we'll now take a look out how XML data and DataSet objects may be synchronised. You'll be aware that ADO.NET is able to provide a complete in-memory representation of a relational database through the DataSet object. The System.Xml namespace adds functionality to synchronise the DataSet with an equivalent XML file, and vice versa.

The XmlDocument class allows you to work with XML via the DOM but for interaction with DataSets you need the XmlDataDocument class which inherits from the XmlDocument but adds some members including a DataSet property which exposes a DataSet representation of the XmlDataDocument and a Load method which loads the XmlDataDocument and synchronises it with a DataSet.

The synchronisation process can be undertaken starting from a variety of objects: an XmlDataDocument, a DataSet or a schema-only DataSet. There follows an example of creating a DataSet from an XmlDataDocument before briefly considering the remaining two options.

Add a DataGrid called dgXML to your web for. In the page load event of the web form place the following code:

Dim xtr As XmlTextReader = New XmlTextReader(Server.MapPath("articles.xml"))
'the object to synchronize
Dim xdd As XmlDataDocument = New XmlDataDocument
'the associated DataSet
Dim ds As DataSet = xdd.DataSet
'initialise the DataSet by reading the schema from the XML document
ds.ReadXmlSchema(xtr)
'reset the XmlTextReader (forward only)
xtr.Close()
xtr = New XmlTextReader(Server.MapPath("articles.xml"))
'ignore whitespace
xtr.WhitespaceHandling = WhitespaceHandling.None
'load the synchronized object
xdd.Load(xtr)
'display the resulting DataSet
dgXML.DataSource = ds
dgXML.DataMember = "article"
dgXML.DataBind()
'clean up
xtr.Close()

Import the system.data and system.xml namespaces before running the code.

This code is a little more complex than you might expect as two steps are necessary: you must explicitly create the DataSet schema from the XML before you can actually load the data – hence the double use of the XmlTextReader. In this case we automatically infer a schema that matches the Xml document anyway.

If you start with a DataSet (dsDataSet) you can create a DataDocument using one of its constructors, e.g.

dim xd as XmlDataDocument = new XmlDataDocument(dsDataSet)

If you have a schema you would read in the explicit schema to the DataSet, create a matching XmlDataDocument from this DataSet then load the xml file into the XmlDataDocument, e.g.

Dim ds as DataSet = new DataSet()
ds.ReadXmlSchema("articles.xsd")
Dim xd as XmlDataDocument = new XmlDataDocument(ds)
xd.Load("articles.xml")

What's the advantage of this? Well you can specify the schema to exclude items of the data you are not interested in and they won't be included in the XmlDataDocument. Thus you can insure data you are not interested in are filtered out.

Thus we've seen that the synchronisation process can be started from a number of objects. Once the objects are synchronised changes to one object are automatically reflected in the other. You then have the flexibility of using the full range of methods and properties available to either class. For example you might do some manipulation via the DataSet before saving the XML to a file using the XmlDataDocument.

Conclusion

That's article two of three in our look at XML data in .NET completed. In this part we continued to examine how we may access data via the facilities available within the .NET Framework, namely those that support the XPath query language. We also looked out how we can synchronise data between XML and DataSets via the XmlDataDocument.

In the next and final article in this series we'll look at XSD Schemas and how SQLServer can be used to integrate with our .NET XML based applications.

References

.NET SDK

Developing XML WebServices and Server Components with VB.NET and the .NET Framework
Mike Gunderloy
Que

You may download the example code here.