Accessing and Manipulating XML Data in .NET II...
This series of articles shall examine how we can access and manipulate XML data using VB.NET and SQLServer.
Part II: XPath, DataSets and XmlDataDocuments
Knowledge assumed: VB.NET, VS.NET, XML; Part I
This series of articles examines how we can access and manipulate XML data using VB.NET and SQLServer. In particular we covered the following in part I of the series:
1 Accessing data in an XML file
DOM
XmlReader
XmlNode and XmlDocument
and shall finish off our look at access methods in this article with a look at using XPath to select and navigate XML nodes before continuing on to
2 Synchronising DataSets with XML via the XmlDataDocument
The topics then remaining for discussion in subsequent articles in this series are then:
3 XSD Schemas
4 XML and SQLServer
Generating data: FOR XML, ExecuteXmlReader
Updating data: SQLXML, DiffGrams
which I intend to cover in the 3rd and final article in this series.
XPath is another W3C standard and .NET supports version 1.0 of the standard. It can be thought of as a query language conceptually similar to SQL. Whereas SQL allows you to select a set of information from a table or set of tables, XPath allows you to select a set of nodes from the DOM representation of an XML document.
Why would you want to use XPath? For reasons of speed and efficiency searches are commonly faster and consume fewer system resources than alternative approaches.
An important concept in XPath is current context which defines the set of nodes that will be searched by the XPath query. There are four choices:
./ : the current node
/ : the root node
.// : the XML hierarchy starting with the current node
// : the whole XML document
To identify a set of elements using XPath, you use the path down the DOM tree structure to those elements, separating tags with forward slashes. For example, the following XPath expression selects all the title elements in the articles.xml file (as employed in part I of this series of articles):
/articles/article/title
You could select all the Title elements without having to concern yourself with the exact path to them by using:
//title
You can also use wildcards in your expression:
/articles/*/title
If interested in attributes rather than elements the convention is to prefix the attribute name with an @:
//article/@pages
@* allows selection of all attributes:
//article/@*
though in this particular case the result will be the same as page is the only attribute in the document.
XPath also offers a predicate language to allow you to specify smaller groups of nodes in the XML tree, similar to the filtering capability offered by the SQL WHERE clause. For example, you could specify a particular value of an element:
/articles/article/[./PubDate=12/12/2002]
which would give you any and all articles published on this date. You can similarly filter on attributes.
XPath supports a selection of filtering functions, e.g.
/articles/article[starts-with(title,.NET)]
would return all items with a title starting with .NET.
Square brackets are used to indicate indexing with collections indexed from 1. Thus
/articles/article[1]
would return the first article node.
There is a last() function that will return the last element in a collection:
/articles/article[last()]
Finally, another useful operator is '|' which means union and so you can form the union of two XPath expressions to create a result set.
That's a very brief overview of the XPath language. Let's now introduce a couple of the key classes in .NET that utilise XPath expressions.
The SelectNodes method of the XmlDocument takes an XPath expression and evaluates that expression over the document. The following example will demonstrate how to use SelectNodes within your code as well as allowing you to familiarise yourself with XPath via experimentation.
Drag and drop textbox, label and button controls to a web form and name them tbXPath, lblOutput and btnSearch. In the code behind file enter the following code to handle the button click:
|
'load the xml file Dim xtr As XmlTextReader = New XmlTextReader(Server.MapPath("articles.xml")) xtr.WhitespaceHandling = WhitespaceHandling.None Dim xd As XmlDocument = New XmlDocument xd.Load(xtr) 'retrieve matching nodes Dim xnl As XmlNodeList = xd.DocumentElement.SelectNodes(tbXPath.Text) 'output the results lblOutput.Text = "" Dim xnod As XmlNode For Each xnod In xnl 'for elements display the corresponding text entity If xnod.NodeType = XmlNodeType.Element Then lblOutput.Text += (xnod.NodeType.ToString & ": " & xnod.Name & " = " & xnod.FirstChild.Value & "<br/>") Else lblOutput.Text += (xnod.NodeType.ToString & ": " & xnod.Name & " = " & xnod.Value & "<br/>") End If Next 'clean up xtr.Close() |
Remember to include 'Imports System.Xml' in your code behind file.
This program allows you to enter an XPath expression and displays the results of applying the expression to articles.xml. It achieves this by instantiating an XmlDocument object and using the SelectNodes method of the object with the entered XPath expression as the argument.
The XPathNavigator class of the system.xml.XPath namespace provides read-only, random access to XML documents (as opposed to the forward only access provided by the XmlReader class). Two distinct tasks can be performed by the XPathNavigator class: selecting a set of nodes with an XPath expression and navigating the DOM representation of an XML document.
The XPathNavigator class can be used by any of XmlDocument, XmlDataDocument and XPathDocument objects though the latter should be used if the primary interest is in performing XPath operations as this object is optimised for such query operations.
The XPathDocument class exposes the CreateNavigator method which returns an XPathNavigator object for use. Similarly to the XmlReader a pointer to a current node in the DOM is maintained. An example:
|
'load the xml Dim xpd As XPathDocument = New XPathDocument(Server.MapPath("articles.xml")) 'create the associated navigator Dim xpn As XPathNavigator = xpd.CreateNavigator() 'select nodes to match the suppliedexpression Dim xpni As XPathNodeIterator = xpn.Select(tbXPath.Text) 'output the results lblOutput.Text = "" While xpni.MoveNext lblOutput.Text += (xpni.Current.NodeType.ToString & ": " & xpni.Current.Name & " = " & xpni.Current.Value & "<br/>") End While |
is the code behind the button click on this occasion with the corresponding web controls exactly as per the previous example (SelectNodes). Use the 'Imports System.Xml.XPath' directive this time.
Thus the Select method of the XPathNavigator class returns an XPathNodeIterator object allowing you to visit each of the selected set of nodes in turn. This class exposes a Current property and a MoveNext method which are used to move through the nodeset above, presenting pertinent information along the way.
From the output of the program you can see that the node Value property is the concatenated text of all nodes beneath that node.
The XPathNavigator class can also be used to move around the DOM via the following exposed methods, amongst others:
MoveToRoot
MoveToParent
MoveToPrevious
MoveToNext
MoveToFirstChild
Importantly the MoveTo method (and related) will never throw an error it will return false when the requested navigation cannot be performed.
For further information see the SDK documentation.
Returning to the DataDocument class introduced in part I of this series of articles we'll now take a look out how XML data and DataSet objects may be synchronised. You'll be aware that ADO.NET is able to provide a complete in-memory representation of a relational database through the DataSet object. The System.Xml namespace adds functionality to synchronise the DataSet with an equivalent XML file, and vice versa.
The XmlDocument class allows you to work with XML via the DOM but for interaction with DataSets you need the XmlDataDocument class which inherits from the XmlDocument but adds some members including a DataSet property which exposes a DataSet representation of the XmlDataDocument and a Load method which loads the XmlDataDocument and synchronises it with a DataSet.
The synchronisation process can be undertaken starting from a variety of objects: an XmlDataDocument, a DataSet or a schema-only DataSet. There follows an example of creating a DataSet from an XmlDataDocument before briefly considering the remaining two options.
Add a DataGrid called dgXML to your web for. In the page load event of the web form place the following code:
|
Dim xtr As XmlTextReader = New XmlTextReader(Server.MapPath("articles.xml")) 'the object to synchronize Dim xdd As XmlDataDocument = New XmlDataDocument 'the associated DataSet Dim ds As DataSet = xdd.DataSet 'initialise the DataSet by reading the schema from the XML document ds.ReadXmlSchema(xtr) 'reset the XmlTextReader (forward only) xtr.Close() xtr = New XmlTextReader(Server.MapPath("articles.xml")) 'ignore whitespace xtr.WhitespaceHandling = WhitespaceHandling.None 'load the synchronized object xdd.Load(xtr) 'display the resulting DataSet dgXML.DataSource = ds dgXML.DataMember = "article" dgXML.DataBind() 'clean up xtr.Close() |
Import the system.data and system.xml namespaces before running the code.
This code is a little more complex than you might expect as two steps are necessary: you must explicitly create the DataSet schema from the XML before you can actually load the data hence the double use of the XmlTextReader. In this case we automatically infer a schema that matches the Xml document anyway.
If you start with a DataSet (dsDataSet) you can create a DataDocument using one of its constructors, e.g.
| dim xd as XmlDataDocument = new XmlDataDocument(dsDataSet) |
If you have a schema you would read in the explicit schema to the DataSet, create a matching XmlDataDocument from this DataSet then load the xml file into the XmlDataDocument, e.g.
|
Dim ds as DataSet = new DataSet() ds.ReadXmlSchema("articles.xsd") Dim xd as XmlDataDocument = new XmlDataDocument(ds) xd.Load("articles.xml") |
What's the advantage of this? Well you can specify the schema to exclude items of the data you are not interested in and they won't be included in the XmlDataDocument. Thus you can insure data you are not interested in are filtered out.
Thus we've seen that the synchronisation process can be started from a number of objects. Once the objects are synchronised changes to one object are automatically reflected in the other. You then have the flexibility of using the full range of methods and properties available to either class. For example you might do some manipulation via the DataSet before saving the XML to a file using the XmlDataDocument.
That's article two of three in our look at XML data in .NET completed. In this part we continued to examine how we may access data via the facilities available within the .NET Framework, namely those that support the XPath query language. We also looked out how we can synchronise data between XML and DataSets via the XmlDataDocument.
In the next and final article in this series we'll look at XSD Schemas and how SQLServer can be used to integrate with our .NET XML based applications.
.NET SDK
Developing XML WebServices and Server Components with VB.NET and the .NET Framework
Mike Gunderloy
Que
You may download the example code here.