Accessing and Manipulating XML Data in .NET I...
This series of articles shall examine how we can access and manipulate XML data using VB.NET and SQLServer.
Part I: DOM, XMLReader, XMLNode and XmlDocument.
Knowledge assumed: VB.NET, VS.NET, XML.
This series of articles shall examine how we can access and manipulate XML data using VB.NET and SQLServer. In particular we shall be looking at the following:
1 Accessing data in an XML file
DOM
XmlReader
XmlNode and XmlDocument
XPath to select and navigate XML nodes
2 Synchronising DataSets with XML via the DataDocument
3 XSD Schemas
4 XML and SQLServer
Generating data: FOR XML, ExecuteXmlReader
Updating data: SQLXML, DiffGrams
We'll probably get through most of 1 in the first article, finish 1 and look at 2 in the second and cover 3 and 4 in the third and fourth articles.
Starting at the beginning, one of the most basic operations with XML data is simply accessing the data it contains. There are several ways to achieve this via VB.NET, as we'll start to investigate in this article. The .NET Framework allows you to access the enclosed data via treating it either as a simple stream of information or as the hierarchical data structure composed of different elements it is. The XmlReader class falls into the former category while the XmlNode and XmlDocument classes facilitate the more structured access to the data of the second category.
Before we start looking at the first of the 3 alternate techniques let's make sure you're happy with the concept of the DOM.
The Document Object Model is the W3C Internet standard for representing the information contained in an XML document as a tree of nodes. A problem arises in that different vendors implement the W3C standard in different ways and to different degrees. The .NET Framework includes support for the DOM level 1 core and DOM level 2 core specifications but it also extends the standard with its own classes and class members, as Microsoft has a tendency to do.
An XML document is a series of nested items including elements and attributes. This nested structure can be represented as a tree structure by making the outermost nested item the root of the tree, the next in items this root element's children, etc. The DOM specifies the standard for constructing this tree including a classification for individual nodes and rules for which nodes can have children.
Take for example the following XML document that we shall use throughout this series of articles.
Listing 1: articles.xml
|
<?xml version="1.0" encoding="utf-8" ?> <articles> <article pages="8"> <title>XSLT and .NET"</title> <url>http://www.dotnetjohn.com/articles.aspx?articleid=10</url> <PubDate>2002-12-25"</PubDate> </article> <article pages="10"> <title>=An Introduction to WebServices</title> <url>http://www.aspalliance.com/sullyc/articles/intro_to_web_services.aspx</url> <PubDate>2003-01-30</PubDate>= </article> <article> <title>Caching in ASP.NET</title> <url>http://www.dotnetjohn.com/articles.aspx?articleid=13</url> <PubDate>2003-01-05</PubDate> </article> </articles> |
In it's simplest form this XML document could be represented as a tree of nodes:
| Articles | ||||||||
| Article | Article | Article | ||||||
| Title | URL | PubDate | Title | URL | PubDate | Title | URL | PubDate |
Note the absence of attributes however.
The XmlReader class is designed to provide forward-only, read-only access to an XML file. It does this via the concept of a singular current node. You may move to the next XML node via the read method that makes this the current node.
You have various other methods and properties at your disposal, a sub-sample of which we'll encounter in the example that shortly follows and many more are available for your further investigation within the SDK or elsewhere.
The XmlReader is an abstract class, i.e. you cannot create objects instantiated directly from this class but only from its child classes. Typically, this will be the XmlTextReader class which implements XmlReader for use with text streams.
Let's look at an example. I'm going to be using VS.NET so if I use a VS.NET IDE feature just do the equivalent in whatever IDE you are using.
Add an XML document to your project, e.g. articles.xml from listing 1 above.
Add a new web form. Add the following code to the page load event of the page, remembering to also import the system.xml namespace into the page:
|
Dim intI As Integer Dim strNode As String Dim xtr As XmlTextReader = New XmlTextReader(Server.MapPath("articles.xml")) Do While xtr.Read strNode = "" For intI = 1 To xtr.Depth strNode &= "-" Next strNode &= "Name:" & xtr.Name & " " strNode &= "Nodetype:" & xtr.NodeType.ToString & " " If xtr.HasValue Then strNode &= "Value:" & xtr.Value End If Response.Write(strNode & "<br/>") Loop xtr.Close() |
View the page and review the output. You'll get something along the lines of the following, depending on your XML document.
Name:xml Nodetype:XmlDeclaration Value:version="1.0" encoding="utf-8"
Name: Nodetype:Whitespace Value:
Name:articles Nodetype:Element
-Name: Nodetype:Whitespace Value:
-Name:article Nodetype:Element
--Name: Nodetype:Whitespace Value:
--Name:title Nodetype:Element
---Name: Nodetype:Text Value:XSLT and .NET
--Name:title Nodetype:EndElement
--Name: Nodetype:Whitespace Value:
--Name:url Nodetype:Element
---Name: Nodetype:Text Value:http://www.dotnetjohn.com/articles.aspx?articleid=10
--Name:url Nodetype:EndElement
--Name: Nodetype:Whitespace Value:
--Name:PubDate Nodetype:Element
---Name: Nodetype:Text Value:2002-12-25
--Name:PubDate Nodetype:EndElement
and so on.
You can see that the DOM includes nodes for everything in the XML file, including the XML declaration as well as any whitespace, e.g. carriage returns between lines.
Looking at some of the class members used:
| Depth | the depth of the current node in the XML document |
| NodeType | a member of the XmlNodeType enumeration |
| HasValue | gets a value indicating whether the current node can have a value |
| Value | that value |
| Name | e.g. PubDate in our xml document |
You may also have noticed the absence of any XML attribute data in the above. Although the DOM does not consider attributes to be nodes, Microsoft has provided the MoveToNextAttribute method to treat them as nodes. You can alternatively retrieve attributes via the Item property of the XmlTextReader.
As I'm sure you know the items represented in the XML tree are referred to as nodes. Many different entities can be represented as such nodes: elements, attributes, whitespace, end tags, and so on. Each of these is categorised as a distinct node type within the DOM and via the XmlNodeType enumeration of the .NET Framework.
The XmlNode class can be used to represent any such node from the DOM representation of an XML document. You could use this object to simply expose the data and metadata for use and / or you could then alter the properties of this object and write the object properties back to the original XML document. Thus the XmlNode class provides 2-way access to the underlying XML, an extension of the capabilities of the XmlReader class.
Further the system.xml namespace provides further classes that inherit from the XmlNode class that represent particular types of nodes, as identified by the XmlNodeType enumeration.
You work with XmlNode objects via the parent XmlDocument object. So you would create an XmlDocument from an XML file and then retrieve XmlNode objects from the XmlDocument.
In the following example we'll navigate through the DOM representation of an XML document using the XmlNode and XmlDocument classes. As before add the following code to the page load event of a web form:
|
Dim intI, intJ As Integer Dim strNode As String Dim xtr As XmlTextReader = New XmlTextReader(Server.MapPath("articles.xml")) xtr.WhitespaceHandling = WhitespaceHandling.None Dim xd As XmlDocument = New XmlDocument xd.Load(xtr) Dim xnodRoot As XmlNode = xd.DocumentElement Dim xnodWorking As XmlNode If xnodRoot.HasChildNodes Then xnodWorking = xnodRoot.FirstChild While Not IsNothing(xnodWorking) ProcessChildren(xnodWorking, 0) xnodWorking = xnodWorking.NextSibling End While End If |
and also add the ProcessChildren subroutine:
|
Private Sub ProcessChildren(ByVal xnod As XmlNode, _ ByVal Depth As Integer) Dim strNode As String Dim intI, intJ As Integer Dim atts As XmlAttributeCollection 'we're only going to process Text and Element nodes If (xnod.NodeType = XmlNodeType.Element) Or (xnod.NodeType = XmlNodeType.Text) Then strNode = "" For intI = 1 To Depth strNode &= " " Next strNode = strNode & xnod.Name & " " strNode &= xnod.NodeType.ToString strNode = strNode & ": " & xnod.Value lblOutput.Text += strNode & "<br/>" 'attributes too atts = xnod.Attributes If Not atts Is Nothing Then For intJ = 0 To atts.Count - 1 strNode = "" For intI = 1 To Depth + 1 strNode &= "-" Next strNode = strNode & atts(intJ).Name & " " strNode &= atts(intJ).NodeType.ToString strNode = strNode & ": " & atts(intJ).Value lblOutput.Text += strNode & "<br/>" Next End If 'recursively process the children of this node Dim xnodworking As XmlNode If xnod.HasChildNodes Then xnodworking = xnod.FirstChild While Not IsNothing(xnodworking) ProcessChildren(xnodworking, Depth + 1) xnodworking = xnodworking.NextSibling End While End If End If End Sub |
Remember again to Import the namespace (Imports System.Xml). View the form in your browser and the xml file should be processed and displayed.
This example uses recursion to visit all nodes in the XML file. It starts at the root node of the document, returned via the DocumentElement property of the XmlDocument object, and visits each child node in turn displaying a little information along the way.
Review the class members used in the code above. Again I suggest you investigate these and the other class members available in further detail via the SDK or similar reference source.
That's the first article of our overview of dealing with XML data in .NET done and dusted. In this part we started to examine how we may access data via the facilities available within the .NET Framework, namely the XmlReader, XmlNode and XmlDocument classes. In the next article we'll finish off our look at accessing XML data with a look at how .NET supports the XPath query language and continue on to looking at how DataSets and XML can be synchronised via the XmlDataDocument class.
.NET SDK
Developing XML WebServices and Server Components with VB.NET and the .NET Framework
Mike Gunderloy
Que