Generating an Excel Workbook using a DataSet...
This article details the process of creating a Microsoft Excel workbook using the Microsoft Primary Interop Assemblies (PIA) API. The article explains what the PIAs are and how they are installed.
Microsoft Excel spreadsheets are used in many real world applications. From tracking employee extensions to charting financial statements, Excel has been integrated into the business world to provide stakeholders with an easy interface to their data. Even with the movement to more sophisticated and integrated applications, most companies still rely on the functionality of Excel. Using a DATAGRID or a reporting tool is quite often not enough for applications. Developers consistently finish tasks, only to hear, "That's great, but can you export it to Excel?"
This article details the process of creating a Microsoft Excel workbook using the Microsoft Primary Interop Assemblies (PIA) API. The article explains what the PIAs are and how they are installed. The article also explains how COM interacts with .NET, and parts of the PIA object model. The article includes an example that generates and displays an Excel workbook from a DATASET.
It is assumed you have prior knowledge of the DATASET and DATATABLE objects. Further, an understanding of Microsoft Excel is necessary in order to understand the terminology used throughout the article.
There are many options for creating an Excel spreadsheet solution using .NET. XML is probably the easiest solution. The process involves creating an XML document from the existing DATASET object. Adding the the following line of XML to your XML document integrates XML and Excel:
<?xml-stylesheet type='text/xsl' href='customers.xsl'?>
XML, however, offers you minimal functionality support and acts as a data dump. If you still want to utilize XML, there is another option. The Excel XML specification supports most Excel functionality, and has a schema defined that you can use to aide in your development. To learn more about the Excel XML language, go to http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnexcl2k2/html/odc_xlsmlinss.asp.
Third party components, such as CarlosAG Excel Writer and SoftArtisans ExcelWriter, are a little more complex to use. However, once you learn their object model – which is significantly smaller than the PIA object model – these components can make the integration of Excel very easy. Further, both products support most of the functionality required when creating an Excel spreadsheet solution.
Situations may arise where you require complete control over an Excel document or your organization prevents you from using using third party components. PIAs can be used to solve both of these problems. PIAs are COM objects that allow you to interact with any of the applications in the Microsoft Office suite. Each application has its own object library, and can be referenced in the same fashion as other .NET assemblies. For more information on how COM interacts with .NET libraries, go to: http://www.csharphelp.com/archives2/archive348.html.
If you already have the Microsoft .NET framework and Microsoft Office installed on your system and are developing an application where each client has the same version of Microsoft Office installed, you can skip this step. However, if you are working within a team environment, or are developing a WinForm application that will be used by clients that may have different version of Office, this step is necessary.
You can now use the PIA API to integrate Excel into .NET. To verify that the PIAs are installed:
You should now see a reference called Excel in you reference lists.
The Excel Object Model outlines the functionality you will use when working with Microsoft Excel. The object model contains many objects to integrate Excel with .NET, including CHART, WORKSHEET, DIALOG and SPEECH objects. This article focuses on the objects that are required to create an Excel spreadsheet. We will not examine each object in the Excel Object Model. However, once you are familiar with the basic Excel objects that we are examining, using other objects to produce interactive Excel applications is very similar.
There are (4) primary objects used to create an Excel spreadsheet solution:
The APPLICATION object represents the Excel application. Every object within the Excel PIA object model is contained within an instance of the APPLICATION object. The APPLICATION object is used to access:
The following code will create a new Excel application and workbook, we would use the following code:
Dim exclApp As Application = New PIA.Application()
Dim book1 As Workbook = CType(exclApp.Workbooks.Add(System.Reflection.Missing.Value), Workbook)
The code creates a new Excel application and stores the object in the exclApp variable. With this object, we can now create a new WORKBOOK object. The WORKBOOK object is created by calling the WORKBOOKS collection object. By invoking the Add() method, a new WORKBOOK object is returned. The Add() method can accept a name of a workbook template to use, if one exists. Since we are not going to use a template, we just pass the method the System.Reflection.Missing.Value value to ignore it.
This should cause some of you to spring forward with the question: "Why doesn't the add object take a Workbook object as its property instead of returning a Workbook object?". This is a pattern that occurs throughout the Excel object model. You do not create an object and pass it into an Add() method. Rather, you use the Add() method to return a reference to an object that the user can use. The WORKSHEETS object is the same, using its Add() method to return a reference to a WORKSHEET object.
The WORKBOOK object represents a workbook within an Excel application. Since Excel can contain multiple workbooks, each workbook can be accessed from the WORKBOOKS object, which is contained within the APPLICATION object. In the example, you use the Add() method to add a new WORKBOOK object to the collection and use the indexer property to gain reference to the existing WORKBOOK object as follows:
Dim bookRef As Workbook = CType(exclApp.Workbooks(0), Workbook)
In both examples, you casted the return value. The Excel PIAs treat almost everything as an object, and do not strongly type many of the return values and parameters for certain methods. The WORKBOOK object does not, however, allow you to enter data into Excel. To enter data, you need a worksheet. To create a worksheet in Excel, you use the WORKBOOK object. Whenever a new WORKBOOK object is created, (3) worksheets are created by default. All WORKSHEET objects are stored in the WORKSHEETS COLLECTION object. The WORKSHEETS COLLECTION is part of the WORKBOOK object. If more worksheets are required, they can be added by using the Add() method of the WORKSHEETS COLLECTION object. Further, if a worksheet needs to be removed, it can be by using the Remove() method of the WORKSHEETS COLLECTION object. The following code creates a new WORKSHEET:
Dim sheet1 As Worksheet = CType(book1.Sheets.Add(System.Reflection.Missing.Value, System.Reflection.Missing.Value, System.Reflection.Missing.Value, System.Reflection.Missing.Value), Worksheet)
The workbook (book1) exposes a SHEETS property (a reference to the WORKSHEETS COLLECTION object). The SHEETS property provides the Add() method. The Add() method creates a new WORKSHEET object within the WORKSHEETS COLLECTION and returns the new WORKSHEET object. The Add() method has four parameters:
Dim sheet1 As Worksheet = CType((Worksheet)book1.Sheets[1], System.Reflection.Missing.Value, System.Reflection.Missing.Value, System.Reflection.Missing.Value), Worksheet)
Dim sheet1 As Worksheet = CType(System.Reflection.Missing.Value, (Worksheet)book1.Sheets[2], System.Reflection.Missing.Value, System.Reflection.Missing.Value), Worksheet)
The SHEETS property also has a DELETE() method to remove all of the sheets within a workbook. Other parameters and methods inside the WORKSHEET object allow you to perform calculations, create styles, manipulate data, and to access and manipulate charts. The following code adds data to the first cell in the current worksheet:
sheet1.Cells(1,1) = "Hello, World"
The CELLS collection gives the user access to each cell within a worksheet. Cells are indexed through rows and columns, similar to cell access in a DATATABLE object. The CELLS collection brings us to our next object, the RANGE object.
The RANGE object manipulates a series of cells. A range can be 1 or more cells within an application. In the previous code, the CELLS collection returns a RANGE object to work with the data. Beyond creating the Excel application, RANGES will be the most used object within Excel. RANGES can allow you to access cells, create styles, add calculations, auto size columns, auto fill columns, and perform many other tasks. This is a key object to learn, and will be the foundation of all the data manipulation you perform with Excel.
With a basic understanding of what the main objects are and how to use them, we will now create an application that pieces all of the objects together.
The sample application is something that a developer has heard at one point or another in their career: "I want to export that data to Excel." In this case, that whiny boss voice you just heard in your head, wants his/her employee contact data to be exported to Excel. To complete our task, we will use WinForms and XML.
| Control | Properties |
|---|---|
| DataGrid | Name: dgEmployeeContacts |
| Button | Name: btnSaveToExcel Text: Save To Excel |
Your form should look as follows:
While this screen will not win the Best Designed UI award at the next .NET conference, it will get the job done. The DATAGRID displays our employee contact data. Clicking the Save To Excel button exports the displayed data into Excel.
The next step is to display the data in the DATAGRID. First, add a new XML file to the project and save it as Contacts.xml. Add the following data to the XML file:
<?xml version="1.0" encoding="utf-8" ?>
<employees>
<employee id="1">
<firstName>Tyler</firstName>
<lastName>Davey</lastName>
<phone>555-5555</phone>
<cell>444-4444</cell>
</employee>
<employee id="2">
<firstName>Joe</firstName>
<lastName>Smith</lastName>
<phone>555-5556</phone>
<cell>444-4445</cell>
</employee>
<employee id="3">
<firstName>Jane</firstName>
<lastName>Doe</lastName>
<phone>555-5557</phone>
<cell>444-4446</cell>
</employee>
</employees>
The XML document contains all of the employee contact information. There are (3) employees, Tyler Davey, Joe Smith and Jane Doe. Each employee has an ID, a phone number and a cell phone number. With our data established, we can now load the data into our DATASET. To do this, double click on the form. This action causes Visual Studio to create the event handler Form1_Load(object sender, System.EventArgs e). *Note – if you renamed your form to something else, Form1 will be replaced with the name of your form. The Form1_Load(object sender, System.EventArgs e) event handler is invoked when the application first loads.
Inside the event handler, add the following code:
Dim contactData As DataSet
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
contactData = New DataSet
contactData.ReadXml("E:\NonFiction\Articles\DotNetJohn\GeneratingExcelFromDataSet\EmployeeContactsVB\Contacts.xml")
Me.dgEmployeeContacts.DataSource = contactData
Me.dgEmployeeContacts.DataMember = "employee"
End Sub
The code utilizes a DATASET object to read in the XML file we created previously. A DATASET object represents data from a datasource into a .NET usable object. DataSets are one of the more robust and powerful objects within the .NET framework. Anytime you access a datasource, such as SQL Server, XML, or Access, the data that is returned from the datasource is generally stored within a DataSet object. The DATASET object uses the ReadXml() method to take an XML object and convert it into the DATASET. To do this, .NET breaks apart the XML datasource into logical chunks that can assume the form of the DATASET object. Looking closely at the XML object, you should be able to distinguish what represents the table (<employees>), what represents the records (<employee>) and what represents the columns (<firstName>, <lastName>, <cell>, <phone> and id).
Our DATAGRID, dgEmployeeContacts, can now assign the DataSource property to our DATASET. The DataMember property of dgEmployeeContacts specifies what the record set is. We do not have to specify this, but for display purposes, we will.
If you run the application, you should have the following screen appear:
Close the form, and double click on the button Save To Excel in the designer. This will cause the designer to move to the buttons event handler, btnSaveToExcel_Click. To take the data from our DataSet and place it into Excel, we first must create an Excel application instance and add a workbook to the application.
Dim excelApplication As PIA.Application = New PIA.Application
Dim excelWorkbook As PIA.Workbook = CType(excelApplication.Workbooks.Add(System.Reflection.Missing.Value), PIA.Workbook)
With the workbook created, just as Excel does when you instantiate a new workbook, three worksheets are created by default. To be able to write to the first worksheet, we must access it within the Worksheets collection.
Dim excelSheet As PIA.Worksheet = CType(excelWorkbook.Sheets(1), PIA.Worksheet)
With the workbook created and a reference to an Excel worksheet, we can now add our columns.
For i As Integer = 0 To contactData.Tables(0).Columns.Count - 1
CType(excelSheet.Cells(1, i + 1), PIA.Range).Value2 = contactData.Tables(0).Columns(i).ColumnName
Next i
The for loop iterates over each column within the first DATATABLE in our DATASET and places the ColumnName value into a cell within the worksheet. By specifying the row index and column index, we can access the correct cell in Excel. Remember that Excel is not 0 based. That is, the first element indexer for either the row or column position is not referenced by 0. Since all of our columns are to appear along the first row, we can hard code the value 1 as the row index.
Columns and rows within Excel are not differentiated from each other as they would be in a database application. To create that difference, we are going to add a style to our columns.
excelSheet.Range(excelSheet.Cells(1, 1), excelSheet.Cells(1, contactData.Tables(0).Columns.Count)).Font.Bold = True
excelSheet.Range(excelSheet.Cells(1, 1), excelSheet.Cells(1, contactData.Tables(0).Columns.Count)).HorizontalAlignment = PIA.XlHAlign.xlHAlignCenter
The Range() method accepts two CELL objects. The first CELL object is the starting cell, the second CELL object is the finishing cell. By using the two CELL objects, the Range() method returns a RANGE object that provides access to styles. We have set the Bold property of the Font property to true, making all our column names bold. Finally, we set the HorizontalAlignment to the enumeration value XlHAlign.xlHAlignCenter. Just a quick reminder: the Range() method can be any range. In the above code block, we have a range from the cell in the first row and first column to the cell in the first row and last column, or from (1,1) to (1,5). That will return 5 CELL objects in the RANGE object. However, we can also access a single cell by using the Range() method and specifiying a range of (1,1) to (1,1).
We now just need to add the rows to our Excel worksheet.
For i As Integer = 0 To contactData.Tables(0).Rows.Count - 1
excelSheet.Cells.Range(excelSheet.Cells(i + 2, 1), excelSheet.Cells(i + 2, contactData.Tables(0).Columns.Count)).Value2 = contactData.Tables(0).Rows(i).ItemArray
Next i
To add our records, we loop over each row in the first DATATABLE in our DATASET. Then, by using the Range() method, we place all the values from the row into the corresponding cells. When a range of cells are created, one of the properties that they have available is Value2. Value2 returns an object when it is accessed. This object is generally one of three things:
The single value is created when a single cell is specified in a range. For instance, if we were to write the following code:
Dim tmp As string
tmp = excelSheet.Cells.Range(excelSheet.Cells(2, 1), excelSheet.Cells(2, 1)).Value2.ToString()
The value of the cell in the second row in the first column would be entered in the variable tmp. An array is created when a range of cells is specified over a single row. In the for loop, we created a range that spanned a single row of cells. By calling the Value2 property, an array is returned. The ItemArray property of a DATAROW returns an array of objects. Therefore, we can use ItemArray to populate our Value2 property.
The last returned value, the 2 dimensional array, occurs when a range of cells spans multiple rows. If we modified our code to look like this:
Dim values As Object
Values = excelSheet.Cells.Range(excelSheet.Cells(2, 1), excelSheet.Cells(contactData.Tables(0).Rows.Count, contactData.Tables(0).Columns.Count)).Value2
We would return a 2 dimensional array that had all the rows and columns stored within it.
With the rows stored in Excel, our last step is to close the application and display the Excel document to the user.
excelWorkbook.Close(True, "C:\temp08102004.xls", Nothing)
System.Runtime.InteropServices.Marshal.ReleaseComObject(excelWorkbook)
System.Runtime.InteropServices.Marshal.ReleaseComObject(excelApplication)
excelWorkbook = Nothing
excelApplication = Nothing
Dim p As Process = New Process
p.StartInfo.FileName = "C:\temp08102004.xls"
p.Start()
The Close() method requires two attributes:
Another alternative to saving the file in the Close() method is to use the Save() method in the Application object. However, the Save() method will not close the application. The next four lines of code are for memory management. I can hear the moans and cries now from those reading this. Yes, .NET does handle memory management. Yes, .NET uses a garbage collector to destroy objects so you don't have to. But we aren't really dealing with .NET. Since the PIAs are COM objects, if we do not release the objects we created, the processes for those objects will continue to run even after our application has exited. By calling the ReleaseComObject() method, we tell the garbage collector to destroy our COM object as soon as possible.
Finally, we need to display the newly created Excel workbook to the user. If we did not do this, the workbook would still exist where we saved it. But for the sake of aiding our users in their day to day processes, it is nice to show them the new document. Plus, it gives me a chance to explain the SYSTEM.DIAGNOSTICS.PROCESS object.
The PROCESS object allows your application to run any application or process. This can be very advantageous to any program that runs in a client server relationship that may need to invoke another application. For instance, if you needed to launch Notepad for the user, you can use the PROCESS object to bring it up. Or you could use the PROCESS object to launch Internet Explorer. In our case, we are using it to open our new Excel file. By specifying the FileName in the StartInfo object, we just need to call the Start() method, and the application will launch.
By pressing the Save To Excel button, Excel will launch, and the document should look similar to this:
I hope I’ve given you another option to creating Excel files. While COM does have its downfalls in creating and reading Excel files, it does have its advantages over the other options available. The biggest advantage being the flexibility that the PIAs give your application. But with that flexibility comes a cost. This article should have alleviated some of those pitfalls that you will run in, as I’ve gone over most of the "discoveries" I made when I began learning the PIAs.
You may download the code here.