Friday, January 28, 2011

Reading Excel in VB- Connection Strings and Registry Edits

A client had a request to upload a .xls file and place the data into a SQL table.  At the time there was only the Microsoft Jet 4.0 engine in x86 (32bit).   The connection string for VB is:
myconnection = New System.Data.OleDb.OleDbConnection("provider=Microsoft.Jet.OLEDB.4.0;data source=C:\filename.xls;Extended Properties=" + Chr(34) + "Excel 8.0;HDR=Yes;IMEX=1" + Chr(34) + ";")
The problem was that as the code ran through the rows, some of the values would be returned as null even though there was data in the cell.  The reason this happens is that the reader is defaulted (in the registry), to look at the first 8 rows and determine the cell format.  The way to get around this issue is to edit the registry values.
HKEY_LOCAL_MACHINE>>SOFTWARE>>Wow6432Node>>Microsoft>>Jet>>4.0>>Engines>>Excel
We want to make sure that when there is a "Mixed Type" of data imported, we want the data read as Text, and we want set TypeGuessRows to 0, meaning scan all rows before deciding the format of the data.

As time progressed, the client decided that in 2010, people should be able to upload .xlsx files as well.  So I downloaded the appropriate 2007 Office System Driver Connectivity Components, which would only handle .xlsx files, not .xls files...  No worries, my connection string turned into:
 If LCase(Right(strFilePath, 3)) = "xls" Then
myconnection = New System.Data.OleDb.OleDbConnection("provider=Microsoft.Jet.OLEDB.4.0;data source=C:\filename.xls;Extended Properties=" + Chr(34) + "Excel 8.0;HDR=Yes;IMEX=1" + Chr(34) + ";")
End If
If LCase(Right(strFilePath, 4)) = "xlsx" Then
myconnection = New System.Data.OleDb.OleDbConnection("provider=Microsoft.ACE.OLEDB.12.0;data source=C:\filename.xls;Extended Properties=" + Chr(34) + "Excel 12.0;HDR=Yes;IMEX=1" + Chr(34) + ";")
End If
Pretty simple fix... 
A couple of weeks later I read about the 2010 Office System Driver Connectivity Components, which will handle .xls and .xlsx files, and can be downloaded here.  I uninstalled the 2007 package and installed the 2010, and went to update my connection string as per Microsoft's How to Use Instruction #2.
myconnection = New System.Data.OleDb.OleDbConnection("provider=Microsoft.ACE.OLEDB.12.0;data source=C:\filename.xls;Extended Properties=" + Chr(34) + "Excel 14.0;HDR=Yes;IMEX=1" + Chr(34) + ";")
This returned a "Could not find installable ISAM" error.  After searching and reading, I discovered that the only modifications necessary were to remove the "If/Then" statements.  My connection string now looked like:
myconnection = New System.Data.OleDb.OleDbConnection("provider=Microsoft.ACE.OLEDB.12.0;data source=C:\filename.xls;Extended Properties=" + Chr(34) + "Excel 12.0;HDR=Yes;IMEX=1" + Chr(34) + ";")
Everything ran well except now the .xls and .xlsx files were returning null values where cells had data.  Time to run Regedit.
HKEY_LOCAL_MACHINE>>SOFTWARE>>Wow6432Node>>Microsoft>>Office>>14.0>>Access Connectivity Engine>>Engines>>Excel
And just like before, we want set TypeGuessRows to 0.
2 folders below the Engines folder is the ISAM folder.  Take a look in there, and you will notice that the Excel 12.0 is for *.xlsb files, and that 12.0 Xml is for *.xlsx files.  My final connection string looks like this:
Dim myconnection As System.Data.OleDb.OleDbConnection = New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0; Data Source=C:\filename.xl; Extended Properties=" + Chr(34) + "Excel 12.0 Xml;HDR=YES;IMEX=1" + Chr(34) + ";")
It's the only one needed for reading both .xls and .xlsx files.

No comments:

Post a Comment