Making SOA Real: January 2006

Today I thought I'd drill into one of the core components we use in our service oriented infrastructure and invoked from inside our EDAF business actions and business components that implement our service operations. Like most applications, in order communicate with our enterprise database we use a data access helper component written in .NET v1.1 called the DataFactory. First though, a short digression on the nature of data access helpers.

Architecturally, Data Access Helpers are Data Access Logic (DAL) Components that are internal to the Data Services Layer. As such, these components are called from the bodies of methods in the components that make up the public interface of the Data Services Layer. As a result, they are used via composition with other Data Access Logic components and primarily for the following reasons:

Code Simplification. By abstracting code that is often duplicated in similar classes, developers can be more productive by having less code to write.
Provider Abstraction. Since ADO.NET is built on the concept of .NET Data Providers, or specific concrete classes used to connect to either a narrow (like SQL Server or Oracle) or broad (like ODBC and OLE DB) data source, data access helpers can abstract the provider code thereby allowing the code to execute against different providers without recompilation.
Database Interface. Data Access helpers can also incorporate code to dynamically discover and call database object such as stored procedures. In this way, data access helpers provide an interface to the database.

In addition to the MSDN Data Access Application Blocks, you can use a variety of home grown approaches, two of which I've discussed more fully in chapter 18 of Teach Yourself ADO.NET in 21 Days. In order to understand how our DataFactory works let's first review it's poor cousin the ProviderFactory.

Provider Factory
This class of approximately 120 lines is a variant of the Abstract Factory Pattern documented by the GoF. This class abstracts the creation of .NET Data Provider specific objects into a single class that exposes factory methods for each of the four primary kinds of concrete classes in the provider.

The parent object then simply needs to create an instance of the ProviderFactory and call the factory’s methods in place of the instantiation of concrete classes. For example:

IDbCommand com = _pf.CreateCommand(“usp_GetProduct”, con);

As a result, there is basically a one to one correspondence between the code with or without the ProviderFactory.

Internally, the ProviderFactory relies on the Activator.CreateInstance method to create objects of the appropriate type and cast them to the generic interface supported in the System.Data namespace.

For example, the code for the CreateCommand method is as follows:


Public Function CreateCommand(ByVal cmdText As String,
  _ ByVal connection As IDbConnection) As IDbCommand
  Dim c As IDbCommand
  c = CType(Activator.CreateInstance(_comType(_pType), _
    False), IDbCommand)
  c.CommandText = cmdText
  c.Connection = connection
  Return c
End Function

You’ll notice that the CreateInstance method accepts the type of object to create stored in the array _comType. The array is defined as a private class level variable like so:

Private _comType(3) As Type

And populated in the instance constructor as follows:


_comType(ProviderType.SqlClient) = GetType(SqlCommand)
_comType(ProviderType.OLEDB) = GetType(OleDbCommand)
_comType(ProviderType.ODBC) = GetType(OdbcCommand)
_comType(ProviderType.Oracle) = GetType(OracleCommand)

The ProviderType enumeration is a custom enumeration that references the supported providers. In this case the four providers that ship with VS .NET 2003 are hardcoded into the constructor.

Now let's move on to the DataFactory and discuss the three key features of the factory including:

Provider Abstraction. Since the data factory needn’t expose provider specific objects for the most part, the technique used to abstract the provider is different from that used in the provider factory. Although it could also be rolled into the provider factory class. The data factory class is hard-coded to use any of the four providers that ship with VS .NET 2003 but can extended through a configuration file to include other providers.
Database Independence. Independence from syntax particular to a database server is accomplished in the data factory through the use of XML statement files. These files include the database specific syntax (i.e. the actual Transact-SQL statements used against SQL Server) to define the commands that get executed. Statement files can be used both with stored procedures and inline SQL.
Caching. Internally, the data factory caches instantiated command objects in a shared Hashtable that is checked each time a command is executed. In order to alleviate contention for the command object, its ICloneable interface is used to create copies of the object when it is needed.

The data factory class exposes the following members.

*CacheFilePath
*Connection
*Provider
*UseCache
BeginTransaction
CreateDataAdapter
CreateSqlFiles (shared)
CreateSqlFile (shared)
ExecuteDataReader
ExecuteNonQuery
ExecuteScalar
ExecuteSqlXmlReader
GetDataSet
GetDataTable
GetProviders (shared)
GetXsltResults
RemoveStatement
SyncDataSet
SqlFilesCreated (event)

The “Execute” methods execute statements and either don’t return results or return a reader that must be traversed. The “Get” methods return complete objects.

The class makes heavy use of overloading to supply varying signatures for many of the methods. For example, the ExecuteNonQuery method includes four overloaded signatures defined as follows:

public int ExecuteNonQuery(string statement, HybridDictionary parms)

public int ExecuteNonQuery(string statement, HybridDictionary parms,
ref int returnVal)

public int ExecuteNonQuery(string statement, HybridDictionary parms,
ref int returnVal, IDbTransaction transaction)

public int ExecuteNonQuery(string statement, HybridDictionary parms,
 ref int returnVal, IDbTransaction transaction,
 ref Object[] outputParms)

Note that the statement name is passed into the methods as a string while the parameters to the statement are passed in a HybridDictionary object.

Note: A System.Collections.Specialized.HybridDictionary object uses a ListDictionary for internal storage when the number of items in the collection is small and a Hashtable when the number of items is large. This is the case since the Hashtable is not as efficient for storing a small number of elements.

The Provider property, which can be passed to the constructor or is defaulted to SqlClient, is used by the private _createProviderTypes method to create Type objects to hold the provider specific types. It does this either by using a hardcoded case statement or reading from a DataFactory.config file.

Once the types are created it uses the CreateInstance method of the System.Activator class to instantiate the provider specific objects at the appropriate times as shown in the snippet from _createProviderTypes:


Private Sub _createProviderTypes()
 Select Case Me.Provider
  Case "SqlClient"
    _conType = GetType(SqlConnection)
    _comType = GetType(SqlCommand)
    _drType = GetType(SqlDataReader)
    _daType = GetType(SqlDataAdapter)
    _parmType = GetType(SqlParameter)
  Case Else
    ' Load the types from the configuration file
 End Select

 ' Create an instance of the connection object
 _connection = CType(Activator.CreateInstance(_conType, _
   False), IDbConnection)
 _connection.ConnectionString = _connect
End Sub

In order to abstract the vendor specific SQL syntax from the data factory, each SQL command is encapsulated in an XML statement file like that here.


<?xml version="1.0" encoding="utf-8" ?>
<DataFactory>
  <Statement name="GetTitles" type="StoredProcedure">
    <Sql>usp_GetTitles</Sql>
    <Parameters>
     <Param name="author" SqlName="@author" type="string“
       maxLength="30" direction="Input" />
     <Param name="title" SqlName="@title" type="string“
       maxLength="100" direction="Input" />
    </Parameters>
  </Statement>
</DataFactory>

Note that the statement file includes a name attribute that contains the friendly name while the Sql element contains the SQL statement. Each parameter to the statement is defined in a Param element and optionally includes the length, direction, and the SourceColumn to map the parameter to in a DataTable.

In this case a stored procedure is referenced but statement files can also be used to handle inline SQL. For example, the Sql element could be replaced with:

SELECT * FROM Titles WHERE author = @author AND title LIKE @title + ‘%’

And the resulting statement file would be functionally equivalent to that shown on the slide.

Internally, the data factory class uses shared, synchronized, provider-specific Hashtable objects (since there may be multiple instances of the data factory using different providers) to store statement objects that have been parsed into structures using an XmlTextReader. The structure definitions are (in VB):


Friend Structure Statement
  Public Name As String
  Public SQL As String
  Public CommandType As CommandType
  Public Parms As ArrayList
  Public Command As IDbCommand
End Structure

Friend Structure Parm
  Public Name As String
  Public SQLName As String
  Public Type As String
  Public Direction As ParameterDirection
  Public maxLength As Integer
  Public SourceColumn As String
End Structure

The Hashtable object that holds each of the provider-specific Hashtable objects is declared as (in VB):


Private Shared _provCache As Hashtable =
  Hashtable.Synchronized(New Hashtable())

After the statement files have been created and are read into an instance of the Statement structure they are placed in provider-specific Hashtable that is assigned to the instance when the Provider property is set.

When a statement is executed the private _getStatement method does the following:

1. Determine if the statement is in the cache
2. If not it loads and parses the statement file creating the command object with parameters before placing the command in the cache
3. If so then it pulls the command object out of the cache and clones it
4. It then populates the parameters
5. And finally returns the command

The most interesting code in this method is that used to find the statement if it exists and then to instantiate the command object accordingly.


' See if its in the cache
If Not Me.UseCache OrElse _procs.ContainsKey(statement) Then
  ' Pull it out of the cache
  s = CType(_procs.Item(statement), statement)
  newCom = CType(_cloneObject(s.Command), IDbCommand)
Else
  s = _getStatementFromFile(statement)
  ' Build the command, add the parameters
  com = CType(Activator.CreateInstance(_comType, False), _
   IDbCommand)
  com.CommandText = s.SQL
  com.CommandType = s.CommandType
  ' Add the parameters
  ' Add the statement to the cache and clone it
End If
' Populate the parameters
Return newCom

The _cloneObject method relies on the ICloneable interface to create a copy of the command object whos parameters can be populated.

Note: Only providers who command objects support deep copies using the ICloneable interface will work with the data factory.

We've had good success with the DataFactory since it simplifies the code we need to write in our business actions, provides a level of abstraction when stored procedures change, and improves performance by caching command objects.

Making SOA Real

Tuesday, January 31, 2006

Using Local Transactions

Thursday, January 26, 2006

Data Access Helpers

About Me

Recent Posts

All Posts

Links

SOA Links

Other Blogs

Archives