SQL Server As An Object Store – Not the Best but…   1 comment

     In a prior post I discussed using SQL Server as tool to store C# objects.  There are two downsides to this approach, the need to manage object size and the need to optimize retrieval times.  The basic approach is to use a key/value column layout to store the data.  We first define a column to hold the object key.  Then we need to define any other columns which will be used to aid in retrieval of object ( lets call this metadata).  In my case I am primarily interested in retrieving objects based on Key and a date range.  To optimize this we used an integer column for the id and made a primary key on this column.  I store date not as the SQL Server datetime object, which is too complex and too slow for our purposes. The simple and typical solution is to use a fixed width character format ( char(8) ) and store the date as a ISO string in the form YYYYMMDD.  This is simple and sorts in expected manner without fuss or muss. In SQL server additional indexes can be created on the metadata if needed) 

     Having defined both the key and meta data we are left with the question of how to define the column to hold the XML serialized form of the C# object.  Since we don’t know the maximum size of the XML string derived from the objects we need something which will flex as the size grows.  Currently SQL server has a maximum size for a variable length string of 8,000.  If all of our object will be 8,000 characters or less we could define the object column as: varchar(8000).  If the object size could grow beyond this maximum will need to use what SQL Server calls:  varchar(max).  In this case, at this time, varchar(max) can support up to 2,147,483,647  ( 2 to the 34th power if you are checking).  This is good and will serve for most objects.

     We need to turn additional optimization.  Looking at the (rather poorly written) SQL Server documentation on varchar(max) we find that if the string to be stored (our object serialized as an XML string) is 8,000 bytes it can be stored ‘in line’ with the data row.  If the size is greater than 8,000 the data will be stored ‘elsewhere’ and a pointer to the data is stored in the column.  When this happens a SELECT statement will return pointers to the data and only when we directly access the column in our code, the data string is retrieved.  This indirect access can be costly. Note:  to be sure that string of 8,000 characters or less are stored ‘in line’ you must set a TABLE option on the SQL table containing your varchar(max) column.  The format of this T-SQL command is”

sp_tableoption N’YOURTABLENAMEHERE’, ‘Large Value Types Out Of Row’, ‘OFF’

If this option is ‘ON’ even strings equal to or smaller than 8,000 characters will be stored ‘elsewhere’.  Got it? Good.

     We can make one other generalized optimization.  From Framework 2.0 on, GZIP has been part of the base system underneath C#.  We can compress the XML string to be stored in GZIP format as a small cpu cost cost.  The GZIP code is compact and simple but very specific.  The basic program flow is:

Object => XML String => byte array => GZIP compression => byte array => Base64 Encoded String

Note that GZIP works on byte arrays not strings.  We need a string to push to SQL Server, the only portable safe conversion from a compressed byte array to string to call the Framework function to create a Base64 encoded string. Retrieval reverses this process:

Base64 Encoded String => byte array => GZIP decompression => byte array => XML string

I am seeing compressions like these at a cpu time cost of  less than 10 ms on my desk top:

XML String Size

Compressed Bytes

Base64 Encoded String

6,603

964

1,288

25,414

2,532

3,376

42,495

4,075

5,436

116,422

8,500

11,336

The maximum string that GZIP will compress is 4GB, so we will hit the SQL Server limits long before the GZIP limit.

Here is the code (condensed from many examples on the web).

Standard Libraries:

using System.IO;
using System.IO.Compression;

Compression:

public static byte[] Compress( byte[] bytes )
{
using ( MemoryStream ms = new MemoryStream( ) )
{
using ( GZipStream gs = new GZipStream( ms, CompressionMode.Compress, true ) )
{
gs.Write( bytes, 0, bytes.Length );
}
ms.Position = 0L;
return ToByteArray( ms );
}
}

public static byte[] ToByteArray( Stream stream )
{
int count = 0;
List<byte> result = new List<byte>( );
try
{
byte[] buffer = new byte[0x20000];
int bytes = 0;
while ( ( bytes = stream.Read( buffer, 0, 0x20000 ) ) > 0 )
{
count += bytes;
for ( int i = 0; i < bytes; i++ )
{
result.Add( buffer[i] );
}
}
}
catch ( Exception ex )
{
//log error and re-throw

               throw ex;
}
return result.ToArray( );
}

Decompression:

public static byte[] DeCompress( byte[] bytes )
{
byte[] result;
using ( MemoryStream ms = new MemoryStream( ) )
{
ms.Write( bytes, 0, bytes.Length );
using ( GZipStream gs = new GZipStream( ms, CompressionMode.Decompress, true ) )
{
ms.Position = 0L;
result = ToByteArray( gs );
}
}
return result;
}

Wrapper Methods:

         /// <summary>
/// GZip Compression Wrapper
/// </summary>
///regular string (not Base64 Encoded)
/// Base64 Encoded Compressed String
public static string Compress( string sin )
{
byte [] cBytes=Compress( Encoding.Unicode.GetBytes( sin ) );
return Convert.ToBase64String( cBytes );
}

       /// <summary>
/// GZip DeCompression Wrapper
/// </summary>
///sin64″>64BitEncoded Compressed String
/// <returns>UnCompressed Regular String</returns>
public static string DeCompress( string sin64 )
{
byte [] cBytes2=Convert.FromBase64String( sin64 );
byte [] ucBytes=DeCompress( cBytes2 );
return Encoding.Unicode.GetString( ucBytes );
}

 

The bottom line?  SQL Server will work as a key/value object store but:

( 0 ) A lot of hand tooling is required; and

( 1 ) It is not blazingly fast for larger objects (larger than 8,000 bytes);  and

(2) There are size limits which must be watched closely; and

(3) A lot of translation is involved between the raw object and what is ultimately stored.

None of these issues are fatal but it makes one wonder if couchDB or some other key/value store wouldn’t be a whole lot easier to work with.

One response to “SQL Server As An Object Store – Not the Best but…

Subscribe to comments with RSS.

  1. Pingback: My Journey Into Microsoft Access and WordPress

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: