Archive for the ‘Programming’ Tag

SQL Server As An Object Store – Not the Best but…   1 comment

     In a prior post I discussed using SQL Server as tool to store C# objects.  There are two downsides to this approach, the need to manage object size and the need to optimize retrieval times.  The basic approach is to use a key/value column layout to store the data.  We first define a column to hold the object key.  Then we need to define any other columns which will be used to aid in retrieval of object ( lets call this metadata).  In my case I am primarily interested in retrieving objects based on Key and a date range.  To optimize this we used an integer column for the id and made a primary key on this column.  I store date not as the SQL Server datetime object, which is too complex and too slow for our purposes. The simple and typical solution is to use a fixed width character format ( char(8) ) and store the date as a ISO string in the form YYYYMMDD.  This is simple and sorts in expected manner without fuss or muss. In SQL server additional indexes can be created on the metadata if needed) 

     Having defined both the key and meta data we are left with the question of how to define the column to hold the XML serialized form of the C# object.  Since we don’t know the maximum size of the XML string derived from the objects we need something which will flex as the size grows.  Currently SQL server has a maximum size for a variable length string of 8,000.  If all of our object will be 8,000 characters or less we could define the object column as: varchar(8000).  If the object size could grow beyond this maximum will need to use what SQL Server calls:  varchar(max).  In this case, at this time, varchar(max) can support up to 2,147,483,647  ( 2 to the 34th power if you are checking).  This is good and will serve for most objects.

     We need to turn additional optimization.  Looking at the (rather poorly written) SQL Server documentation on varchar(max) we find that if the string to be stored (our object serialized as an XML string) is 8,000 bytes it can be stored ‘in line’ with the data row.  If the size is greater than 8,000 the data will be stored ‘elsewhere’ and a pointer to the data is stored in the column.  When this happens a SELECT statement will return pointers to the data and only when we directly access the column in our code, the data string is retrieved.  This indirect access can be costly. Note:  to be sure that string of 8,000 characters or less are stored ‘in line’ you must set a TABLE option on the SQL table containing your varchar(max) column.  The format of this T-SQL command is”

sp_tableoption N’YOURTABLENAMEHERE’, ‘Large Value Types Out Of Row’, ‘OFF’

If this option is ‘ON’ even strings equal to or smaller than 8,000 characters will be stored ‘elsewhere’.  Got it? Good.

     We can make one other generalized optimization.  From Framework 2.0 on, GZIP has been part of the base system underneath C#.  We can compress the XML string to be stored in GZIP format as a small cpu cost cost.  The GZIP code is compact and simple but very specific.  The basic program flow is:

Object => XML String => byte array => GZIP compression => byte array => Base64 Encoded String

Note that GZIP works on byte arrays not strings.  We need a string to push to SQL Server, the only portable safe conversion from a compressed byte array to string to call the Framework function to create a Base64 encoded string. Retrieval reverses this process:

Base64 Encoded String => byte array => GZIP decompression => byte array => XML string

I am seeing compressions like these at a cpu time cost of  less than 10 ms on my desk top:

XML String Size

Compressed Bytes

Base64 Encoded String

6,603

964

1,288

25,414

2,532

3,376

42,495

4,075

5,436

116,422

8,500

11,336

The maximum string that GZIP will compress is 4GB, so we will hit the SQL Server limits long before the GZIP limit.

Here is the code (condensed from many examples on the web).

Standard Libraries:

using System.IO;
using System.IO.Compression;

Compression:

public static byte[] Compress( byte[] bytes )
{
using ( MemoryStream ms = new MemoryStream( ) )
{
using ( GZipStream gs = new GZipStream( ms, CompressionMode.Compress, true ) )
{
gs.Write( bytes, 0, bytes.Length );
}
ms.Position = 0L;
return ToByteArray( ms );
}
}

public static byte[] ToByteArray( Stream stream )
{
int count = 0;
List<byte> result = new List<byte>( );
try
{
byte[] buffer = new byte[0x20000];
int bytes = 0;
while ( ( bytes = stream.Read( buffer, 0, 0x20000 ) ) > 0 )
{
count += bytes;
for ( int i = 0; i < bytes; i++ )
{
result.Add( buffer[i] );
}
}
}
catch ( Exception ex )
{
//log error and re-throw

               throw ex;
}
return result.ToArray( );
}

Decompression:

public static byte[] DeCompress( byte[] bytes )
{
byte[] result;
using ( MemoryStream ms = new MemoryStream( ) )
{
ms.Write( bytes, 0, bytes.Length );
using ( GZipStream gs = new GZipStream( ms, CompressionMode.Decompress, true ) )
{
ms.Position = 0L;
result = ToByteArray( gs );
}
}
return result;
}

Wrapper Methods:

         /// <summary>
/// GZip Compression Wrapper
/// </summary>
///regular string (not Base64 Encoded)
/// Base64 Encoded Compressed String
public static string Compress( string sin )
{
byte [] cBytes=Compress( Encoding.Unicode.GetBytes( sin ) );
return Convert.ToBase64String( cBytes );
}

       /// <summary>
/// GZip DeCompression Wrapper
/// </summary>
///sin64″>64BitEncoded Compressed String
/// <returns>UnCompressed Regular String</returns>
public static string DeCompress( string sin64 )
{
byte [] cBytes2=Convert.FromBase64String( sin64 );
byte [] ucBytes=DeCompress( cBytes2 );
return Encoding.Unicode.GetString( ucBytes );
}

 

The bottom line?  SQL Server will work as a key/value object store but:

( 0 ) A lot of hand tooling is required; and

( 1 ) It is not blazingly fast for larger objects (larger than 8,000 bytes);  and

(2) There are size limits which must be watched closely; and

(3) A lot of translation is involved between the raw object and what is ultimately stored.

None of these issues are fatal but it makes one wonder if couchDB or some other key/value store wouldn’t be a whole lot easier to work with.

REST, Ruby On Rails, CouchDB and Me – Part 0   4 comments

Part 0 – This Post

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 – CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 7 – JQUERY,JPlayer and HTML5

The is the first in asmall series of blog posts concerning what is often known as as Open Source Stack for Web Development.  My interests here are in WEB 2.0 development.  We will be working towards using JQUERY and CouchDB to develop REST(ful) Web sites.  I will start with a focus on Rudy and Ruby on Rails (aka RadRails).   However, I will be using free and open source products not on UNIX but on Windows based systems.  Windows runs on fairly cheap hardware (compared to Apples UNIX based platforms) and Windows is much more accessible to new developers than Linux systems.  I will be working here with Windows XP and Windows Vista but you can play alone on Linux and MAC x86.  To get started let’s get together some software and supporting documentation and begin.

Ruby

Version 1.8.7 is the current stable release but since we are interested in using CouchDB I have installed the required ‘edge’ version 1.9.2.  Either of these version will support Ruby on Rails development. Ruby installations are available from several sites. Until you know exactly what you are doing stick with distributions from RubyInstaller.org since these are the baseline Ruby builds which everyone tests against first.  As you get deeper into it (or if  things get a little more sticky)  you try an integrated open stack like the one offered by BitNami.  At some point you may need to install the DevKit.  This is not as heavy as it sounds (but you will need to work with your system paths to get everything in place).  You can get the DevKit here and installation instructions are here.   Note:  the devkit download is now a Windows MSI file but the instructions are still based on a ZIP file extraction, but it all works trust me.  Don’t install either the BitNami Stack or the DevKit until you know you need it.   If you completely new to Ruby a nice introduction can be found in Why’s Poignant Guide To Ruby.

Ruby On Rails

There are a lot of different options to developing Ruby on Rails applications.  If I was not an IDE freak I could use  a simple text editor (I often use the non-open Primal Script) but being new to Ruby on Rails I wanted the support of and IDE.  For this option I selected the Aptana RadRails Development Environment.  This free development environment is based on the open Eclipse editor.  I downloaded the base Aptana IDE (Studio 2) and then added RADRails as a plug in.  These are available in Linux and Mac x86 installers in addition to windows.  You could install only RADRails but then you would have a crippled version of the Aptana product.  We will be noting Ruby on Rails training materials as we move along.

CouchDB

Although we will be using mySQL or SQLlite for some of our development, our real content manager will be the NOSQL couchDB.  This is our real goal in the project – testing the viability  of a REST(ful) HTTP addressable NOSQL database.  This project is from the Apache organization and is available for Linux, MAC x86 and Windows.  It runs as a service on the Windows OS.  The Windows installer is available here.  There is an excellent open source on line couchDB book available here.  For Ruby on Rails we will be using – CouchRest for our Ruby on Rails work with couchdb.  CouchRest is available as a ruby GEM and can be installed on Linux, MAC x86 and Windows versions of Ruby.

JQUERY

Web 2.0 is not possible with out modern, sophisticated JavaScript libraries.  One of the best of these is JQUERY and not surprisingly couchDB ships with a powerful jquery library (couchdb.js) to facilitate browser side manipulation of couchdb data.  For browser work you should be pulling your JQuery from CDN Host.  For use within Ruby for Rails project you will need to add JQUERY  support to your Aptana IDE.  JQuery works with all modern browsers (and even some which not).PacalII

%d bloggers like this: