Archive for the ‘couchdb’ Category

Hey Flickr, Where Did My Statistics Go? The CouchBase Connection. Part III   1 comment

This is the third post in this series on how to harvest statistical data from your (or a friend’s) Flickr Picture View
data.  The series begins
here.  Today we are looking at CouchBase as a noSQL database to store our Flickr data.  This post will get as far as getting the shell of a console application up and will defer example code samples for the next blog post.

CouchBase  iscouchbase a commercialized version of the public domain project
Apache CouchDB.  CouchDB is open source and CouchBase is not.  Both support API libraries for .Net and Java.  Commercial development with CouchBase is NOT free.  The CouchDB wiki lists five active C# libraries for CouchDB.  CouchBase supports a many API libraries including .Net and Java.  I have written about CouchDB and Ruby in a prior series of posts which can be found here. Both systems support multi-server nodes and concurrency controls.  Neither of these features will be touched on in the post.  Our focus here will be on an introduction to the minimum necessary administration skills and API coding to help us with our goal of storing information about Users, Photos and View Counts through time.  Along the way we will also discuss JSON Serialization / Deserialization using Newtonsoft.JSON, open source application Logging with NLog.  I will defer the discussion of CouchBase views for a subsequent post.

Data Model Overview.

Ultimately we want to store information about each User. For each user we will store information for one or more Photo and for each Photo, information on one or more View Counts.  Users and Photos have their own Primary Key, supplied as an ID field from Flickr.  Our view counts will be collected each day and the Primary Key of the Counts is represented by the date the view count data was collected.  This could be modeled into a traditional RDBMS in third normal form, but this pattern is also most naturally represented as a nesting of lists of objects within container objects.  Rather than say we are storing nested objects it is more typical today to say that this data can be thought of as a structured Document.  The most natural way to store and access this data is by simple (or compounds of ) primary keys.  When we get to the point where we are doing manipulation of the data for statistical analysis and summary our most natural mode of access will be by a key composed of the User ID and Photo ID and iterating there view counts by Date ID (or Date ID range).  A very simple way to model this is with a Key / Value noSQL database based on document storage (aka a Document Store).  We could call this design an object oriented database model but that would be old fashion.  Here is the visual of the data model:image

The full Document could be represented as a compound C# object:

   class CObject
public CUser User { get; set; }
public List<CPhoto> Photo { get; set; }

public class CUser
public string FullName { get; set; }
public string Username { get; set; }
public string UserId { get; set; }
public string APIKey { get; set; }
public string SharedSecret { get; set; }
public string Token { get; set; }
public string TokenSecret { get; set; }


public class CPhoto
public string ID { get; set; }
public string Title { get; set; }
public string ThumbnailURL { get; set; }
public List<CView> Views { get; set; }

public class CView
public string Date { get; set; }
public int Views { get; set; }

In this post we will setup a single server CouchBase instance and develop a single user application to manipulate documents in a CouchBase “bucket”.  We will not model the complete object in this post but deal with a simplified version of Photo Object while we get our feet wet on CouchBase CRUD operations and simple CouchBase server Administration.  To make things as simple as possible, for this post, we will be working only with a modified version of the CPhoto object (Document).


Getting The Stuff You Need.

Shopping List

Setting up a single node Windows CouchBase Server simple and basic administration is easy and fun. Download and run the installation of CouchBase from here. Fred willing all will go well and you will be placed at local page in your default browser. Bookmark this page and note the Port Number that has been assigned to the default instance of CouchBase.  On first use you will need to create an administrator Username and Password. I left the defaults alone for the Cluster and Server Nodes. Select the Tab Data Buckets.  You will need to decrease the Quota Usage limits for the default Bucket.  With the space you freed up, create a new Bucket called “DLR.Flickr.Example1” .  Here is what mine looks like:


And Here is the Bucket Settings Page:


OK.  Now take some time and review the documentation for the .Net SDK here.  You can read through or code along with the examples given there. Done? Now let’s get to work.

Starting to Code CouchBase

Open Visual Studio and select Tools/NuGet Package Manager/Package Manager Console and enter the command:

Install-Package CouchbaseNetClient

Create a new Visual Studio Console application.  I called mine:

DLR.CCDB.ConsoleApp and set the default namespace to DLR.CCDB.  Add references to:




[ If you can not resolve Newtonsoft.Json:  Right click on the root of the project and select: Manage NuGet Packages.  Search on Newtonsoft.Json.  Select Install on JSON.Net.  Now try adding the Newtonsoft reference again.]

Now is a good time to add the open source Logging solution to your project.  Select: Manage NuGet Packages.  Search on NLOG. Install both  NLog and NLog Configuration.

Open your App.Config project file.  You will need to make several changes.  Here is what mine looks like after the changes.

Red items are added manually by me (you) and the Blue entries are added by the NuGet Package manager during the sets you followed above.

<!–?xml version=”1.0″ encoding=”utf-8″?>
Couchbase.Configuration.CouchbaseClientSection, Couchbase” />
<servers bucket=”DLR.Flickr.Example1″ bucketPassword=””>
uri=”” />
<supportedRuntime version=”v4.0″ sku=”.NETFramework,Version=v4.5″ />
<assemblyBinding xmlns=”urn:schemas-microsoft-com:asm.v1″>
<assemblyIdentity name=”Newtonsoft.Json” publicKeyToken=”30ad4fe6b2a6aeed” culture=”neutral” />
<bindingRedirect oldVersion=”″ newVersion=”″ />
<assemblyIdentity name=”Enyim.Caching” publicKeyToken=”05e9c6b5a9ec94c2″ culture=”neutral” />
<bindingRedirect oldVersion=”″ newVersion=”″ />
<assemblyIdentity name=”NLog” publicKeyToken=”5120e14c03d0593c” culture=”neutral” />
<bindingRedirect oldVersion=”″ newVersion=”″ />

We are most interested in this section:

<servers bucket=”DLR.Flickr.Example1″ bucketPassword=””>
uri=”” />



This sets your default API calls to the bucket “DLR.Flickr.Example1” which you created above.  Although we will not develop the theme here you can override the default bucket during runtime to deal with calls to multiple buckets in the same program.


This sets your local node. the is a constant for development projects (localhost) and the 8091 is the port assigned to CouchBase during installation (double check this value on your system by navigating to the CouchBase Console page you added to your favorites list above.

While we are here let’s make some changes (without explanation why at this point) in NLog.Config (which was created when you installed NLog above).  Replace the entire contents of the file with (mind the wrap):

<!–?xml version=”1.0″ encoding=”utf-8″ ?>
<nlog xmlns=””
<target name=”debugfile” xsi:type=”File” layout=”${message}”  fileName=”C:/temp/DLR.Flickr/Debug.txt” archiveNumbering=”Rolling”  archiveEvery=”Day” maxArchiveFiles=”7″ ConcurrentWrites=”true”                        />
<target name=”logfile” xsi:type=”File” layout=”${message}”    fileName=”C:/temp/DLR.Flickr/Info.txt”  archiveNumbering=”Rolling”  archiveEvery=”Day” maxArchiveFiles=”7″ ConcurrentWrites=”true”                      />
<target name=”Errorsfile” xsi:type=”File” layout=”${message}” fileName=”C:/temp/DLR.Flickr/Error.txt” archiveNumbering=”Rolling”  archiveEvery=”Day” maxArchiveFiles=”7″ ConcurrentWrites=”true”                   />
<target name=”Fatalfile” xsi:type=”File” layout=”${message}”  fileName=”C:/temp/DLR.Flickr/Fatal.txt” archiveNumbering=”Rolling”  archiveEvery=”Day” maxArchiveFiles=”7″ ConcurrentWrites=”true”                    />
<logger name=”*” minlevel=”Debug” maxlevel=”Debug” writeTo=”debugfile” />
<logger name=”*” minlevel=”Info” maxlevel= “Fatal” writeTo=”logfile” />
<logger name=”*” minlevel=”Error” maxlevel=”Fatal” writeTo=”Errorsfile” />
<logger name=”*” minlevel=”Fatal” maxlevel=”Fatal” writeTo=”Fatalfile” />

We will get back to the details of this configuration file in the next post.

Write the minimum test code possible. 

Replace the contents of Program.cs with

using System;

using Couchbase;

using Enyim.Caching.Memcached;

using Newtonsoft.Json;

using NLog;

namespace DLR.CCDB.ConsoleApp

class Program
static void Main(string[] args)
CouchbaseClient client=new CouchbaseClient();




Build and run.  You should have no errors and client should not equal null after the call

CouchbaseClient client=new CouchbaseClient();

Success?  Take a break and we will continue next week.


Ramp Up for QCon 2012 San Francisco   Leave a comment

I am getting ready to attend QCON 2012 in San Francisco, this will be my third QCon SF. You can read my posts for the 2011 edition here and here.  I enjoy QCon and find the tutorials especially useful.  But let me say right off that I don’t attend on my own dime.  In the past I have attended QCon and Microsoft’s MIX conference.  Sadly, Microsoft has decided to kill MIX, this is bad for Microsoft and for the developer community.  I have written posts about MIX here.  Now that MVC 4 is open sourced, one wonders why Microsoft corporate has killed MIX.

QCon is run by the same folks you put out the web magazine InfoQ.  For my money (and its not) in the past QCon schedule is a little long on Agile and Java and not strong enough on JavaScript libraries. The 2012 tracks rectify this problem somewhat. What QCon is strongest is for its embrace of open and free (or almost free) technologies to power the web next year.  This years tracks include a reprise of “Cross Platform Mobile”, “Programmable Web”, “No SQL” (on all three days) and a bevy of tracks on hard core web development ( “Real Time Web” , “UX”, “Taming HTML5 and JavaScript” and “Dynamic Languages for the Web” plus the usual cast of characters: Java, Agile and what looks like a strong three day track of end to end “solutions”. Strangely JQuery is not featured anywhere in this conference.

The tutorials include a full day session on cross platform development  using Phonegap.  Robinson and Weber who presented an excellent and well attended session on RESTful Development will return with a Neo4j programming class.  With excellent timing Peter Bell will present a four hour tutorial on CoffeeScript.  If there was a MIX 2012, they would be talking about how MVC4 is supporting CoffeeScript (sigh).  Track and tutorials can be found here and here.  Note the venue has also changed this year from the centrally located, and very urban, Union Square area of SF to the more up scale Embarcadero area. For folks who like to explore the city, as opposed to pub crawling, this is a bummer.  But it is ocean view and close to the (now departed) Occupy SF site.

Well, that’s the basic facts, now the question is: Should YOU attend QCon 2012?  The best part of QCon is that it is not a standard vendor conference and (except for Agile) nobody is trying to sell you anything.  The attendee’s tend to be working programmers from startups in the SF area and from Europe which helps the sessions to be very focused on what really works. for programmers and startup firms.  There is no focus on mega systems like Oracle products or SharePoint.  In this environment, couchdb (and this years bad boy Neo) are about as establishment as you get. QCon has traditionally been very friendly towards NO SQL databases and has been I consistent good sorce for information on this topic.  This year promises to continue in this trend.   I regularly avoid pure vendor sessions so of the three days of regular conference sessions I can look forward to about 2.5 days of good sessions.  The tutorials that I have attended are top notch and honest.  I can not evaluate the daily keynote speeches since I am normally sleep in past these.  Attendees have told me the keynote party and mixer is nice but I am always in the Haight district when this happens on Wednesday night.  If the boss is paying and you are coding the web and web devices you must attend this conference.  If it’s your dime….

I will be bloging QCon 2012 on my Nexus 7

My (Virtual) Year On Tour With the Grateful Dead   1 comment


From Tape to The Internet

Crimson flames tied through my ears
Rollin’ high and mighty traps
Pounced with fire on flaming roads
Using ideas as my maps
“We’ll meet on edges, soon,” said I
Proud ’neath heated brow
Ah, but I was so much older then
I’m younger than that now

My Back Pages by Bob Dylan

I first heard the Grateful Dead live in Albuquerque in 1971.  Like many, this was a seminal experience for me, changing my understanding of the meaning of performance and of Rock and Roll.  That over weight, black clad, Prankster with a halo of unmanageable black hair playing guitar was clearly the center of the band and the performance.  I was young and was unaware that this was Captain Trips.   Captain America was more like it.  Nor was I aware of the mythical aura that was even then growing around the band and Jerry and their fans.  Like many, my first experience of the Dead was prepared only by my youth and The Bear’s  purple haze of the night.  In the intervening years I was more attracted to Frank Zappa who produced a consistent recording experience that the Dead never would achieve. After the (limited) success of the Grateful Dead Movie in capturing what the Dead were, the world moved on. But the band played on.  Perhaps it is better that way.  The early taping of Dead shows from  the sound boards (thank you Owsley) and later by dedicated deadhead tapers left us with a rich vein of music and magic in the over 3000 individual live performances available in one form or another.  Beginning in the last decade of the last century, the Dead organization began to issue live sound board recordings from this corpus. 

Dick’s Picks and Me

Half-wracked prejudice leaped forth
“Rip down all hate,” I screamed
Lies that life is black and white
Spoke from my skull. I dreamed
Romantic facts of musketeers
Foundationed deep, somehow
Ah, but I was so much older then
I’m younger than that now

While a great resource for those of us not conversant in bit torrent, these releases where frequently expensive and (to my ears) over produced which subtracted from the raw energy of the original sound board tapes.  I am lucky to have access to a great music store with plentiful numbers of used Grateful Dead CD.  If you are in Maine visit a local Bull Moose Music store.   I was luck to be able to pick up many Live Dead concert CDs at a reasonable price.  Thank you Bull Moose.  Although the official Dead releases enhanced my life, there are some problems:

I) The list prices are quite high (try to get used copies)

II) Selections for the Dick’s Pick Series seem to be primarily based on best complete shows (The Dead were often hit and miss in the same night and limiting yourself to the best complete show skips a lot of great music. This problem has been reduced by the newer Road Trips Series and specialty releases like Ladies and Gentlemen… the Grateful Dead which cooks down the best of a four night stand at the Fillmore, 1971).

III) The processing of the raw tapes, IMHO, sometimes cook the life out of some the releases.

IV) Some of the specialty releases seem to be picked more for the historical importance than the quality of the performance (Closing the Winterland, for example).

Having said all that, if you can get the official releases used, some of them are great.

A Short Divergence in Our Story

Girls’ faces formed the forward path
From phony jealousy
To memorizing politics
Of ancient history
Flung down by corpse evangelists
Unthought of, though, somehow
Ah, but I was so much older then
I’m younger than that now

I started to cooked down my copies of the official live releases into playlist CDs (favorites of 1974, Dark Star releases, etc.).  Then I had open heart surgery, caught a post operative wound infection and almost died.  Stephen Gould wrote someplace that the greatest species in evolution are Bactria.  They are everywhere.  There are more Bactria in your body than body cells.  And I was in the three month war between the bugs and myself ( to be honest I had massive antibiotic infusions on my side).  My day was composed of pain medicine, James Joyce’s Ulysses, The Bible, and my CDs of live Dark Star performances.  Let’s just say that Dark Star and the Gospel of Mark were more significant than Tramadol in my recovery.   Rehab consisted of countless hours of treadmill work.  That, and an MP3 player packed with Scarlet Begonia and Fire on the Mountain.

BTW: Tom Constanten said somewhere that they didn’t play Dark Star, it was always going on,  they just joined in.  Although T.C. recommends ‘any East Coast Dark Star’ my favorites are early West Coast versions.

The Internet Archive Connection

In a soldier’s stance, I aimed my hand
At the mongrel dogs who teach
Fearing not that I’d become my enemy
In the instant that I preach
My pathway led by confusion boats
Mutiny from stern to bow
Ah, but I was so much older then
I’m younger than that now

The Internet Archive, in early 2000 began collecting, digitalizing and making available for re-distribution the large body of Grateful Dead concerts made by independent tapers and sound board recordings (SB) which were in circulation.  By policy SB recordings are available for playing on the web site and non-SB recordings are available for downloading.  There are multiple recordings available for most shows and these vary in quality from commercially releasable to barely audible.  There are over 8,000 individual recording of about 1,900 shows.  About 1,000 of these are SB. While vast the Internet Archive is not the most accessible site. Like most people I started with the feature of the Grateful Dead collection called:

Click Me:

 Grateful Dead Shows on This Day In History 

(If you have never been there – try the link right now).

For 08-30 (today while I am writing this) the Internet Archive will display 30 recordings (for shows of this date in 1985, 1983, 1981, 1980, 1970 and 1969).  There are limited sort options of these results.  Selecting a given recording brings one to a new web page containing an online player and (if the show is not an SB) download options.  I was hooked on the musical possibilities but trapped  by the limited user interface of the Internet Archive.   I wanted more.  Much more…

Hacking The Internet Archive

A self-ordained professor’s tongue Too serious to fool 
Spouted out that liberty
Is just equality in school
“Equality,” I spoke the word
As if a wedding vow
Ah, but I was so much older then
I’m younger than that now

My goal was to have the ability to listen to ALL of the Dead’s concerts but using only the best recordings and to be able to move through the collection using a better user interface which would allow me to decide where and when to go to any individual date. My goal was to spend a year and at least sample all 1,900 concerts and listen completely to all SB concert recordings.  I decided to complete this project in 12 months.  To do this I would first need to wrestle the Internet Archive (IA) to its knees.  Little did I know that this would take me on a programming journey involving three programing languages (Ruby, Javascript and C#), two data specifications (XML and JSON), two database engines (couchdb and SQL Server) as well as understanding the (somewhat loosely documented) search engine of IA, and more….  Readers interested in the technical details should see my series of postings on Ruby on Rails and CouchDB.  Please see Part 5 has the details of how to hack the Internet Archive to get at the data for the Grateful Dead recordings on IA.  Thus armed with the complete dataset from the Internet Archive of Grateful Dead recordings and a new front end I was ready to begin my listening  project.  There are over 8,000 recordings of over 2,000 concerts on the Internet Archive.  My first cut on the recordings in to use an algorithm to select ONE recording for each recording date for review.  This is a very simple selection based on the first of:

  • Was processed by Charlie Miller (IMHO the BEST processor of Grateful Dead Tapes)
  • Is a Sound Board Recording
  • Is  a Matrix Recording
  • Is the most recently posted tape for a given date.

Does this process miss some gems? Undoubtedly but it did give me 2,000 tapes to review rather than 8,000. With this criteria in place,  my local copy of the IA database and my own UI for IA I started listening in July, 2011.  I did not attempt to listen to all 2,000 recordings completely.  If a recording was of poor quality or the band was out of tune or Jerry was ‘uninspired’ I abandoned the tape have brief samples of my favorite tunes.  In the end I reviewed about 1,000 concerts by in thirteen months (I finished during the ‘days between’ period;[August 1 and  August 9]).  I ended up with about 475  concerts on my personal playlist of ‘greatest concerts’.  Along the way I wrote several reviews on this blog of concerts which I thought were particularly of note. and compiled hyperlinked list of shows by year (the series starts here) and hyperlinks to Dark Star concerts and Scarlet Begonia –> Fire on the Mountain concerts.  All of these blogs contain links to jump right into the concert within the Internet Archive (but you still need to use the IA music player however).  Do I have a favorite sequence of songs, a favorite concert, a favorite era.  Yes.  Am I going to tell you? No.  Dig in visit the Internet Archive and start listening.  It could save your life.


Days Between Grateful Dead

and there were days
and there were days I know
when all we ever wanted
was to learn and love and grow
Once we grew into our shoes
we told them where to go
walked halfway around the world
on promise of the glow
stood upon a mountain top

walked barefoot in the snow
gave the best we had to give
how much we’ll never know we’ll never know

Days Between by Garcia and Hunter





Was It Worth The Trip?




                                                               To Bear and Captain Trips, we say Thank You and Rest In Peace.

All photos by cloud2013 except Bear and Captain Trips Credit: Rosi McGee 


PS: Stupid Grateful Dead Statistics From the Internet Archive Database

Top 12 Most Played By Era (excluding Space and Drums):

Title 1967-1971 1972-1978 1979-1990 1991-1995
Althea     *  
Big River   *    
Brown Eyed Women   *    
Casey Jones *      
Cassidy     *  
China Cat Sunflower *      
Corrina       *
Crazy Fingers       *
Cryptical Envelopment *      
Cumberland Blues *      
Dark Star *      
Deal   *    
El Paso   *    
Estimated Prophet     *  
Eyes Of ThWorld       *
Good Lovin *      
Hard to Handle *      
I Know You Rider *   *  
Jack Straw   *    
Lazy River Road       *
Little Red Rooster     *  
Looks Like Rain     *  
Me and My Uncle *      
Mexicali Blues   *    
Not Fade Away * * * *
Playing In ThBand   * *  
Sugar Magnolia   * * *
Sugaree   *    
Tennesse Jed   *    
Terrapin Station       *
The Other One     *  
Throwing Stones       *
Truckin   * *  
Turn On Your Lovelight *      
Uncle Johns Band *     *
Wang Dang Doodle       *
Way To Go Home       *
Wharf Rat     *  
When I Paint My Masterpiece       *

Internet Archive:  All Recordings and Sound Board Recordings


Concert Length


Song Counts By Year(Dark Star, Playin’ in the Band and Scarlet Begonia –> Fire On The Mountain)


QCON 2011 San Francisco and Occupy California   2 comments

Let me say write off that I do not pay for my own ticket to QCON, my boss picks up the tag.  I love QCON.  It is definitely not MIX. I go there to see what is happening in the world which 6439629043_9a7e84a2bd_z is NOT Oracle and Not Microsoft.  That’s the same reason I read their online Zine: InfoQ.   QCon always provides a look at what is current and recent in the open stack world.  This year we looked closely at REST, Mobile development, Web API and NOSQL. As they did last  year QCON provides a nice look at what is open and emerging.  Big metal with always be with us but the desk top is looking6373613127_9780c7d60f very weak during the next few years while Mobile devices of all kinds and makers are exploding.  The biggest fall out is that while HTML5 is only slowly emerging on desktops in place, all new Mobile devices (which is to say most new systems) will be fully HTML5 compliant.  Not only that but with the exception of Windows Phones, the rendering engine for all mobile devices is based on WebKit.  What this mean for those of us in the cubes is that worrying about how to bridge to pre-HTML5 browsers with HTML5 code is a non-issue.  Mobile development is HTML5 development.  The big metal end of the supply chain is being segmented into Web API servers (which service JSON XHR2 data calls) and the NOSQL engines which serve the WEB API farms.  Remember a native mobile app     ideally has pre-loaded all of its pages its interactions are solely over JSON XHR2 for data (be it documents, data or HTML fragments).  The traditional JSP or ASPX web server is not really in play with native mobile apps and has and increasingly small role to play in “native like” or browser based mobile apps.  Let’s move on.

“IPad Light by cloud2013”

Speaking of moving on: There is an occupation going on in this country.  I visited occupations sites in San Francisco, UCal Berkeley and  Berkeley “Tent City”.  These are all very active and inspiring occupy sites.  Now if we can only get to Occupy Silicon Valley! 

I attended the REST in Practice tutorial this year and it was a very nice.  The authors were well informed and the agenda comprehensive.  I personally like the Richardson maturity model but think that people are not facing up to the fact that level three is rarely achieved in practice and the rules of web semantics necessary to interoperate at level 3 are almost non-existent. Remember the original REST model is client/server.  The basic model is a finite state machine and the browser (and the user) are in this model required to be dumb as fish.  Whether Javascript is a strong enough model and late binding semantics can be made clear enough to pull off level three is really an open question which no one has an answer to.  If we forget about interoperability (except for OAuth) things start to fall into place but we thought OPENNESS was important to REST.

Workshop: REST In Practice by the Authors: Ian Robinson & Jim Webber

Why REST? The claims:

· Scalable

· Fault Tolerant

· Recoverable

· Secure

· Loosely coupled6439625819_5705585c80

Questions / Comment:6380018433_9172323197

Do we agree with these goals?

Does REST achieve them?

Are there other ways to achieve the same goals?

REST design is important for serving AJAX requests and AJAX requests are becoming central to Mobile device development, as opposed to intra-corporate communication. See Web API section below.

Occupy Market Street (San Francisco)            

The new basic Document for REST: Richardson Maturity Model (with DLR modifications)

Level 0:

One URI endpoint

One HTTP method [Get]


Level 1:

Multiple URI,

One HTTP Method [Get]

Century Level HTTP Codes (200,300,400,500)

Level 2:

Multiple URI,

Multiple HTTP Methods

Fine Grain HTTP Codes (“Any code below 500 is not an error, it’s an event”)

URI Templates

Media Format Negotiation (Accept request-header)

Headers become major players in the interaction between client and server

Level 3:  The Semantic Web

Level 2 plus

Links and Forms Tags (Hypermedia as the engine of state)

Plus emergent semantics

<shop xmlns=”;






<link rel=”self” href= type=”application/restbucks+xml”/>

<link rel=”rb:order-form” href=”″ type=”application/restbucks+xml”/&gt;



Think of the browser (user) as a finite State Machine where the workflow is driven by link tags which direct the client as to which states it may transition to and the URI associated with each state transition.6380028389_e64c6a826f

The classic design paper on applied REST architecture is here: How To GET a Cup Of Coffee. Moving beyond level 1 requires fine grain usage of HTTP Status Codes, Link tags, the change headers and media type negotiation. Media formats beyond POX and JSON are required to use level 3 efficiently (OData and ATOM.PUB for example).

Dude, where’s my two phase commit? Not supported directly, use the change headers (if-modified, if-non-match, etag headers) or architectural redesign (redefine resources or workflow). Strategic choice is design of the finite state machine and defining resource granularity.


(Slide from Rest in Practice)

Architectural Choices:

The Bad Old Days: One resource many, many ‘verbs’.

The Happy Future: Many, many resources, few verbs.

The Hand Cuff Era: Few Resources, Few verbs.

The Greater Verbs:

GET: Retrieve a representation of a resource

POST: Create a new resource (Server sets the key)

PUT: Create new resource (Client sets the key); ( or Update an existing resource ?)

DELETE: Delete an existing resource

Comment: The proper use of PUT vs. POST is still subject to controversy and indicates (to me) that level 3 is still not well defined.

Typically they say POST to create a blog entry and PUT at append a comment to a blog. In Couchdb we POST to create a document and PUT to add a revision (not a delta) and get back a new version number. The difference here is how the resource is being defined, which is an architectural choice.


The Lesser Verbs:

OPTIONS: See which verbs a resource understands

HEAD: Return only the header (no response body)

PATCH: Does not exist in HTML5. This would be a delta Verb but no one could agree on a specification for the content.  Microsoft did some early work on this with their XML Diffgram but no one else followed suit.


Authentication (in order of increased security)

Basic Auth

Basic Auth + SSL


WSSE Authentication (ATOM uses this)

Message Security:

Message Level Encrypt (WS-SEC)

For the Microsoft coders I highly recommend

RESTful .Net (WCF For REST (Framework 3.5) Jon Flanders

There are significant advantages to building your RESTful services using .Net.  Here is a comparison table to get you oriented:

DLR’s Cross Reference:
Web Service Standard REST Service WCF For REST (Framework 3.5)
1 TCP/IP + others TCP/IP TCP/IP
3 SOAP Headers HTTP Headers HTTP Headers
4 WS*Security Basic Auth/SSL Basic Auth/SSL or WS*Security
5 Early Binding Late Binding Late Binding
7 XML Media Negotiation Media Negotiation
8 SOAP FAULTS HTTP Response Codes HTTP Response Codes
9 Single Endpoint Multiple Endpoints, URI Templates Multiple Endpoints, URI Templates
10 Client Proxy Custom auto-generated Javascript proxy


The REST of the Week

Wednesday is more or less vendor day at QCON and the sessions are a step down from the tutorials but the session quality6373577519_b3a8be078c picked up again on Thursday and Friday.  XXX XXXX who gave an excellent tutorial last year gave an informative talk on ‘good code’.  The Mobile Development and HTML5 tracks were well attended and quite informative.  The fie   ld is wild open with many supporting systems being free to the developer (support will cost you extra) and the choices are broad: from browser ‘responsive design’ application to native appearing applications to native apps ( and someone threw in “hybrid app” into the mix).  The Mobile panel of IBM DOJO, JQuery.Mobil and Sencha was hot.  I am new (to say the least) to Mobile development but here are my (somewhat) random notes on these sessions:

MOBILE Development is HTML5 Development

HTML5 is the stack. Phone and Tablet applications use WebKit based rendering engines and HTML5 conformant browsers only (Windows Phone 7 is the exception here). HTML5 has its own new security concerns ( New Security Concerns)

Three major application development approaches are:

· Browser Applications;

· Native like Applications;

· Hybrid Applications; and

· Native Applications.

Browser applications may emulate the screens seen on the parallel desk top browser versions on the front end but in practice the major players (Facebook, YouTube, Gmail) make substantial modifications to at least the non-visual parts of the Mobile experience making extensive use of local storage and the HTML5 manifest standard for performance and to allow for a reasonable off line experience. Browser applications fall under the guidelines of Responsive Design (aka adaptive Design) and tend to be used when content will appear similarly between desktop and Mobile devices.

“Native like” applications use:

· The Browser in full screen Mode with no browser ‘chrome’; and

· Widgets are created using CSS, JS and HTML5 which simulate the ‘look and feel’ of a native application;

· No Access to Native Functionality (GPS, Camera, etc)6380026599_db3ba709db

· Tend to use, but does not require use of HTML5 manifest or local storage but it is strongly encouraged. 6439624411_22b452613f

A Native application is still an HTML5 application with the following characteristics:

· All JS Libraries, CSS and HTML are packaged and pre-loaded using a vendor specific MSI/Setup package;

· AJAX type calls for data are allowed;

· Access to Native Widgets and/or Widgets are created using CSS, JS and HTML5

· Access to Native Functionality (GPS, Camera, etc)

· Standard HTTP GET or POST are NOT allowed

A Hybrid Application is a “Native Like” Application” placed within a wrapper which allows access to device hardware and software (like the camera) via a special JavaScript interface and, with additional special coding, can be packaged within a MSI/Setup and distributed as a pure Native application.

AJAX calls are made via XHR2 (aka XMLHttpRequest Level 2) which among other things relaxes the single domain requirement of XHR and processing Blob and File interfaces.

The following major vendors offer free libraries and IDE for development:

Native Apps: PhoneGap, Appcelerator

Native App Like: Sencha, PhoneGap, IBM Dojo

Browser App: JQuery.Mobile

PhoneGap does NOT require replacement of Sencha, JQuery.Mobil, Dojo.Mobile JQuery libraries.

PhoneGap allows JavaScript to call PhoneGap JavaScript libraries which abstract access to device hardware (camera, GPS, etc).

Sencha does not require replacement of the JQuery.Mobil, Dojo.Mobile JQuery libraries.

Although it is theoretically possible to create “Native like” applications with only JQuery.Mobile this is NOT encouraged.6439625143_caa6996f39 6337926187_91ca36793d

Local Storage

This is a major area of performance efforts and is still very much open in terms of how best to approach the problem:

The major elements are:

App Cache (for pre-fetch. and Native App Approach)

DOM Storage (aka Web Storage)

IndexedDB (vs. Web SQL)

File API (this is really part of XHR2)

Storing Large Amounts of Data Locally

If you are looking to store many Megabytes – or more, beware that there are limits in place, which are handled in different ways depending on the browser and the particular API we’re talking about. In most cases, there is a magic number of 5MB. For Application Cache and the various offline stores, there will be no problem if your domain stores under 5MB. When you go above that, various things can happen: (a) it won’t work; (b) the browser will request the user for more space; (c) the browser will check for special configuration (as with the “unlimited_storage” permission in the Chrome extension manifest).



Web SQL Database is a web page API for storing data in databases that can be queried using a variant of SQL.

Storage Non-Support as of two weeks ago.

IE Chrome Safari Firefox iOS BBX[RIM] Android
IndexedDB Supported Supported No Support Supported No Support No Support No Support
WEB SQL No Support Supported Supported No Support Supported Supported Supported


Doing HTML5 on non-HTML5 Browsers: If you are doing responsive design and need to work with Desktop and6380016957_4c6b5e7345_z Mobil using the same code base: JQuery.Mobile, DOJO and , Modernizr(strong Microsoft support for this JavaScript library).


What is it? Just a name for breaking out the AJAX servers from the web server. This is an expansion of REST into just serving data for XHR. It is a helpful way to specialize our design discussions by separating serving pages (with MVC or whatever) from serving data calls from the web page. Except for security the two can be architecturally separated.

Web APIs Technology Stack


Look familiarr? Looks like our old web server stack to me.


The CAP Theorem  (and Here)

  • Consistency: (all nodes have the same data at the same time)
  • Availability: (every request receives a response – no timeouts, offline)
  • Partition tolerance: (the system continues to operate despite arbitrary message loss)

Pick Any Two6439627917_7f88626477_z

If some of the data you are serving can tolerate Eventual Consistency then NOSQL is much faster.6380029445_0e0ecf7d53

If you need two phase commit, either use a SQL database OR redefine your resource to eliminate the need for the 2Phase Commit.

NoSQL databases come in two basic flavors:

Key/Value: This are popular with content management and where response time must be minimal. In general you define what btrees you want to use before the fact. There are no on the fly Joins or projects. MongoDB and CouchDB are typical leaders in this area.

Column Map: This is what Google calls Big Table. This is better for delivering groups of records based on criteria which may be defined ‘on the fly’. Cassandra is the leader in this group.

Web Sockets:

6439628517_6c7955df1f_zSad to say this is still not standardized and preliminary support libraries are still a little rough.  Things do not seem to have moved along much since the Microsoft sessions I attended at MIX 11.

Photos: All Photos by Cloud2013

REST, Ruby On Rails, CouchDB and Me – Part 6 Getting The Data Into And Out Of CouchDB   1 comment

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 7 – JQUERY,JPlayer and HTML5

 [This just in: couchDB and Microsoft MVC 3: here]

Our mission in today’s post: 

  • Serialize the Concert object to a JSON string (i.e. a JSON document);
  • Do POST requests to insert  a JSON document for each concert into the couchdb database;
  • Create couchDB views to allow optimized data retrieval; and
  • Create a couchDB view to optimize retrieval recordings for all years for an arbitrary Month and Day (this duplicates the data provided by the “Grateful Dead Shows on This Day In History” selection in the Internet Archive.

Our last post outlined the formal structure of our document and the JSON format of the document we intend to post to our couchdb database.  Also in that post we outlined the RUBY code used to collect and clean the data we accessed from the Internet Archive.  In order to place our concert information into our couchdb database we need to transform our RUBY objects into proper JSON format and then call an HTTP POST to place our resource (document) into the database.  Why a POST rather than a PUT command you ask?  Better to ask a true RESTafarian.  In REST PUT is used to create a resource (in couchdb terms this is the database) and POST is used to modify a resource (in our case create a document within the database – which modifies the database I guess).  Hmmmm.

Why JSON?  Ask These Guys:


To transform a RUBY object into JSON format requires the use of a RUBY GEM: json.  This GEM comes in two flavors, a pure RUBY form (slower) and a C form (faster, but it requires that you have a working RUBY development stack, GNU C compiler etc. deployed on your local machine).  Since we are happy with the speed of the pure RUBY form.  The downside of this GEM is that it will not serialize pure RUBY objects as is.  To do that you need to code custom serialization handlers. By default, the JSON serializer will work directly on primitive (native) types (strings, ints, what have you), and the simple structures: arrays and hash tables.  So our first task will be to transform our data into hash table format.  Assume a RUBY array of concert objects (@selectList) and a parallel array of data for each track (tracks). In pseudo code we have:

@selectList.each do |recording|



Our method makeJSON takes the RUBY objects and returns a JSON string as:

def makeJSONFull(recording,tracks)
tracks.each do |t|
tItem[“track”]=t.track #this could be derived as offset – 1
return JSON.generate(tList)

where JSON.generate(tList) is the JSON GEM method generate (to serialize) and tList is a Hash containing only primitive types (strings) and a Hash list of track data.  The track data contains only primitive types (strings).  We pass the returned string (which contains our data in JSON format) to our routine PostRecording(jsonString).  This Routine looks like:

def PostRecording(jsonString)
responseBody=open(uri,:method=> :post, :body => jsonString,”Content-Type” => “application/json”).read
puts ‘POST Response Success: ‘ + responseBody
rescue OpenURI::HTTPError => the_error
puts ‘##########Post Response Error: ‘ +[0]


OK this routine depends on the RUBY GEM: rest-open-uri .  This GEM is a single file of RUBY code which depends on the RUBY components: uri and net/http (these are part of the base RUBY system).  “rest-open-uri” extends the base RUBY HTTP components for the HTTP GET verb  and extends them to include all of the HTTP verbs (PUT, POST, DELETE and HEAD)  necessary to implement any REST system.  The open method, as used here:

open(uri,:method=> :post, :body => jsonString,”Content-Type” => “application/json”).read

uri: the address of our couchdb database

Content-Type: our MIME type for this call (“application/json”)

body: our document

method: the HTTP verb POST
If this POST command fails we want to capture the error and (for this example) display the error:

rescue OpenURI::HTTPError => the_error
puts ‘##########Post Response Error: ‘ +[0]

Actual error recovery is left as an exercise for the reader.


OK, we now have data in our database.  In our case this is is 2,014 documents (i.e. 2,014 separate Grateful Dead concerts).  Using Futon we see the database as:


In Futon, we can look at our data:


And we can drill in on an individual document (for example the primordial recording on 1965-11-01):



Of course what we need to able to return our document with a simple HTTP GET request.  Further we will want to return only those fields we will actually need from a document (i.e. we need to shape our data to minimize bandwidth).  We do that in couchdb with a view which is defined in a design document.  We can define multiple views in a single design document and can have multiple design documents in a given database. 

For our first view lets set the task to return all documents which occurs on a given month and day (regardless of year).  This is the filter. We want to shape our data to return only a subset of fields: the venue, the IAKey and the document tracks.  The key for this view will be the month+day (in MMDD format).  Our design document might look like:

“_id”: “_design/basic”,
“_rev”: “19-fd2c9b34d2536ce1f187ab2d4e5413de”,
“views”: {
“MonthDay”: {
“map”: “function(doc){emit(doc._id.substr(5,2)+doc._id.substr(8,2),[doc.venue , doc.IAKey, doc.tracks ])}”


What this view (called MonthDay) does is maps each document (doc) into a key (doc._id.substr(5,2)+doc._id.substr(8,2)) and an array of return fields:

[doc.venue , doc.IAKey, doc.tracks ]

Note that doc.tracks itself returns an array of tracks.

Using Futon to test our work and find the results for all New Years Day Grateful Dead Concerts.  The first document looks like:



(additional lines are omitted from this screen snap).


As an HTTP Get Verb we would write:

This command users the


The View: _design/basic/_view/MonthDay

startkey: this is a key word parameter field which tells couchdb to go to the b-tree which represents the MonthDay view and finds the first Key match and continues through the tree returning all Key matches.

If we wanted a range of Keys (say all holiday season Concerts) we can use the startkey and the endkey parameters:

startkey” gives us a starting key in the b-tree and returns all documents until (and including) the b-tree key defined by endkey.



For more details see the Definitive Guide to CouchDB chapter: Finding Your Data With Views.  The key to simple view definition is defining an appropriate lookup key and defining what data to return.  We can define a simple listing view as:

"Jump": { "map": "function(doc){emit(doc._id,'['+doc.venue+'] ' +''+doc.IAKey)}" },
This returns a lookup key equal to the original key used in the database (doc._id) and the data as venue (doc.venue)
and a URL for the concert recording (''+doc.IAKey). Now if we want to return all
concerts in August, 1969 we can issue the GET:

This will return the rows as:{“total_rows”:2013,”offset”:156,”rows”:[ {“id”:”1969-08-02″,”key”:”1969-08-02″,”value”:”[Family Dog at the Great Highway];}, {“id”:”1969-08-03″,”key”:”1969-08-03″,”value”:”[Family Dog at the Great Highway];}, {“id”:”1969-08-16″,”key”:”1969-08-16″,”value”:”[Woodstock Music];}, {“id”:”1969-08-21″,”key”:”1969-08-21″,”value”:”[Aqua Theater];}, {“id”:”1969-08-23″,”key”:”1969-08-23″,”value”:”[Pelletier Farm];}, {“id”:”1969-08-28″,”key”:”1969-08-28″,”value”:”[Family Dog at the Great Highway];}, {“id”:”1969-08-29″,”key”:”1969-08-29″,”value”:”[Family Dog at the Great Highway];}, {“id”:”1969-08-30″,”key”:”1969-08-30″,”value”:”[Family Dog at the Great Highway]


Get it? Good.

What Would Dweelze Do?

Now let’s return to RUBY and write a web page which will display concert and track data for any given day in Grateful Dead History. Using the view:MonthView as defined above:

“MonthDay”: {
“map”: “function(doc){emit(doc._id.substr(5,2)+doc._id.substr(8,2),[doc.venue , doc.IAKey, doc.tracks ])}”

Lets return data using RUBY for concerts for all years for the Month and day of todays Date.  In Ruby we need to define a controller and a erb HTML file to accomplish this.  Lets say our page will be called player.  Our controller might look like:

require ‘rest-open-uri’

require ‘json’

class PlayerController < ApplicationController
def player[5..-1]
def returnJSONString(url)
open (url) do |x|
x.each_line do |y|
return jsonString;


This returns the JSON returned from couchdb unaltered as a parameter to the erb page for player (player.html.erb).  If we wanted to work with the data within RUBY we would need to change the JSON back into a RUBY format by calling:


For our purposes, however, we want to pass the data directly to the browser and will process the data using JavaScript this will in almost all cases be faster than processing the JSON back to RUBY and then formatting the data on the erb page.  In our first pass the route map for our page will be simple:

map.connect ‘player’, :controller => ‘player’, :action => ‘player’

and our first pass at an HTML page will look like this:

<meta http-equiv=”Content-Type” content=”text/html; charset=iso-8859-1″ />
<title>Player: On This Day In Grateful Dead History</title>
<script type=”text/javascript”>
gdData=<%=  @parm  %> ;
<h3>On This Day In Grateful Dead History</h3>

This page does absolutely NOTHING…  Except gets our JSON data to the Javascript on the browser page.  And that, dear reader is as far as we need to go with today’s post.



REST, Ruby On Rails, CouchDB and Me – Part 5 Getting The Data Ready for CouchDB   3 comments

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 6 – Getting The Data Into And Out Of CouchDB

Part 7 – JQUERY,JPlayer and HTML5

The Internet Archive and the Grateful Dead

The Internet Archive (IA)  is a 501( c ) 3 non-profit corporation dedicated to disseminating of public domain digital artifacts.  This includes such items as books, videos of imagesall kinds (TV Shows, shorts and feature films) audio recordings of all kinds (musical and spoken word).    One of their most popular projects is a truly huge collection of Grateful Dead concert recordings.  One 418910of the most visited pages of the Internet Archive’s web site is the “Grateful Dead Shows on This Day In History” page, which lists and allows play back of any Grateful Dead concerts they have in their collection of whatever day you visit the site.  Did I say that there IA has a large number of Grateful Dead concert recordings?   The are recordings of around 2,000 separate concert dates.  Any given concert may be represented by multiple recordings from a variety of sources: soundboard, audience recordings using professional and amateur equipment.  Original media ranges from cassette tapes, to 7 inch reel to reel and digital media.  If you are going to be working with the IA Grateful Dead collection please review the FAQ on the collections policy notes as well as the special notes here.

IA uses a very sophisticated data repository of meta data and an advanced query engine to allow retrieving both the meta data and the recordings.  Meta data can be retrieved directly using the “advanced search” engine.  On the day I started this post I visited IA and used the “Grateful Dead Shows on This Day In History”  The query returned data on 8  concerts (and 25 recordings of those 8 concerts).  A partial page image is given below:


Clicking on any of these entries moves us to a second screen in order to play the concert recording.  A screen shot of the playback screen looks like this:


Looking closer at the second screen we see the music player:


Can we design a faster, simpler and better looking interface into the Grateful Dead Archive?  Can couchDB help us? gratefuldead_20070108135140 The the first question will be addressed in a later post. This current post will look at how couchDB can  help us achieve a faster more efficient information system.  IA does a super job of serving up the music files on demand – there is no reason to duplicate their storage system.   However, IA is fairly slow to serve up meta data (such as the results of the “Grateful Dead Shows on This Day In History” query) Abstracting the IA metadata into a CouchDB database will allow us to serve up the meta data much faster than the IA query system.

Getting Data Into CouchDB

Our basic plan for using RUBY to get data from IA and into couchdb consists of:

  1. Prepare a URL query request to get the basic recording meta data (not the track meta data);
  2. Submit A GET request to IA using the URL query;
  3. Parse the XML returned to get at the individual Concert meta data fields;ruby
  4. Select the BEST recording for any given concert (more on this below);
  5. Prepare a URL to request track listing XML file based on the IA Primary Key of the selected concert recording;
  6. Submit a GET request to IA;
  7. Parse the XML returned to get at the individual track meta data fields;
  8. Create a ruby object which can be safely serialized to JSON;
  9. Serialize the object to a JSON string (i.e. a JSON document);
  10. Do a POST request to insert a JSON document for each concert into the couchdb database;
  11. Create couchDB views to allow optimized data retrieval; and
  12. Create a couchDB view to optimize retrieval recordings for all years for an arbitrary Month and Day (this duplicates the data provided by the “Grateful Dead Shows on This Day In History” selection in the Internet Archive.

Note we are not accessing nor storing the actual music files.  Before discussing how this plays out in practice lets define our JSON couchDB document.  We will cover items one through eight in this post.  We turn to items nine through twelve in the next post.

CouchDB Document Schema

CouchDB databases start with documents as the basic unit.  Typically a couchdb based application will have one database holding one or more variant document types.  There will be one or more design documents which provide multiple views, show functions and map functions as necessary to facilitate the application.  We will use a single document which will represent a abstract of the meta data contained in IA for individual recordings ( we are going to select the one ‘best’ recording per concert).  Our couchdb database will hold one document per concert.   The tracks (actually the track meta data will be stored as arrays within the  concert document).  We will populate the couchdb database in a single background session  pulling meta data (NOT THE MUSIC FILES) from IA and we will  include the IA publication date in the document so we can update our database when (if) new recordings are added to IA in the Grateful Dead collection.

Here are the document fields  which we will use:



Typical Values

_id couchdb primary key.  We will use a natural key: a string representation of the concert date. 1969-07-04
_rev revision number provided by couchDB 1-6ea272d20d7fc80e51c1ba53a5101ac1
IAKey Internet Archive Key for this Recording gd1965-11-01.sbd.bershaw.5417.sbeok.shnf
pubdate Internet Archive Date When Recording was published to the web 2009-03-14
venue Wherethe concert took place Fillmore East Ballroom
description free text describing the concert – provided by the uploader Neal Cassady & The Warlocks 1965 1. Speed Limit studio recording/Prankster production tape circa late 1965
cm boolean – Recording by Charlie MIller  – used to select the ‘best’ recording true or false
sb boolean – Recording was made from a soundboard – used to select the ‘best’ recording true or false
mx boolean – A matrix style recording – used to select the ‘best’ recording true or false
tracks an array of meta data for each track of the recording see below

Each track in the tracks array  formally looks like:

Field Notes Typical value
IAKey The Internet Archive key for this track.  This key is unique within a given recording  (see the IAKey above) gd1965-11-01.sbd.bershaw.5417.sbeok.shnf/Acid4_01_vbr
track track number 02
title the song title Cold Rain and Snow
time the length of the track in minutes and seconds 09:48

Let call everything except the tracks our BASE data and the track data our TRACK data.

We insert documents to the database (using an HTTP post) as JSON so a typical document would look like this in JSON format:

“_id”: “1966-07-30”,
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16”,
“pubdate”: “2008-09-22”,
“venue”: “P.N.E. Garden Auditorium”,
“description”: “Set 1 Standing On The Corner I Know You Rider Next Time You See Me”,
“cm”: false,
“sb”: true,
“mx”: false,
“tracks”: [
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t01_vbr”,
“track”: “01”,
“title”: “Standing On The Corner”,
“time”: “03:46”
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t02_vbr”,
“track”: “02”,
“title”: “I Know You Rider”,
“time”: “03:18”
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t03_vbr”,
“track”: “03”,
“title”: “Next Time You See Me”,
“time”: “04:00”


Hacking The Internet Archive: Getting Data From IA and Into CouchDB:

Here is the URL to obtain the page for “Grateful Dead Shows on This Day In History”:

This is a simple GET request with a query for the IA “collection” of Grateful Dead items filtered on the date string: 19??-08-04 and sorted descending by the concert date.  This get returns an HTML page.  This type of interface is known as an HTTP RPC interface.  RPC (Remote Procedure Call) interfaces are not pure REST interfaces but they are somewhat RESTful inthat they allow us to make a data request using a late bound, loosely coupled HTTP call.  See here and here for more theoretic background on RPC calls.  IA provides an  “Advanced Search” function will allow us to return data for an arbitrarily complex query in one of several different data formats other than HTML.  We selected XML as the format  for our work here.  XML is the traditional format for HTTP RPC but other formats may be better for certain applications.  Unfortunaely IA does not directly document the format of the RPC data request but they do provide a QEB page to build the request.  The page looks like this:


Using this screen we can compose a HTTP RPC request which will mimic the URL produced by “Grateful Dead Shows on This Day In History” and with a little brain effort and experimentation we can  understand how to compose requests without using the QBE screen.  By feeding the RPC request query back into advanced search and selecting XML as an output format as shown here:


we produce both an example of the HTTP RPC request which will return our desired data in our desired format.  Thus we generate a HTMLEncoded RPC request like:


where we replace #{_dateString} with a date string like 19??-08-08.  Of course to one years worth of data we could use a data string like: 1968-??-??.  It is a simple extension of the query languge to replace the singular date request: date%3A#{_dateString} with a date range.

which returns Grateful Dead recording data for all years of the last century which were recorded on 08-08.  The XML output returned to the caller looks like:


In a more graphic format the output looks like:


Within Ruby we will need to make the HTTP Get request with a desired date range, transform the body of the return request into an XML document and use XPATH to parse the XML and retrieve the meta data values for each recording (see below).  The is NOTHING inherently wrong with this RPC interface.  It is flexible and allows us to select only the data fields we are interested in and return data only for the dates we wish.  Since RUBY supports neither native consumption of JSON nor XML. So the XML format of the data is as good as any other and numerous tools exist in RUBY to manipulate XML data.  I which RUBY had a more native interface for JSON but it does not.

At this point, we do not have meta-data about individual tracks in a given recording.  It turns out that we can get this data but not through an HTTP RPC request.  It turns our, dear reader, that if we have the IAKey for the recording we can obtain an xml file with track meta data by making the following call:{IAKEY}/{IAKEY}_files.xml.

This file contains assorted XML data, it varies by what formats IA makes available the individual tracks via a 309 (HTTP redirect).  This is not an RPC call so we are far from a RESTful interface here.  We do not have control over the fields or range of the data included in this call.  It is all or nothing.  But at least the XML format is simple to mainipulate.  With the IAKey in hand for an individual recording and making some reasonable guesses we can parse the XML file of track data and compose the TRACKS array for our couchDB document using XPATH. A single entry for the high bit rate mp3 track recording looks like:

<file name=”gd89-08-04d2t01_vbr.mp3″ source=”derivative”>
<creator>Grateful Dead</creator>
<album>1989-08-04 – Cal Expo Amphitheatre</album>
<format>VBR MP3</format>

Note that we have the IAKey for the track (gd89-08-04d2t01 ) as part of the name attribute.


Using a background Ruby Process to Read the Data

The following RUBY GEMS are required to complete this step:

rest-open-uri : This GEM extends open-uri to support POST, PUT and DELTE HTTP command

json : This GEM handles serialization and de-serialization of a limited subset of RUBY into JSON strings.

From the standard RUBY library we will also be using

rexml : This GEM creates XML documents from XML Strings and supports XPATH which we will use to read the XML documents from IA

Our first step is to extract the get the the data via HTTP and parse the XML file returned to find individual recordings.  There are  (in most cases) be multiple recordings per concert (per date) and we want to retain for the database only the “best”.

In pseudo Ruby code:

require ‘rest-open-uri’

require ‘rexml/document’

 def initialize(_dateString)

#HTTP GET, create a string of the response body and transform the string into an XML node tree
#mind the screen wrap and html Encoding:

open (@uri) do |x|       #build a representation of the response body as a string
x.each_line do |y|
if xmlString==”
puts ‘No String Returned From Internet Archive’
@IAXMLDocument=  #turn the string into an XML document
end #open

Now we need  to loop through the XML document and pull out each ‘doc’ section using XPATH and read each doc section for the meta data for that recording.

#use XPATH and find each response/result/doc node and yield

def get_recordings(document)

document.elements.each(‘response/result/doc’)do |doc|

yield doc

#get the XML document and yield

def get_record(xmldoc)

get_recordings(xmldoc) do |doc|
yield doc

#general purpose XPATH method to extract element.text (the metadata values) for arbitrary XPATH expressions

def extract_ElmText(doc,xpath)

doc.elements.each(xpath) { |element|  return element.text }

def worker(xmldoc)

#main loop


get_recordings(xmldoc) do |doc|
_pubdate=extract_ElmText(doc,’date[@name=”publicdate”]’)[0..9]  #there is a bug here , corrected by lines below

if (_pubdate.length==0)
puts “#No Publication Date: {_date} #{_title}”

#make a RUBY class object to hold one recording _date, _description, _tracklist, _title, _keylist, _pubdate,_uri

#save the recording class objects in an array



In this code the ‘worker’ method calls the helper methods to:

0) Do the HTTP  get to make the RPC request and read the response body one line at a time and

1) transform the lines into a single string and convert ( the string into an XML document for processing by XPATH

2) loop through the doc nodes of the xml tree and extract the values of the  meta data fields

3) the meta data values are passed to an RUBY class ( GDRecording) which holds this meta data for later processing,

4 finally we temporarily store the recordings in an array for the next processing step.

Note that these routines work  whether the query returns a single day (with multiple recordings) or multiple days or even the whole dataset!  What is essencial is that we process the file as N ‘doc’ sub trees (which represent single recordings) and have recording date (at least) to group our data and extract the ‘best’ recording within each date group.

Our next step will be group the recordings by day (i.e. concert) and provide our own filter to select a single ‘best’ recording for each concert.

Shake and Bake:  Finding A ‘Best’ Recording.


What is the best Grateful Dead concert.  Why the first one I went to of course.  Just ask any Deadhead and you will probably get the same answer.  But what is the best recording of any given GD concert? My approach is very simple.

  • Most recent posted recordings are better than older recordings. (least important criteria)
  • Soundboard recordings are better than audience recordings.
  • Matrix recordings are even better.
  • Recordings mixed by Charlie Miller are best of all. (most important criteria)

Well these are MY criteria.  What ever criteria as long as they are hieratical  you can code the select in a very trivial manner.  If we have a field in each recording for the concert date and a field for each selection criteria (we derive these from the keywords field in IA) we sort the recordings by date and then by each of the criteria from most important (Charlie Miller in may case) to least important (date posted) and then select the first recording in sort order within each date group. On Ruby the sort of our list of recordings is trivial to code and easy to maniuplate (add new criteria or change the priority of criteria). The sort statement looks like this:

@list.sort! { |a,b| ( ) <=> ( )   }

Once sorted we create a list of best recordings as:

def newSelect
if  @list==nil or @list.count==0
puts ‘No Recordings.’
foreach do |rec|
if _dateGroup!
if dateGroup!=nil
if dateGroup!=nil

Note that is code is not only simple but it is independent of the selection criteria we are using.

Now that we have a list of recordings we are interested in,  we can get the XML file of track meta data using the IAKey discussed above and making a simple GET call and parsing the XML file for the meta data for each.  Much of the code used duplicates the XML code presented above so we need not reproduce all the code except to show a short section which uses a slightly different XML XPATH syntax:

open (filesURI) do |x| x.each_line do |y| xmlString=xmlString+y end end‘files’) do |doc|

doc.elements.each(‘file’) {|file_elm|
file_elm.elements.each(‘title’) { |element| title=element.text }
file_elm.elements.each(‘track’) { |element| trackString=element.text}
file_elm.elements.each(‘length’) { |element| lengthString=element.text}

{omitted code}


Okay now we have a (hash) list of recording meta data,  each item of which contains a (hash) list of track meta data for that recording.  In our next post we will leave this unRestful world behind and move into the RESTful world of couchDB when we:

  • Serialize the object to a JSON string (i.e. a JSON document);
  • Do POST requests to insert  a JSON document for each concert into the couchdb database;
  • Create couchDB views to allow optimized data retrieval; and
  • Create a couchDB view to optimize retrieval recordings for all years for an arbitrary Month and Day (this duplicates the data provided by the “Grateful Dead Shows on This Day In History” selection in the Internet Archive.

cat on fancy couch

REST, Ruby On Rails, CouchDB and Me – Part 4 – CURL on Windows And Ruby POST   Leave a comment

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 6 – Getting The Data Into And Out Of CouchDB

Part 7 – JQUERY,JPlayer and HTML5

In The Post:

  • CURL and Couchdb
  • Documents Design and Otherwise
  • Posting Documents to couchDB Using Ruby

If you are like me you have spent some time with the free ebook: CouchDB The Definitive Guide.  If you are a windows user you may have run into some problems with the examples given in the chapter  on “Design Documents”.  Specifically they don’t work ‘out of the box’.  The examples in that chapter show us how to: create a database, to create and post a design document and to post a document to the database.  These examples use  CURL in a command shell.


Since we are running Windows first we need to install CURL on our system.  Either set your system path to include the CURL executable. We can get a windows version here.  Use the version labeled DOS, Win32- MSVC or Win64 depending on your system. We assume here that couchDB has been installed successfully on your system. Now open a ‘command prompt’ on your system.  If you must have a UNIX type shell you need to install CYWIN or some other UNIX emulator for Windows.  If you are using the Aptana IDE like me you need to create an “external command” to open a command shell within Aptana.  This figure illustrates the setup within the Aptana IDE to do this:


In the command shell you can create a couchdb Database using a POST command and CURL.  Couchdb is RESTful so we use a PUT command for all actions which CREATE a resource, of which a database is one example.  The format of the command is:

curl -X PUT http://{couchdb}/{yourdatabasename}I want to create a database named deadbase so on my system this command and response looks like:

C:\Documents and Settings\dredfield\My Documents\Aptana Studio Workspace\couchDB01

>curl -X PUT


The where “{“ok”:true}” is the response body of the http response to my put command.  Confirm your work by starting a browser and navigating to Futon user interface to your couchdb installation.  On my system this url is:

you should see something like this:


CURL and Documents

OK, now lets make a design document for this database and PUT that document to the new database.  With slight modifications to the example given in CouchDB The Definitive Guide my first cut at a design document looks like this:


     “_id” : “_design/example”,

     “views” : {

        “View00” : {

       “map” : “function(doc){emit(doc._id,}”




This is a JSON formatted document.  Initial syntax checking is up to you.  Basically couchDB will accept anything within the outer brackets whether or not it is formatted as usable JSON or not.  We have several options for checking syntax.  There are free online syntax checkers like JSONLint.  The interface to JSONLint looks like:


An installable open source JSON checker and visualizing tool, JSON View is available here.  JSON View’s output looks like:


Now that we know our syntax is correct (if not the logic of the design document – more on this in the next installment) we can PUT this document to our database.  We can have more than one design document in a given database.  The name (id) of this document is “_design/example”.  where “_design” tells couchdb this is indeed a design document and its name is “example”.   My document is named mydesign.json on my file system.  The CURL command to PUT this into the database looks like:

curl -X PUT -d @mydesign.json

couchdb will respond:


Note here that this is NOT the syntax shown in CouchDB The Definitive Guide.  The syntax there will not work in a windows shell (i.e. command prompt).  Even when you have syntax correct JSON document  and the correct format of the PUT statement on Windows you may recieve an error message from CURL complaining about UTF8 errors within the document and have a PUT failure.  The problem here is that the Windows file system supports several encoding schemes and various windows programs save documents in to different default encoding.  If you are using Notepad.exe to create your files be sure to save the files in ANSIformat.

Check your work using the FUTON interface locate the “_design/example document” in deadbase


Double click on the document:


Note that “views” is a “Field” within the document.  Select the “Source” tab  and take a look inside the document:


Now lets POST a document into the database.  Since we have not defined any validation fields we can push anything into the database.  Even documents which consist of just “{}”.  CouchDB defines only one innate restriction:

If a document defines the id field (“_id”) then the value of _id must not conflict with an existing value of the ID field of ANY other document in the database.

If the document does not define an ID field, couchDB will generate an ID (as a UUID) and apply it to the document.  You can supply your own ID values.  If you can either generate your own value  (Ruby can generate a GUID for you) or you can request a GUID from couchdb with a GET command.  See this page for more information.  In the sample program I will be developing for this series I will be using a ‘natural key’ – that is a key whose value has an actual meaning (a Social Security is such a natural key for example, but please never use this).  If you try to POST a document and use a duplicate key you will get back a 409 status code for the error.

The document I will be using in the next post looks like this:


“_id” : “1972-07-22”,

“IAKey” : “gd1972-07-22.sbd.miller.94112.sbeok.flac16”,

“description” : “Set 1 Bertha Me And My Uncle You Win Again Jack Straw Bird Song Beat It On Down The Line Sugaree Black Throated …

“pubdate”: “2008-08-15”,

“sb”: true,

“cm”: true,

“mx”: false,

“venue”: “Paramount Northwest Theatre”,


If I save this document as ConcertRecord.json I can use CURL to POST this document as:

curl -H “Content-Type: application/json” -X POST -d @ConcertRecord.json

and couchdb will reply with an HTTP status 200 and a response body of:


In couchDB Futon this document looks like:


Note that the order of the fields is set by couchDB not the order in the first loaded document.

Ruby At Last

OK, enough of the command shell let’s do some couchDB work using RUBY.  I am going to access couchDB from a fairly low level within Ruby in these posts.  There are several ActiveRecord type GEMS which will interface with couchDB but my focus here will be on: (1)  speed of access and (2) transferability of knowledge between Ruby access and direct Javascript/Browser access to couchDB.

Here’s a minimum of what we need to POST a document to a couchdb using RUBY.

The GEMS for

JSON : This will always load the Ruby based version of the JSON module.  If you want to have ‘pure’ JSON (i.e. a C based module you will need to have the Ruby/Windows DEVKit installed on your system.  For our purposes the ‘pure’ version is not necessary.

REST-OPEN-URI:  This extends open-uri by using the net/http  and the uri GEMs to cover all of the REST verbs (GET, POST, PUT and DELETE).  This is a very light install and is only lightly documented.

Here is the basic plan:

Assume we have a RUBY object (call it “rec”) which includes, among other things the fields we want to POST into the deadbase as a deadbase document like the one developed above.  We first need to convert the fields into a JSON string and then to POST the JSON string into the deadbase.  The JSON GEM is used to achive the first goal and REST-Open-URI is used to accomplish the second.

JSON Strings:

The JSON GEM will only serialize Ruby base types (strings, numbers and bools and HASH objects).  The JSON GEM is quite limited in that it will not serialize a Ruby object derived from the base RUBY object  into a JSON string, even if that object consists only of base types and Hash objects.  Although you may extend JSON we did not choose to do so. Rather we will create a simple Hash object and populate it manually via Ruby code with the fields we want to use for a document. Simply this could look like:

def makeJSON(rec)  #we know that JSON can serialize this type of object









return JSON.generate(thing)  #this returns a JSON String



Our POST routine will use the output form makeJSON and POST the JSON string to the deadbase.  In simple for this routine looks like:

def PostRecording(jsonString)

uri=””   #this is our database


responseBody=open(uri,:method=> :post, :body => jsonString,”Content-Type” => “application/json”).read

puts ‘POST Response Success: ‘ + responseBody



OpenURI::HTTPError => the_error

puts ‘Post Response Error: ‘ +[0]



The key line is, of course:

responseBody=open(uri,:method=> :post, :body => jsonString,”Content-Type” => “application/json”).read

If we ran this line as:

responseBody=open(uri,:method=> :post, :body => jsonString).read

we would get an http Status code for an “Invalid Media Type”.  That’s because the default “Content-Type” for POST commands is “application/xxx-form” which is the typical format of a HTML “form” involved in a POST from a web browser.  We are far from a browser here and our “Content-Type” needs to be “application/json”.  The way to add Headers to the POST is to provide one or more key/value pairs with the desired header information.  Hence:

“Content-Type” => “application/json”

and the correct Ruby line is:

responseBody=open(uri,:method=> :post, :body => jsonString,”Content-Type” => “application/json”).read

We need to wrap the POST command in an exception block where the line:

OpenURI::HTTPError => the_error

is only executed IF the Http response status is > 399.  You can then do more fine grained responses to the error condition.  Specifically, if[0]==409 you have attempted to POST the same document twice (at least two documents with the same ID).

That looks like a wrap for now.


Posted 2011/07/22 by Cloud2013 in Aptana, couchdb, REST, Ruby

Tagged with , , , ,

%d bloggers like this: