Archive for the ‘couchdb’ Tag

Hey Flickr, Where Did My Statistics Go? The CouchBase Connection. Part III   1 comment

This is the third post in this series on how to harvest statistical data from your (or a friend’s) Flickr Picture View
data.  The series begins
here.  Today we are looking at CouchBase as a noSQL database to store our Flickr data.  This post will get as far as getting the shell of a console application up and will defer example code samples for the next blog post.

CouchBase  iscouchbase a commercialized version of the public domain project
Apache CouchDB.  CouchDB is open source and CouchBase is not.  Both support API libraries for .Net and Java.  Commercial development with CouchBase is NOT free.  The CouchDB wiki lists five active C# libraries for CouchDB.  CouchBase supports a many API libraries including .Net and Java.  I have written about CouchDB and Ruby in a prior series of posts which can be found here. Both systems support multi-server nodes and concurrency controls.  Neither of these features will be touched on in the post.  Our focus here will be on an introduction to the minimum necessary administration skills and API coding to help us with our goal of storing information about Users, Photos and View Counts through time.  Along the way we will also discuss JSON Serialization / Deserialization using Newtonsoft.JSON, open source application Logging with NLog.  I will defer the discussion of CouchBase views for a subsequent post.

Data Model Overview.

Ultimately we want to store information about each User. For each user we will store information for one or more Photo and for each Photo, information on one or more View Counts.  Users and Photos have their own Primary Key, supplied as an ID field from Flickr.  Our view counts will be collected each day and the Primary Key of the Counts is represented by the date the view count data was collected.  This could be modeled into a traditional RDBMS in third normal form, but this pattern is also most naturally represented as a nesting of lists of objects within container objects.  Rather than say we are storing nested objects it is more typical today to say that this data can be thought of as a structured Document.  The most natural way to store and access this data is by simple (or compounds of ) primary keys.  When we get to the point where we are doing manipulation of the data for statistical analysis and summary our most natural mode of access will be by a key composed of the User ID and Photo ID and iterating there view counts by Date ID (or Date ID range).  A very simple way to model this is with a Key / Value noSQL database based on document storage (aka a Document Store).  We could call this design an object oriented database model but that would be old fashion.  Here is the visual of the data model:image

The full Document could be represented as a compound C# object:

   class CObject
public CUser User { get; set; }
public List<CPhoto> Photo { get; set; }

public class CUser
public string FullName { get; set; }
public string Username { get; set; }
public string UserId { get; set; }
public string APIKey { get; set; }
public string SharedSecret { get; set; }
public string Token { get; set; }
public string TokenSecret { get; set; }


public class CPhoto
public string ID { get; set; }
public string Title { get; set; }
public string ThumbnailURL { get; set; }
public List<CView> Views { get; set; }

public class CView
public string Date { get; set; }
public int Views { get; set; }

In this post we will setup a single server CouchBase instance and develop a single user application to manipulate documents in a CouchBase “bucket”.  We will not model the complete object in this post but deal with a simplified version of Photo Object while we get our feet wet on CouchBase CRUD operations and simple CouchBase server Administration.  To make things as simple as possible, for this post, we will be working only with a modified version of the CPhoto object (Document).


Getting The Stuff You Need.

Shopping List

Setting up a single node Windows CouchBase Server simple and basic administration is easy and fun. Download and run the installation of CouchBase from here. Fred willing all will go well and you will be placed at local page in your default browser. Bookmark this page and note the Port Number that has been assigned to the default instance of CouchBase.  On first use you will need to create an administrator Username and Password. I left the defaults alone for the Cluster and Server Nodes. Select the Tab Data Buckets.  You will need to decrease the Quota Usage limits for the default Bucket.  With the space you freed up, create a new Bucket called “DLR.Flickr.Example1” .  Here is what mine looks like:


And Here is the Bucket Settings Page:


OK.  Now take some time and review the documentation for the .Net SDK here.  You can read through or code along with the examples given there. Done? Now let’s get to work.

Starting to Code CouchBase

Open Visual Studio and select Tools/NuGet Package Manager/Package Manager Console and enter the command:

Install-Package CouchbaseNetClient

Create a new Visual Studio Console application.  I called mine:

DLR.CCDB.ConsoleApp and set the default namespace to DLR.CCDB.  Add references to:




[ If you can not resolve Newtonsoft.Json:  Right click on the root of the project and select: Manage NuGet Packages.  Search on Newtonsoft.Json.  Select Install on JSON.Net.  Now try adding the Newtonsoft reference again.]

Now is a good time to add the open source Logging solution to your project.  Select: Manage NuGet Packages.  Search on NLOG. Install both  NLog and NLog Configuration.

Open your App.Config project file.  You will need to make several changes.  Here is what mine looks like after the changes.

Red items are added manually by me (you) and the Blue entries are added by the NuGet Package manager during the sets you followed above.

<!–?xml version=”1.0″ encoding=”utf-8″?>
Couchbase.Configuration.CouchbaseClientSection, Couchbase” />
<servers bucket=”DLR.Flickr.Example1″ bucketPassword=””>
uri=”” />
<supportedRuntime version=”v4.0″ sku=”.NETFramework,Version=v4.5″ />
<assemblyBinding xmlns=”urn:schemas-microsoft-com:asm.v1″>
<assemblyIdentity name=”Newtonsoft.Json” publicKeyToken=”30ad4fe6b2a6aeed” culture=”neutral” />
<bindingRedirect oldVersion=”″ newVersion=”″ />
<assemblyIdentity name=”Enyim.Caching” publicKeyToken=”05e9c6b5a9ec94c2″ culture=”neutral” />
<bindingRedirect oldVersion=”″ newVersion=”″ />
<assemblyIdentity name=”NLog” publicKeyToken=”5120e14c03d0593c” culture=”neutral” />
<bindingRedirect oldVersion=”″ newVersion=”″ />

We are most interested in this section:

<servers bucket=”DLR.Flickr.Example1″ bucketPassword=””>
uri=”” />



This sets your default API calls to the bucket “DLR.Flickr.Example1” which you created above.  Although we will not develop the theme here you can override the default bucket during runtime to deal with calls to multiple buckets in the same program.


This sets your local node. the is a constant for development projects (localhost) and the 8091 is the port assigned to CouchBase during installation (double check this value on your system by navigating to the CouchBase Console page you added to your favorites list above.

While we are here let’s make some changes (without explanation why at this point) in NLog.Config (which was created when you installed NLog above).  Replace the entire contents of the file with (mind the wrap):

<!–?xml version=”1.0″ encoding=”utf-8″ ?>
<nlog xmlns=””
<target name=”debugfile” xsi:type=”File” layout=”${message}”  fileName=”C:/temp/DLR.Flickr/Debug.txt” archiveNumbering=”Rolling”  archiveEvery=”Day” maxArchiveFiles=”7″ ConcurrentWrites=”true”                        />
<target name=”logfile” xsi:type=”File” layout=”${message}”    fileName=”C:/temp/DLR.Flickr/Info.txt”  archiveNumbering=”Rolling”  archiveEvery=”Day” maxArchiveFiles=”7″ ConcurrentWrites=”true”                      />
<target name=”Errorsfile” xsi:type=”File” layout=”${message}” fileName=”C:/temp/DLR.Flickr/Error.txt” archiveNumbering=”Rolling”  archiveEvery=”Day” maxArchiveFiles=”7″ ConcurrentWrites=”true”                   />
<target name=”Fatalfile” xsi:type=”File” layout=”${message}”  fileName=”C:/temp/DLR.Flickr/Fatal.txt” archiveNumbering=”Rolling”  archiveEvery=”Day” maxArchiveFiles=”7″ ConcurrentWrites=”true”                    />
<logger name=”*” minlevel=”Debug” maxlevel=”Debug” writeTo=”debugfile” />
<logger name=”*” minlevel=”Info” maxlevel= “Fatal” writeTo=”logfile” />
<logger name=”*” minlevel=”Error” maxlevel=”Fatal” writeTo=”Errorsfile” />
<logger name=”*” minlevel=”Fatal” maxlevel=”Fatal” writeTo=”Fatalfile” />

We will get back to the details of this configuration file in the next post.

Write the minimum test code possible. 

Replace the contents of Program.cs with

using System;

using Couchbase;

using Enyim.Caching.Memcached;

using Newtonsoft.Json;

using NLog;

namespace DLR.CCDB.ConsoleApp

class Program
static void Main(string[] args)
CouchbaseClient client=new CouchbaseClient();




Build and run.  You should have no errors and client should not equal null after the call

CouchbaseClient client=new CouchbaseClient();

Success?  Take a break and we will continue next week.


Grateful Dead: Scarlet Begonias -> Fire On The Mountain Concert Tapes On The Internet Archive   18 comments

While getting ready to do a blog post on Microsoft’s MVC Web API I dumped my CouchDB database of Grateful Dead Internet Archive tapes into SQL Server 2008.  To test the full text search on SQL Server I produced this list.  I hope you enjoy it.  Here is a list of Soundboard or Matrix concert recordings on the Internet Archive where Scarlet Begonias and Fire On The Mountain are played during the same concert.  This is a computer generated list.  There are 163 concerts listed here. There are 217 concert recordings with this song combination if we were to include audience tapes also.

Click On the image below of a list of ‘best’ SB –> FOTM concert Recordings

[this is a hypertext list – click on a date and you will jump to the Internet Archive Music Player for that concert]

1977-03-18 – Winterland Arena

1977-04-23 – Springfield Civic Center Arena

1977-04-27 – Capitol Theatre

1977-05-04 – The Palladium

1977-05-05 – New Haven Coliseum

1977-05-08 – Barton Hall

1977-05-11 – St. Paul Civic Center Arena

1977-05-13 – Auditorium Theatre

1977-05-17 – Memorial Coliseum

1977-05-25 – Mosque

1977-10-02 – Paramount Theatre

1977-10-09 – McNichols Sports Arena

1977-11-06 – Broome County Arena

1977-12-27 – Winterland Arena

1977-12-31 – Winterland Arena

1978-01-31 – Uptown Theater

1978-04-08 – Veterans Memorial Coliseum

1978-04-11 – Fox Theater

1978-04-16 – Huntington Civic Center

1978-04-24 – Horton Field House – Illinois S

1978-05-07 – Field House – Rensselaer Polyte

1978-07-07 – Red Rocks Amphitheatre

1978-09-02 – Giants Stadium

1978-10-17 – Winterland Arena

1978-11-18 – Uptown Theatre

1978-12-16 – Nashville Municipal Auditorium

1978-12-30 – Pauley Pavilion – University of

1979-01-17 – Veterans’ Memorial Coliseum

1979-02-10 – Soldier’s And Sailors Memorial

1979-04-16 – Unknown studio

1979-04-22 – Spartan Stadium – San Jose Stat

1979-05-05 – Baltimore Civic Center

1979-08-05 – Oakland Auditorium

1979-09-05 – Madison Square Garden

1979-10-24 – Springfield Civic Center Arena

1979-11-01 – The Nassau Coliseum

1979-11-08 – Capitol Center

1979-11-30 – Stanley Theatre

1979-12-03 – Uptown Theater

1979-12-10 – Soldiers & Sailors Memorial Aud

1979-12-30 – Oakland Auditorium Arena

1980-03-30 – Capitol Theatre

1980-05-11 – Cumberland County Civic Center

1980-05-30 – Milwaukee Auditorium

1980-08-17 – Kansas City Municipal Aud

1980-08-30 – The Spectrum

1980-10-22 – Radio City Music Hall

1980-10-27 – Radio City Music Hall

1980-11-30 – Fox Theatre

1980-12-26 – Oakland Auditorium Arena

1980-12-31 – Oakland Auditorium Arena

1981-02-27 – Uptown Theater

1981-03-10 – Madison Square Garden

1981-03-20 – Rainbow Theater

1981-05-11 – Veterans’ Memorial Coliseum

1981-05-15 – Athletic Center – Rutgers U

1981-08-16 – MacArthur Court – University of

1981-08-31 – Aladdin Theater

1981-09-12 – Greek Theatre, U. Of California

1981-09-25 – Stabler Arena

1981-10-04 – Rainbow Theater

1981-10-19 – Sports Palace

1981-12-03 – Dane County Coliseum

1981-12-09 – Events Center, U of Colorado

1981-12-26 – Oakland Auditorium

1982-02-17 – Warfield Theater

1982-03-14 – Recreation Hall – University of

1982-04-03 – Scope

1982-04-15 – Providence Civic Center

1982-05-23 – Greek Theatre – University of C

1982-07-25 – Compton Terrace Amphitheatre

1982-07-31 – Manor Downs

1982-08-06 – St. Paul Civic Center

1982-09-20 – Madison Square Garden

1982-11-25 – Bob Marley Performing Arts Cent

1982-11-26 – Bob Marley Performing Arts Cent

1982-12-27 – Oakland Auditorium

1983-03-26 – Aladdin Theatre

1983-04-13 – The Patrick Gymnasium, U of V

1983-06-18 – Saratoga Performing Arts Center

1983-06-27 – Poplar Creek Music Theater

1983-07-31 – Ventura County Fairgrounds

1983-09-04 – Park West Ski Resort

1983-09-13 – Manor Downs

1983-10-14 – Hartford Civic Center

1983-10-21 – The Centrum

1983-12-27 – San Francisco Civic Center

1984-03-31 – Marin County Veterans Aud

1984-04-13 – Hampton Coliseum

1984-04-20 – Philadelphia Civic Center

1984-04-26 – Providence Civic Center

1984-06-21 – Kingswood Music Theatre

1984-07-13 – Greek Theatre – University of C

1984-10-15 – Civic Center

1984-10-30 – Berkeley Community Theater

1984-12-31 – San Francisco Civic Auditorium

1985-02-20 – Henry J. Kaiser Convention Cent

1985-06-16 – Greek Theater, U. of California

1985-07-01 – Merriweather Post Pavilion

1985-09-15 – Devore Field

1985-11-11 – Brendan Byrne Arena

1986-02-12 – Henry J. Kaiser Convention Cent

1986-05-03 – Cal Expo Amphitheatre

1986-12-28 – Henry J. Kaiser Convention Cent

1987-01-29 – San Francisco Civic Center

1987-03-01 – Henry J. Kaiser Convention Cent

1987-04-02 – The Centrum

1987-04-09 – UIC Pavilion – University of Il

1987-05-10 – Laguna Seca Raceway

1987-06-14 – Ventura County Fairgrounds

1987-06-30 – Kingswood Music Theatre

1987-07-08 – Roanoke Civic Center

1987-08-15 – Town Park

1987-08-23 – Calaveras Co. Fairgrounds

1987-11-15 – Long Beach Arena

1988-03-16 – Henry J Kaiser Convention

1988-03-27 – Coliseum

1988-04-15 – Rosemont Horizon

1988-06-28 – Saratoga Performing Arts Center

1988-07-15 – Greek Theater – University of C

1988-09-02 – Capital Centre

1989-03-27 – The Omni

1989-04-06 – Crisler Arena – University of M

1989-04-16 – Mecca

1989-08-06 – Cal Expo Amphitheatre

1989-10-20 – The Spectrum

1990-02-26 – Oakland-Alameda County Coliseum

1990-03-22 – Copps Coliseum

1990-06-15 – Shoreline Amphitheatre

1990-07-04 – Sandstone Amphitheatre

1990-07-21 – World Music Theatre

1990-09-10 – The Spectrum

1990-09-14 – Madison Square Garden

1990-10-19 – Internationales Congress Centru

1990-12-09 – Compton Terrace Amphitheatre

1990-12-27 – Coliseum Arena

1991-03-21 – Capital Centre

1991-03-27 – Nassau Veterans Memorial

1991-04-27 – Sam Boyd Silver Bowl, U.N.L.V.

1991-06-25 – Sandstone Amphitheatre

1991-08-16 – Shoreline Amphitheatre

1991-09-04 – Richfield Coliseum

1991-09-13 – Madison Square Garden

1991-12-27 – Oakland-Alameda County Coliseum

1992-05-21 – Cal Expo Amphitheatre

1992-05-31 – Sam Boyd Silver Bowl

1992-06-12 – Knickerbocker Arena

1992-06-22 – Star Lake Amphitheater

1992-12-17 – Oakland Coliseum Arena

1993-03-28 – Knickerbocker Arena

1993-06-21 – Deer Creek Music Center

1993-08-25 – Shoreline Amphitheatre

1993-09-13 – The Spectrum

1993-09-24 – Boston Garden

1993-12-10 – Los Angeles Sports Arena

1993-12-19 – Oakland-Alameda County Coliseum

1994-03-16 – Rosemont Horizon

1994-03-30 – The Omni

1994-06-19 – Autzen Stadium – University of

1994-10-01 – Boston Garden

1994-10-14 – Madison Square Garden

1995-05-26 – Memorial Stadium

(More Concert Info is available by clicking on the image below)

Grateful Dead – The 1966 Tapes On The Internet Archive   33 comments

The Grateful Dead Concert Tapes On The Internet Archive

The Internet Archive’s Grateful Dead is a great collection of concert tapes. There are over 2,000 unique concerts. There are multiple recordings of most concerts. This series attempts to provide a simple interface to this collection by selecting one recording for each concert on a given date. Preference is given to tapes processed by the great Charlie Miller and sound board tapes. Not all shows are well recorded but the there are great moments here. Dig in and enjoy! You didn’t have to be there. Good Luck.

The Grateful Dead: Dark Star Rising (1966 – 1969)

After leaving the Army in 1960 (the story goes that he lost a tank) Jerry Garcia gravitated towards bluegrass (playing banjo in his band Mother McCree’s Uptown Jug Champions and LSD (1964). Jerry, Pigpen, Phil, Bill and (the under aged) Bob formed the Warlocks in 1965 playing as the ‘house band’ with Ken Kesey’s Acid Tests in San Francisco. Tom joined in 1968 and was very important to the massive Dark Stars of yore. Owsley Stanley (The Bear) was both chemist and sound man for the Grateful Dead. Between 1965 and 1967, Stanley produced and help distribute more than 1.25 million doses of LSD. Owsley helped develp the distinctive ‘Steal You Face’ Skull logo for the Dead. Many of the early live recordings of the Grateful Dead where due to Owsley’s efforts and his unique approach to clairity and simplicity in miking allows us to experence much of the early shows ‘as they were.’

In 1968 The Grateful Dead included these members:

Jerry Garcia
Bob Weir
Ron “Pigpen” McKernan
Phil Lesh
Bill Kreutzmann
Mickey Hart
Tom Constanten

In This Series:

____ 1970 1980 1990
____ 1971 1981 1991
____ 1972 1982 1992
____ 1973 1983 1993
____ 1974 1984 1994
____ 1975 1985 1995
1966 1976 1986 ____
1967 1977 1987 ____
1968 1978 1988 ____
1969 1979 1989 ____

Additional details of the Internet Archive Collection and how these lists were generated can be found here.

My blogs on individual Grateful Dead show can be found here.

A list of all soundboard tapes of Dark Star performances on the Internet Archive can be found here.

A list of all soundboard tapes of Scarlet Begonia -> Fire On The Mountain performances on the Internet Archive can be found here.

The amazing Internet Archive in a global treasure of archived public domain Moving Images, Live Music, Audio recording and Text. For Dead head the Internet Archives Grateful Dead collection is priceless.   Here is the basic statement of purpose from the Internet Archive on the Grateful Dead collection:

About Grateful Dead concerts on the Archive

Audience-made Grateful Dead concert recordings are available as downloads while available soundboards are accessible in streaming format only.

The Grateful Dead is being separated from the Live Music Archive into its own collection (with its own forum) to avoid confusion about lossless availability. The metadata and reviews for shows and recordings, even those not available for regular download, will remain available for those who maintain direct links. No filesets have been deleted from the Archive; certain items are simply not public now. Prior to our completing the changes, text files are easily referenced at a separate database.

At this time, the Grateful Dead collection is not open to public uploads. The Grateful Dead Internet Archive Project (GDIAP) will continue its direct management of this collection for the time being.

As far as we know, there has been no change to standard GD fan trading. It is common for bands to have policies that differ between fan trading, versus archiving here.

Please note the comments on “The Grateful Dead Internet Archive Project”  Recording range from the primordial Acid Test Recording from 1965 through the final Grateful Dead concert in 1995.  There are multiple recordings of most concerts ranging from audience recordings through full digitally recorded soundboard ‘tapes’.  Quality ranges from very poor to almost perfect.  Note that concerts which have been released by the Grateful Dead organization or by Rhino records are no longer available on the Internet Archive.

In 2011 I started a demonstration project for CouchDB on the windows platform using Ruby on Rails and (later) Microsoft MVC.  This lead to a series of blog posts on reverse engineering the Internet Archive’s Grateful Dead collection and collecting an abstract of meta data into couchdb (note the recording themselves were NOT loaded into my local couchdb database).  The Ruby On Rails series of posts can be found here. The MVC post can be found here.  Reviewing the concerts has lead me to post a series of blogs on concert recording of the Grateful Dead I personally found interesting.  This series, which can be found here, is on going.

The 1966 Tapes

This current blog lists hyperlinks into the Internet Archive for the  1966 recordings.  Please note I am listing here ONE recording per performance.  The achieve includes tapes contributed by multiple sources.  I am using a simple hierarchical method to select which recording to list for each date:

  1. The recording is a soundboard recording processed by Charlie Miller
  2. The recording is a soundboard recording
  3. The is the most recently submitted tape for a given date

The tape linked here is not necessarily the best recording for a given date but it is the best an algorithm can select. Just click the Listen hyperlink to jump to the Internet Archive and its player to hear the concert.  If you are interested in other recording on the same date click on the “check for other copies” link within the Internet Archives page.  I will be posting additional years as time passes.  Have fun and keep dancing.  Remember, Jerry died for our sins.

1966-00-00 – Various – See info file Listen

1966-01-01 – Viola Lee Blues Reahearsals Listen

1966-01-08 – Fillmore Auditorium Listen

1966-01-29 – Sound City Recording Studios Listen

1966-02-12 – Youth Opportunities Center Listen

1966-02-25 – Ivar Theater Listen

1966-03-12 – Danish Center Listen

1966-03-19 – Carthay Studios Listen

1966-03-25 – Trouper’s Hall Listen

1966-05-19 – Avalon Ballroom Listen

1966-06-01 – Scorpio Studio Outtakes Listen

1966-06-15 – Various Listen

1966-07-01 – Unknown (perhaps Fillmore Auditorium) Listen

1966-07-03 – Fillmore Auditorium Listen

1966-07-16 – Fillmore Auditorium Listen

1966-07-17 – Fillmore Auditorium Listen

1966-07-29 – P.N.E. Garden Auditorium Listen

1966-07-30 – P.N.E. Garden Auditorium Listen

1966-08-01 – Various Listen

1966-09-16 – Avalon Ballroom Listen

1966-10-02 – San Francisco State University Listen

1966-10-07 – Winterland Arena Listen

1966-10-31 – Winterland Arena Listen

1966-11-19 – Fillmore Auditorium Listen

1966-11-29 – The Matrix Listen

1966-12-01 – The Matrix Listen

1966-12-04 – The Matrix Club Listen

1966-12-05 – Studio Listen

Microsoft MVC 3 and CouchDB – Low Level Get Calls   1 comment

I have written elsewhere on couchdb on Windows and using Ruby on Rails to interface to this system.  These posts can be found here:couchdb

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 6 – Getting The Data Into And Out Of CouchDB

Part 7 – JQUERY,JPlayer and HTML5

In my work life I work in a Microsoft shop which for us means Microsoft servers for the back end and (mostly) pure HTML/AJAX frontends.  We are transitioning towards using Microsoft MVC 3 to provide HTTP end points for our AJAX calls.  Here are some notes from my POC work in this area.  My couch data consists of documents describing Grateful Dead concerts stored on the great site Internet Archive, if you have never visited the Internet Archive, please do so.  I back engineered the meta data of IA’s extensive collection of Dead concerts (over 2,000 concert recordings).  Visit the Grateful Dead Archive Home at the Internet Archive here.

CouchDB Documents and Views

I stored the meta data into a local couchdb (running on Windows XP).  The basic document I am storing is a master detail set for the ‘best’ recording for each Dead concert.  The Master part of the document contains the date, venue and other data of the concert and the detail set is an array of meta data on each song preformed during the concert.  As is traditional with couchdb, the documents are represented as JSON strings.  Here is what the document for the UR recording (1965-11-01) found on the IA:


“_id”: “1965-11-01”,tumblr_ld4jfoNw7F1qai6ym

“_rev”: “1-6ea272d20d7fc80e51c1ba53a5101ac1”,

“mx”: false,

“pubdate”: “2009-03-14”,

“sb”: true,

“venue”: “various”,

“tracks”: [


“uri”: “ 01.sbd.bershaw.5417.sbeok.shnf/Acid4_01_vbr.mp3”,

“track”: “01”,

“title”: “Speed Limit”,

“time”: “09:48”



“uri”: “;,

“track”: “02”,

“title”: “Neil Cassidy Raps”,

“time”: “02:19”




Couchdb allow the creation of views which are binary trees with user defined Keys and user defined sub sets of the document data.  If one wanted to return the venue and the tracks for each concert for a given Month and Day (across all years) the view created in couchdb would look like:

“MonthDay”: {

“map”: “function(doc){emit(doc._id.substr(5,2)+doc._id.substr(8,2),[doc.venue , doc.IAKey, doc.tracks ])}”


This view allows us to use and HTTP GET to pass in a monthday key (e.g. “1101”) and get back (as a JSON array)

the date (MMDDYY: doc._id.substr(5,2)+doc._id.substr(8,2))

the venue (doc.venue);

the AI URI of the concert (doc.IAKey); and

an array of track data (doc.tracks)

MVC URL Routing Maps

Although we could call couchdb directly from the browser, we normally work through a gateway system for security, so we will build a shim to sit between the browser and couchdb.  This allows us to flow the authentication / authorization stack separately from couchdb’s security system.  In MS MVC we can create a new HTTP endpoint for AJAX calls (our shim) is a very simple manner. Let’s create an endpoint which will look like:

http:\\{our server path}\DeadBase\MonthDay\{month}\{day}

where vacuum_routing

http:\\{our server path}\DeadBase\MonthDay\111

would request month:11 and day:01 concerts.  In MVC we can declare this routing as:




new { controller = “DeadBase”, action = “RestMonthDay”,null} );

Done.  Interestingly in MVC 3 this route definition will accept either the form:

http:\\{our server path}\DeadBase\MonthDay\{month}\{day} ; or

http:\\{our server path}\DeadBase\MonthDay?month=”??”&day=”??”

In the second form,  parameter order does not matter, but case does; quotation marks are optional and need to be dealt with internally by the action method.

either of these call will resolve to the same controller and method.

MVC Controller and Method HandlerMVC

We now need to create the shim which will be the target for the Http Endpoint.  In C# this looks like:

public class DeadBaseController : Controller

public string RestMonthDay( string month, string day )
//our shim code goes here



We able to use string as our return type because we will be calling couchdb which returns a string from of JSON by default.  As a side note if we wanted to use MVC 3 to return JSON from a native C# object our controller method takes a different form:

public JsonResult GetStateList()


List<ListItem> list = new List<ListItem>() {

new ListItem() { Value = “1”, Text = “VA” },

new ListItem() { Value = “2”, Text = “MD” },

new ListItem() { Value = “3”, Text = “DC” } };

return this.Json(list);


Our AJAX call from the browser does not need to know any of these details.  Here is one way to code the call in JavaScript using JQuery:

var url = urlBase + “?” + args;ajax


url: url,

dataType: ‘json’,

success: okCallBack,

error: nookCallBack


function okCallBack(data) {

gdData = data;

//do something useful here


function nookCallBack(xhr, ajaxOptions, errorThrown) {

alert(“ErrorText:” + errorThrown + ” ” + “Error Code:” + xhr.status);



From Handler to CouchDB in C#

Here is the rest of the generic C# code to go from the Handler to CouchDB and back.

Clean the parameters and pass the call to a generic couchDB GET caller:mvc


Format the view name and parameter into couchdb format  and pass to the low level couchDB caller:


Classic Framework HTTP code to make the HTTP GET and return the results as a string back up the call stack:


We could (and did) take our Browser code from the Ruby on Rails project above and with minimum changes call our MVC shim.

Simple clean and fun.

Occupy your mind2

Ruby On Rails, CouchDB and Me – Part 7 – JQUERY,JPlayer and HTML5   Leave a comment

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB  

Part 6 – Getting The Data Into And Out Of CouchDB

Part 7 – JQUERY,JPlayer and HTML5

We have two missions in the current post:

  • Getting Our browser side chops together: Using Javascript, DHTML, CSS Level 3, JQUERY against our JSON feed
  • Using JQuery UI and a JQuery UI Plugin: JPlayer to play songs from Internet Archive based on our JSON feed

Recall that our JSON feed provides concert and track data for concerts preformed on an arbitrary date. The top level data of the feed can be visualized like this:


The field total row refers to the total number of records in the database not the number of rows in this feed. The field offset indicates the entry point in the b-tree of the view used for this feed.  Beats me why these would be useful to the calling program!  Following this ‘header’ data we have each concert listed in Key order. The offsets and values are

  • 0: Venue
  • 1: IAKey
  • 2: Array of Track Data

We can visualize the expanded  track data array as:

image Within each offset of the array we have the fields:

  • uri     – The pointer into IA for the mp3 file
  • track – order in the concert of this track
  • title   – track name
  • time  – Track length as MM:SS

We clearly could iterate through these fields and list the concerts and tracks statically on the web page using standard ROR tools but lets be more dynamic.  Let’s first display the concert dates and venues and then display the tracks for a concert  when the user click on a concert without a round trip to the server (and ROR).


Someone once said that Javascript is the only languare that people use without knowning how. Don’t be one of those people.  The cleanest approach to learning Javascript is Crockfords: Javascript:  The Good Parts – Simple, clean Javascript fun.  (Steal This Book Here) Read this even if you ‘know’ Javascript.  If you don’t like to read, try the movie:

JQUERY: It Puts the Dynamic into DHTML.

JQuery is my favorite Javascript library.  Not necessarily the best or the most common.  Just my favorite.  JQuery accomplishes two goals very well:

  • Eliminating (or at least simplifying) DHTML coding differences between all main stream browsers (and most non-mainstream ones);
  • Simplifying and abstracting the operations necessary to drive DHTML via Javascript.

The design of JQUERY leverages the CSS 3 selector syntax so you will need to understand modern CSS selectors.

DHTML was first introduced as a Microsoft extension.  Netscape (remember Netscape?) soon followed with a similar, but not exact DHTML API of its own.  Further each of these browsers also tended to render certain edge cases differently.  And the CSS Level 3 Selectors and HTML5 specifications were coming down the pike. Both CSS3 and HTML5 are now a reality on Chrome, Foxfire and Safari (and some day, real soon) on IE9.  What to do?  John Resiq had an idea and the idea was called JQUERY.  The BASIC idea is to use the CSS Level 3 selectors to selects sets of HTML Tags and then to preform actions on those tags using a common API which would mask the differences between Browsers (and differences between versions of browsers).  Along the way JQUERY attempts to provide features not available in some browsers as long as those features would appear in the (then emergent) HTML5 specification.  Learning JQUERY is difficult only because the API is abstract and their is no BEST text on JQUERY.  Here is how John explains JQUERY:

OK So Lets See Some Code Already!

Iterating The JSON Object In Javascript And Display Using JQUERY

Please refer to our prior post for a description of how the JSON object is delivered to the page via the Rails mark up in our rb file.  Basically we had a single line:

gdData=<%=  @parm  %> ;

Let’s work with this data to display the structure on the browser screen.

We start with two EMPTY HTML tags on our page:

<div id=”concertdiv“>

<lu id=’track’></lu>

We can iterate this object  using javascript as:


where ConcertList2 is defined as:

function ConcertList2(o){
var iaURL=””
for (ndx = o[“rows”].length – 1; ndx !=-1; ndx–) {
var cdate = o[“rows”][ndx].id;
var venue = o[“rows”][ndx][“value”][0];
var itemID=ndx.toString();
var uri= iaURL +o[“rows”][ndx][“value”][1]
var href=”<a id=”href” href=””+uri+”” target=”_BLANK”>”+” – IA -” +””;
var className=’normal’;
if (ndx==0){
var item=”

” + cdate + ‘ – ‘ + venue + href + ‘

‘ ;

The javascript variable “item” for a given concert would contain a string of HTML:

<p id=’0’ class=’concert normal’>1969-08-16 – Woodstock Music <a href=’/>-IA-</p>

Note that this tag contains two class: ‘concert’ and ‘normal’.

The JQuery code line:


consists of a selector:


an action verb:


and an argument:


The selector uses CSS 3 syntax so it selects the SET of all tags with the ID of ‘concertdiv’.  In our page this is a set of one item.

Iterating through our JSON object will post-pend our items after the tag associated with concertdiv

The results  looks like this on the Browser Page:

1969-08-16 – Woodstock Music – IA –

1980-08-16 – Mississippi River Festival – IA –

1981-08-16 – MacArthur Court – University of Oregon – IA –

1987-08-16 – Town Park – IA –

1991-08-16 – Shoreline Amphitheatre – IA –

Simple, no?

We can iterate and display the tracks as:


where TrackList is defined as:

function TrackList(o,ndx){
var ndx1=0;
for(ndx1=o[“rows”][ndx][“value”][2].length – 1;ndx1!=-1;ndx1–){
var title=o[“rows”][ndx][“value”][2][ndx1].title;
var time=o[“rows”][ndx][“value”][2][ndx1].time
var track=o[“rows”][ndx][“value”][2][ndx1].track
var uri=o[“rows”][ndx][“value”][2][ndx1].uri
var item=”<li>” + track + ‘ ‘+ time +’ ‘+title+ ‘</li>’;


In this case our ‘item’ variable contains a simple HTML string like:

<li>01 03:08 Stage Announcements, Introduction</li?

The results on the browser page for a given concert will look like this:

  • 01 03:08 Stage Announcements, Introduction
  • 02 02:04 Saint Stephen >
  • 03 02:42 Mama Tried >
  • 04 00:38 High Time false start
  • 05 10:28 Stage Banter. Technical Difficulties
  • 06 19:05 Dark Star >
  • 07 06:10 High Time
  • 08 38:32 Turn On Your Lovelight
  • 09 01:52 Applause, Stage AnnouncementsWe can bind these two display routines together with two simple Javascript functions so that when we click on a concert name the page will refresh the track list without a visit back to the web server.First we will use JQUERY to BIND a function to a click event to the concert class:

    function bindClick(){
    $(‘.concert’).click(function() {

  • This bound function uses the pre-defined function ‘removehilite’ to swap

  • function removehilite(){

    and a simple inline CCS definition:

    .normal {color:#0B559B;}
    .hilite {color:#FF0000;}

    We pull this all together into a simple driver as:

                gdData=<%=  @parm  %> ;

    Got it? Good.  Now let’s use a JQUERY UI plug in allow us to play concerts from our browser page.

    JQUERY UI and JQUERY UI Widgets

    As useful as JQUERY is for dynamic web pages let’s go further use the JQuery UI system and the UI Widget: JPlayer,  to allow us to play the mp3 files which reside on Internet Archive.  JQuery UI  is a system built on top of JQUERY to allow the systematic development of UI Widgets which page developers can deploy which minimize un wanted interactions between widgets.  Further the JQuery UI system (and widgets developed within that system) can use a systematic set of theme classes whose color scheme can be generated with a nice tool called ThemeRoller.  I will not have a lot to say about these products in general (except to say they are free and work great) and you will need to visit the links noted in this paragraph to learn more about these tools.

    HTML5 Audio Tag

    HTML5 has introduced a new tag to allow playing audio without using a plug in.  There are some issues still being worked out since there is NOT common agreement yet about whether the standard should universally support MP4 or OGG files universally.  Currently MP3 is supported by all browsers which support HTML5.  Nominally the new tag looks like this:

    <audio controls=”controls”>
    <source src=”horse.mp3″ type=”audio/mp3″ />
    Your browser does not support the audio element.

    Note that the line after the “source” tags is what is rendered if your browser does NOT support HTML5.  If we replace this line with appropriate code to support a plug-in like Flash we have a control which will play well in both HTML5 and HTM4 environments.  We could develop our own solution but I have been working with JPlayer a very nice JQUERY UI widget and will use that for this post.  I like this widget because JPlayer

    • Is a JQuery widget
    • Works with JQuery Themeroller
    • Has a very active user community
    • Displays graphics and video as well as audio tracks

    I developed my final browser page in this series using a modified version of  the  ‘demo 2’ code example which is downloaded along with JPlayer.  Here is the plan:

    Display the Concert list the same way as above (with a few extras for visual appeal).  Prepare the track list in a way similar to that used above but modified to put it in a form that JPlayer can both display the tracklist for us and load the track list into JPlayer (more on this below).  We are going to modify the RoR rb file but not the underlying R0R code.  We will let the browser do the work.  I follow this strategy since our next phase of the project will allow the user to select the date for which concert data is to be displayed and played using AJAX calls in a RESTful manner (more on this next time) rather than round tripping to the server when we want to load a new date (or date range).

    What changes?

    Two new Javascript files:  one for JPlayer and one to handle preparing the track list for JPlayer to consume; and a reference to the themeroller prepared CSS file:

    <link href=”/skin/” rel=”stylesheet” type=”text/css” />

    <script type=”text/// <![CDATA[
    javascript” src=”
    // ]]>“></script>

    <script type=”text/// <![CDATA[
    javascript” src=”
    // ]]>/javascripts/jquery.jplayer.min.js“></script>

    <script type=”text/// <![CDATA[
    javascript” src=”
    // ]]>/javascripts/Playlist.js“></script>

    Playlist.js is the new file I am contributing to the mix the others are JQuery and JPlayer Javascript files.

    Using Themeroller styles I can generate a completely different style for the page and only chang the link reference to the CSS page to change how the page appears.  There are NO embedded style elements on the page.

    Rather than calling the TrackList method I am going to call a new method makePlayList when a concert is selected. This code  looks like this:

    function makePlayList(selected){

    var ndx1=0;

    var tList=new Array();

    for (ndx1 = gdData[“rows”][selected][“value”][2].length – 1; ndx1 != -1; ndx1–) {

    tList[ndx1] = buildTrack(gdData[“rows”][selected][“value”][2][ndx1].track, gdData[“rows”][selected][“value”][2][ndx1].time, gdData[“rows”][selected][“value”][2][ndx1].title, gdData[“rows”][selected][“value”][2][ndx1].uri, gdData[“rows”][selected].id.substring(0, 4));


    return tList;


    In turn, buildTrack looks like:

    var buildTrack=function(num,time,title,ref,cYear){
    var dwnldicon=’pic/download.png’;
    var    nameFMT=”$0     $1     $2<img src=’$4′>“;   
    var track=new Object();
    var name=nameFMT.replace(“$0”,num);
    return track;

    All of which is returned to JPlayer as:

    mediaPlaylist.playlist = makePlayList(selected);

    Our core Javascript code now looks like:

    var mediaPlayer=null;


    ConcertList2(gdData);  //displays the concert list at the top of the page

    bindClick();  //binds a click event on a concert to loading a new playlist to JPlayer

    mediaPlaylist = new Playlist(“1”,makePlayList(0) , //jump start with the first concert item


    ready: function() {

    mediaPlaylist.displayPlaylist();  //show the playlist



    ended: function() {



    swfPath: “javascripts”,  //jplayer option

    solution: “flash, html”, //jplayer option

    supplied: “mp3” //jplayer option



    Most of our HTML tags are stolen directly from the JPlayer ‘demo 2’ code and mostly deal with setting up the player controls (play, pause, stop, next, etc).

    OK.  The new browser page looks like this (in two Parts):

    Concert Listing Section:


    I am using icons for hyperlinks to the Internet Archive Grateful Dead Collection: and the smaller icons to link to all recordings for a given date:.   The bottom half of the screen contains the JPlayer and its user controls as well as a user selectable track list:

    image The image is associated with the track selected (there is a JPlayer bug  with the images.  If the same image is associated with two seccesive tracks the second picture will not be displayed – they are working on this). I use the selected Concert text (in this case “Madison Square Garden: 1987-09-16” as a hyperlink to the page containing the  concert recording on Internet Archive.

    These screen captures are from Chrome (Safari and FireFox look the same).  On IE 8 (and lower) HTML5 is not supported and the player reverts to Flash.  The track list on IE8 is not as pretty and is no longer selectable (although the player controls still work):

    image The Sad IE 8 Track Display

    What more can I say?

REST, Ruby On Rails, CouchDB and Me – Part 6 Getting The Data Into And Out Of CouchDB   1 comment

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 7 – JQUERY,JPlayer and HTML5

 [This just in: couchDB and Microsoft MVC 3: here]

Our mission in today’s post: 

  • Serialize the Concert object to a JSON string (i.e. a JSON document);
  • Do POST requests to insert  a JSON document for each concert into the couchdb database;
  • Create couchDB views to allow optimized data retrieval; and
  • Create a couchDB view to optimize retrieval recordings for all years for an arbitrary Month and Day (this duplicates the data provided by the “Grateful Dead Shows on This Day In History” selection in the Internet Archive.

Our last post outlined the formal structure of our document and the JSON format of the document we intend to post to our couchdb database.  Also in that post we outlined the RUBY code used to collect and clean the data we accessed from the Internet Archive.  In order to place our concert information into our couchdb database we need to transform our RUBY objects into proper JSON format and then call an HTTP POST to place our resource (document) into the database.  Why a POST rather than a PUT command you ask?  Better to ask a true RESTafarian.  In REST PUT is used to create a resource (in couchdb terms this is the database) and POST is used to modify a resource (in our case create a document within the database – which modifies the database I guess).  Hmmmm.

Why JSON?  Ask These Guys:


To transform a RUBY object into JSON format requires the use of a RUBY GEM: json.  This GEM comes in two flavors, a pure RUBY form (slower) and a C form (faster, but it requires that you have a working RUBY development stack, GNU C compiler etc. deployed on your local machine).  Since we are happy with the speed of the pure RUBY form.  The downside of this GEM is that it will not serialize pure RUBY objects as is.  To do that you need to code custom serialization handlers. By default, the JSON serializer will work directly on primitive (native) types (strings, ints, what have you), and the simple structures: arrays and hash tables.  So our first task will be to transform our data into hash table format.  Assume a RUBY array of concert objects (@selectList) and a parallel array of data for each track (tracks). In pseudo code we have:

@selectList.each do |recording|



Our method makeJSON takes the RUBY objects and returns a JSON string as:

def makeJSONFull(recording,tracks)
tracks.each do |t|
tItem[“track”]=t.track #this could be derived as offset – 1
return JSON.generate(tList)

where JSON.generate(tList) is the JSON GEM method generate (to serialize) and tList is a Hash containing only primitive types (strings) and a Hash list of track data.  The track data contains only primitive types (strings).  We pass the returned string (which contains our data in JSON format) to our routine PostRecording(jsonString).  This Routine looks like:

def PostRecording(jsonString)
responseBody=open(uri,:method=> :post, :body => jsonString,”Content-Type” => “application/json”).read
puts ‘POST Response Success: ‘ + responseBody
rescue OpenURI::HTTPError => the_error
puts ‘##########Post Response Error: ‘ +[0]


OK this routine depends on the RUBY GEM: rest-open-uri .  This GEM is a single file of RUBY code which depends on the RUBY components: uri and net/http (these are part of the base RUBY system).  “rest-open-uri” extends the base RUBY HTTP components for the HTTP GET verb  and extends them to include all of the HTTP verbs (PUT, POST, DELETE and HEAD)  necessary to implement any REST system.  The open method, as used here:

open(uri,:method=> :post, :body => jsonString,”Content-Type” => “application/json”).read

uri: the address of our couchdb database

Content-Type: our MIME type for this call (“application/json”)

body: our document

method: the HTTP verb POST
If this POST command fails we want to capture the error and (for this example) display the error:

rescue OpenURI::HTTPError => the_error
puts ‘##########Post Response Error: ‘ +[0]

Actual error recovery is left as an exercise for the reader.


OK, we now have data in our database.  In our case this is is 2,014 documents (i.e. 2,014 separate Grateful Dead concerts).  Using Futon we see the database as:


In Futon, we can look at our data:


And we can drill in on an individual document (for example the primordial recording on 1965-11-01):



Of course what we need to able to return our document with a simple HTTP GET request.  Further we will want to return only those fields we will actually need from a document (i.e. we need to shape our data to minimize bandwidth).  We do that in couchdb with a view which is defined in a design document.  We can define multiple views in a single design document and can have multiple design documents in a given database. 

For our first view lets set the task to return all documents which occurs on a given month and day (regardless of year).  This is the filter. We want to shape our data to return only a subset of fields: the venue, the IAKey and the document tracks.  The key for this view will be the month+day (in MMDD format).  Our design document might look like:

“_id”: “_design/basic”,
“_rev”: “19-fd2c9b34d2536ce1f187ab2d4e5413de”,
“views”: {
“MonthDay”: {
“map”: “function(doc){emit(doc._id.substr(5,2)+doc._id.substr(8,2),[doc.venue , doc.IAKey, doc.tracks ])}”


What this view (called MonthDay) does is maps each document (doc) into a key (doc._id.substr(5,2)+doc._id.substr(8,2)) and an array of return fields:

[doc.venue , doc.IAKey, doc.tracks ]

Note that doc.tracks itself returns an array of tracks.

Using Futon to test our work and find the results for all New Years Day Grateful Dead Concerts.  The first document looks like:



(additional lines are omitted from this screen snap).


As an HTTP Get Verb we would write:

This command users the


The View: _design/basic/_view/MonthDay

startkey: this is a key word parameter field which tells couchdb to go to the b-tree which represents the MonthDay view and finds the first Key match and continues through the tree returning all Key matches.

If we wanted a range of Keys (say all holiday season Concerts) we can use the startkey and the endkey parameters:

startkey” gives us a starting key in the b-tree and returns all documents until (and including) the b-tree key defined by endkey.



For more details see the Definitive Guide to CouchDB chapter: Finding Your Data With Views.  The key to simple view definition is defining an appropriate lookup key and defining what data to return.  We can define a simple listing view as:

"Jump": { "map": "function(doc){emit(doc._id,'['+doc.venue+'] ' +''+doc.IAKey)}" },
This returns a lookup key equal to the original key used in the database (doc._id) and the data as venue (doc.venue)
and a URL for the concert recording (''+doc.IAKey). Now if we want to return all
concerts in August, 1969 we can issue the GET:

This will return the rows as:{“total_rows”:2013,”offset”:156,”rows”:[ {“id”:”1969-08-02″,”key”:”1969-08-02″,”value”:”[Family Dog at the Great Highway];}, {“id”:”1969-08-03″,”key”:”1969-08-03″,”value”:”[Family Dog at the Great Highway];}, {“id”:”1969-08-16″,”key”:”1969-08-16″,”value”:”[Woodstock Music];}, {“id”:”1969-08-21″,”key”:”1969-08-21″,”value”:”[Aqua Theater];}, {“id”:”1969-08-23″,”key”:”1969-08-23″,”value”:”[Pelletier Farm];}, {“id”:”1969-08-28″,”key”:”1969-08-28″,”value”:”[Family Dog at the Great Highway];}, {“id”:”1969-08-29″,”key”:”1969-08-29″,”value”:”[Family Dog at the Great Highway];}, {“id”:”1969-08-30″,”key”:”1969-08-30″,”value”:”[Family Dog at the Great Highway]


Get it? Good.

What Would Dweelze Do?

Now let’s return to RUBY and write a web page which will display concert and track data for any given day in Grateful Dead History. Using the view:MonthView as defined above:

“MonthDay”: {
“map”: “function(doc){emit(doc._id.substr(5,2)+doc._id.substr(8,2),[doc.venue , doc.IAKey, doc.tracks ])}”

Lets return data using RUBY for concerts for all years for the Month and day of todays Date.  In Ruby we need to define a controller and a erb HTML file to accomplish this.  Lets say our page will be called player.  Our controller might look like:

require ‘rest-open-uri’

require ‘json’

class PlayerController < ApplicationController
def player[5..-1]
def returnJSONString(url)
open (url) do |x|
x.each_line do |y|
return jsonString;


This returns the JSON returned from couchdb unaltered as a parameter to the erb page for player (player.html.erb).  If we wanted to work with the data within RUBY we would need to change the JSON back into a RUBY format by calling:


For our purposes, however, we want to pass the data directly to the browser and will process the data using JavaScript this will in almost all cases be faster than processing the JSON back to RUBY and then formatting the data on the erb page.  In our first pass the route map for our page will be simple:

map.connect ‘player’, :controller => ‘player’, :action => ‘player’

and our first pass at an HTML page will look like this:

<meta http-equiv=”Content-Type” content=”text/html; charset=iso-8859-1″ />
<title>Player: On This Day In Grateful Dead History</title>
<script type=”text/javascript”>
gdData=<%=  @parm  %> ;
<h3>On This Day In Grateful Dead History</h3>

This page does absolutely NOTHING…  Except gets our JSON data to the Javascript on the browser page.  And that, dear reader is as far as we need to go with today’s post.



REST, Ruby On Rails, CouchDB and Me – Part 5 Getting The Data Ready for CouchDB   3 comments

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 6 – Getting The Data Into And Out Of CouchDB

Part 7 – JQUERY,JPlayer and HTML5

The Internet Archive and the Grateful Dead

The Internet Archive (IA)  is a 501( c ) 3 non-profit corporation dedicated to disseminating of public domain digital artifacts.  This includes such items as books, videos of imagesall kinds (TV Shows, shorts and feature films) audio recordings of all kinds (musical and spoken word).    One of their most popular projects is a truly huge collection of Grateful Dead concert recordings.  One 418910of the most visited pages of the Internet Archive’s web site is the “Grateful Dead Shows on This Day In History” page, which lists and allows play back of any Grateful Dead concerts they have in their collection of whatever day you visit the site.  Did I say that there IA has a large number of Grateful Dead concert recordings?   The are recordings of around 2,000 separate concert dates.  Any given concert may be represented by multiple recordings from a variety of sources: soundboard, audience recordings using professional and amateur equipment.  Original media ranges from cassette tapes, to 7 inch reel to reel and digital media.  If you are going to be working with the IA Grateful Dead collection please review the FAQ on the collections policy notes as well as the special notes here.

IA uses a very sophisticated data repository of meta data and an advanced query engine to allow retrieving both the meta data and the recordings.  Meta data can be retrieved directly using the “advanced search” engine.  On the day I started this post I visited IA and used the “Grateful Dead Shows on This Day In History”  The query returned data on 8  concerts (and 25 recordings of those 8 concerts).  A partial page image is given below:


Clicking on any of these entries moves us to a second screen in order to play the concert recording.  A screen shot of the playback screen looks like this:


Looking closer at the second screen we see the music player:


Can we design a faster, simpler and better looking interface into the Grateful Dead Archive?  Can couchDB help us? gratefuldead_20070108135140 The the first question will be addressed in a later post. This current post will look at how couchDB can  help us achieve a faster more efficient information system.  IA does a super job of serving up the music files on demand – there is no reason to duplicate their storage system.   However, IA is fairly slow to serve up meta data (such as the results of the “Grateful Dead Shows on This Day In History” query) Abstracting the IA metadata into a CouchDB database will allow us to serve up the meta data much faster than the IA query system.

Getting Data Into CouchDB

Our basic plan for using RUBY to get data from IA and into couchdb consists of:

  1. Prepare a URL query request to get the basic recording meta data (not the track meta data);
  2. Submit A GET request to IA using the URL query;
  3. Parse the XML returned to get at the individual Concert meta data fields;ruby
  4. Select the BEST recording for any given concert (more on this below);
  5. Prepare a URL to request track listing XML file based on the IA Primary Key of the selected concert recording;
  6. Submit a GET request to IA;
  7. Parse the XML returned to get at the individual track meta data fields;
  8. Create a ruby object which can be safely serialized to JSON;
  9. Serialize the object to a JSON string (i.e. a JSON document);
  10. Do a POST request to insert a JSON document for each concert into the couchdb database;
  11. Create couchDB views to allow optimized data retrieval; and
  12. Create a couchDB view to optimize retrieval recordings for all years for an arbitrary Month and Day (this duplicates the data provided by the “Grateful Dead Shows on This Day In History” selection in the Internet Archive.

Note we are not accessing nor storing the actual music files.  Before discussing how this plays out in practice lets define our JSON couchDB document.  We will cover items one through eight in this post.  We turn to items nine through twelve in the next post.

CouchDB Document Schema

CouchDB databases start with documents as the basic unit.  Typically a couchdb based application will have one database holding one or more variant document types.  There will be one or more design documents which provide multiple views, show functions and map functions as necessary to facilitate the application.  We will use a single document which will represent a abstract of the meta data contained in IA for individual recordings ( we are going to select the one ‘best’ recording per concert).  Our couchdb database will hold one document per concert.   The tracks (actually the track meta data will be stored as arrays within the  concert document).  We will populate the couchdb database in a single background session  pulling meta data (NOT THE MUSIC FILES) from IA and we will  include the IA publication date in the document so we can update our database when (if) new recordings are added to IA in the Grateful Dead collection.

Here are the document fields  which we will use:



Typical Values

_id couchdb primary key.  We will use a natural key: a string representation of the concert date. 1969-07-04
_rev revision number provided by couchDB 1-6ea272d20d7fc80e51c1ba53a5101ac1
IAKey Internet Archive Key for this Recording gd1965-11-01.sbd.bershaw.5417.sbeok.shnf
pubdate Internet Archive Date When Recording was published to the web 2009-03-14
venue Wherethe concert took place Fillmore East Ballroom
description free text describing the concert – provided by the uploader Neal Cassady & The Warlocks 1965 1. Speed Limit studio recording/Prankster production tape circa late 1965
cm boolean – Recording by Charlie MIller  – used to select the ‘best’ recording true or false
sb boolean – Recording was made from a soundboard – used to select the ‘best’ recording true or false
mx boolean – A matrix style recording – used to select the ‘best’ recording true or false
tracks an array of meta data for each track of the recording see below

Each track in the tracks array  formally looks like:

Field Notes Typical value
IAKey The Internet Archive key for this track.  This key is unique within a given recording  (see the IAKey above) gd1965-11-01.sbd.bershaw.5417.sbeok.shnf/Acid4_01_vbr
track track number 02
title the song title Cold Rain and Snow
time the length of the track in minutes and seconds 09:48

Let call everything except the tracks our BASE data and the track data our TRACK data.

We insert documents to the database (using an HTTP post) as JSON so a typical document would look like this in JSON format:

“_id”: “1966-07-30”,
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16”,
“pubdate”: “2008-09-22”,
“venue”: “P.N.E. Garden Auditorium”,
“description”: “Set 1 Standing On The Corner I Know You Rider Next Time You See Me”,
“cm”: false,
“sb”: true,
“mx”: false,
“tracks”: [
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t01_vbr”,
“track”: “01”,
“title”: “Standing On The Corner”,
“time”: “03:46”
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t02_vbr”,
“track”: “02”,
“title”: “I Know You Rider”,
“time”: “03:18”
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t03_vbr”,
“track”: “03”,
“title”: “Next Time You See Me”,
“time”: “04:00”


Hacking The Internet Archive: Getting Data From IA and Into CouchDB:

Here is the URL to obtain the page for “Grateful Dead Shows on This Day In History”:

This is a simple GET request with a query for the IA “collection” of Grateful Dead items filtered on the date string: 19??-08-04 and sorted descending by the concert date.  This get returns an HTML page.  This type of interface is known as an HTTP RPC interface.  RPC (Remote Procedure Call) interfaces are not pure REST interfaces but they are somewhat RESTful inthat they allow us to make a data request using a late bound, loosely coupled HTTP call.  See here and here for more theoretic background on RPC calls.  IA provides an  “Advanced Search” function will allow us to return data for an arbitrarily complex query in one of several different data formats other than HTML.  We selected XML as the format  for our work here.  XML is the traditional format for HTTP RPC but other formats may be better for certain applications.  Unfortunaely IA does not directly document the format of the RPC data request but they do provide a QEB page to build the request.  The page looks like this:


Using this screen we can compose a HTTP RPC request which will mimic the URL produced by “Grateful Dead Shows on This Day In History” and with a little brain effort and experimentation we can  understand how to compose requests without using the QBE screen.  By feeding the RPC request query back into advanced search and selecting XML as an output format as shown here:


we produce both an example of the HTTP RPC request which will return our desired data in our desired format.  Thus we generate a HTMLEncoded RPC request like:


where we replace #{_dateString} with a date string like 19??-08-08.  Of course to one years worth of data we could use a data string like: 1968-??-??.  It is a simple extension of the query languge to replace the singular date request: date%3A#{_dateString} with a date range.

which returns Grateful Dead recording data for all years of the last century which were recorded on 08-08.  The XML output returned to the caller looks like:


In a more graphic format the output looks like:


Within Ruby we will need to make the HTTP Get request with a desired date range, transform the body of the return request into an XML document and use XPATH to parse the XML and retrieve the meta data values for each recording (see below).  The is NOTHING inherently wrong with this RPC interface.  It is flexible and allows us to select only the data fields we are interested in and return data only for the dates we wish.  Since RUBY supports neither native consumption of JSON nor XML. So the XML format of the data is as good as any other and numerous tools exist in RUBY to manipulate XML data.  I which RUBY had a more native interface for JSON but it does not.

At this point, we do not have meta-data about individual tracks in a given recording.  It turns out that we can get this data but not through an HTTP RPC request.  It turns our, dear reader, that if we have the IAKey for the recording we can obtain an xml file with track meta data by making the following call:{IAKEY}/{IAKEY}_files.xml.

This file contains assorted XML data, it varies by what formats IA makes available the individual tracks via a 309 (HTTP redirect).  This is not an RPC call so we are far from a RESTful interface here.  We do not have control over the fields or range of the data included in this call.  It is all or nothing.  But at least the XML format is simple to mainipulate.  With the IAKey in hand for an individual recording and making some reasonable guesses we can parse the XML file of track data and compose the TRACKS array for our couchDB document using XPATH. A single entry for the high bit rate mp3 track recording looks like:

<file name=”gd89-08-04d2t01_vbr.mp3″ source=”derivative”>
<creator>Grateful Dead</creator>
<album>1989-08-04 – Cal Expo Amphitheatre</album>
<format>VBR MP3</format>

Note that we have the IAKey for the track (gd89-08-04d2t01 ) as part of the name attribute.


Using a background Ruby Process to Read the Data

The following RUBY GEMS are required to complete this step:

rest-open-uri : This GEM extends open-uri to support POST, PUT and DELTE HTTP command

json : This GEM handles serialization and de-serialization of a limited subset of RUBY into JSON strings.

From the standard RUBY library we will also be using

rexml : This GEM creates XML documents from XML Strings and supports XPATH which we will use to read the XML documents from IA

Our first step is to extract the get the the data via HTTP and parse the XML file returned to find individual recordings.  There are  (in most cases) be multiple recordings per concert (per date) and we want to retain for the database only the “best”.

In pseudo Ruby code:

require ‘rest-open-uri’

require ‘rexml/document’

 def initialize(_dateString)

#HTTP GET, create a string of the response body and transform the string into an XML node tree
#mind the screen wrap and html Encoding:

open (@uri) do |x|       #build a representation of the response body as a string
x.each_line do |y|
if xmlString==”
puts ‘No String Returned From Internet Archive’
@IAXMLDocument=  #turn the string into an XML document
end #open

Now we need  to loop through the XML document and pull out each ‘doc’ section using XPATH and read each doc section for the meta data for that recording.

#use XPATH and find each response/result/doc node and yield

def get_recordings(document)

document.elements.each(‘response/result/doc’)do |doc|

yield doc

#get the XML document and yield

def get_record(xmldoc)

get_recordings(xmldoc) do |doc|
yield doc

#general purpose XPATH method to extract element.text (the metadata values) for arbitrary XPATH expressions

def extract_ElmText(doc,xpath)

doc.elements.each(xpath) { |element|  return element.text }

def worker(xmldoc)

#main loop


get_recordings(xmldoc) do |doc|
_pubdate=extract_ElmText(doc,’date[@name=”publicdate”]’)[0..9]  #there is a bug here , corrected by lines below

if (_pubdate.length==0)
puts “#No Publication Date: {_date} #{_title}”

#make a RUBY class object to hold one recording _date, _description, _tracklist, _title, _keylist, _pubdate,_uri

#save the recording class objects in an array



In this code the ‘worker’ method calls the helper methods to:

0) Do the HTTP  get to make the RPC request and read the response body one line at a time and

1) transform the lines into a single string and convert ( the string into an XML document for processing by XPATH

2) loop through the doc nodes of the xml tree and extract the values of the  meta data fields

3) the meta data values are passed to an RUBY class ( GDRecording) which holds this meta data for later processing,

4 finally we temporarily store the recordings in an array for the next processing step.

Note that these routines work  whether the query returns a single day (with multiple recordings) or multiple days or even the whole dataset!  What is essencial is that we process the file as N ‘doc’ sub trees (which represent single recordings) and have recording date (at least) to group our data and extract the ‘best’ recording within each date group.

Our next step will be group the recordings by day (i.e. concert) and provide our own filter to select a single ‘best’ recording for each concert.

Shake and Bake:  Finding A ‘Best’ Recording.


What is the best Grateful Dead concert.  Why the first one I went to of course.  Just ask any Deadhead and you will probably get the same answer.  But what is the best recording of any given GD concert? My approach is very simple.

  • Most recent posted recordings are better than older recordings. (least important criteria)
  • Soundboard recordings are better than audience recordings.
  • Matrix recordings are even better.
  • Recordings mixed by Charlie Miller are best of all. (most important criteria)

Well these are MY criteria.  What ever criteria as long as they are hieratical  you can code the select in a very trivial manner.  If we have a field in each recording for the concert date and a field for each selection criteria (we derive these from the keywords field in IA) we sort the recordings by date and then by each of the criteria from most important (Charlie Miller in may case) to least important (date posted) and then select the first recording in sort order within each date group. On Ruby the sort of our list of recordings is trivial to code and easy to maniuplate (add new criteria or change the priority of criteria). The sort statement looks like this:

@list.sort! { |a,b| ( ) <=> ( )   }

Once sorted we create a list of best recordings as:

def newSelect
if  @list==nil or @list.count==0
puts ‘No Recordings.’
foreach do |rec|
if _dateGroup!
if dateGroup!=nil
if dateGroup!=nil

Note that is code is not only simple but it is independent of the selection criteria we are using.

Now that we have a list of recordings we are interested in,  we can get the XML file of track meta data using the IAKey discussed above and making a simple GET call and parsing the XML file for the meta data for each.  Much of the code used duplicates the XML code presented above so we need not reproduce all the code except to show a short section which uses a slightly different XML XPATH syntax:

open (filesURI) do |x| x.each_line do |y| xmlString=xmlString+y end end‘files’) do |doc|

doc.elements.each(‘file’) {|file_elm|
file_elm.elements.each(‘title’) { |element| title=element.text }
file_elm.elements.each(‘track’) { |element| trackString=element.text}
file_elm.elements.each(‘length’) { |element| lengthString=element.text}

{omitted code}


Okay now we have a (hash) list of recording meta data,  each item of which contains a (hash) list of track meta data for that recording.  In our next post we will leave this unRestful world behind and move into the RESTful world of couchDB when we:

  • Serialize the object to a JSON string (i.e. a JSON document);
  • Do POST requests to insert  a JSON document for each concert into the couchdb database;
  • Create couchDB views to allow optimized data retrieval; and
  • Create a couchDB view to optimize retrieval recordings for all years for an arbitrary Month and Day (this duplicates the data provided by the “Grateful Dead Shows on This Day In History” selection in the Internet Archive.

cat on fancy couch

%d bloggers like this: