Archive for the ‘Hypertext Transfer Protocol’ Tag

Microsoft MVC 3 and CouchDB – Low Level Get Calls   1 comment

I have written elsewhere on couchdb on Windows and using Ruby on Rails to interface to this system.  These posts can be found here:couchdb

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 6 – Getting The Data Into And Out Of CouchDB

Part 7 – JQUERY,JPlayer and HTML5

In my work life I work in a Microsoft shop which for us means Microsoft servers for the back end and (mostly) pure HTML/AJAX frontends.  We are transitioning towards using Microsoft MVC 3 to provide HTTP end points for our AJAX calls.  Here are some notes from my POC work in this area.  My couch data consists of documents describing Grateful Dead concerts stored on the great site Internet Archive, if you have never visited the Internet Archive, please do so.  I back engineered the meta data of IA’s extensive collection of Dead concerts (over 2,000 concert recordings).  Visit the Grateful Dead Archive Home at the Internet Archive here.

CouchDB Documents and Views

I stored the meta data into a local couchdb (running on Windows XP).  The basic document I am storing is a master detail set for the ‘best’ recording for each Dead concert.  The Master part of the document contains the date, venue and other data of the concert and the detail set is an array of meta data on each song preformed during the concert.  As is traditional with couchdb, the documents are represented as JSON strings.  Here is what the document for the UR recording (1965-11-01) found on the IA:

{

“_id”: “1965-11-01”,tumblr_ld4jfoNw7F1qai6ym

“_rev”: “1-6ea272d20d7fc80e51c1ba53a5101ac1”,

“mx”: false,

“pubdate”: “2009-03-14”,

“sb”: true,

“venue”: “various”,

“tracks”: [

{

“uri”: “http://www.archive.org/download/gd1965-11- 01.sbd.bershaw.5417.sbeok.shnf/Acid4_01_vbr.mp3”,

“track”: “01”,

“title”: “Speed Limit”,

“time”: “09:48”

},

{

“uri”: “http://www.archive.org/download/gd1965-11-01.sbd.bershaw.5417.sbeok.shnf/Acid4_02_vbr.mp3”,

“track”: “02”,

“title”: “Neil Cassidy Raps”,

“time”: “02:19”

}

]

}

Couchdb allow the creation of views which are binary trees with user defined Keys and user defined sub sets of the document data.  If one wanted to return the venue and the tracks for each concert for a given Month and Day (across all years) the view created in couchdb would look like:

“MonthDay”: {

“map”: “function(doc){emit(doc._id.substr(5,2)+doc._id.substr(8,2),[doc.venue , doc.IAKey, doc.tracks ])}”

}

This view allows us to use and HTTP GET to pass in a monthday key (e.g. “1101”) and get back (as a JSON array)

the date (MMDDYY: doc._id.substr(5,2)+doc._id.substr(8,2))

the venue (doc.venue);

the AI URI of the concert (doc.IAKey); and

an array of track data (doc.tracks)

MVC URL Routing Maps

Although we could call couchdb directly from the browser, we normally work through a gateway system for security, so we will build a shim to sit between the browser and couchdb.  This allows us to flow the authentication / authorization stack separately from couchdb’s security system.  In MS MVC we can create a new HTTP endpoint for AJAX calls (our shim) is a very simple manner. Let’s create an endpoint which will look like:

http:\\{our server path}\DeadBase\MonthDay\{month}\{day}

where vacuum_routing

http:\\{our server path}\DeadBase\MonthDay\111

would request month:11 and day:01 concerts.  In MVC we can declare this routing as:

routes.MapRoute(

“MyMonthDay”,

“{controller}/{action}/{month}/{day}”, 

new { controller = “DeadBase”, action = “RestMonthDay”,null} );

Done.  Interestingly in MVC 3 this route definition will accept either the form:

http:\\{our server path}\DeadBase\MonthDay\{month}\{day} ; or

http:\\{our server path}\DeadBase\MonthDay?month=”??”&day=”??”

In the second form,  parameter order does not matter, but case does; quotation marks are optional and need to be dealt with internally by the action method.

either of these call will resolve to the same controller and method.

MVC Controller and Method HandlerMVC

We now need to create the shim which will be the target for the Http Endpoint.  In C# this looks like:

public class DeadBaseController : Controller

public string RestMonthDay( string month, string day )
{
//our shim code goes here

      }

    }

We able to use string as our return type because we will be calling couchdb which returns a string from of JSON by default.  As a side note if we wanted to use MVC 3 to return JSON from a native C# object our controller method takes a different form:

public JsonResult GetStateList()

{

List<ListItem> list = new List<ListItem>() {

new ListItem() { Value = “1”, Text = “VA” },

new ListItem() { Value = “2”, Text = “MD” },

new ListItem() { Value = “3”, Text = “DC” } };

return this.Json(list);

}

Our AJAX call from the browser does not need to know any of these details.  Here is one way to code the call in JavaScript using JQuery:

var url = urlBase + “?” + args;ajax

$.ajax({

url: url,

dataType: ‘json’,

success: okCallBack,

error: nookCallBack

});

function okCallBack(data) {

gdData = data;

//do something useful here

}

function nookCallBack(xhr, ajaxOptions, errorThrown) {

alert(“ErrorText:” + errorThrown + ” ” + “Error Code:” + xhr.status);

}

}

From Handler to CouchDB in C#

Here is the rest of the generic C# code to go from the Handler to CouchDB and back.

Clean the parameters and pass the call to a generic couchDB GET caller:mvc

image

Format the view name and parameter into couchdb format  and pass to the low level couchDB caller:

image

Classic Framework HTTP code to make the HTTP GET and return the results as a string back up the call stack:

image

We could (and did) take our Browser code from the Ruby on Rails project above and with minimum changes call our MVC shim.

Simple clean and fun.

Occupy your mind2

REST, Ruby On Rails, CouchDB and Me – Part 5 Getting The Data Ready for CouchDB   3 comments

Part 0 – REST, Ruby On Rails, CouchDB and Me

Part 1 – Ruby, The Command Line Version

Part 2 – Aptana IDE For Ruby

Part 3 CouchDB Up and Running on Windows

Part 4 – CouchDB, Curl and RUBY

Part 5 – Getting The Data Ready for CouchDB

Part 6 – Getting The Data Into And Out Of CouchDB

Part 7 – JQUERY,JPlayer and HTML5

The Internet Archive and the Grateful Dead

The Internet Archive (IA)  is a 501( c ) 3 non-profit corporation dedicated to disseminating of public domain digital artifacts.  This includes such items as books, videos of imagesall kinds (TV Shows, shorts and feature films) audio recordings of all kinds (musical and spoken word).    One of their most popular projects is a truly huge collection of Grateful Dead concert recordings.  One 418910of the most visited pages of the Internet Archive’s web site is the “Grateful Dead Shows on This Day In History” page, which lists and allows play back of any Grateful Dead concerts they have in their collection of whatever day you visit the site.  Did I say that there IA has a large number of Grateful Dead concert recordings?   The are recordings of around 2,000 separate concert dates.  Any given concert may be represented by multiple recordings from a variety of sources: soundboard, audience recordings using professional and amateur equipment.  Original media ranges from cassette tapes, to 7 inch reel to reel and digital media.  If you are going to be working with the IA Grateful Dead collection please review the FAQ on the collections policy notes as well as the special notes here.

IA uses a very sophisticated data repository of meta data and an advanced query engine to allow retrieving both the meta data and the recordings.  Meta data can be retrieved directly using the “advanced search” engine.  On the day I started this post I visited IA and used the “Grateful Dead Shows on This Day In History”  The query returned data on 8  concerts (and 25 recordings of those 8 concerts).  A partial page image is given below:

image

Clicking on any of these entries moves us to a second screen in order to play the concert recording.  A screen shot of the playback screen looks like this:

image

Looking closer at the second screen we see the music player:

image

Can we design a faster, simpler and better looking interface into the Grateful Dead Archive?  Can couchDB help us? gratefuldead_20070108135140 The the first question will be addressed in a later post. This current post will look at how couchDB can  help us achieve a faster more efficient information system.  IA does a super job of serving up the music files on demand – there is no reason to duplicate their storage system.   However, IA is fairly slow to serve up meta data (such as the results of the “Grateful Dead Shows on This Day In History” query) Abstracting the IA metadata into a CouchDB database will allow us to serve up the meta data much faster than the IA query system.

Getting Data Into CouchDB

Our basic plan for using RUBY to get data from IA and into couchdb consists of:

  1. Prepare a URL query request to get the basic recording meta data (not the track meta data);
  2. Submit A GET request to IA using the URL query;
  3. Parse the XML returned to get at the individual Concert meta data fields;ruby
  4. Select the BEST recording for any given concert (more on this below);
  5. Prepare a URL to request track listing XML file based on the IA Primary Key of the selected concert recording;
  6. Submit a GET request to IA;
  7. Parse the XML returned to get at the individual track meta data fields;
  8. Create a ruby object which can be safely serialized to JSON;
  9. Serialize the object to a JSON string (i.e. a JSON document);
  10. Do a POST request to insert a JSON document for each concert into the couchdb database;
  11. Create couchDB views to allow optimized data retrieval; and
  12. Create a couchDB view to optimize retrieval recordings for all years for an arbitrary Month and Day (this duplicates the data provided by the “Grateful Dead Shows on This Day In History” selection in the Internet Archive.

Note we are not accessing nor storing the actual music files.  Before discussing how this plays out in practice lets define our JSON couchDB document.  We will cover items one through eight in this post.  We turn to items nine through twelve in the next post.

CouchDB Document Schema

CouchDB databases start with documents as the basic unit.  Typically a couchdb based application will have one database holding one or more variant document types.  There will be one or more design documents which provide multiple views, show functions and map functions as necessary to facilitate the application.  We will use a single document which will represent a abstract of the meta data contained in IA for individual recordings ( we are going to select the one ‘best’ recording per concert).  Our couchdb database will hold one document per concert.   The tracks (actually the track meta data will be stored as arrays within the  concert document).  We will populate the couchdb database in a single background session  pulling meta data (NOT THE MUSIC FILES) from IA and we will  include the IA publication date in the document so we can update our database when (if) new recordings are added to IA in the Grateful Dead collection.

Here are the document fields  which we will use:

Field

Notes

Typical Values

_id couchdb primary key.  We will use a natural key: a string representation of the concert date. 1969-07-04
_rev revision number provided by couchDB 1-6ea272d20d7fc80e51c1ba53a5101ac1
IAKey Internet Archive Key for this Recording gd1965-11-01.sbd.bershaw.5417.sbeok.shnf
pubdate Internet Archive Date When Recording was published to the web 2009-03-14
venue Wherethe concert took place Fillmore East Ballroom
description free text describing the concert – provided by the uploader Neal Cassady & The Warlocks 1965 1. Speed Limit studio recording/Prankster production tape circa late 1965
cm boolean – Recording by Charlie MIller  – used to select the ‘best’ recording true or false
sb boolean – Recording was made from a soundboard – used to select the ‘best’ recording true or false
mx boolean – A matrix style recording – used to select the ‘best’ recording true or false
tracks an array of meta data for each track of the recording see below

Each track in the tracks array  formally looks like:

Field Notes Typical value
IAKey The Internet Archive key for this track.  This key is unique within a given recording  (see the IAKey above) gd1965-11-01.sbd.bershaw.5417.sbeok.shnf/Acid4_01_vbr
track track number 02
title the song title Cold Rain and Snow
time the length of the track in minutes and seconds 09:48

Let call everything except the tracks our BASE data and the track data our TRACK data.

We insert documents to the database (using an HTTP post) as JSON so a typical document would look like this in JSON format:

{
“_id”: “1966-07-30”,
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16”,
“pubdate”: “2008-09-22”,
“venue”: “P.N.E. Garden Auditorium”,
“description”: “Set 1 Standing On The Corner I Know You Rider Next Time You See Me”,
“cm”: false,
“sb”: true,
“mx”: false,
“tracks”: [
{
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t01_vbr”,
“track”: “01”,
“title”: “Standing On The Corner”,
“time”: “03:46”
},
{
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t02_vbr”,
“track”: “02”,
“title”: “I Know You Rider”,
“time”: “03:18”
},
{
“IAKey”: “gd1966-07-30.sbd.GEMS.94631.flac16/gd1966-07-30.d1t03_vbr”,
“track”: “03”,
“title”: “Next Time You See Me”,
“time”: “04:00”
}
]
}

2654190796_c0a810ec44

Hacking The Internet Archive: Getting Data From IA and Into CouchDB:

Here is the URL to obtain the page for “Grateful Dead Shows on This Day In History”:

http://www.archive.org/search.php?query=collection:GratefulDead%20date:19??-08-04&sort=-/metadata/date

This is a simple GET request with a query for the IA “collection” of Grateful Dead items filtered on the date string: 19??-08-04 and sorted descending by the concert date.  This get returns an HTML page.  This type of interface is known as an HTTP RPC interface.  RPC (Remote Procedure Call) interfaces are not pure REST interfaces but they are somewhat RESTful inthat they allow us to make a data request using a late bound, loosely coupled HTTP call.  See here and here for more theoretic background on RPC calls.  IA provides an  “Advanced Search” function will allow us to return data for an arbitrarily complex query in one of several different data formats other than HTML.  We selected XML as the format  for our work here.  XML is the traditional format for HTTP RPC but other formats may be better for certain applications.  Unfortunaely IA does not directly document the format of the RPC data request but they do provide a QEB page to build the request.  The page looks like this:

image

Using this screen we can compose a HTTP RPC request which will mimic the URL produced by “Grateful Dead Shows on This Day In History” and with a little brain effort and experimentation we can  understand how to compose requests without using the QBE screen.  By feeding the RPC request query back into advanced search and selecting XML as an output format as shown here:

clip_image001

we produce both an example of the HTTP RPC request which will return our desired data in our desired format.  Thus we generate a HTMLEncoded RPC request like:

@uri=”http://www.archive.org/advancedsearch.php?q=collection%3AGratefulDead+date%3A#{_dateString}&fl%5B%5D=avg_rating&fl%5B%5D=date&fl%5B%5D=description&fl%5B%5D=downloads&fl%5B%5D=format&fl%5B%5D=identifier&fl%5B%5D=publicdate&fl%5B%5D=subject&fl%5B%5D=title&sort%5B%5D=date+desc&sort%5B%5D=&sort%5B%5D=&rows=2000&page=1&callback=callback&output=xml”

where we replace #{_dateString} with a date string like 19??-08-08.  Of course to one years worth of data we could use a data string like: 1968-??-??.  It is a simple extension of the query languge to replace the singular date request: date%3A#{_dateString} with a date range.

which returns Grateful Dead recording data for all years of the last century which were recorded on 08-08.  The XML output returned to the caller looks like:

clip_image001[4]

In a more graphic format the output looks like:

clip_image001[6]

Within Ruby we will need to make the HTTP Get request with a desired date range, transform the body of the return request into an XML document and use XPATH to parse the XML and retrieve the meta data values for each recording (see below).  The is NOTHING inherently wrong with this RPC interface.  It is flexible and allows us to select only the data fields we are interested in and return data only for the dates we wish.  Since RUBY supports neither native consumption of JSON nor XML. So the XML format of the data is as good as any other and numerous tools exist in RUBY to manipulate XML data.  I which RUBY had a more native interface for JSON but it does not.

At this point, we do not have meta-data about individual tracks in a given recording.  It turns out that we can get this data but not through an HTTP RPC request.  It turns our, dear reader, that if we have the IAKey for the recording we can obtain an xml file with track meta data by making the following call:

http://www.archive.org/download/{IAKEY}/{IAKEY}_files.xml.

This file contains assorted XML data, it varies by what formats IA makes available the individual tracks via a 309 (HTTP redirect).  This is not an RPC call so we are far from a RESTful interface here.  We do not have control over the fields or range of the data included in this call.  It is all or nothing.  But at least the XML format is simple to mainipulate.  With the IAKey in hand for an individual recording and making some reasonable guesses we can parse the XML file of track data and compose the TRACKS array for our couchDB document using XPATH. A single entry for the high bit rate mp3 track recording looks like:

<file name=”gd89-08-04d2t01_vbr.mp3″ source=”derivative”>
<creator>Grateful Dead</creator>
<title>Tuning</title>
<album>1989-08-04 – Cal Expo Amphitheatre</album>
<track>13</track>
<bitrate>194</bitrate>
<length>00:32</length>
<format>VBR MP3</format>
<original>gd89-08-04d2t01.flac</original>
<md5>91723df9ad8926180b855a0557fd9871</md5>
<mtime>1210562971</mtime>
<size>794943</size>
<crc32>2fb41312</crc32>
<sha1>80a1a78f909fedf2221fb281a11f89a250717e1d</sha1>
</file>

Note that we have the IAKey for the track (gd89-08-04d2t01 ) as part of the name attribute.

Garcia

Using a background Ruby Process to Read the Data

The following RUBY GEMS are required to complete this step:

rest-open-uri : This GEM extends open-uri to support POST, PUT and DELTE HTTP command

json : This GEM handles serialization and de-serialization of a limited subset of RUBY into JSON strings.

From the standard RUBY library we will also be using

rexml : This GEM creates XML documents from XML Strings and supports XPATH which we will use to read the XML documents from IA

Our first step is to extract the get the the data via HTTP and parse the XML file returned to find individual recordings.  There are  (in most cases) be multiple recordings per concert (per date) and we want to retain for the database only the “best”.

In pseudo Ruby code:

require ‘rest-open-uri’

require ‘rexml/document’

 def initialize(_dateString)

#HTTP GET, create a string of the response body and transform the string into an XML node tree
#mind the screen wrap and html Encoding:
@uri=http://www.archive.org/advancedsearch.php?q=collection%3AGratefulDead+date%3A#{_dateString}&fl%5B%5D=avg_rating&fl%5B%5D=date&fl%5B%5D=description&fl%5B%5D=downloads&fl%5B%5D=format&fl%5B%5D=identifier&fl%5B%5D=publicdate&fl%5B%5D=subject&fl%5B%5D=title&sort%5B%5D=date+desc&sort%5B%5D=&sort%5B%5D=&rows=2000&page=1&callback=callback&output=xml

  xmlString=”
open (@uri) do |x|       #build a representation of the response body as a string
x.each_line do |y|
xmlString=xmlString+y
end
if xmlString==”
puts ‘No String Returned From Internet Archive’
quit
end
end
@IAXMLDocument= REXML::Document.new(xmlString)  #turn the string into an XML document
end #open

Now we need  to loop through the XML document and pull out each ‘doc’ section using XPATH and read each doc section for the meta data for that recording.

#use XPATH and find each response/result/doc node and yield

def get_recordings(document)

document.elements.each(‘response/result/doc’)do |doc|

yield doc
end
end

#get the XML document and yield

def get_record(xmldoc)

get_recordings(xmldoc) do |doc|
yield doc
end
end

#general purpose XPATH method to extract element.text (the metadata values) for arbitrary XPATH expressions

def extract_ElmText(doc,xpath)

doc.elements.each(xpath) { |element|  return element.text }
end

def worker(xmldoc)

#main loop

_docCount=0

get_recordings(xmldoc) do |doc|
_docCount+=1
_date=extract_ElmText(doc,’date[@name=”date”]’)[0..9]
_description=extract_ElmText(doc,”str[@name=’description’]”)
_title=extract_ElmText(doc,”str[@name=’title’]”)
_title=clean_Title(_title)
_keylist=pull_keys(doc)
_pubdate=extract_ElmText(doc,’date[@name=”publicdate”]’)[0..9]  #there is a bug here , corrected by lines below

if (_pubdate.length==0)
_pubdate=’1999-01-01′
puts “#No Publication Date: {_date} #{_title}”
end
_uri=extract_ElmText(doc,’str[@name=”identifier”]’)

#make a RUBY class object to hold one recording
_record=GDRecording.new _date, _description, _tracklist, _title, _keylist, _pubdate,_uri

#save the recording class objects in an array

@list[@list.count]=_record

end

In this code the ‘worker’ method calls the helper methods to:

0) Do the HTTP  get to make the RPC request and read the response body one line at a time and

1) transform the lines into a single string and convert (REXML::Document.new) the string into an XML document for processing by XPATH

2) loop through the doc nodes of the xml tree and extract the values of the  meta data fields

3) the meta data values are passed to an RUBY class ( GDRecording) which holds this meta data for later processing,

4 finally we temporarily store the recordings in an array for the next processing step.

Note that these routines work  whether the query returns a single day (with multiple recordings) or multiple days or even the whole dataset!  What is essencial is that we process the file as N ‘doc’ sub trees (which represent single recordings) and have recording date (at least) to group our data and extract the ‘best’ recording within each date group.

Our next step will be group the recordings by day (i.e. concert) and provide our own filter to select a single ‘best’ recording for each concert.

Shake and Bake:  Finding A ‘Best’ Recording.

best

What is the best Grateful Dead concert.  Why the first one I went to of course.  Just ask any Deadhead and you will probably get the same answer.  But what is the best recording of any given GD concert? My approach is very simple.

  • Most recent posted recordings are better than older recordings. (least important criteria)
  • Soundboard recordings are better than audience recordings.
  • Matrix recordings are even better.
  • Recordings mixed by Charlie Miller are best of all. (most important criteria)

Well these are MY criteria.  What ever criteria as long as they are hieratical  you can code the select in a very trivial manner.  If we have a field in each recording for the concert date and a field for each selection criteria (we derive these from the keywords field in IA) we sort the recordings by date and then by each of the criteria from most important (Charlie Miller in may case) to least important (date posted) and then select the first recording in sort order within each date group. On Ruby the sort of our list of recordings is trivial to code and easy to maniuplate (add new criteria or change the priority of criteria). The sort statement looks like this:

@list.sort! { |a,b| (a.date+a.cm.to_s+a.sb.to_s+a.mx.to_s+a.pubdate ) <=> (b.date+b.cm.to_s+b.sb.to_s+b.mx.to_s+b.pubdate )   }

Once sorted we create a list of best recordings as:

def newSelect
_dateGroup=nil
_list=Array.new
if  @list==nil or @list.count==0
puts ‘No Recordings.’
return;
end
foreach do |rec|
if _dateGroup!=rec.date
if dateGroup!=nil
@selectList[@selectList.count]=list[0]
end
_dateGroup=rec.date
_list=Array.new
end
_list[_list.count]=rec
end
if dateGroup!=nil
@selectList[@selectList.count]=list[0]
end
end

Note that is code is not only simple but it is independent of the selection criteria we are using.

Now that we have a list of recordings we are interested in,  we can get the XML file of track meta data using the IAKey discussed above and making a simple GET call and parsing the XML file for the meta data for each.  Much of the code used duplicates the XML code presented above so we need not reproduce all the code except to show a short section which uses a slightly different XML XPATH syntax:

open (filesURI) do |x| x.each_line do |y| xmlString=xmlString+y end end
REXML::Document.new(xmlString).elements.each(‘files’) do |doc|

doc.elements.each(‘file’) {|file_elm|
obj=nil
title=nil
trackString=nil
lengthString=nil
obj=file_elm.attributes[“name”]
file_elm.elements.each(‘title’) { |element| title=element.text }
file_elm.elements.each(‘track’) { |element| trackString=element.text}
file_elm.elements.each(‘length’) { |element| lengthString=element.text}

{omitted code}

end

Okay now we have a (hash) list of recording meta data,  each item of which contains a (hash) list of track meta data for that recording.  In our next post we will leave this unRestful world behind and move into the RESTful world of couchDB when we:

  • Serialize the object to a JSON string (i.e. a JSON document);
  • Do POST requests to insert  a JSON document for each concert into the couchdb database;
  • Create couchDB views to allow optimized data retrieval; and
  • Create a couchDB view to optimize retrieval recordings for all years for an arbitrary Month and Day (this duplicates the data provided by the “Grateful Dead Shows on This Day In History” selection in the Internet Archive.

cat on fancy couch

%d bloggers like this: