Maps and Profiling Performance

Dec 07, 2011    

Kurt Cagle over at XML Today posted a great blog about the map:map library in MarkLogic.  map:map is a hashtable-like map implementation inside MarkLogic that has measurable performance advantages over raw sequences and predicates in XQuery.  Wait … performance advantages?  How can one tell?  The quickest way is to punch up a code example in XQuery and run it in CQ or the new MarkLogic5 Query Console using the “Profile” mode.  You’ll get the runtime and a breakdown of each step of the XQuery evaluation and how it contributed to the total time.

Here’s a trick with maps that Kurt didn’t mention.  You can subtract two maps quickly diff two lists of information.  First the slower XQuery sequences approach:

let $a := for $i in (1 to 10000) return xs:string($i)
let $b := for $i in (1 to 10000) return if (xdmp:random(1000) > 990) then () else xs:string($i)
return
$a[ fn:not( . = $b ) ]

This has around a 2.8 to 3 second run time on my laptop using the Query Console profile feature.  There are probably better ways to do this with XQuery, but I’m trying to demonstrate how cool maps are, so I’ll let that go for now.

Here is the same code implemented with maps.  This has about a 0.1 second average run time my laptop.  The performance divide just gets wider as I increase the size of the lists being compared.

let $a := for $i in (1 to 10000) return xs:string($i)
let $b := for $i in (1 to 10000) return if (xdmp:random(1000) > 990) then () else xs:string($i)
let $mapa := map:map()
let $mapb := map:map()
let $_ := for $av in $a return map:put($mapa, $av, $av)
let $_ := for $bv in $b return map:put($mapb, $bv, $bv)
return
map:keys( $mapa - $mapb )