Things I should have known: fn:doc-available($url)

Mar 29, 2012    

Sometimes you run across something in a technology or domain that you really should have known long long ago but didn’t.  This is one of those things.  Thank you to the person who pointed it out to me.

fn:exists(fn:doc($url)) causes MarkLogic to pull a fragment from storage to “post-filter” confirm the existence of a document node.  The faster way of doing this is fn:doc-available($url) or xdmp:exists(fn:doc($url)).

xdmp:exists( … ) is basically equivalent to xdmp:estimate( … ) gt 0

The way to confirm for yourself that one has less impact on storage than the other is to turn on xdmp:query trace before running each command.  Compare the MarkLogic error log after running the following

xquery version "1.0-ml";
xdmp:document-insert( "/1.xml", <a>1</a>);
xquery version "1.0-ml";
xdmp:query-trace(fn:true()),
fn:exists(fn:doc("/1.xml"));
xquery version "1.0-ml";
xdmp:query-trace(fn:true()),
fn:doc-available("/1.xml")

The error log should have something like this:

Analyzing path: fn:doc("/1.xml")
Step 1 is searchable: fn:doc("/1.xml")
Path is fully searchable.
Gathering constraints.
Step 1 contributed 1 constraint: fn:doc("/1.xml")
Executing search.
Selected 1 fragment to filter
Analyzing path: fn:doc("/1.xml")
Step 1 is searchable: fn:doc("/1.xml")
Path is fully searchable.
Gathering constraints.
Step 1 contributed 1 constraint: fn:doc("/1.xml")
Executing search.

It’s that “Selected 1 fragment to filter” that tells touching the storage was necessary for fn:exists().  You want to avoid this for most storage solutions.  Interestingly enough, this is true whether or not I turn on the URI Lexicon.