Tutorial: Mobile Shakespeare (Part 3 – Adding Search)

Dec 17, 2011    

Mobile Shakespeare search screenIn the last part of this tutorial we skinned the Mobile Shakespeare app to be more memorable and distinctive.  Now it’s time to add some search functionality.

The complete code base for this sample is now up in gitub for your reference:

github/derickson/shake/xquery2

I’ve cleaned up the XQuery for readability and will be linking only portions of the code here.

This is the app we are going to build: BROKEN LINK TO DEMO

REST

First we’ll set up a few new REST targets in /lib/config.xqy

<get path="play/:id/act/:act/scene/:scene/speech/:speech"><to>play#scene</to></get>
        
<get path="search"><to>search#get</to></get>
<post path="search"><to>search#get</to></post>

Note, I’ve added the ability to jump to a specific SPEECH back in the /play resource.  You can check out this code yourself in the github copy of the code.

Search Resource

Next we’ll make a new resource for the search REST targets in /resource/search.xqy .  At the top of this file we are going to import the MarkLogic Search API, which is a high level XQuery library that sits ontop of the core MarkLogic search function in the cts:* library.  The Search API is a great place to start when building XQuery web apps because it does so much for you.  Like all high level tools, you may eventually outgrow parts of the Search API and decide to use the cts:functions directly ( I do this all the time for intricate multi-tiered facets).  The cts functions are incredibly easy to compose and work much like boolean functions, but for now we’ll stick to the Search API.  So … that import statement

import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy";

The most basic call of the Search API is

search:search( $searchTerm, $options)

The Search API $options parameter

The $searchTerm is a one-line string that fits a grammar specified in the second parameter, $options.  If you omit the options XML parameter the Search API defaults do a good job of emulating a “Google-like” search syntax, but I want to make some modifications.  Rather than start from scratch learning how to construct the XML that make up these option we can get the Search API defaults to use as a starting point by calling the following function in the Query Console or cq. (Don’t forget the import of the Search API module)

search:get-default-options()

You can start editing from the Search API standard functionality.  Here is my finished $options object:

(: Search API options :)
declare variable $options :=
    <options xmlns="http://marklogic.com/appservices/search">
        <concurrency-level>8</concurrency-level>
        <debug>0</debug>
        <page-length>10</page-length>
        <search-option>score-logtfidf</search-option>
        <quality-weight>1.0</quality-weight>
        <return-constraints>false</return-constraints>
        <!-- Turning off the things we don't use -->
        <return-facets>false</return-facets>
        <return-qtext>false</return-qtext>
        <return-query>false</return-query>
        <return-results>true</return-results>
        <return-metrics>false</return-metrics>
        <return-similar>false</return-similar>
        <searchable-expression>//SPEECH</searchable-expression>
        <sort-order direction="descending">
            <score/>
        </sort-order>
        <term apply="term">
            <!-- "" $term returns no results -->
            <empty apply="no-results" />
            <!-- Not sure why this isn't a default -->
            <term-option>case-insensitive</term-option>
        </term>
        <grammar>
            <quotation>"</quotation>
            <implicit>
                <cts:and-query strength="20" xmlns:cts="http://marklogic.com/cts"/>
            </implicit>
            <starter strength="30" apply="grouping" delimiter=")">(</starter>
            <starter strength="40" apply="prefix" element="cts:not-query">-</starter>
            <joiner strength="10" apply="infix" element="cts:or-query" tokenize="word">OR</joiner>
            <joiner strength="20" apply="infix" element="cts:and-query" tokenize="word">AND</joiner>
            <joiner strength="30" apply="infix" element="cts:near-query" tokenize="word">NEAR</joiner>
            <joiner strength="30" apply="near2" consume="2" element="cts:near-query">NEAR/</joiner>
            <joiner strength="50" apply="constraint">:</joiner>
            <joiner strength="50" apply="constraint" compare="LT" tokenize="word">LT</joiner>
            <joiner strength="50" apply="constraint" compare="LE" tokenize="word">LE</joiner>
            <joiner strength="50" apply="constraint" compare="GT" tokenize="word">GT</joiner>
            <joiner strength="50" apply="constraint" compare="GE" tokenize="word">GE</joiner>
            <joiner strength="50" apply="constraint" compare="NE" tokenize="word">NE</joiner>
        </grammar>
        <!-- Custom rendering code for "Snippet" -->
        <transform-results apply="snippet" ns="http://framework/lib/l-util" at="/lib/l-util.xqy" />
    </options>;

Let’s go through it.  I turn off return of data I won’t be using for rendering.  For example my app has no facets:

<!-- Turning off the things we don't use -->
<return-facets>false</return-facets>

I want our search results to be the SPEECH tags inside the PLAY root elements.  In cts we would specify a “searchable expression” as the first param of cts:search. In the search API we add the following:

<searchable-expression>//SPEECH</searchable-expression>

Lastly i change some of the default text term options.  When a user doesn’t type anything, I’ll omit executing the search, rather than just pass back the first SPEECH in document order in the repository (which they can do from the Play button on the new home page anyways).  Also when testing the app I found that searches for “My kingdom for a horse” returned zero results.  That’s because the default for the Search API is “case-sensitive”.  (Note, it’s a good idea to turn on the index for fast case sensitive search if this is what you want)  But my user’s might type in “My” kingdom for a horse, so I’ll set a term option to case-insensitive:

<term apply="term">
    <!-- "" $term returns no results -->
    <empty apply="no-results" />
    <!-- Not sure why this isn't a default -->
    <term-option>case-insensitive</term-option>
</term>

The last step is to specify a custom snippeting library function.  The Search API assumes I am searching on text that is too large to present in a result, but I’d like my users to see the whole SPEECH in order to give the highlighted words context.  I’ll let you look at the highlight code yourself in github under /lib/l-util.xqy, but the portion of the $options that specifies which code to use is:

<!-- Custom rendering code for "Snippet" -->
<transform-results apply="snippet"
    ns="http://framework/lib/l-util"
    at="/lib/l-util.xqy" />

Wrapping up the Page

Next I’ll need a good search form.  I decided to have a Phrase Search toggle switch because mobile users often don’t have the quotatin marks of the standard Search API grammar on their keyboards without going to a SHIFT alternate keyboard:

<!-- Search Form -->
                <form action="/search" method="get" data-transition="fade" class="ui-body ui-body-b ui-corner-all">
                    <fieldset >
                        <label for="search-basic">Search all lines:</label>
                        <input type="search" name="term" id="term" value="{$term}" data-theme="b" />    
                    </fieldset>
                    <div data-role="fieldcontain">
                        <label for="slider2">Phrase search:</label>
                        <select name="phrase" id="phrase" data-role="slider" >
                            <option value="off">
                                Off
                            </option>
                            <option value="on">
                                {
                                    (: Dynamic inline attribute of the option element :)
                                    if($phrase eq "on") then 
                                        attribute selected {"selected"} 
                                    else 
                                        ()
                                }
                                On
                            </option>
                        </select>
                    </div>
                    <button type="submit" data-theme="b" data-transition="fade">Submit</button>
                </form>

And I’ll actually need to call the search.  I pass the Search API results XML object to a transform function which you can go through on github:

<p>
            {
                (: Search Results Area :)
                
                (: 
                 Modify the typed search term.  
                 Add Quotes if the $phrase flag is "on"
                 If the term is empty sequence, use ""
                :)
                let $searchTerm := 
                    if(fn:exists($term)) then 
                        if($phrase eq "on" and fn:not( fn:starts-with($term,'"') and fn:ends-with($term,'"'))) then
                            fn:concat('"',$term,'"')
                        else
                            $term
                    else 
                        ""
                return
                
                    (: 
                      Think Functionally ...
                      XQuery invokes passes the evaluation of search:search
                      to transform-results
                    :) 
                    
                    (: transform results into HTML5 :)
                    local:transform-results( 
                        (: execute the search with the Search API :)
                        search:search($searchTerm, $options)//search:result 
                    )
            }
            </p>

So now in one XQuery script I have a basic search app that is already optimized for Mobile browsers.  I’m happy with the performance of the site on my relatively new Android phone, but I imagine we’d want to pay close attention to some of the HTML can can caching response headers being returned from MarkLogic given that the content is completely static .  Here are some items for improvement and exploration I could think of:

  • Check performance after “Phonegapping” the HTML5 into a native iOS or Android App
  • Add the HTML5 meta tags for specifying an Apple icon when this site is bookmarked on iOS home screen.
  • Add paging to the search screen
  • Add the ability to search within a specific play (this could be done quickly with a Search API constraint and a drop down)
  • Allow users to “star” lines as their favorites (no login really necessary) and put links to the most popular lines on the Mobile Shakespeare home screen.

Phrase-Through and Phrase-Around

However, instead of spending time on that, let’s tune the MarkLogic indices slightly.  One of the strengths of MarkLogic over other full text search indexers is that MarkLogic preserves structure.  As a result, MarkLogic can do inferred metadata search and full text search out of the same indices without extra configuration or integration (it scales well too!). Here’s one my my favorite lines from Macbeth

Quote From Macbeth Act 5 Scene 8

Some search engines flatten the text in their documents before generating search indexes.  This makes resolving relevance based on the surrounding XML or HTML tags very difficult imagine searching for the phrase “Untimely ripp’d Accursed be” in the following examples:

1.)
<div>... was <b>untimely</b> ripp'd. Accursed be ...</div>

2.)
<div><p>... was untimely ripp'd.</p><p>Accursed be ...</p>

3.)
<div>... was untimely
  <annotation class="hidden tooltip">Awesome!</annotation>
ripp'd.  Accursed b...</div>

4.)
<SPEECH>
  <LINE>Tell thee, Macduff was from his mother's womb<LINE>
  <LINE>Untimely ripp'd.</LINE>
</SPEECH>
<SPEECH>
  <LINE>Accursed be the tongue ...

The first sample might lead you to believe that flattening the text is a good idea.  By ignoring all structure, we get a relevant result because the words “untimely,” “ripp’d,” “accursed,” and “be” are adjacent.  The tags represent style, not substance!  MarkLogic has something called a “Phrase-Through” index setting which tells the indexer to determine phraases through an element separation.  Obvious examples from XHTML, wordml, and other common namespaces come preconfigured so you don’t have to worry about them.

The second sample destroys the notion of text flattening.  Ripp’d and Accursed aren’t just in different sentences (as the period might inform some indexers), they are in different paragraphs and do not form a semantic “phrase”.  MarkLogic won’t Phrase-Through a <p> tag unless we tell it to so we get the correct behavior.

The third sample above is trickier.  Embedded into the text is markup that represents an inline annotation of the semi-structued data.  If I flatten the text, the word “Awesome” messes up our phrase, but a default parse of the XML structure also breaks up the semantics of the “untimely ripp’d” phrase.  MarkLogic can solve this with a “Phrase-Around” on the annotation tag.  A Phrase-Around setting tells MarkLogic to indexer to link the words before and after into a phrase but ignore the words inside.

The fourth sample is our data from the Shakespeare XML demo.  To get a good phrase search on “was from his mother’s womb untimely ripp’d” we need to set up a Phrase-Through on the LINE element in the Databases > shake > Phrase-Throughs setting on the MarkLogic admin menu.  Once we’ve done this a phrase enabled search for “was from his mother’s womb untimely ripp’d” results in:

correctly highlighted quote

Not bad.  By allowing Phrase-Through and Phrase-Around flexibility on the source XML schema we don’t have to transform the data to index it.  We get to preserver structure and have full text search at the same time!

— Dave