Monday, February 22, 2010

Catalog Key Errors

Apparently I am the patron saint of key errors. The latest one was with the catalog, and it was way nicer than some of the previous monsters because at least it didn't bring down the whole site or involve sweaty, hand shaking stress. To the end user, certain keyword searches didn't in livesearch, advanced queries. This key error manifested in the logs as:

Module Products.ZCTextIndex.BaseIndex, line 203, in search_phrase
  Module Products.ZCTextIndex.OkapiIndex, line 161, in _search_wids
KeyError: -1525983394

I had a feeling that this would happen because a few days earlier, I came across a few weird errors trying to emply some jedi mind tricks in the db. An example of one:

2010-02-17 21:46:36,505 ERROR @/opt/Zope-2.9/lib/python/Products/PluginIndexes/common/UnIndex.py/UnIndex.py UnIndex 194  : DateIndex: unindex_object could not remove documentId -1324564720 from index modified.  This should not happen.
Traceback (most recent call last):
 File "/opt/Zope-2.9/lib/python/Products/PluginIndexes/common/UnIndex.py", line 168, in removeForwardIndexEntry
   indexRow.remove(documentId)
KeyError: -1825924534

Whoops.  They didn't cause anything obvious to happen so I did what every lazy sys admin does and ignored it. As far as I can tell there isn't a required correlation between the two errors, but it will at least point you in the right direction.

That was Friday night. Then customers got on the system Monday and we started getting the ZCTextIndex KeyError sporadically. It affected all searches with the word 'jackson' and using indexes set up with OkapiIndex algorithm (PloneLexicon). This is the default choice for the Title, SearchableText, and Description indexes.

Looking this error up on Google shows a few answers, which involve clearing and rebuilding the catalog. I'm sure this works, but our index alone is 1.2GB so that was really a last resort option. It would take so long that before reindexing we probably just would have monkeypatched the catalog code to just ignore those keys. I was determined to find a way to fix without rebuilding. And I did.

Warning: the following explanation is based on eye crossing interpretations of zope code. I may be off a little but I think its a worthy attempt nonethless.

The text indexes need a way to correlate a word set (what the user is searching for) with a document and vice-versa.  For example, when I lookup 'jackson', I need to know that 17 documents have that keyword. If you have been mucking around with the catalogs or something went crazy on your catalog (like your sys admin) then you can get this words list of sync with other indexes and the real catalog. To confirm this, try idx.getEntryForObject(docid) on the failing index, where docid is the KeyError integer in your error log. In my example, the keyword 'jackson' in the words BTree pointed to a list of documents, all of which were valid except -1525983394. By just extracting the bad reference to this non-existing document, then there is no need to reindex everything.

Note that for this to work you need to know the word that triggers the error. You can't do a reverse lookup to find out which words reference that document because that mapping is the one that is still correct. It will always return empty due to a silent fail. If your users won't tell you, then you can just log it when it errors out later. If its really bad, iterate through all words and what they point to and see if they throw a key error (hint: use the _wordinfo.iterkeys() iterator + other tools listed here). Or just reindex.

As always, backup before hand and if you are nervous work on a copy first. I'm sure this can easily be modified for other indexes/schemes as well.

From the debug prompt:
'''
@docids is a list of the key error items as ints, not strings
@word is the word that triggers the index error
@cat is the catalog object that has the error, i.e. app.foo.portal_catalog
'''
def removeFromWordCatalog(docids, word, cat):
    import transaction
    for docid in docids:
        for indexObj in cat.getIndexObjects():
            try:
                itype = indexObj.getIndexType()
            except AttributeError:
                continue # some indexes don't implement this method
            lex = indexObj.getLexicon()
            if itype == 'Okapi BM25 Rank':
                wids = lex.termToWordIds(word)
                idx = indexObj.index
                for wid in wids:                
                    try:
                        # using excepts because BTrees key lookup is annoying
                        # see if its even really an error before ousting
                        blah = idx._wordinfo[wid][docid]
                    except KeyError: # not listed - nothing wrong
                        continue
                        
                    idx._wordinfo[wid].pop(docid)
                    idx._wordinfo[wid] # persistance
                    try: 
                        idx._wordinfo[wid][docid]
                    except KeyError:
                        print "sucessfully removed"
                    transaction.savepoint(1)
                    transaction.commit() 

docids = [-1525983376, -3423423445, ...]
word = 'jackson'
cat = app.foo.portal_catalog
removeFromWordCatalog(docids, word, cat) 

UPDATE: I already got to the point where the word by word thing became annoying so I wrote a generic function to "unword" a catalog entry. Much nicer, and although I thought it would take a long time it was actually not toooooo bad. I also pieced out the code so I could get as generic or specific as I want in the case that something else happens. Which it will. Woohoo!

def removeDocFromWordlist(idx, wid, docid):
    import transaction
    try:
        # using excepts because BTrees key lookup is annoying
        # see if its even really an error before ousting
        blah = idx._wordinfo[wid][docid]
    except KeyError: # not listed - nothing wrong
        return 
        
    idx._wordinfo[wid].pop(docid)
    idx._wordinfo[wid] # persistance
    try: 
        idx._wordinfo[wid][docid]
    except KeyError:
        print "sucessfully removed"
    transaction.savepoint(1)
    transaction.commit() 
    
    
def cleanUpBadDocId(docid, idx):
    iterati = idx._wordinfo.iterkeys()
    while True:
        try:
            wid = iterati.next()
        except StopIteration:
            break
        
        if idx._wordinfo[wid].has_key(docid):
            print "removing %s from word list %s"%(docid, wid)
            removeDocFromWordlist(idx, wid, docid)
            
            
def removeFromWordCatalogs(docids, cat):
    for docid in docids:
        for indexObj in cat.getIndexObjects():
            try:
                itype = indexObj.getIndexType()
            except AttributeError:
                continue # some indexes don't implement this method
            lex = indexObj.getLexicon()
            if itype == 'Okapi BM25 Rank':
                idx = indexObj.index            
                cleanUpBadDocId(docid, idx)


docids = [-1525983376, -3423423445, ...]
cat = app.foo.portal_catalog
removeFromWordCatalogs(docids, cat)

2 comments:

  1. Thanks Liz. I've run into this once or twice but have always just rebuilt the catalog rather than actually trying to figure out what was going on...I'm sure this'll be useful next time.

    ReplyDelete
  2. In CleanUpBadDocId(), instead of

    iterati = idx._wordinfo.iterkeys()
    while True:
    try:
    wid = iterati.next()
    except StopIteration:
    break

    you can just do

    for wid in idx._wordinfo:

    ReplyDelete