Monday, February 22, 2010

Catalog Key Errors

Apparently I am the patron saint of key errors. The latest one was with the catalog, and it was way nicer than some of the previous monsters because at least it didn't bring down the whole site or involve sweaty, hand shaking stress. To the end user, certain keyword searches didn't in livesearch, advanced queries. This key error manifested in the logs as:

Module Products.ZCTextIndex.BaseIndex, line 203, in search_phrase
  Module Products.ZCTextIndex.OkapiIndex, line 161, in _search_wids
KeyError: -1525983394

I had a feeling that this would happen because a few days earlier, I came across a few weird errors trying to emply some jedi mind tricks in the db. An example of one:

2010-02-17 21:46:36,505 ERROR @/opt/Zope-2.9/lib/python/Products/PluginIndexes/common/UnIndex.py/UnIndex.py UnIndex 194  : DateIndex: unindex_object could not remove documentId -1324564720 from index modified.  This should not happen.
Traceback (most recent call last):
 File "/opt/Zope-2.9/lib/python/Products/PluginIndexes/common/UnIndex.py", line 168, in removeForwardIndexEntry
   indexRow.remove(documentId)
KeyError: -1825924534

Whoops.  They didn't cause anything obvious to happen so I did what every lazy sys admin does and ignored it. As far as I can tell there isn't a required correlation between the two errors, but it will at least point you in the right direction.

That was Friday night. Then customers got on the system Monday and we started getting the ZCTextIndex KeyError sporadically. It affected all searches with the word 'jackson' and using indexes set up with OkapiIndex algorithm (PloneLexicon). This is the default choice for the Title, SearchableText, and Description indexes.

Looking this error up on Google shows a few answers, which involve clearing and rebuilding the catalog. I'm sure this works, but our index alone is 1.2GB so that was really a last resort option. It would take so long that before reindexing we probably just would have monkeypatched the catalog code to just ignore those keys. I was determined to find a way to fix without rebuilding. And I did.

Warning: the following explanation is based on eye crossing interpretations of zope code. I may be off a little but I think its a worthy attempt nonethless.

The text indexes need a way to correlate a word set (what the user is searching for) with a document and vice-versa.  For example, when I lookup 'jackson', I need to know that 17 documents have that keyword. If you have been mucking around with the catalogs or something went crazy on your catalog (like your sys admin) then you can get this words list of sync with other indexes and the real catalog. To confirm this, try idx.getEntryForObject(docid) on the failing index, where docid is the KeyError integer in your error log. In my example, the keyword 'jackson' in the words BTree pointed to a list of documents, all of which were valid except -1525983394. By just extracting the bad reference to this non-existing document, then there is no need to reindex everything.

Note that for this to work you need to know the word that triggers the error. You can't do a reverse lookup to find out which words reference that document because that mapping is the one that is still correct. It will always return empty due to a silent fail. If your users won't tell you, then you can just log it when it errors out later. If its really bad, iterate through all words and what they point to and see if they throw a key error (hint: use the _wordinfo.iterkeys() iterator + other tools listed here). Or just reindex.

As always, backup before hand and if you are nervous work on a copy first. I'm sure this can easily be modified for other indexes/schemes as well.

From the debug prompt:
'''
@docids is a list of the key error items as ints, not strings
@word is the word that triggers the index error
@cat is the catalog object that has the error, i.e. app.foo.portal_catalog
'''
def removeFromWordCatalog(docids, word, cat):
    import transaction
    for docid in docids:
        for indexObj in cat.getIndexObjects():
            try:
                itype = indexObj.getIndexType()
            except AttributeError:
                continue # some indexes don't implement this method
            lex = indexObj.getLexicon()
            if itype == 'Okapi BM25 Rank':
                wids = lex.termToWordIds(word)
                idx = indexObj.index
                for wid in wids:                
                    try:
                        # using excepts because BTrees key lookup is annoying
                        # see if its even really an error before ousting
                        blah = idx._wordinfo[wid][docid]
                    except KeyError: # not listed - nothing wrong
                        continue
                        
                    idx._wordinfo[wid].pop(docid)
                    idx._wordinfo[wid] # persistance
                    try: 
                        idx._wordinfo[wid][docid]
                    except KeyError:
                        print "sucessfully removed"
                    transaction.savepoint(1)
                    transaction.commit() 

docids = [-1525983376, -3423423445, ...]
word = 'jackson'
cat = app.foo.portal_catalog
removeFromWordCatalog(docids, word, cat) 

UPDATE: I already got to the point where the word by word thing became annoying so I wrote a generic function to "unword" a catalog entry. Much nicer, and although I thought it would take a long time it was actually not toooooo bad. I also pieced out the code so I could get as generic or specific as I want in the case that something else happens. Which it will. Woohoo!

def removeDocFromWordlist(idx, wid, docid):
    import transaction
    try:
        # using excepts because BTrees key lookup is annoying
        # see if its even really an error before ousting
        blah = idx._wordinfo[wid][docid]
    except KeyError: # not listed - nothing wrong
        return 
        
    idx._wordinfo[wid].pop(docid)
    idx._wordinfo[wid] # persistance
    try: 
        idx._wordinfo[wid][docid]
    except KeyError:
        print "sucessfully removed"
    transaction.savepoint(1)
    transaction.commit() 
    
    
def cleanUpBadDocId(docid, idx):
    iterati = idx._wordinfo.iterkeys()
    while True:
        try:
            wid = iterati.next()
        except StopIteration:
            break
        
        if idx._wordinfo[wid].has_key(docid):
            print "removing %s from word list %s"%(docid, wid)
            removeDocFromWordlist(idx, wid, docid)
            
            
def removeFromWordCatalogs(docids, cat):
    for docid in docids:
        for indexObj in cat.getIndexObjects():
            try:
                itype = indexObj.getIndexType()
            except AttributeError:
                continue # some indexes don't implement this method
            lex = indexObj.getLexicon()
            if itype == 'Okapi BM25 Rank':
                idx = indexObj.index            
                cleanUpBadDocId(docid, idx)


docids = [-1525983376, -3423423445, ...]
cat = app.foo.portal_catalog
removeFromWordCatalogs(docids, cat)

Tuesday, February 16, 2010

PloneChix: Hot and Not

Instead of boring you with a lengthy explanation of our official purpose, I'd like to address what makes PloneChix exciting and unique and then tackle some misconceptions about what people think PloneChix represents. As a followup post, I'd like to invite all PloneChix members to explain why PloneChix exists for them. There will be more meaning and inspiration from their stories than reading an tedious political statement.


What is so hot about PloneChix? PloneChix is ...
  • ... a no flame zone. That means no RTFMs, no stupid questions, no negative feedback, and no smack talk. This is the most important aspect of who we are. Women in technology, especially open source, are more likely to withdraw from a community because of actions/comments/answers perceived as harsh.   By making a special effort to make a no flame zone, we are encouraging people to ask for help and giving them helpful, thorough, honest answers by any means possible. Who knows, maybe all this kindness will be contagious. 
  • ... a place to be heard. Speak up, let us help you, stick around Plone for a little bit longer and we think you'll like what you see.
  • ... a place to ask non-technical questions (although we like technical too). For example: What is the job situation in the bay area for Plone integrators? Does company X have a supportive female developer community? What can I do to market myself better as an integrator/developer/documenter? 
  • ... an open and supportive learning environment. This is the perfect arena to tell a story and get honest feedback. Did you blow that interview because you underrepresented yourself or did you really not have enough experience? Let's talk about it. It's a place to get advice from women with different experiences and backgrounds.
  • ... an opportunity to tackle that project with a group instead of doing it on your own. We have just started working on defining our projects for this year. Why not add yours? If you have wanted to commit to the core for a while but just didn't want to tackle such a huge project on your own or want some help, we'll make sure to hook you up with a veteran who will show you the ropes.
  • ... a place to network. PloneChix is here to share and encourage each other to write that first book or give a talk at that conference. In my opinion, visibility is the #1 issue with women in the Plone community. Be visible, be heard, and get your name out there. There are lots of special opportunities for women in technology, and if we don't take advantage of them women in other communities will (i.e. Ruby women are fierce). Posting job opportunities, calls for papers/talks, scholarships, et al is highly encouraged.
Hot indeed. There are already some strange rumors about PloneChix that have surfaced that I'd like to take a moment to address. PloneChix is NOT...
  • ... a group of man-hating femi-nazis. In fact it would be way easier to justify if we were. However, we do have a tendency to focus on the needs of women and womens issues in the Plone, python, and Open Source Software (OSS) communities.
  • ... a derogatory name. It's an homage to LinuxChix, one of the finest female-friendly OSS movements to this day. There are many other groups that pay tribute to the original, including DevChix, DrupalChix and CodeChix, just to name a few. The word "Chix" has become synonymous with female fronted groups that discourage friendly fire (no RTFMs) and encourage low barrier to entry.
  • ... a "women only" club. Anyone is welcome as long as they are supportive of the cause and contribute positively back to it and/or the Plone community.
  • ... a support group for complaining or flaming, especially about other members of the community. PloneChix is a place for positive feedback, encouragement, and help. We are about positive change, not negative reinforcement.
  • ... looking for 50:50 male to female ratio just for the sake of having a 50:50 male to female ratio. On the contrary, we are all interested in seeing the entire Plone community grow and thrive in the way that Kirrily Roberts* describes.
  • ... a replacement for the mailing list, irc, or other more formal means of communication about Plone. It is an additional communication channel lead by like-minded individuals. It is also a place to talk about careers, meetups, projects, or even just to brag about a new project in a friendly, open environment. 
This is the part of the post where I start practicing what I've been preaching. Girls, boys, aliens, machines: how does this make you feel? Hint: I was shooting for all warm and tingly inside.

If you are interested in participating, male or female, check out our wiki, which has all the information you need to get started.

* Please read this post 50 times. Take a nap. Then 50 more. Then we'll talk.