Why DB XML Doesn't Do Path Indexes
Every so often we have a Berkeley DB XML user ask us if we implement path indexes. Or maybe they assume that we do implement them and are confused about why their index isn't working. It turns out that a lot of these users have come to us after using eXist, which for a long time has only been able to specify path indexes.
A path index is specified by describing the path to the nodes that you want to index - often with a subset of XPath. So you might want to index all "firstname" elements, but only if they are children of an "author" element - which you might write like this:
//author/firstname
The good thing about path indexes is that they can be very small targeted indexes, which only record exactly the information that you want.
The trouble with path indexes, I've always explained, is that it's really hard to work out when you can use one in a query. You miss out on lots of potential optimisation opportunities because query analysis can't prove that an index is applicable. It's not surprising really - path indexes are a subset of materialised views which is known to be a very hard problem, in the general case.
So it's not surprising to see that recent work on eXist has added a new type of index which works in exactly the same way as DB XML's do. The author of the blog post even describes the problems that I have been telling our users about, and recommends that path indexes be used sparingly if at all.
It's good to see other people coming to the same conclusion that I came to. DB XML may implement path indexes some day, but they won't be anywhere near as useful as some users think they should be.
Posted by john at January 17, 2008 04:55 PM