11 July 2005

Store MPTT trees with history

When searching for a method to store trees in a database (in this case MySQL) in the spring of 2004, I found a method called "Nested Sets" by Joe Celko (other version). This method is also known as MPTT (Modified Preorder Tree Traversal). Using this method, queries to get all parents or all childs are much faster, then recursively getting the next parent or next childs.

To be able to save the history of my tree, I added two columns to the database: valid_from and valid_till. To insert an item in the tree, I backuped the items to be changed (the items on the right and the parent items) setting the valid_till field to now, inserted copies of the backuped ones with higher left and right values (and a valid_from field set to now) to make room for the new item, and finally inserted the new item. Doing so the number of rows in the database grew fast; every insert caused all right items and parents to be copied for history.

I researched alternatives for the MPTT algoritm and found some good articles on the internet. Vadim Tropashko wrote an article about "Nested Intervals" as an alternative to Nested Sets (other articles). Using that method you did not need updating parent items, but the right items must be updated as well.
Another drawback an implementation of Nested Intervals is, that you must use fractions or very accurate floats. When using fractions your performance decreases because they must be evaluated to select items. Using floats may give round errors if there are many levels and/or many items.

When brainstorming with my colleague Hermen van den Hoorn about our growing database table containing the tree, we got the ultimate idea. Why should there be no gaps in the tree? Why are we reformatting the tree each time to use each value between the left and right value of the root? The MPTT model does not dictate that.
Wanting to save the history of the tree, we will never delete an item; only expire it by setting its valid_till field to now. If we leave room for the deleted item, i.e. do not rebuild the left and right values of the tree, the tree remains correct. Adding an element requires making room for it, but does not need to backup the entire tree.
The final tree will contain all items: deleted items, current items and perhaps future items. If we take care of the whole tree, even the deleted and future items, if changing it, we are able to get the tree as it was on each timestamp.

An example to illustrate.
Base table (from Joe Celko's article):
id name mptt_left mptt_right mptt_level valid_from valid_till
1 Albert 1 12 1 2000-01-01 9999-12-31
2 Bert 2 3 2 2000-01-01 9999-12-31
3 Chuck 4 11 2 2000-01-01 9999-12-31
4 Donna 5 6 3 2000-01-01 9999-12-31
5 Eddie 7 8 3 2000-01-01 9999-12-31
6 Fred 9 10 3 2000-01-01 9999-12-31

The corresponding tree (after 2000-01-01) looks like:
          Albert (1,12)
           /       \
         /           \
   Bert (2,3)   Chuck (4,11)
                  /   |  \
                /     |    \
              /       |      \
            /         |        \
       Donna (5,6) Eddie (7,8) Fred (9,10)

Now I want to add employee Jack under Bert per 2003-01-01. We make room and the table becomes:
id name mptt_left mptt_right mptt_level valid_from valid_till
1 Albert 1 14 1 2000-01-01 9999-12-31
2 Bert 2 5 2 2000-01-01 9999-12-31
3 Chuck 6 13 2 2000-01-01 9999-12-31
4 Donna 7 8 3 2000-01-01 9999-12-31
5 Eddie 9 10 3 2000-01-01 9999-12-31
6 Fred 11 12 3 2000-01-01 9999-12-31

For all employees right of Bert the left and right values are increased by 2. For all parents, including Bert, the right value is increased by 2. The tree is still valid.
After adding Jack the table becomes:
id name mptt_left mptt_right mptt_level valid_from valid_till
1 Albert 1 14 1 2000-01-01 9999-12-31
2 Bert 2 5 2 2000-01-01 9999-12-31
3 Chuck 6 13 2 2000-01-01 9999-12-31
4 Donna 7 8 3 2000-01-01 9999-12-31
5 Eddie 9 10 3 2000-01-01 9999-12-31
6 Fred 11 12 3 2000-01-01 9999-12-31
7 Jack 3 4 3 2003-01-01 2004-01-01

And the tree (per 2003-01-01):
          Albert (1,14)
           /       \
         /           \
   Bert (2,5)   Chuck (6,13)
      /           /   |  \
    /           /     |    \
 Jack (3,4)   /       |      \
            /         |        \
       Donna (7,8) Eddie (9,10) Fred (11,12)

When Jack stopped on 2004-01-01 the tree is:
          Albert (1,14)
           /       \
         /           \
   Bert (2,5)   Chuck (6,13)
                  /   |  \
                /     |    \
              /       |      \
             /         |        \
       Donna (7,8) Eddie (9,10) Fred (11,12)

The table is the same as above.

It is very simple to maintain the tree. You can move an item by expiring it and insert it as new node (after having made room) with the same id. Note that the id, the valid_from and valid_till columns form an unique key.

I am now busy with the conversion from our 72,000 records table to a < 900 records table...

No comments: