OSDev.org

Posted: **Fri Dec 21, 2012 10:41 am**

Hi,

in a traditional B-Tree a node is split when it has n+1 entries (when it is over full) producing 2 nodes with n/2 and (n+1)/2 nodes respectively. Also when a node has less than n/2 entries it either steals some entries from its neighbours or is merged with one of its neighbours. When keys are inserted into a B-Tree in order you end up with all nodes half full, the worst case scenario.

I've read about another variant where a full node shoves entries into its neighbours to make space or 2 full nodes are split into 3 nodes with 2n/3 entries. The same way a node with less than 2n/3 entries steals some entries from its neighbours or 3 nodes are merged into 2. This garanties a higher utilization.

In both cases though I can see an access pattern of alternating insert/remove where each operation causes an expensive split or merge.

So my questions are:

What other split/merge rules do you know?
What do you recommend and why?

MfG,
Mrvn

Posted: **Fri Dec 21, 2012 10:57 am**

mrvn wrote:What do you recommend and why?

No without actual measurable data set, and I would just pick any one and measure before I even decide to optimize it or not.

As a side note, from my experience, in many situation if you want sorted data (so you want a tree); instead of finding a better tree there is also possibility that in some case you can do with unsorted data by altering the algorithm and use direct hash.

So think again, why do you need sorted data?

Posted: **Fri Dec 21, 2012 5:08 pm**

bluemoon wrote:So think again, why do you need sorted data?

Who said I want sorted data? I just mentioned that you get the worst case space utilization in a B-Tree if you insert sorted keys. Just to show that it does easily happen.

Posted: **Sat Dec 22, 2012 2:11 am**

mrvn wrote:Who said I want sorted data?

If not, why bother with tree?

Obviously you missed my point in 1st sentence. I meant you cannot compare between algorithm without (sampled) data set and actual measurement.
Each algorithm has its good and worst case scenario, furthermore, and even two algorithm both got the same O(ln) they are not directly comparable since the operation costs are in different unit.(e.g. some function is more complex and take longer to execute)

Therefore, without knowing what you want to do, or any sample data set, I recommend do not change your current implementation until you do some measurement.

Posted: **Sat Dec 22, 2012 10:43 am**

bluemoon wrote:Therefore, without knowing what you want to do, or any sample data set, I recommend do not change your current implementation until you do some measurement.

What I want to do is learn about different split/merge algorithm for B-Trees so I can study them, implement those that probably fit my current use case and compare them. I don't want you to compare them for me and I don't want to have to ask again next month when I have a different use case because you only told me about one algorithm that only fits this weeks data.

OSDev.org

When to split/merge B-Tree nodes?

When to split/merge B-Tree nodes?

Re: When to split/merge B-Tree nodes?

Re: When to split/merge B-Tree nodes?

Re: When to split/merge B-Tree nodes?

Re: When to split/merge B-Tree nodes?