Re: How do B-tree-based file systems work?
Posted: Sun Mar 13, 2016 12:42 am
When I first sat down to write a response, I confused myself. That, combined with the fact that I'm using iffy terminology at best, should show you the grip I have on the subject. It might stem from the fact that I'm trying to think of B-trees in terms of linked lists, a data structure I understand well enough. So, let's reapproach this.alexfru wrote: Within a node, keys and their respective values are logically tied/paired. You can have an array of k-v pairs in a node. Or you can have it split into two distinct arrays, an array of keys and an array of values, still directly corresponding to one another.
Obviously, if you want quick key search (which is the hole point of the data structure at hand), you keep the array of keys ordered/sorted. So, when you insert a new key between, let's say, key3 and key4 (because key3 < new key < key 4), you also insert its corresponding value between value3 and value4 (irrespective of how values relate to one another). The new key and its value both have the same index in separate arrays or in the same array, the index that used to belong to key4-value4 pair. key4-value4 moves one place.
The same ordering logic applies to links to children as those links are stuck between adjacent keys and there are two extra links on the sides.
I'm sorry, I'm having trouble understanding what it is that you don't understand.
I'm looking at the Wikipedia article for a 2-3 tree. It says,
So, it sounds like this is a general representation of the following nodes/leaves (some of the syntax is pseudocodish, I wrote it quickly to see if I was getting the point):Wikipedia wrote:...where every node with children (internal node) has either two children (2-node) and one data element or three children (3-nodes) and two data elements. Nodes on the outside of the tree (leaf nodes) have no children and one or two data elements.
Code: Select all
typedef struct _node2 {
data_t data; // Some hypothetical data
node* children[2]; // Two child links
} node2_t; // 2-node
typedef struct _node3 {
data_t data[2]; // Two data elements
node* children[3]; // Three child links
} node3_t; // 3-node
typedef struct _leaf1 {
data_t data;
} leaf1_t; // Leaf with one data element
typedef struct _leaf2 {
data_t data[2];
} leaf2_t; // Leaf with two data element
As to the key-value stuff, is this as simple as exchanging the hypothetical "data_t" element in the above representation to say, char* key and char* value?
I suppose what is tripping me up now is the determination of where keys go in the tree. The Wikipedia article's section on insertion talks about searching for the proper insertion point for the key - what determines where an arbitrary key goes when inserted into the tree?