SuperLeaf1995 wrote:read lock: "please anyone don't read anything because i will be writing over here"
write lock: "please anyone don't write anything because i will be reading over here"
This is usually called a read-write lock and the meaning is typically the reverse of what you want. A read-write lock can be read-locked an arbitrary number of times, but can be write-locked only when it is free (not even read-locked). The usual usage is to read-lock a structure in places where it will be read, and write-lock it in places where it will be written. A connection between read and write is necessary, because otherwise it is possible for readers and writers to miss each other, and cause data races, and nobody wants that.
A simple implementation of this would just use a counter. When the counters is 0, the lock is free, when it is positive, the lock is read-locked, and when it is -1, it is write-locked. Read-locks basically just atomically increment the counter, but special case a read of -1, which means the task has to be suspended until the counter is no longer -1. Unlocks from read-locks then just atomically decrement the counter, and maybe wake up any waiting thread if the counter is 0 by the end. Write locks just try to atomically replace a 0 with a -1 in the counter. If that fails (because the counter is not 0), they must block until it is zero. That implementation is simple, but can lead to writer starvation: If the lock is constantly in read-locked state, because multiple threads are constantly reading the same data structure, so statistically, some thread will always be in the critical section, then the writers will never be able to observe a 0 value on the counter, and a write lock will never succeed.
A more complicated algorithm would be the ticket algorithm. The idea is like at the DMV: Everyone who comes in takes a ticket and waits for that ticket to be called. This would allow for some fairness, but the algorithm can be very tricky to implement. Especially for a read-write lock. You may want to allow for early return for readers unless there is a writer queued up. So that requires you to track the number of writers queued up. Unlocking is a bit tricky as well. You can't just increase a counter, you would have to set it to your own ticket number, unless the counter is already higher than that. Then there is handling the multiple readers: You don't want to just always call up the next ticket, since the ticket just turned in might be a reader, but the next one might be a writer. With multiple readers allowed into the lock at the same time, it is possible that some might overtake each other. There is also the question of how to handle failure and back-off. Maybe a signal arrived in a thread while blocking. Alright, how do you back out of this ticket operation? By the time you back off, you might no longer be the most recent arrival. And how do you handle overflow? All of these are tricky questions to answer.
SuperLeaf1995 wrote:Locks are created per node and unlocked when node is no longer used.
Well, that is hard. How does that work? If the lock is inside the node, than whoever takes a lock must already know the address of a node. So the lock inside the node cannot protect the node itself, only its contents. Consider this pseudo-code:
Code: Select all
find_node(key):
n = root
while (n):
lock(n->lock)
if (key < n->key)
next = n->left
else if (key > n->key)
next = n->right
else
return n
unlock(n->lock)
n = next
This already illustrates one problem: Does the code return the node in locked state or not? If yes, the calling code must unlock the node when it is done. Possible but error-prone. If not, then a different process might delete the node from the tree as soon as the unlock is processed. Or you could return only the value, but then you must copy the value into a temporary before releasing the lock, so there are no accesses to the node while the lock is not held.
But now for the actual reason to bring this up: The problem is, as soon as the unlock is called, that node and all of its children might vanish. Therefore, the access to the nodes child-links, even though copied to a temporary, might be invalid, since by the time the lock in the child nodes is taken, the child node itself might no longer exist. Or look at the problem from the other side: While the locator thread is between the unlock on its current node and the lock on the next, it is holding no locks. Therefore, finding the locks free is not actually a guarantee that nobody holds a reference to the node, and so it is never safe to delete a node. Garbage collection would take care of this at great cost.
Whereas, if you just had one big lock for the tree, if you hold it, you can manipulate the tree however you want, since you definitely are the only manipulator in the tree at that time. If the lock is also generally not held for a great length of time (only for a single look-up, insert, or remove), then the fine-grained approach may very well be more trouble than worth, especially considering:
SuperLeaf1995 wrote:The tree does not self-balance, to make initial insertions faster
Unless you know your input data is not going to devolve into pathological cases (sorted input sequence being a pathological case), this is a horrible idea. Especially with the fine-grained locking approach. An unbalanced tree can devolve into a linked list (every node with only one child). If that were to happen, the order of locking would be the same in all threads that attempt to use the list, and therefore no overtaking of threads would be possible. All threads would queue up at exactly the same locks in the same order, and you get the same effect as if you just had a big tree lock, but in a more complicated fashion.
With binary trees, deletion is always more complicated than insertion. If ease of implementation is your primary concern, might I suggest using Anderson trees? And if speed is the reason: Red-black trees will only require at most two tree rotations on insert. It does not get much better.
bzt wrote:Why do you need read locks? It's enough to have only write locks. If the lock is free, nodes can be read. If it's set, then further reads and writes are put on hold.
What? Then if a reader is preempted after testing the lock, but before actually performing the read, it will read stale data. A read-write lock is necessary so the writer can know with confidence that no other thread is currently accessing the same data structure.