Voix event model...
Posted: Tue Mar 27, 2007 1:50 pm
Ok, can't resist describing this, as it took me at some days to design, and another to proof-of-concept (currently my pipes support it).
Let's first look at the issue: you want to read/write to various files/sockets/whatever, most of which are slow, and which you have a lot (say a webserver serving 10k people) and you want to serve them in one thread to avoid spawning a HUGE number of threads (which is bad) but you need to serve them in whatever order is available to avoid having that guy that's fetching stuff on gigabit link to wait for that other guy on 300baud modem fetching stuff your webserver needs to load over NFS ... well got the point..
Or a webbrowser that wants to read lots of sites, while at the same time playing responsive with the user.. ok you could spawn threads for that in many cases, but it's neater to be able to do it all without. So: traditional approach in Unix style system would to make all files non-blocking, and then select() or poll() them all to see where useful work can be done.
Works fine for 10 files/sockets. Works not so fine for 10k files/sockets, because you'll spend tons of time running through the list of descriptors to be waited on, and rebuilding and tearing down structures.So.. mm.. I decided my kernel doesn't natively support that kind of sillyness at all (though it can be emulated ofcourse).
Instead I designed the following scheme (which I guess is not too different from Linux new epoll, but I haven't looked at it's implementation so I wouldn't know): file descriptors keep track of events that have happened (ev_high). They also contain a mask field (ev_mask), which says which events are of interest. Finally they have a queue field (ev_queue), which says where interesting stuff is to be reported.
So basicly, every time (ev_high&ev_mask) transitions from zero to non-zero, pointer to said open file object is posted to ev_queue. All three are per file-descriptor. Since ev_high is only ever cleared by explicitly manipulating the desciptor (say, ev_can_read is cleared by trying to read, if after the attempt there is nothing that can be read without blocking), it's only ever necessary to have one slot in queue for each descriptor reporting to that queue, and overflows are impossible (provided system calls are suitably written to take care of not to break things).
So one can do, among other things:
- create temporary queue of 1 slot, set a filedescriptor to point to it, wait on it (to wait on a single object)
- get a poll list, create a temporary queue worth it's size, set filedescriptors to point to it, wait on it, tear it down, and report to userspace the list of events
- create a permanent queue, worth whatever size one wants to have the maximum amount of waitable files, put whatever files to point on it (on the same or different system calls), use it for a while, waiting on it several times, whatever... remove / add file descriptors as see fit, and never need to push big lists into the kernel
It's also rather easy to temporarily take filedescriptor, point it to a temporary queue, wait, and then make it point back to where it pointed, if one wants to do a blocking call to a descriptor that is normally part of a wait-list.
So... one of my goals stated in the mission statement in the announcement forum is now complete: I have a system which allows waiting on arbitary number of objects, as long as they are visible to users as what is ATM called file descriptors..
They need not even support read()/write() or anything like that. I'll rename file descriptors to handles later if it seems I can't convince myself that everything is represented by a file descriptor.
Limitation of the system is that a single file descriptor can only be reporting to a single queue at any time, but then again several threads can wait on the same queue, so I don't see this as such a huge problem.
Let's first look at the issue: you want to read/write to various files/sockets/whatever, most of which are slow, and which you have a lot (say a webserver serving 10k people) and you want to serve them in one thread to avoid spawning a HUGE number of threads (which is bad) but you need to serve them in whatever order is available to avoid having that guy that's fetching stuff on gigabit link to wait for that other guy on 300baud modem fetching stuff your webserver needs to load over NFS ... well got the point..
Or a webbrowser that wants to read lots of sites, while at the same time playing responsive with the user.. ok you could spawn threads for that in many cases, but it's neater to be able to do it all without. So: traditional approach in Unix style system would to make all files non-blocking, and then select() or poll() them all to see where useful work can be done.
Works fine for 10 files/sockets. Works not so fine for 10k files/sockets, because you'll spend tons of time running through the list of descriptors to be waited on, and rebuilding and tearing down structures.So.. mm.. I decided my kernel doesn't natively support that kind of sillyness at all (though it can be emulated ofcourse).
Instead I designed the following scheme (which I guess is not too different from Linux new epoll, but I haven't looked at it's implementation so I wouldn't know): file descriptors keep track of events that have happened (ev_high). They also contain a mask field (ev_mask), which says which events are of interest. Finally they have a queue field (ev_queue), which says where interesting stuff is to be reported.
So basicly, every time (ev_high&ev_mask) transitions from zero to non-zero, pointer to said open file object is posted to ev_queue. All three are per file-descriptor. Since ev_high is only ever cleared by explicitly manipulating the desciptor (say, ev_can_read is cleared by trying to read, if after the attempt there is nothing that can be read without blocking), it's only ever necessary to have one slot in queue for each descriptor reporting to that queue, and overflows are impossible (provided system calls are suitably written to take care of not to break things).
So one can do, among other things:
- create temporary queue of 1 slot, set a filedescriptor to point to it, wait on it (to wait on a single object)
- get a poll list, create a temporary queue worth it's size, set filedescriptors to point to it, wait on it, tear it down, and report to userspace the list of events
- create a permanent queue, worth whatever size one wants to have the maximum amount of waitable files, put whatever files to point on it (on the same or different system calls), use it for a while, waiting on it several times, whatever... remove / add file descriptors as see fit, and never need to push big lists into the kernel
It's also rather easy to temporarily take filedescriptor, point it to a temporary queue, wait, and then make it point back to where it pointed, if one wants to do a blocking call to a descriptor that is normally part of a wait-list.
So... one of my goals stated in the mission statement in the announcement forum is now complete: I have a system which allows waiting on arbitary number of objects, as long as they are visible to users as what is ATM called file descriptors..
They need not even support read()/write() or anything like that. I'll rename file descriptors to handles later if it seems I can't convince myself that everything is represented by a file descriptor.
Limitation of the system is that a single file descriptor can only be reporting to a single queue at any time, but then again several threads can wait on the same queue, so I don't see this as such a huge problem.