Yeah. I like those ideas. Let me and "us" I should say know what you get working since I would not mind knowing myself. I have virtually no experience in designing a cache system.
I would enjoy reading about what you or anyone else gets working in their operating system for a cache.
Asynchronous and/or non-blocking IO for disks?
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
Well, for traditional, single device cache, the main problem is figuring out what blocks are worth keeping. LRU is common, but has the problem that a sequential scan (either one-pass, or a loop larger than what fits into the cache) can trash the whole cache for absolutely no benefit. In the worst case of loop over N+1 blocks where N is the size of the cache, one ends up always throwing away the block that's going to be needed next..
So somehow one wants to prefer blocks that have actually chance of surviving in the cache until the next access, and somehow one wants to avoid keeping in cache blocks that will never be accessed again. The device-speed dependency then complicates this by also requiring that we can come up with a cost-estimate which we can scale to device speed.
Seems one simple strategy to avoid the LRU worst case of sequential scan trashing it all, is to keep two caches: a smaller cache where any new block is loaded, and then a larger cache where we move blocks that we requested while they were in the smaller cache. More or less the idea of so-called 2Q (two queues) strategy. I think this would be workable for multiple-device case, if one keeps track of average load times for each device and two-queues per device.
Now, normally one keeps some ratio between the sizes of the two queues. Say, the accessed-once queue is 1/3 of the size of accessed-twice queue. For the device-speed dependant case then, one would handle the accessed-once queues of all devices as a single queue with the speed-scaled LRU version, the accessed-twice queues of all devices in the same way.
That would be pretty simple to implement. Downside is that it's obviously suboptimal, since it'll never promote any blocks into the accessed-twice queue if you loop over data longer than what fits in. Optimal strategy would keep the cache full with constant blocks, and not even try to keep rest of the blocks in the cache. So the other LRU worst case remains.
Anyway, I've currently got a block-device cache of 256 entries total, and I've noticed that together with a floppy-drive, that's actually pretty nice for testing cache performance, as it's small enough to throw directories out of the cache when you read through a long file (exactly the worst case for LRU). So I guess I could modify it to use a 256 entries total cache for all devices combined, and then try to build something that performs well on top of that. Floppy's nice because any cache hit/miss is perceivable, thanks to awful access times.
So somehow one wants to prefer blocks that have actually chance of surviving in the cache until the next access, and somehow one wants to avoid keeping in cache blocks that will never be accessed again. The device-speed dependency then complicates this by also requiring that we can come up with a cost-estimate which we can scale to device speed.
Seems one simple strategy to avoid the LRU worst case of sequential scan trashing it all, is to keep two caches: a smaller cache where any new block is loaded, and then a larger cache where we move blocks that we requested while they were in the smaller cache. More or less the idea of so-called 2Q (two queues) strategy. I think this would be workable for multiple-device case, if one keeps track of average load times for each device and two-queues per device.
Now, normally one keeps some ratio between the sizes of the two queues. Say, the accessed-once queue is 1/3 of the size of accessed-twice queue. For the device-speed dependant case then, one would handle the accessed-once queues of all devices as a single queue with the speed-scaled LRU version, the accessed-twice queues of all devices in the same way.
That would be pretty simple to implement. Downside is that it's obviously suboptimal, since it'll never promote any blocks into the accessed-twice queue if you loop over data longer than what fits in. Optimal strategy would keep the cache full with constant blocks, and not even try to keep rest of the blocks in the cache. So the other LRU worst case remains.
Anyway, I've currently got a block-device cache of 256 entries total, and I've noticed that together with a floppy-drive, that's actually pretty nice for testing cache performance, as it's small enough to throw directories out of the cache when you read through a long file (exactly the worst case for LRU). So I guess I could modify it to use a 256 entries total cache for all devices combined, and then try to build something that performs well on top of that. Floppy's nice because any cache hit/miss is perceivable, thanks to awful access times.
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.