Disk IO caching?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
earlz
Member
Member
Posts: 1546
Joined: Thu Jul 07, 2005 11:00 pm
Contact:

Disk IO caching?

Post by earlz »

How is disk caching done in most OSs?

I have this idea, but it seems so simple that it seems every OS would have some form of it..

anyway..

So disk reading is given top priority... If a process requests to read a file, it happens right then and there as long as another process isn't also reading a file(in which it'd just have to be put on a queue)

If a file requests to write a file, then that gets put on a memory cache. When there is some free time for the disk(as in, no process trying to read from it) the cache is synched to the disk. Also this works so that when reading from a freshly written file, it is very fast as it is cached.
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Re: Disk IO caching?

Post by pcmattman »

If a process requests to read a file, it happens right then and there as long as another process isn't also reading a file(in which it'd just have to be put on a queue)
You should be able to do multiple reads at once - there's nothing in the filesystem or disk drive stopping you ;)
f a file requests to write a file, then that gets put on a memory cache. When there is some free time for the disk(as in, no process trying to read from it) the cache is synched to the disk. Also this works so that when reading from a freshly written file, it is very fast as it is cached.
I'm personally a fan of reading into cache (where possible, you'll need to be able to dynamically resize your cache area if you run out of RAM) and then all future reads go from that cache, and then as soon as a write is performed it not only updates the cache but also writes to disk.

This means that a write will *always* go to disk, but a read can come from cache.

There are of course other, faster, ways to go about caching but this is probably the easiest to get your head around and requires minimal effort to implement.
User avatar
salil_bhagurkar
Member
Member
Posts: 261
Joined: Mon Feb 19, 2007 10:40 am
Location: India

Re: Disk IO caching?

Post by salil_bhagurkar »

Write caches are maintained until the system finds enough writes to a physically localised space on the hard disk so that they can be done in the shortest possible time with minimum possible time consuming disk head seeks. Waiting until the system goes idle, may not always work, as you don't always have enough ram to store all those writes.
earlz
Member
Member
Posts: 1546
Joined: Thu Jul 07, 2005 11:00 pm
Contact:

Re: Disk IO caching?

Post by earlz »

salil_bhagurkar wrote:Write caches are maintained until the system finds enough writes to a physically localised space on the hard disk so that they can be done in the shortest possible time with minimum possible time consuming disk head seeks. Waiting until the system goes idle, may not always work, as you don't always have enough ram to store all those writes.
well, of course there would be a limit until it starts flushing the cache..

Also, head seek time is negligible with most any harddrive made within the last few years.. and there there is thumbdrives and SSDs... I don't think that kind of approach would work well on the new hardware people have...
Hyperdrive
Member
Member
Posts: 93
Joined: Mon Nov 24, 2008 9:13 am

Re: Disk IO caching?

Post by Hyperdrive »

earlz wrote:
salil_bhagurkar wrote:Write caches are maintained until the system finds enough writes to a physically localised space on the hard disk so that they can be done in the shortest possible time with minimum possible time consuming disk head seeks. Waiting until the system goes idle, may not always work, as you don't always have enough ram to store all those writes.
well, of course there would be a limit until it starts flushing the cache..

Also, head seek time is negligible with most any harddrive made within the last few years.. and there there is thumbdrives and SSDs... I don't think that kind of approach would work well on the new hardware people have...
For SSDs there are other things you probably want to consider. E.g. erase block size. There is a T13 proposal related to that: Soild State Drive Identify Proposal for ATA8-ACS.

--TS
User avatar
salil_bhagurkar
Member
Member
Posts: 261
Joined: Mon Feb 19, 2007 10:40 am
Location: India

Re: Disk IO caching?

Post by salil_bhagurkar »

earlz wrote:
salil_bhagurkar wrote:Write caches are maintained until the system finds enough writes to a physically localised space on the hard disk so that they can be done in the shortest possible time with minimum possible time consuming disk head seeks. Waiting until the system goes idle, may not always work, as you don't always have enough ram to store all those writes.
well, of course there would be a limit until it starts flushing the cache..

Also, head seek time is negligible with most any harddrive made within the last few years.. and there there is thumbdrives and SSDs... I don't think that kind of approach would work well on the new hardware people have...
IMHO disk seek times are still not that negligible on newer hard drives. I don't know much about high end systems which might use better technology. Random r/w on disks still slow down the speed to a few MBs per second as against sequential r/w which can go up to 50-60 MB per second for a typical home computer.

As far as drives other than hds are concerned, you don't need to worry about data localisation.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Disk IO caching?

Post by Combuster »

Its also the head settling time that cuts you down here. If you read a disk sequentially, you read all the tracks in the same cylinder in succession. Which means that you only need one head seek, once that's done, you can chunk out all the tracks in one go the moment they pass under the head.

If you do random access, you'll end up moving the head around for each (group of) sector, which means you have the seek and settling time for each individual access instead of just one.

While HDs are the easiest to to discuss, data locality is just as important on some non-hd media, like floppy drives, ZIP drives and CDs (where the delays are much higher) as well as tape drives (I so hope you are not going to seek on one of those) to name a few.

Only solid state storage are close to being as fast with random reads compared to sequential reads.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Disk IO caching?

Post by jal »

Combuster wrote:Its also the head settling time that cuts you down here. If you read a disk sequentially, you read all the tracks in the same cylinder in succession. Which means that you only need one head seek, once that's done, you can chunk out all the tracks in one go the moment they pass under the head. If you do random access, you'll end up moving the head around for each (group of) sector, which means you have the seek and settling time for each individual access instead of just one.
So this seems to imply that if the next read is close enough (in terms of disk geography) to the new read, it may be better to continue reading and discard some data, as there's no disk up/down/settle, right?


JAL
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Disk IO caching?

Post by Brendan »

Hi,
jal wrote:
Combuster wrote:Its also the head settling time that cuts you down here. If you read a disk sequentially, you read all the tracks in the same cylinder in succession. Which means that you only need one head seek, once that's done, you can chunk out all the tracks in one go the moment they pass under the head. If you do random access, you'll end up moving the head around for each (group of) sector, which means you have the seek and settling time for each individual access instead of just one.
So this seems to imply that if the next read is close enough (in terms of disk geography) to the new read, it may be better to continue reading and discard some data, as there's no disk up/down/settle, right?
For hard drives, if the new read is on a different cylinder then you have to shift the disk heads to get to it and continuing to read from the wrong cylinder won't help. If the data is on the same cylinder then you can access it without shifting the heads, and continuing to read still won't help.

For CD-ROMs the data is in a big spiral (and not in cylinders like on hard drives), so continuing to read can prevent the need to shift the head. However, I'd assume CD-ROM drives automatically do this for you (e.g. if you ask for a sector that's fairly close to where the head is, then it'll keep tracking the spiral instead of shifting the heads). Spirals suck for a different reason - reading from sector N and then asking for sector N-1 always involves shifting the heads.

Also note that if you always do reads/writes according to how close they are to the current head position, then you can have reads/writes at the start or end of the disk that never get done (if the heads are kept busy doing reads/write in the middle of the disk). Even if these reads/writes at the start or end of the disk do get done it's likely that they'll wait for an unfair amount of time before they're done. To prevent this it's better to do things in ascending order (e.g. do all reads/writes for cylinder 0, then cylinder 1, then cylinder 2, ..., then the last cylinder, then wrap back to cylinder 0 again). Fortunately, this works well for CD_ROMs too, and you can just do all reads/writes in order of their starting LBA address for both hard drives and CD-ROMs.

The other problem is that some reads/writes are more important than others. For example, if you need to read a page from swap space and also need to read a page for defragmenting the file system in the background, and the second read is closer to the heads, then which read would you do first?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
salil_bhagurkar
Member
Member
Posts: 261
Joined: Mon Feb 19, 2007 10:40 am
Location: India

Re: Disk IO caching?

Post by salil_bhagurkar »

Brendan wrote: The other problem is that some reads/writes are more important than others. For example, if you need to read a page from swap space and also need to read a page for defragmenting the file system in the background, and the second read is closer to the heads, then which read would you do first?
That is a very interesting concept and you can very effectively see how it works when you work with vista+ oses. They have this I/O priority which generally assigns a background priority for processes like defragmentation, due to which in this case, even if loading the page from the swap space is expensive, it would be given more importance. This makes the system more responsive and more productive where it needs to be.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Disk IO caching?

Post by Brendan »

Hi,
salil_bhagurkar wrote:That is a very interesting concept and you can very effectively see how it works when you work with vista+ oses. They have this I/O priority which generally assigns a background priority for processes like defragmentation, due to which in this case, even if loading the page from the swap space is expensive, it would be given more importance. This makes the system more responsive and more productive where it needs to be.
Yes.

The other thing to consider is being able to cancel I/O requests. For example, imagine if the application asks the VFS to read from a file, the VFS asks the file system to fetch some data for the file, then the file system asks the storage device to load some sectors, then the application terminates. In this case you'd want the VFS to tell the file system to cancel the read, and the file system to tell the file system to cancel the read, and for everything to work right regardless of whether the read was canceled or not (because the operation may have been canceled too late).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: Disk IO caching?

Post by bewing »

Brendan wrote: For CD-ROMs the data is in a big spiral (and not in cylinders like on hard drives)
Well, not exactly. For CD-Rs, there are two formatting "types" for blank media. There are Audio CDs, and Data CDs. 99% of what you will find in stores is Data CDs. You probably need to order Audio CDs directly from a distributor (and they cost a lot more). Data CDs are recorded in actual cylinders, not a helical spiral. Audio CDs are helical spirals. You can burn both audio and data to either type of CD -- the only real difference is that on an Audio CD the head does not need to be moved from track to track if you are reading the disc sequentially. So if you burn music to a data CD, then you are making the CD drive to a teeny bit of extra work, because at the end of each track it is expecting the head to automatically move to the next track -- but it doesn't -- so it must do one extra head movement on every rotation.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Disk IO caching?

Post by Brendan »

Hi,
bewing wrote:
Brendan wrote: For CD-ROMs the data is in a big spiral (and not in cylinders like on hard drives)
Well, not exactly. For CD-Rs, there are two formatting "types" for blank media. There are Audio CDs, and Data CDs. 99% of what you will find in stores is Data CDs. You probably need to order Audio CDs directly from a distributor (and they cost a lot more). Data CDs are recorded in actual cylinders, not a helical spiral.
For CD-R, wikipedia says:
"The blank disc has a pre-groove track onto which the data are written. The pre-groove track, which also contains timing information, ensures that the recorder follows the same spiral path as a conventional CD."

AFAIK all optical disks, including CD, CD-R, CD-RW, DVD, Blu-Ray, etc are written in spirals. This is partly for compatibility with the original CD format (e.g. so no extra mechanics or control logic are needed in drives to support all formats), and mainly to avoid undesirable jumps/delays when reading data sequentially.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: Disk IO caching?

Post by bewing »

I've been a (tiny) computer store owner for 12 years, and my info comes directly from my recordable media distributor (Horizon USA). So I'm willing to directly challenge wikipedia on this one. There are both "audio" and "data" blank CDs, they are different, and the difference is helical vs circular tracks.
User avatar
JackScott
Member
Member
Posts: 1031
Joined: Thu Dec 21, 2006 3:03 am
Location: Hobart, Australia
Contact:

Re: Disk IO caching?

Post by JackScott »

<$0.02>
While I'm not confident either way, what bewing says does make sense in that I can remember back in the day when your old audio CD players couldn't play CD-Rs. My portable CD player doesn't read CD-RWs. Just anecdotal evidence, but it would make sense.

Although, it could have something to do with the reflective chemicals used?
</$0.02>
Post Reply