Why still seek and read?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Why still seek and read?

Post by OSwhatever »

I know there is a risk of being beheaded questioning anything about POSIX here but I'm going to take my chances. I wonder why POSIX seek and write still lives on this day. In practice seek and write can be merged into one read call which also state at what position you want to read from. I often see this extra seek call unnecessary and it also impacts the performance by issuing an extra call. I don't know the origin of the seek call but I guess it was made when non-volatile storage was much slower and seeking for a certain position took ages and could also only be read in a sequence (tape). Today the situation is completely different with flash drivers with minimal seek times and "seeking" with a flash drive is really a pointless operation.

As we implement new operating systems, why should we implement seek down to the lowest level?
kfreezen
Member
Member
Posts: 46
Joined: Tue Jul 21, 2009 11:36 am

Re: Why still seek and read?

Post by kfreezen »

Seek and write could be merged into one call, but then what? You would have to keep track of the position iterator on your own. Having seek and read/write as two seperate calls is quite a nice abstraction in my opinion, because, otherwise, for every single call, you would have to read back the amount read (or maybe the new position iterator) into the old position iterator. I don't really know if I like that idea very well.
User avatar
sortie
Member
Member
Posts: 931
Joined: Wed Mar 21, 2012 3:01 pm
Libera.chat IRC: sortie

Re: Why still seek and read?

Post by sortie »

You should probably do some research before asking this question, but there are good answers to your question:

There already exists such a system call: It's called pread(2). It's basically read(2) with an off_t parameter and it has no effect on the implicit file descriptor position used by read(2). There's naturally also a pwrite system call.

The important thing to realize about read(2) is that it works on a lot of difference devices. You are probably just thinking of files that have an offset and are seekable, but there are also streams (pipes, character devices, sockets, ...) that are not seekable. The read system call is meant for streams, while the pread system call is meant for files. When you use read on a file, then it emulates a stream by maintaining a file offset in the file descriptor (which the kernel then uses to rewrite the request to a pread call on the kernel device). The lseek system call is meant to control the emulated stream, but it can also be used to learn things about the file (such as its size, but with Linux extensions, also to navigate sparse files).

Indeed, it's a feature that most programs use read (or fgetc/fread) instead of pread. Unless they really need seekable files, it allows these program to use streams (pipes in the most common cases) as input data rather than just files. Streams can be thought of a subset or inferior version files and if you don't strictly need the file semantics, then just use the stream operations.

You might also want to check out the readv(3) and preadv(3) system calls.

Note how the read and write operations are only required to do at least one byte of IO. For instance, if the read system call was required to read as much data as requested, then it wouldn't really work well with the stream principle of "read as much data as current possible, but at least one byte". read(2) is a good low-level abstraction, but in many cases you want the easier fread that doesn't need reading in a loop for more than single byte.
Last edited by sortie on Mon Nov 18, 2013 3:09 pm, edited 2 times in total.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Why still seek and read?

Post by Combuster »

The major problem is that in practical use, many things can be read and/or written while not being able to be seeked into without such logic - even when tapes are gone you have things such as standard I/O, pipes, sockets and special devices. Therefore logic dictates that the two operations should (still) be separate.

And it's not POSIX per se that's responsible here. The C standard dictates the exact same practice, for pretty much the same reasons.

EDIT: I see Sortie is ninjaediting for the bonus points on this one :D
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Why still seek and read?

Post by Brendan »

Hi,
OSwhatever wrote:As we implement new operating systems, why should we implement seek down to the lowest level?
For "block devices" I've always combined seek and read/write; and made them byte addressable rather than block addressable (e.g. so you can ask to read 123 bytes at offset 4567 if you really want to). I mostly do this because it reduces the number messages and task switches in a micro-kernel (where VFS is in a separate process). I'd also say there's no sane reason why you can't do this and still provide a POSIX/C library that behaves the same, simply by having the library track the current position in the file.

For "character devices", these are streams and not files. Most file operations (seek, append, truncate, copy, etc) don't make any sense for streams. For this reason, I think it was a mistake for C/POSIX to attempt to treat them the same as files.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: Why still seek and read?

Post by bluemoon »

Depends on application usage pattern, read/fread/pread/mmap each have its own pros/cons, you may optimize for certain scenario but there is no best API, and they have no conflict.
User avatar
Jezze
Member
Member
Posts: 395
Joined: Thu Jul 26, 2007 1:53 am
Libera.chat IRC: jfu
Contact:

Re: Why still seek and read?

Post by Jezze »

I chose to use to have offset as part of my read function because that means I dont need to keep a state in the kernel that keeps track of the current position for each file. Less states in general means it is easier to debug and validate for correctness. For streams this offset is simply ignored but you could also make it mean discard this many bytes. This is up to the underlying filesystem.
Fudge - Simplicity, clarity and speed.
http://github.com/Jezze/fudge/
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Why still seek and read?

Post by Combuster »

Jezze wrote:For streams this offset is simply ignored but you could also make it mean discard this many bytes. This is up to the underlying filesystem.
Which means that reading the first byte and second byte would actually read the first and third byte on anything that's not a file... Is that really what you want?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: Why still seek and read?

Post by linguofreak »

Brendan wrote:For "character devices", these are streams and not files. Most file operations (seek, append, truncate, copy, etc) don't make any sense for streams. For this reason, I think it was a mistake for C/POSIX to attempt to treat them the same as files.
Look at it this way: under C/POSIX a stream isn't just treated as a file, but rather "file" really means "stream", and many (indeed most) files are members of a derived class* that implements extra operations. This isn't how the C and POSIX standards describe it or how things are usually talked about, but it's how a system that implements the standards actually behaves.

*C isn't object oriented, so it's a bit of an abuse of terminology to be talking about classes here, but we can look at things in object oriented terms even if they aren't implemented in an object oriented language.
mrstobbe
Member
Member
Posts: 62
Joined: Fri Nov 08, 2013 7:40 pm

Re: Why still seek and read?

Post by mrstobbe »

I'd like to add here that the concept of "seek and read" has serious ramifications on I/O buffering. For example...
  1. Starting at "0" and doing a read has allowed an OS to pre-read buffers as needed (the vast majority of cases so why not?).
  2. Starting at "0" but then seeking potentially wastes some I/O cycles (see point 1), but allows the OS to know where the program is about to read/write.
  3. Reading at one point but then seeking to another and reading again... is well, you know... bad. mkay. But not super bad because of point 2.
Until disk I/O beats memory I/O, I don't think any of the above will obsolete. Seeking is a good behavior. Fuzzy pre-cognition of the user's intention... It's really as simple as that. It's a good thing.
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Why still seek and read?

Post by Kevin »

The OS doesn't really know if the program is going to read from the current offset, if it's going to write there, or if the next thing is an lseek. It can guess that a read will come and optimise for that case. But the same way, it can guess that after a pread(fd, buf, n, off), the next thing the program will do is a read from off + n. It's no more and no less valid than making the assumption with separate read/lseek.
Developer of tyndur - community OS of Lowlevel (German)
User avatar
sortie
Member
Member
Posts: 931
Joined: Wed Mar 21, 2012 3:01 pm
Libera.chat IRC: sortie

Re: Why still seek and read?

Post by sortie »

Programs can use functions like readahead(2) to tell the kernel what memory they will be using shortly. I'm sure there's other also APIs to tell the access pattern.
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Why still seek and read?

Post by Kevin »

True, but I've yet to see anyone use them. ;)

Anyway, my point was that all of these optimisations are not really coupled to the question of read/pread, they work with both.
Developer of tyndur - community OS of Lowlevel (German)
mrstobbe
Member
Member
Posts: 62
Joined: Fri Nov 08, 2013 7:40 pm

Re: Why still seek and read?

Post by mrstobbe »

My point is OS's can do that (and quite frankly should if they aren't already). Seeking someplace is an immediate indicator that, as program, you intend to do something there. If the file's been opened as read only, you know exactly what they plan to do there. Taking advantage of any information provided is clearly a good thing.

EDIT: syntax (comma) to clarify.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Why still seek and read?

Post by Combuster »

Kevin wrote:Anyway, my point was that all of these optimisations are not really coupled to the question of read/pread, they work with both.
Unless you consider why you would want to use pread - it's when the offset for individual reads is likely to be non-consecutive, and thus you could predict that readahead is probably less effective on that file, and spend some disk transfer cycles elsewhere.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply