The read and write functions (and their pread and pwrite counterparts, and the readv, writev, preadv, and pwritev functions) have really good semantics as IO primitives, it's just not always what the user expects. I meant to write a blog post about the functions and why they are good primitives, but I'll just write something here, but I just woke up.
The read and write functions promises to either fail, or read/write at least one byte of IO. The exact behaviour of the functions depend on the code implementing the inode. The kernel code will usually do whatever is easier. For instance, if a file is being read, but only 42 bytes is in cache, and 1337 bytes were requested, then it's perfectly reasonable to just give 42 bytes. Perhaps the program is able to do something with these bytes. If it really needed more bytes, then the program would invoke the read system call again.
Alternatively, try to imagine if the system calls required all the IO to finish. That would mean network programming would become bothersome, normally you allocate a larger buffer and read into it, and the kernel would fill in whatever it can. If you forced it to full it all up, then you could easily deadlock if the remote sends all the data it wishes and then waits for a response, but the request wasn't enough to fill up the buffer. The same applies to pipes and unix sockets, though arguably it isn't a problem with files - but it's best to keep the same IO semantics for pipes and files. That said, the IO is likely always completed entirely with files, and a lot of programs unfortunately depend on this implementation detail.
It's also worth noting the return type is ssize_t, but the count parameter is size_t. It is unspecified what happens if the count is over SSIZE_MAX, but given the "give me at least one byte or fail" semantics, it's reasonable to simply do if ( (size_t) SSIZE_MAX < count ) { count = SSIZE_MAX; } and truncate the request.
(The same discussion applies to the other mention read and write functions)
There is another possible IO primitive that could give the user more control and settle this: "Give me at least x bytes, but at most y bytes". It is a bit more bothersome to implement in the kernel, and perhaps not even worth it, as it is trivial to build such functions upon the read/write functions:
Code: Select all
size_t readleast(int fd, void* buf, size_t least, size_t max)
{
ssize_t amount = read(fd, buf, max);
if ( amount < 0 ) { return 0; }
if ( least && !amount ) { return 0; /* unexpected EOF */ }
if ( (size_t) amount < least )
{
void* nextbuf = (uint8_t*) buf + amount;
size_t nextleast = least - amount;
size_t nextmax = max - amount;
amount += readleast(fd, nextbuf, nextleast, nextmax);
}
return amount;
}
size_t writeleast(int fd, const void* buf, size_t least, size_t max)
{
ssize_t amount = write(fd, buf, max);
if ( amount < 0 ) { return 0; }
if ( least && !amount ) { return 0; /* unexpected EOF */ }
if ( (size_t) amount < least )
{
const void* nextbuf = (const uint8_t*) buf + amount;
size_t nextleast = least - amount;
size_t nextmax = max - amount;
amount += writeleast(fd, nextbuf, nextleast, nextmax);
}
return amount;
}
The key thing about read/write is that the kernel code can do whatever is easiest and most efficient and then rely on the program to do another call if that wasn't enough. This potentially even makes the system more responsive. Note how these semantics are great for a kernel, but it's not really what users expect. This is why layers such as FILE with fread/fwrite has been built upon the Unix IO primitives. However, a large number of programs use the primitives directly, which means they have to deal with the primitives likely not being what they want. I provide the above functions in my libc to ease file descriptor programming. I also provide preadall, pwriteall, preadleast, pwriteleast, preadall, and pwriteall. (the all versions is simply a call where least=max, that is "give me exactly N bytes of input and only less upon error"). You can check for errors in these calls if they return less than least. An error could potentially have occured if they return something between least and max, but it's not an error for your program at this point, and you'll get the error for real on the next read call on the file descriptor.
I hope this clears things up. I would advise against changing these semantics to cater for a higher level like FILE or C++ streams, but rather just implement those layers using this advise. Also note that you are free to make read/write on files always complete with the requested amount, but this will likely make programs written for your OS non-portable, because they assume these semantics, which will do all other operating systems a disfavour. It's better to make people use a higher level API or some extensions like readleast.