Page 1 of 1

FILE buffer - delayed or not? (PDCLib design Q)

Posted: Wed Aug 12, 2009 6:18 am
by Solar
In a C lib, a stream has an attached buffer.

This buffer can be changed by use of the setvbuf() function, which allows to use different buffer sizes (than the default BUFSIZ), and to use user-allocated memory for the buffer.

My design question. I could:

A) Allocate the stream buffer in fopen(). Disadvantage: If the buffer is changed via setvbuf(), I've wasted a malloc() / free() cycle on the default buffer (which I no longer need). If the stream is closed without any real I/O operation, I also wasted a malloc() / free() cycle (two, if the user actually did call setvbuf() and closed the stream without real I/O). (That last part is actually quite uncommon, I'd think.)

B) Delayed allocation of the stream buffer on first "real" I/O operation. Disadvantage: On each (internal) fill-the-buffer call, and on every write-to-buffer call, I have to check if the buffer has already been allocated.

Apparently (according to man pages), glibc does B), although I couldn't imagine why. I consider A) the much cleaner design, and think that the chances of A) recieving a performance hit is slim (as I consider setvbuf() a rarely-used function).

However, I always say "do not assume", so I would like to hear your opinions...?!?


PS: Not a vote, because I want to hear the reasoning, not a count of hands. ;)

Re: FILE buffer - delayed or not? (PDCLib design Q)

Posted: Wed Aug 12, 2009 6:48 am
by NickJohnson
I really doubt there would be any significant difference in speed, considering that a write-to-buffer operation should really only happen once every fwrite() operation, a single if statement takes so little time, and the applications it impacts would be limited by the OS's I/O performance anyway. So I would agree with your choice of A, because it is at least slightly cleaner, and if anything slows down stuff outside the inner loop as opposed to inside it. It's still kind of splitting hairs though.

Re: FILE buffer - delayed or not? (PDCLib design Q)

Posted: Wed Aug 12, 2009 7:08 am
by Brendan
Hi,

For streams, I'd probably only use "setvbuf()" if I thought "malloc()" would fail. For example, create a buffer in the ".bss" so that I could open a file when the heap is exhausted (or when the heap is corrupted, possibly for debugging purposes?).

If you want a much cleaner design, then you could hide the ugly bits in a library, or... Oh, sorry. ;)


Cheers,

Brendan

Re: FILE buffer - delayed or not? (PDCLib design Q)

Posted: Wed Aug 12, 2009 1:01 pm
by Craze Frog
I have a third solution. Have a static global variable to hold any "unused" buffer.

On the first call to fopen(), allocate the buffer.
On any call to setvbuf(), check if your global variable is null. If it is, instead of freeing the old buffer, assign it to the global variable. If the global variable is not null, free() the stream's buffer.

On subsequent calls to fopen(), check if the global variable is 0. If it is, allocate the file buffer with malloc(). Else, simply use the file buffer stored in the global variable and set it to 0.

This way, the only case with bad performance is when the program opens a lot of files and then does setvbuf() on the streams after opening all the file (instead of using setvbuf() at once after each file).

Re: FILE buffer - delayed or not? (PDCLib design Q)

Posted: Wed Aug 12, 2009 1:27 pm
by NickJohnson
Craze Frog wrote:I have a third solution. Have a static global variable to hold any "unused" buffer.

On the first call to fopen(), allocate the buffer.
On any call to setvbuf(), check if your global variable is null. If it is, instead of freeing the old buffer, assign it to the global variable. If the global variable is not null, free() the stream's buffer.

On subsequent calls to fopen(), check if the global variable is 0. If it is, allocate the file buffer with malloc(). Else, simply use the file buffer stored in the global variable and set it to 0.

This way, the only case with bad performance is when the program opens a lot of files and then does setvbuf() on the streams after opening all the file (instead of using setvbuf() at once after each file).
That definitely would solve all the *performance* problems with either design, but it introduces some style/implementation issues. First, it's non-obvious and uses global variables when there are implementations that don't. Second, it uses a standard buffer's worth of extra memory if setvbuf() is called. Third, what happens if you call setvbuf() twice on the same file descriptor? The buffer you store in the global variable has to be something you can use later, but if you cache an arbitrarily sized buffer set by the first setvbuf(), it would be useless to fopen() and probably take up even more memory.

Once again, I really think it's splitting hairs in terms of speed - a small part of fopen(), fwrite(), and/or setvbuf() is not the major performance bottleneck in any real program. :roll:

Re: FILE buffer - delayed or not? (PDCLib design Q)

Posted: Wed Aug 12, 2009 3:47 pm
by Solar
NickJohnson wrote:Third, what happens if you call setvbuf() twice on the same file descriptor?
Undefined behaviour in any case.

But I don't like to have more globals flying around than strictly necessary. They'll become a pain as soon as I try to make the lib multithreading-safe.

I go with the assignment on fopen(). Thanks for the input!

Re: FILE buffer - delayed or not? (PDCLib design Q)

Posted: Thu Aug 13, 2009 4:47 am
by Craze Frog
Third, what happens if you call setvbuf() twice on the same file descriptor?
Free the previously allocated buffer and allocate a new one.
I go with the assignment on fopen(). Thanks for the input!
Hehe, now that's a pun.

Re: FILE buffer - delayed or not? (PDCLib design Q)

Posted: Thu Aug 13, 2009 6:09 am
by Solar
Craze Frog wrote:
Third, what happens if you call setvbuf() twice on the same file descriptor?
Free the previously allocated buffer and allocate a new one.
As I said, setvbuf() may be called (successfully) only once on a given stream. Behaviour of a second call is undefined.