FILE buffer - delayed or not? (PDCLib design Q)

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

FILE buffer - delayed or not? (PDCLib design Q)

Post by Solar »

In a C lib, a stream has an attached buffer.

This buffer can be changed by use of the setvbuf() function, which allows to use different buffer sizes (than the default BUFSIZ), and to use user-allocated memory for the buffer.

My design question. I could:

A) Allocate the stream buffer in fopen(). Disadvantage: If the buffer is changed via setvbuf(), I've wasted a malloc() / free() cycle on the default buffer (which I no longer need). If the stream is closed without any real I/O operation, I also wasted a malloc() / free() cycle (two, if the user actually did call setvbuf() and closed the stream without real I/O). (That last part is actually quite uncommon, I'd think.)

B) Delayed allocation of the stream buffer on first "real" I/O operation. Disadvantage: On each (internal) fill-the-buffer call, and on every write-to-buffer call, I have to check if the buffer has already been allocated.

Apparently (according to man pages), glibc does B), although I couldn't imagine why. I consider A) the much cleaner design, and think that the chances of A) recieving a performance hit is slim (as I consider setvbuf() a rarely-used function).

However, I always say "do not assume", so I would like to hear your opinions...?!?


PS: Not a vote, because I want to hear the reasoning, not a count of hands. ;)
Every good solution is obvious once you've found it.
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: FILE buffer - delayed or not? (PDCLib design Q)

Post by NickJohnson »

I really doubt there would be any significant difference in speed, considering that a write-to-buffer operation should really only happen once every fwrite() operation, a single if statement takes so little time, and the applications it impacts would be limited by the OS's I/O performance anyway. So I would agree with your choice of A, because it is at least slightly cleaner, and if anything slows down stuff outside the inner loop as opposed to inside it. It's still kind of splitting hairs though.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: FILE buffer - delayed or not? (PDCLib design Q)

Post by Brendan »

Hi,

For streams, I'd probably only use "setvbuf()" if I thought "malloc()" would fail. For example, create a buffer in the ".bss" so that I could open a file when the heap is exhausted (or when the heap is corrupted, possibly for debugging purposes?).

If you want a much cleaner design, then you could hide the ugly bits in a library, or... Oh, sorry. ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Craze Frog
Member
Member
Posts: 368
Joined: Sun Sep 23, 2007 4:52 am

Re: FILE buffer - delayed or not? (PDCLib design Q)

Post by Craze Frog »

I have a third solution. Have a static global variable to hold any "unused" buffer.

On the first call to fopen(), allocate the buffer.
On any call to setvbuf(), check if your global variable is null. If it is, instead of freeing the old buffer, assign it to the global variable. If the global variable is not null, free() the stream's buffer.

On subsequent calls to fopen(), check if the global variable is 0. If it is, allocate the file buffer with malloc(). Else, simply use the file buffer stored in the global variable and set it to 0.

This way, the only case with bad performance is when the program opens a lot of files and then does setvbuf() on the streams after opening all the file (instead of using setvbuf() at once after each file).
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: FILE buffer - delayed or not? (PDCLib design Q)

Post by NickJohnson »

Craze Frog wrote:I have a third solution. Have a static global variable to hold any "unused" buffer.

On the first call to fopen(), allocate the buffer.
On any call to setvbuf(), check if your global variable is null. If it is, instead of freeing the old buffer, assign it to the global variable. If the global variable is not null, free() the stream's buffer.

On subsequent calls to fopen(), check if the global variable is 0. If it is, allocate the file buffer with malloc(). Else, simply use the file buffer stored in the global variable and set it to 0.

This way, the only case with bad performance is when the program opens a lot of files and then does setvbuf() on the streams after opening all the file (instead of using setvbuf() at once after each file).
That definitely would solve all the *performance* problems with either design, but it introduces some style/implementation issues. First, it's non-obvious and uses global variables when there are implementations that don't. Second, it uses a standard buffer's worth of extra memory if setvbuf() is called. Third, what happens if you call setvbuf() twice on the same file descriptor? The buffer you store in the global variable has to be something you can use later, but if you cache an arbitrarily sized buffer set by the first setvbuf(), it would be useless to fopen() and probably take up even more memory.

Once again, I really think it's splitting hairs in terms of speed - a small part of fopen(), fwrite(), and/or setvbuf() is not the major performance bottleneck in any real program. :roll:
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: FILE buffer - delayed or not? (PDCLib design Q)

Post by Solar »

NickJohnson wrote:Third, what happens if you call setvbuf() twice on the same file descriptor?
Undefined behaviour in any case.

But I don't like to have more globals flying around than strictly necessary. They'll become a pain as soon as I try to make the lib multithreading-safe.

I go with the assignment on fopen(). Thanks for the input!
Every good solution is obvious once you've found it.
Craze Frog
Member
Member
Posts: 368
Joined: Sun Sep 23, 2007 4:52 am

Re: FILE buffer - delayed or not? (PDCLib design Q)

Post by Craze Frog »

Third, what happens if you call setvbuf() twice on the same file descriptor?
Free the previously allocated buffer and allocate a new one.
I go with the assignment on fopen(). Thanks for the input!
Hehe, now that's a pun.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: FILE buffer - delayed or not? (PDCLib design Q)

Post by Solar »

Craze Frog wrote:
Third, what happens if you call setvbuf() twice on the same file descriptor?
Free the previously allocated buffer and allocate a new one.
As I said, setvbuf() may be called (successfully) only once on a given stream. Behaviour of a second call is undefined.
Every good solution is obvious once you've found it.
Post Reply