In a C lib, a stream has an attached buffer.
This buffer can be changed by use of the setvbuf() function, which allows to use different buffer sizes (than the default BUFSIZ), and to use user-allocated memory for the buffer.
My design question. I could:
A) Allocate the stream buffer in fopen(). Disadvantage: If the buffer is changed via setvbuf(), I've wasted a malloc() / free() cycle on the default buffer (which I no longer need). If the stream is closed without any real I/O operation, I also wasted a malloc() / free() cycle (two, if the user actually did call setvbuf() and closed the stream without real I/O). (That last part is actually quite uncommon, I'd think.)
B) Delayed allocation of the stream buffer on first "real" I/O operation. Disadvantage: On each (internal) fill-the-buffer call, and on every write-to-buffer call, I have to check if the buffer has already been allocated.
Apparently (according to man pages), glibc does B), although I couldn't imagine why. I consider A) the much cleaner design, and think that the chances of A) recieving a performance hit is slim (as I consider setvbuf() a rarely-used function).
However, I always say "do not assume", so I would like to hear your opinions...?!?
PS: Not a vote, because I want to hear the reasoning, not a count of hands.
FILE buffer - delayed or not? (PDCLib design Q)
FILE buffer - delayed or not? (PDCLib design Q)
Every good solution is obvious once you've found it.
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: FILE buffer - delayed or not? (PDCLib design Q)
I really doubt there would be any significant difference in speed, considering that a write-to-buffer operation should really only happen once every fwrite() operation, a single if statement takes so little time, and the applications it impacts would be limited by the OS's I/O performance anyway. So I would agree with your choice of A, because it is at least slightly cleaner, and if anything slows down stuff outside the inner loop as opposed to inside it. It's still kind of splitting hairs though.
Re: FILE buffer - delayed or not? (PDCLib design Q)
Hi,
For streams, I'd probably only use "setvbuf()" if I thought "malloc()" would fail. For example, create a buffer in the ".bss" so that I could open a file when the heap is exhausted (or when the heap is corrupted, possibly for debugging purposes?).
If you want a much cleaner design, then you could hide the ugly bits in a library, or... Oh, sorry.
Cheers,
Brendan
For streams, I'd probably only use "setvbuf()" if I thought "malloc()" would fail. For example, create a buffer in the ".bss" so that I could open a file when the heap is exhausted (or when the heap is corrupted, possibly for debugging purposes?).
If you want a much cleaner design, then you could hide the ugly bits in a library, or... Oh, sorry.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 368
- Joined: Sun Sep 23, 2007 4:52 am
Re: FILE buffer - delayed or not? (PDCLib design Q)
I have a third solution. Have a static global variable to hold any "unused" buffer.
On the first call to fopen(), allocate the buffer.
On any call to setvbuf(), check if your global variable is null. If it is, instead of freeing the old buffer, assign it to the global variable. If the global variable is not null, free() the stream's buffer.
On subsequent calls to fopen(), check if the global variable is 0. If it is, allocate the file buffer with malloc(). Else, simply use the file buffer stored in the global variable and set it to 0.
This way, the only case with bad performance is when the program opens a lot of files and then does setvbuf() on the streams after opening all the file (instead of using setvbuf() at once after each file).
On the first call to fopen(), allocate the buffer.
On any call to setvbuf(), check if your global variable is null. If it is, instead of freeing the old buffer, assign it to the global variable. If the global variable is not null, free() the stream's buffer.
On subsequent calls to fopen(), check if the global variable is 0. If it is, allocate the file buffer with malloc(). Else, simply use the file buffer stored in the global variable and set it to 0.
This way, the only case with bad performance is when the program opens a lot of files and then does setvbuf() on the streams after opening all the file (instead of using setvbuf() at once after each file).
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: FILE buffer - delayed or not? (PDCLib design Q)
That definitely would solve all the *performance* problems with either design, but it introduces some style/implementation issues. First, it's non-obvious and uses global variables when there are implementations that don't. Second, it uses a standard buffer's worth of extra memory if setvbuf() is called. Third, what happens if you call setvbuf() twice on the same file descriptor? The buffer you store in the global variable has to be something you can use later, but if you cache an arbitrarily sized buffer set by the first setvbuf(), it would be useless to fopen() and probably take up even more memory.Craze Frog wrote:I have a third solution. Have a static global variable to hold any "unused" buffer.
On the first call to fopen(), allocate the buffer.
On any call to setvbuf(), check if your global variable is null. If it is, instead of freeing the old buffer, assign it to the global variable. If the global variable is not null, free() the stream's buffer.
On subsequent calls to fopen(), check if the global variable is 0. If it is, allocate the file buffer with malloc(). Else, simply use the file buffer stored in the global variable and set it to 0.
This way, the only case with bad performance is when the program opens a lot of files and then does setvbuf() on the streams after opening all the file (instead of using setvbuf() at once after each file).
Once again, I really think it's splitting hairs in terms of speed - a small part of fopen(), fwrite(), and/or setvbuf() is not the major performance bottleneck in any real program.
Re: FILE buffer - delayed or not? (PDCLib design Q)
Undefined behaviour in any case.NickJohnson wrote:Third, what happens if you call setvbuf() twice on the same file descriptor?
But I don't like to have more globals flying around than strictly necessary. They'll become a pain as soon as I try to make the lib multithreading-safe.
I go with the assignment on fopen(). Thanks for the input!
Every good solution is obvious once you've found it.
-
- Member
- Posts: 368
- Joined: Sun Sep 23, 2007 4:52 am
Re: FILE buffer - delayed or not? (PDCLib design Q)
Free the previously allocated buffer and allocate a new one.Third, what happens if you call setvbuf() twice on the same file descriptor?
Hehe, now that's a pun.I go with the assignment on fopen(). Thanks for the input!
Re: FILE buffer - delayed or not? (PDCLib design Q)
As I said, setvbuf() may be called (successfully) only once on a given stream. Behaviour of a second call is undefined.Craze Frog wrote:Free the previously allocated buffer and allocate a new one.Third, what happens if you call setvbuf() twice on the same file descriptor?
Every good solution is obvious once you've found it.