Page 1 of 1

Problem loading file.

Posted: Sat Feb 06, 2010 12:01 pm
by suthers
Hey, I've recebntly been trying to load a large file (approx. 12Mb) into a string.
But every time the ifstream object return eof long before the end of the file.
If I then try to get the next line again, it just returns the last line again.
Also no particular character seems to stop it.
I use the following code:

Code: Select all

ifstream file("address");
string buffer = "", store = "";
if(file.is_open())
{
          while(!file.eof())
          {
                   getline(file, buffer);
                   store += buffer;
          }
          file.close();
}
Any ideas what's wrong???
Thanks in advance,

Jules

Re: Problem loading file.

Posted: Sun Feb 07, 2010 2:13 am
by Thomas
Hi Suthers,
This the correct way of doing things :) .I haven't tested it though. Another thing that you should note is that string is an "immutable" data type and it is not the recommended way of doing things when you are doing lot of modifications

Code: Select all

char buffer[BUFFER_MAX]; 
while (file.getline(buffer,sizeof(buffer)) != NULL ) 
	{
	    cout<<buffer ; 
  	    
	}
--Thomas

Re: Problem loading file.

Posted: Sun Feb 07, 2010 12:01 pm
by suthers
Hey, I tried your code, but now instead of getting only 5000 or so characters, I have about 50...
Interestingly though, it always does the same number of getlines before it returns null... Always 4.
Any ideas what's wrong with this?

Thanks in advance,

Jules

Re: Problem loading file.

Posted: Sun Feb 07, 2010 4:56 pm
by Combuster
That code is just begging for buffer overflows, especially if you apply that on binary files.

Looking at the STL documentation, there seems to be no call without additional traits that doesn't have it choke on binary files (if that's what you are trying to do)

Without io streams, you can do something along the following (from memory, untested, no error checking):

Code: Select all

#include <cstdio>
FILE * file = fopen(filename, "rb"); // open as binary
fseek(0, SEEK_END, file);            // move to end of file
int filesize = ftell(file);          // get the byte counter (= file size)
char * data = malloc(size);          // reserve memory
fseek(0, SEEK_SET, file);            // reset to start
fread(data, size, 1, file);          // read the entire file at once
fclose(file);

Re: Problem loading file.

Posted: Mon Feb 08, 2010 5:57 am
by Owen
... I've never unverstood why C lacks a proper get file length function... It would seem like such an obvious thing

Re: Problem loading file.

Posted: Mon Feb 08, 2010 6:54 am
by Solar
Because C was meant to be portable. And back when C became popular, file storage techniques varied wildly, so the definition of file structure is pretty loose.

From the C standard, chapter 7.19.2 Streams:
Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. Whether space characters that are written out immediately before a new-line character appear when read in is implementation-defined.

A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in from a binary stream shall compare equal to the data that were earlier written out to that stream, under the same implementation. Such a stream may, however, have an implementation-defined number of null characters appended to the end of the stream.
Lots of "implementation-defined" in there...

And then remember that this was the time where core memory was measured in kilobytes, and punch cards / tape drives were common. Even if those devices had the metadata to tell the file size beforehand, you just didn't load the whole file to memory, you processed it record by record.

And even if you knew the file size, given how streams are defined, you still didn't know how much space the file would require in memory.

Re: Problem loading file.

Posted: Mon Feb 08, 2010 12:02 pm
by Thomas
Hi Suthers,
Its better to create a memory mapped file than loading the entire thing into memory :wink: . Also string is an immutabale data type, meaning each time you do += operation new object gets created :) .

{Please do excuse my poor english}
--Thomas

Re: Problem loading file.

Posted: Wed Feb 10, 2010 7:05 am
by suthers
Hey, thanks guys!
That works great!
It nows loads the whole file without a problem...
Now I just have to extract the sounds samples from it, beat detect, streatch to correct BPM, normalise and fourier transform it!!!
But that will be easy compared to the first hurdle :wink: :lol:
Thanks for all the help!

Jules

Re: Problem loading file.

Posted: Wed Feb 10, 2010 11:29 am
by Selenic
Owen wrote:... I've never unverstood why C lacks a proper get file length function... It would seem like such an obvious thing
stat(), anyone? Also gives you useful information like owner, permissions bits (IIRC) and so on; I think it's a Posix standard, so all Unix-likes and even Windows should support it.

As Thomas said, malloc(<file length>) is a Bad Idea in general; think about trying to do it with a multi-gigabyte video file, and you see the problem. Besides, 99% of files (especially audio and video) are sequential, so you almost definitely only need a small portion of the file at once...

Re: Problem loading file.

Posted: Thu Feb 11, 2010 2:43 am
by Solar
Selenic wrote:
Owen wrote:... I've never unverstood why C lacks a proper get file length function... It would seem like such an obvious thing
stat(), anyone? Also gives you useful information like owner, permissions bits (IIRC) and so on; I think it's a Posix standard, so all Unix-likes and even Windows should support it.
I'm pretty sure Owen meant the language standard itself, i.e. <stdio.h>.

But C (just like C++) never was about having an all-enabling standard library, so just use what the OS API provides.

Re: Problem loading file.

Posted: Thu Feb 11, 2010 2:14 pm
by Selenic
Solar wrote:I'm pretty sure Owen meant the language standard itself, i.e. <stdio.h>.

But C (just like C++) never was about having an all-enabling standard library, so just use what the OS API provides.
That's what I meant (though you've expressed it better) - although the language standard doesn't specify stat() (it is, after all, supposed to be usable on embedded systems), other standards that are implemented by all reasonably modern OSes *do*.