I need contiguous blocks easily...

pcmattman · Post by **pcmattman** » Sat Jul 21, 2007 10:25 pm

I'm stuck.

Without the trusty += operator of the std::string class I cannot (easily) read in large amounts of data from a file while keeping the final string contiguous.

At the moment, this is my hack to read a file in its entirety:

Code: Select all

	// get the size of the file (this is truly pathetic)
	int filesz = 0;
	char tmp[2];
	while( !feof( fp ) )
	{
		fread( tmp, 1, 1, fp );
		filesz++;
	}
	
	// now read it in
	fileData = (char*) kmalloc( filesz );
	fread( fileData, 1, filesz, fp );

The double read is slower than a single one (which is slow enough as it is as I only support PIO for my kernel-space ATA driver

)

Please note that this is NOT code I've written for a user-space app, it's kernel-space so these functions are NOT POSIX compliant (for a start, fp is an int

).

Is there any way of getting around this problem?

Brendan · Post by **Brendan** » Sat Jul 21, 2007 11:19 pm

Hi,

pcmattman wrote:

Code: Select all

	// get the size of the file (this is truly pathetic)
	int filesz = 0;
	char tmp[2];
	while( !feof( fp ) )
	{
		fread( tmp, 1, 1, fp );
		filesz++;
	}

The file system must know how large the file is (from it's directory entry or other information), so why not just ask the file system how large the file is?

For standard C I'd use the "fstat()" function for this, but you could have a simpler (non-standard) file system function like "off_t fgetfilesize(int fileDes)".

Cheers,

Brendan

pcmattman · Post by **pcmattman** » Sat Jul 21, 2007 11:21 pm

Well, the problem is that it's in my kernel and I don't have any standard library there. For the time being I've just written an fsize() function.

In the newlib (or any other standard C library), does feof() call fstat to find out how big the file is? How does it know when the end of the file comes around?

Brendan · Post by **Brendan** » Sun Jul 22, 2007 1:25 am

Hi,

pcmattman wrote:Well, the problem is that it's in my kernel and I don't have any standard library there. For the time being I've just written an fsize() function.

pcmattman wrote:In the newlib (or any other standard C library), does feof() call fstat to find out how big the file is? How does it know when the end of the file comes around?

There 2 types of file I/O functions. The first type is low level I/O using functions like "open()", "close()", "read()", "write()", "lseek()", "fstat()", "stat()", etc. Conceptually, these functions deal directly with the (virtual) file system.

The second type is streams, which are like a high level wrapper (on top of the lower level functions) that exists inside the library. This includes functions like "fopen()", "fclose()", "fread()", "fwrite()", "fseek()", "fflush()", "feof()", etc. The library itself maintains file buffers and other things so that (for e.g.) you can get one byte at a time (while the library just gets the byte from the buffer instead of asking the kernel/VFS for it, and does a "read()" of several KB at a time if the buffer is empty).

The "feof()" function is for streams. It'd check if there's more bytes in the buffer and return zero if there is. If the buffer is empty it'd probably try to do a "read()" and see if it can get more data into the buffer (and return zero if it does get more data). It may also use some sort of "I'm at the end of the file" flag or keep track of the size of the file so it doesn't need to check "read()", but this may or may not work, depending on what type of stream it is, how it was opened and how file sharing works.

For example, if the stream is a socket then you might be at the end of the stream until someone sends data to the socket. For some (most?) systems, if you open a normal file as "read only" then some other software can open the same file as "append only", and the end of the file can keep shifting as more data is appended to the file (I'd assume this is common for things like logs - imagine having to stop Apache just to read the web server log). In both these cases the library would need to check "read()" each time you call "feof()" because the end of the stream may have moved.

Also remember that a function like "read()" may not return all the data you ask for, even when there is enough data. For example, if you try to "read(fileDes, myBuffer, sizeOfFile)" then the file system code might read part of the file and stop. This allows the file system code to return the first N sectors/bytes and wait for you to ask for the next N sectors/bytes, which makes it much easier to write the file system code and can have other advantages.

For example, if there's a bad sector in the middle of the file it can return all the data before the bad sector (and then return a "bad sector error" when you try to get the next N sectors/bytes). Alternatively, the file might be fragmented (where the file system driver returns N contiguous sectors only), or the hardware might not handle large transfers (where the file system driver returns what it can do in one transfer, and sets up a new transfer when it gets the next "read()" request), or the file system driver might be optimizing disk access (doing all pending transfers that don't involve shifting the disk heads before shifting the disk heads), or using some sort of file I/O priorities (stop transfering your data to/from disk if a higher priority request is received).

This means your code should be something like:

Code: Select all

    int filesz = 0;
    int totalBytes = 0;
    int readBytes;

    filesz = fsize( fp );
    if( filesz <= 0 ) {
        return ERROR_CODE;
    }
    fileData = kmalloc( filesz );

    while( totalBytes < filesz ) {
        readBytes = read( fp, fileData + totalBytes, 1, filesz - totalBytes ); 
        if(readBytes == 0) {
            /* EOF sooner than expected */
            return ERROR_CODE;
        }
        if(readBytes < 0) {
            /* Some sort of error reading the data */
            return ERROR_CODE;
        }
        totalBytes += readBytes;
    }

BTW it doesn't matter if you've got a standard library in the kernel or not - if you're using functions that look like they are part of a standard library then make them the same as a the functions you'd find in a standard library. Otherwise use completely different function names to avoid confusing everyone (e.g "kread()" and not "fread()", just like the way you're using "kmalloc()" and not "malloc()") .

Cheers,

Brendan

pcmattman · Post by **pcmattman** » Sun Jul 22, 2007 5:22 pm

Brendan wrote:BTW it doesn't matter if you've got a standard library in the kernel or not - if you're using functions that look like they are part of a standard library then make them the same as a the functions you'd find in a standard library. Otherwise use completely different function names to avoid confusing everyone (e.g "kread()" and not "fread()", just like the way you're using "kmalloc()" and not "malloc()").

They are exactly the same in usage, except they do not do the buffering that happens behind the scenes. So, basically, every time you call fread() you're doing a read of the hard drive. I have yet to figure out how to create the streams and either way I find it irrelevant when these functions will only be used to load the shell from the hard drive.

At the moment I'm using them to test my FAT32 implementation (almost completed fwrite, which is not really high up on my 'things I enjoy doing' list - it is massive and is likely to cause me much trouble with corrupted filesystems and such).

Brendan · Post by **Brendan** » Sun Jul 22, 2007 11:58 pm

Hi,

pcmattman wrote:At the moment I'm using them to test my FAT32 implementation (almost completed fwrite, which is not really high up on my 'things I enjoy doing' list - it is massive and is likely to cause me much trouble with corrupted filesystems and such).

That doesn't make much sense to me - the kernel and FAT32 code should use low level I/O (functions like "read()" and "write()" ) and shouldn't be using higher level streams (functions like "fread()" and "fwrite()"). The higher level streams are just an abstraction created by the library that applications use, and aren't something the filesystem code should know about or care about....

You're almost doing this - you've got a function for reading data from a file that doesn't do any buffering (just like "read()") that takes similar parameters as the normal "read()" function, but you've called it "fread()". You don't think it's strange how "fread()" uses a "FILE *fp" as a reference to the file (that you're not using), and "read()" uses "int fileDes" as a reference to the file (that you are using)?

I've got a function that shuts down the entire OS. I'm going to call it "exit()". I hope that doesn't confuse anyone....

Cheers,

Brendan

OSDev.org

I need contiguous blocks easily...

I need contiguous blocks easily...

Re: I need contiguous blocks easily...