Proper buffer design for file reading

SpyderTL · Post by **SpyderTL** » Wed Jul 10, 2019 1:44 pm

It's rare that you actually want an entire file loaded into memory at once. For very large files, this risks crashing the system.

Because storage devices rarely give you access to a single byte of data, then you are forced to deal with blocks of data. However, blocks of data are rarely useful for an application, so it may be desirable to hide all of the block details from the application, which is what I've done.

For an application, what you normally want is to be able to read X bytes from file offset address Y, so that is where I've focused most of my attention. All of the block swapping is handled behind the scenes. As an aside, all of my other "data" access is handled through the same basic interface, including low level storage, system RAM, audio input/output, network connections and even things like the keyboard and mouse. The design is heavily influenced by the Reader/Writer concept in .NET. A reader or writer object has a current position, and a ReadByte, ReadInt16, ReadInt32, ReadInt64, ReadString, etc. set of functions. This makes it trivial to find information even in very large files without having to worry about having the entire file in memory.

There are several other approaches that you can use, but loading the entire file into memory is probably not what you want, long term.

eekee · Post by **eekee** » Thu Jul 11, 2019 12:16 pm

SpyderTL wrote:As an aside, all of my other "data" access is handled through the same basic interface, including low level storage, system RAM, audio input/output, network connections and even things like the keyboard and mouse. The design is heavily influenced by the Reader/Writer concept in .NET. A reader or writer object has a current position, and a ReadByte, ReadInt16, ReadInt32, ReadInt64, ReadString, etc. set of functions.

Apart from RAM and the convenience functions, this sounds exactly like Plan 9. Take out network and optionally audio too, and you have Unix. (Remember /dev/dsp for audio?) (I think RAM is available as a file now too, but I'm not sure if the addresses correspond.) Standard C incorporates the basic functions, read write & seek. It's a powerful design!

ReadString would be a nice addition to Unix/Plan 9. I think in early Unix, they looped getc or fgetc to get the same effect. Get a null byte -> exit loop. I wouldn't be surprised if there's still some code doing that in Plan 9, or there's a buffering library which can do it more efficiently than getc. Oh... this buffering library already has it.

Brdstr in bio.

As for the other convenience functions, I think ReadInt32 is like if(read(fd, *some_int32_var, sizeof(int32)) != sizeof(int32)){error();}. I don't know if the .NET convenience functions have a byteswapping feature, but read obviously doesn't.

Anyway, this basic scheme was used almost everywhere until mmap got popular. Its biggest shortfall is a lack of atomicity, (seek+read is 2 syscalls,) which is easily solved by providing syscalls which combine both.

deleted8917 · Post by **deleted8917** » Thu Jul 11, 2019 4:44 pm

Well, now works, but I still nedding to specify the buffer size (seems to be no choice). If I try to read a file bigger than 512 bytes, the HDD gets stuck reading with a software reset.

Code: Select all

int fat32_read_file(uint8_t* filename, uint8_t* buffer, uint32_t buffsiz, struct file fp)
{
	/* Check HDD precense, I don't want to shoot my foot */
	if (!hd_exists() && !filename)
		return 1; 
	hd_read(start_of_root, FAT32_FILES_PER_DIRECTORY * sizeof(struct DirectoryEntry), (uint8_t*)&drce[0]);

	static uint32_t sector_offset = 0;
	static uint32_t fsect = 0;
	uint8_t buff[buffsiz];
	uint8_t* fatbuff = 0;
	uint8_t fil[12];
	for (int i = 0; i < FAT32_FILES_PER_DIRECTORY; ++i) {
		fat2human(drce[i].file_name, fil);
		trimName(fil, 11);
		if (strcmp((char*)fil, (char*)filename) == 0) {
			uint8_t fcluster = ((uint32_t)drce[i].cluster_number_hi) << 16 | ((uint32_t)drce[i].cluster_number_lo);   
			int32_t ncluster = fcluster;
			int32_t file_size = fp.file_size;

			kputs("\nFile content: \n");

			/* 1 sector file (less than 512 bytes) */
			if (file_size < 512) {
				hd_read(fcluster, 512, buff);
				memcpy(buffer, buff, buffsiz);
				//buff[file_size] = '\0';
				//kputs("%s", (char*)buff);
			}

			/* File bigger than a sector, cluster */
			while (file_size > 0) {
				fsect = start_of_data + bpb.sectors_per_cluster * (ncluster - 2);
				for (; file_size > 0; file_size -= 512) {
					hd_read(fsect + sector_offset, 512, buff);
					//buff[file_size > 512 ? 512 : file_size] = '\0';
					//kputs("%s", (char*)buff);
					memcpy(buffer, buff, buffsiz);
	
					if (++sector_offset > bpb.sectors_per_cluster) 
						break;
				}
				uint32_t fsectcurrentcl = ncluster / (512 / sizeof(uint32_t));

				hd_read(fat_start + fsectcurrentcl, 512, fatbuff);
				uint32_t foffsectcurrentcl = ncluster % (512 / sizeof (uint32_t));
				ncluster = ((uint32_t*)&fatbuff)[foffsectcurrentcl] & 0x0FFFFFFF;
			}
			return 0;
		}
	}
	kputs("\nFile %s not found\n", filename);
	return 1;
}

GMorgan · Post by **GMorgan** » Sat Jul 20, 2019 3:47 pm

I'd think about this from the user's page perspective. If I had a fixed size file this is 100% a mmap where you just map the file as a special page table swap file. Then the memory manager handles the working set for you. You have a good algorithm for this already.

For a stream you just pick an appropriate integer block buffer behind the scenes.

If it is random access and potentially variable, make them give you a pointer and length.

OSDev.org

Proper buffer design for file reading

Re: Proper buffer design for file reading

Re: Proper buffer design for file reading

Re: Proper buffer design for file reading

Re: Proper buffer design for file reading