It's rare that you actually want an entire file loaded into memory at once. For very large files, this risks crashing the system.
Because storage devices rarely give you access to a single byte of data, then you are forced to deal with blocks of data. However, blocks of data are rarely useful for an application, so it may be desirable to hide all of the block details from the application, which is what I've done.
For an application, what you normally want is to be able to read X bytes from file offset address Y, so that is where I've focused most of my attention. All of the block swapping is handled behind the scenes. As an aside, all of my other "data" access is handled through the same basic interface, including low level storage, system RAM, audio input/output, network connections and even things like the keyboard and mouse. The design is heavily influenced by the Reader/Writer concept in .NET. A reader or writer object has a current position, and a ReadByte, ReadInt16, ReadInt32, ReadInt64, ReadString, etc. set of functions. This makes it trivial to find information even in very large files without having to worry about having the entire file in memory.
There are several other approaches that you can use, but loading the entire file into memory is probably not what you want, long term.
Proper buffer design for file reading
Re: Proper buffer design for file reading
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Re: Proper buffer design for file reading
Apart from RAM and the convenience functions, this sounds exactly like Plan 9. Take out network and optionally audio too, and you have Unix. (Remember /dev/dsp for audio?) (I think RAM is available as a file now too, but I'm not sure if the addresses correspond.) Standard C incorporates the basic functions, read write & seek. It's a powerful design!SpyderTL wrote:As an aside, all of my other "data" access is handled through the same basic interface, including low level storage, system RAM, audio input/output, network connections and even things like the keyboard and mouse. The design is heavily influenced by the Reader/Writer concept in .NET. A reader or writer object has a current position, and a ReadByte, ReadInt16, ReadInt32, ReadInt64, ReadString, etc. set of functions.
ReadString would be a nice addition to Unix/Plan 9. I think in early Unix, they looped getc or fgetc to get the same effect. Get a null byte -> exit loop. I wouldn't be surprised if there's still some code doing that in Plan 9, or there's a buffering library which can do it more efficiently than getc. Oh... this buffering library already has it. Brdstr in bio.
As for the other convenience functions, I think ReadInt32 is like if(read(fd, *some_int32_var, sizeof(int32)) != sizeof(int32)){error();}. I don't know if the .NET convenience functions have a byteswapping feature, but read obviously doesn't.
Anyway, this basic scheme was used almost everywhere until mmap got popular. Its biggest shortfall is a lack of atomicity, (seek+read is 2 syscalls,) which is easily solved by providing syscalls which combine both.
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
-
- Member
- Posts: 119
- Joined: Wed Dec 12, 2018 12:16 pm
Re: Proper buffer design for file reading
Well, now works, but I still nedding to specify the buffer size (seems to be no choice). If I try to read a file bigger than 512 bytes, the HDD gets stuck reading with a software reset.
Code: Select all
int fat32_read_file(uint8_t* filename, uint8_t* buffer, uint32_t buffsiz, struct file fp)
{
/* Check HDD precense, I don't want to shoot my foot */
if (!hd_exists() && !filename)
return 1;
hd_read(start_of_root, FAT32_FILES_PER_DIRECTORY * sizeof(struct DirectoryEntry), (uint8_t*)&drce[0]);
static uint32_t sector_offset = 0;
static uint32_t fsect = 0;
uint8_t buff[buffsiz];
uint8_t* fatbuff = 0;
uint8_t fil[12];
for (int i = 0; i < FAT32_FILES_PER_DIRECTORY; ++i) {
fat2human(drce[i].file_name, fil);
trimName(fil, 11);
if (strcmp((char*)fil, (char*)filename) == 0) {
uint8_t fcluster = ((uint32_t)drce[i].cluster_number_hi) << 16 | ((uint32_t)drce[i].cluster_number_lo);
int32_t ncluster = fcluster;
int32_t file_size = fp.file_size;
kputs("\nFile content: \n");
/* 1 sector file (less than 512 bytes) */
if (file_size < 512) {
hd_read(fcluster, 512, buff);
memcpy(buffer, buff, buffsiz);
//buff[file_size] = '\0';
//kputs("%s", (char*)buff);
}
/* File bigger than a sector, cluster */
while (file_size > 0) {
fsect = start_of_data + bpb.sectors_per_cluster * (ncluster - 2);
for (; file_size > 0; file_size -= 512) {
hd_read(fsect + sector_offset, 512, buff);
//buff[file_size > 512 ? 512 : file_size] = '\0';
//kputs("%s", (char*)buff);
memcpy(buffer, buff, buffsiz);
if (++sector_offset > bpb.sectors_per_cluster)
break;
}
uint32_t fsectcurrentcl = ncluster / (512 / sizeof(uint32_t));
hd_read(fat_start + fsectcurrentcl, 512, fatbuff);
uint32_t foffsectcurrentcl = ncluster % (512 / sizeof (uint32_t));
ncluster = ((uint32_t*)&fatbuff)[foffsectcurrentcl] & 0x0FFFFFFF;
}
return 0;
}
}
kputs("\nFile %s not found\n", filename);
return 1;
}
Re: Proper buffer design for file reading
I'd think about this from the user's page perspective. If I had a fixed size file this is 100% a mmap where you just map the file as a special page table swap file. Then the memory manager handles the working set for you. You have a good algorithm for this already.
For a stream you just pick an appropriate integer block buffer behind the scenes.
If it is random access and potentially variable, make them give you a pointer and length.
For a stream you just pick an appropriate integer block buffer behind the scenes.
If it is random access and potentially variable, make them give you a pointer and length.