True Streaming File I/O
Posted: Wed Jul 30, 2014 1:38 pm
I'm pretty sure I know what most of you guys are going to say, so let me start off by saying that if you think this idea is a bad idea, in general, don't bother posting a reply. I'll take your silence as a generic "that's stupid" response, by default.
On the other hand, if you have a specific concern that you would like to bring up that needs to be taken into consideration, feel free to elaborate.
So, it just so happens that I am working on a file system explorer (in C#) to read the raw file system of a disk and display all of the data tables and file entries in a tree view that can be navigated, similar to the windows explorer (but displaying the raw data tables instead of just file icons). I also just got finished reading a thread by wxwsk8er asking about reading individual bytes for a file system. Of course, the answer is that you can't read individual bytes from a storage device... you have to read one or more blocks.
So, I ran into an issue with the file system explorer where windows would randomly throw an error when reading individual bytes from a raw file stream. Google turned up a post about this issue, where, apparently, Windows may or may not require you to read block aligned bytes, depending on what mode that Windows decided to open the raw file in, and the way the device driver handles data transfers, etc. So I changed my code to read bytes in 512 or 2048 byte chunks, which fixed the problem. But it got me thinking...
Would it be possible to read individual bytes from the FDC / IDE controllers, and stream them one-by-one to the application? It seems plausible if you are reading in PIO (polling) mode. (DMA mode doesn't make much sense.) The controller itself may complain if the application does not read the data fast enough, but other than that, what would be the down side?
Obviously, multi-threading would be an issue. Only one file could be open at any given time, unless you wanted to time-slice your controller driver data access, and reset and resend read/write commands every time you had a "context-switch" to another "thread". I can see how that would be slow, if not impossible to manage.
What other disadvantages can you guys think of? Are there any compelling advantages to this approach?
For certain scenarios, I can see how this would be faster, overall, than copying a block to memory, then reading that block, if you are only looking for one or two values near the beginning of a block. For example, reading file system tables...
It would also allow the application to read data directly from the controller to the CPU without using any system memory, or at least, it would give you that option.
Of course, most modern devices don't even have the ability to read individual bytes, and rely on DMA transfers or PCI Bus Mastering to transfer data, so this approach would really only be applicable to older controllers, like the FDC and IDE controller.
Still, as a purely academic exercise, is there any reason that this would not work?
(Constructive comments only, please...)
On the other hand, if you have a specific concern that you would like to bring up that needs to be taken into consideration, feel free to elaborate.
So, it just so happens that I am working on a file system explorer (in C#) to read the raw file system of a disk and display all of the data tables and file entries in a tree view that can be navigated, similar to the windows explorer (but displaying the raw data tables instead of just file icons). I also just got finished reading a thread by wxwsk8er asking about reading individual bytes for a file system. Of course, the answer is that you can't read individual bytes from a storage device... you have to read one or more blocks.
So, I ran into an issue with the file system explorer where windows would randomly throw an error when reading individual bytes from a raw file stream. Google turned up a post about this issue, where, apparently, Windows may or may not require you to read block aligned bytes, depending on what mode that Windows decided to open the raw file in, and the way the device driver handles data transfers, etc. So I changed my code to read bytes in 512 or 2048 byte chunks, which fixed the problem. But it got me thinking...
Would it be possible to read individual bytes from the FDC / IDE controllers, and stream them one-by-one to the application? It seems plausible if you are reading in PIO (polling) mode. (DMA mode doesn't make much sense.) The controller itself may complain if the application does not read the data fast enough, but other than that, what would be the down side?
Obviously, multi-threading would be an issue. Only one file could be open at any given time, unless you wanted to time-slice your controller driver data access, and reset and resend read/write commands every time you had a "context-switch" to another "thread". I can see how that would be slow, if not impossible to manage.
What other disadvantages can you guys think of? Are there any compelling advantages to this approach?
For certain scenarios, I can see how this would be faster, overall, than copying a block to memory, then reading that block, if you are only looking for one or two values near the beginning of a block. For example, reading file system tables...
It would also allow the application to read data directly from the controller to the CPU without using any system memory, or at least, it would give you that option.
Of course, most modern devices don't even have the ability to read individual bytes, and rely on DMA transfers or PCI Bus Mastering to transfer data, so this approach would really only be applicable to older controllers, like the FDC and IDE controller.
Still, as a purely academic exercise, is there any reason that this would not work?
(Constructive comments only, please...)