Well, right now it reads subdirectories as well. While file reading is now decent speed if reading in big chunks (it first plans what blocks it needs, so it doesn't need to consult FAT after every block) directory reading is still awful since there is no block cache yet, and every entry is read separately, it keeps reading FAT, then one entry, then FAT, then one entry, and so on...
Because of the readdir()+stat() nonsense I copied from POSIX, it actually keeps doing useless work, since every time you stat a file (to get attributes like size and whether it's a dir) it searches through the directory, one entry at a time... and every time it reads one entry for this search, it goes reading the FAT from the disk, then reads one block, and then for the next entry the FAT again, and probably the very same block again, and so on.
Root directory and files (when you read in decent chunks) don't suffer from this, as my floppy driver is clever enough: it reads using multi-tracked reads, two full tracks at a time, and remembers which set of tracks is currently in the DMA buffer, skipping the read if the revelant data is already there.
So another nice result from writing FAT first, instead of "better" filesystem, is that I catch stuff like API scalability issues (the whole POSIX style readdir()+stat() nonsense).
So mm... what is missing is a proper block cache. Maybe a dentry-cache as well, although I think the readdir()+stat() nonsense would be better solved by having readdir() just return the stat() information too (since the kernel has that info available in readdir() anyway). Also, I should have to figure out how to get rid of the duplicate code for the root-directory reading (making it use the normal directory reading code instead), but because directory sizes can't be known without finding all the blocks for a directory, that's surprisingly tricky to do in a sane way.

The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.