Page 1 of 1

FAT VBR

Posted: Mon Nov 16, 2015 10:47 am
by BASICFreak
Hello again,

I have been working on a FAT VBR, that is an AIO VBR for FAT12/16/32.

All size optimization I can see, while keeping everything as a "module", got me down to 50 bytes overweight - at-least I got it down from the 104 bytes lol.

I have currently worked on this for about 5 full hours, my next step will be to remove the "modular" functionality from the VBR code, though I still do not see this fitting in 420 bytes...

Before I get too much farther into this, I just want to ask if this is even possible (Has anyone here done so)?

I know I can do a FAT12/16 and a separate FAT32 VBR, but then I have to keep track of multiple FAT VBRs - and I so rather one VBR for all FATs.






Best regards,


B!

EDIT: without worrying about modular FS I have got it down to 36 Bytes over...

Here is the section I would like help optimizing (for size not speed) some...

Code: Select all

.NextCluster:
	mov ebx, DWORD [FAT_Start]
	mov edx, DWORD [CurrentCluster]
	mov edi, DWORD 0x1000
	mov cx, 2
	movzx eax, WORD [BPB_TOTALSECTS16]
	test ax, ax
	jz .FAT32
	movzx edx, BYTE [BPB_SECTSPERCLUST]
	div dx
	cmp ax, 0x0FF6
	ja .FAT16
	.FAT12:
		mov ax, dx
		shr ax, 1
		add ax, dx
		mov dx, ax
		shr dx, 8
		add ebx, edx
		call ReadSectors
		movzx bx, al
		mov ax, WORD [di + bx]
		test bl, 1
		jnz .FAT12ODD
		and ax, 0x0FFF
		cmp ax, 0x0FF0
		jae .EOF
		jmp .continue
		.FAT12ODD:
			shr ax, 4
			cmp ax, 0x0FF0
			jae .EOF
			jmp .continue
	.FAT16:
		movzx ax, dl
		shr dx, 8
		add ebx, edx
		call ReadSectors
		shl ax, 1
		xchg bx, ax
		mov ax, WORD [di + bx]
		cmp ax, 0xFFF0
		jae .EOF
		jmp .continue
	.FAT32:
		mov al, dl
		and ax, 0xF
		shr edx, 4
		add ebx, edx
		call ReadSectors
		shl ax, 2
		xchg bx, ax
		mov eax, DWORD [di + bx]
		cmp eax, 0x0FFFFFF0
		jb .continue
	.EOF:
		xor eax, eax
	.continue:
		mov DWORD [CurrentCluster], eax
	popa
	ret
I know there has to be a more efficient way to find the next cluster...

Re: FAT VBR

Posted: Mon Nov 16, 2015 8:08 pm
by CelestialMechanic
First suggestion: ditch the 32-bit instructions. You are in real mode, all segments are 16-bit and require the use of an operand prefix byte in order to have 32-bit operands. Every byte counts!

Second suggestion: Set all segments to 0000H early, even CS. Don't assume that the BIOS will jump to 0000:7C00H. El Torito will jump to 07C0:0000. With all the segments set to zero, set BP to 7C00H. That way you can access the BPB with instructions like MOV AX, [BP+1AH] which will use just three bytes, instead of instructions like MOV AX, [7C1AH] which uses four. When I revised my boot sector to use this I saved 23 bytes. Every byte counts!

Third suggestion: If you use my BP trick above, you can also use the 128 bytes below the boot sector for local data, and again instructions like MOV AX,[BP-20H] will only take three bytes. Every byte counts!

A meditation: A VBR must accomplish one thing and one thing only: it must read at least one sector into memory for execution. If it fails, it should tell the user so, which means that it must be capable of printing strings. Also, in the event of failure, it should offer the user the ability to press a key and try again. The only BIOS functions that are relevant are INTs 10H, 13H, 16H, and 19H. Unfortunately there is not room enough to decipher a file system. Two sectors will give this capability and somehow the sector numbers must be written to the VBR so that they may be found and loaded. Once they are in place then it is possible to load a file given its path and filename.

I did make one exception to my meditation above: I use a simple test to determine that the CPU is an 80286 or better so that I can use instructions like PUSHA, POPA, PUSH immediate, and shift by an immediate count. PUSHA and POPA are especially useful. Since I keep SS=0 throughout my time in real mode, PUSH SS has the effect of pushing zero onto the stack and it only takes one byte. You know the drill by now: every byte counts!

Re: FAT VBR

Posted: Mon Nov 16, 2015 9:55 pm
by BASICFreak
CelestialMechanic wrote:First suggestion: ditch the 32-bit instructions. You are in real mode, all segments are 16-bit and require the use of an operand prefix byte in order to have 32-bit operands. Every byte counts!
Not as possible as one may think, FAT32 entries are 28-bits wide, and LBA is 48 / 28 bits wide. I did limit the destination from EDI to DI and forced a 0 in the high part of DAP's Destination Buffer. Which gave me 4 bytes.
...set BP to 7C00H. That way you can access the BPB with instructions like MOV AX, [BP+1AH] which will use just three bytes, instead of instructions like MOV AX, [7C1AH] which uses four. When I revised my boot sector to use this I saved 23 bytes. Every byte counts!
This alone saved me 10 bytes, thanks for that pointer.
Third suggestion: If you use my BP trick above, you can also use the 128 bytes below the boot sector for local data, and again instructions like MOV AX,[BP-20H] will only take three bytes. Every byte counts!
Saved another 10 Bytes with this

Also moved my DAP out of the binary (which saved 12 bytes), so now I have 11 Bytes to spare thus far.

Thanks for the tips.

Now, I should probably test this before making anymore modifications... Then if I can I'll squeeze in an ERROR message (or make it chain load, or reboot...)

Re: FAT VBR

Posted: Tue Nov 17, 2015 5:51 am
by Combuster

Code: Select all

 jnz .FAT12ODD
      and ax, 0x0FFF
      cmp ax, 0x0FF0
      jae .EOF
      jmp .continue
      .FAT12ODD:
         shr ax, 4
         cmp ax, 0x0FF0
         jae .EOF
         jmp .continue
There's a lot of duplication when the only thing you need to skip is the SHR


I also see a number of potential address generation utility that could possibly save a few bytes here and there. After all you can do two additions and a multiplication within the [braces]. Together with LEA you can write many multiply-accumulate combinations as one-liners. Compare:

Code: Select all

    shl ax, 1
      xchg bx, ax
      mov ax, WORD [di + bx]

Code: Select all

mov ax, [edi + 2 * eax]

Re: FAT VBR

Posted: Tue Nov 17, 2015 1:03 pm
by BASICFreak
Combuster wrote:

Code: Select all

 jnz .FAT12ODD
      and ax, 0x0FFF
      cmp ax, 0x0FF0
      jae .EOF
      jmp .continue
      .FAT12ODD:
         shr ax, 4
         cmp ax, 0x0FF0
         jae .EOF
         jmp .continue
There's a lot of duplication when the only thing you need to skip is the SHR
I'm not sure why I thought the and would break it... lol, well that saves some space.

I also see a number of potential address generation utility that could possibly save a few bytes here and there. After all you can do two additions and a multiplication within the [braces]. Together with LEA you can write many multiply-accumulate combinations as one-liners. Compare:

Code: Select all

    shl ax, 1
      xchg bx, ax
      mov ax, WORD [di + bx]

Code: Select all

mov ax, [edi + 2 * eax]
I'll attempt this after a while, too much to do today.


But the good news is I haven't tested the FAT32 part (nor the FAT12) but I know FAT16 works.

Had to change how I decided FAT size, FAT16 can have it's total sectors in one of two places...

2 bytes free, with all this information I'm fairly sure I can get it down several more.




Thanks for the help,


B!

Re: FAT VBR

Posted: Tue Nov 17, 2015 1:31 pm
by Combuster
BASICFreak wrote:I'm not sure why I thought the and would break it... lol, well that saves some space.
Even if it did matter, you would:

Code: Select all

    JNZ .ifblock
.elseblock: 
    AND AX, 0x0FFF
    JMP .restblock 
.ifblock:
    SHR AX, 4
.restblock
    CMP AX, 0x0FF0
    JAE (...)
    JMP (...)
which is still a pure two instructions less :wink:

Re: FAT VBR

Posted: Tue Nov 17, 2015 2:47 pm
by BASICFreak
Combuster wrote:
BASICFreak wrote:I'm not sure why I thought the and would break it... lol, well that saves some space.
Even if it did matter, you would:

Code: Select all

    JNZ .ifblock
.elseblock: 
    AND AX, 0x0FFF
    JMP .restblock 
.ifblock:
    SHR AX, 4
.restblock
    CMP AX, 0x0FF0
    JAE (...)
    JMP (...)
which is still a pure two instructions less :wink:
I was late when I put the compare for last cluster, one of the last things because I forgot the cluster does not end with 0... But yes you are right (as usual). Looks like I may have free time, for a little while, so I'll take what I've learned and attempt to apply more optimization.


UPDATE:
I also see a number of potential address generation utility that could possibly save a few bytes here and there. After all you can do two additions and a multiplication within the [braces]. Together with LEA you can write many multiply-accumulate combinations as one-liners.
Due to having to use 32-bit registers with your suggestion, it actually ends up 1 byte bigger
I have to clear the extended part of edi [+3 bytes] and eax [+1 byte]; alone it does save 3 bytes...
If I had one more place to do this it would break even.

But I will keep this in mind for future codes.