NASM Addressing and Raw Binaries

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

NASM Addressing and Raw Binaries

Post by human00731582 »

Hey everyone! Since this is my first post, I'd like to start by saying I love this community. I've been lurking for a few years on and off, and I decided to take the development plunge completely this time, since I have so much free time on my hands lately. This is the first time I've been able to get past the GDT and almost clear the IDT hurdle. I feel very enthusiastic and excited about my progress.

I'm using QEMU and developing strictly 32-bit (personal preference), using NASM as my assembler. Now I believe the way I put this disk image together is quite different from many suggestions here, and it may be part of my issue. I plan to do this all in assembly, no C whatsoever. Again, personal preference, even if it is an order of magnitude harder to do. :D

About my issue...
When loading up my IDT, I'm having issues with the section inside each entry that references the OFFSETs. I believe this is because when I compile, it is not truly one large ASM file, so the dynamic addressing/allocation in NASM is acting screwy and thus my IDT doesn't point to the proper places. It seems it's mostly properly configured because it is crashing when I press a key, so at least the interrupts are working!

Code: Select all

%define SECTION_BASE 0x1000
PIC1			equ 0x20	; IO base address for master PIC.
PIC2			equ 0xA0	; IO base address for slave PIC.
PIC1_COMMAND	equ PIC1
PIC1_DATA		equ PIC1+1
PIC2_COMMAND	equ PIC2
PIC2_DATA		equ PIC2+1
PIC_EOI			equ 0x20	; End-of_interrupt command code.


%macro ISR_OFFSETS 1
	ISRLOW%1 equ (SECTION_BASE + isr%1 - $$) & 0xFFFF		; lower 16 bits of offset
	ISRHIGH%1 equ (SECTION_BASE + isr%1 - $$) >> 16		; upper 16 bits of offset
%endmacro

%macro ISR_NOERRORCODE 1
	isr%1:
		cli
		push byte 0
		push byte %1
		jmp isr_common
%endmacro

%macro IDTENTRY 1
 .entry%1:
	dw ISRLOW%1			; Offset 0-15
	dw CODE_SELECTOR	; Selector (from GDT)
	db 0				; reserved
	db 10001110b		; Present, Ring 0, (0) Storage, 32-bit int gate
	dw ISRHIGH%1		; Offset 16-31
%endmacro

IDT:
 IDTENTRY 0
 ...
 IDTENTRY 33    ;keyboard
 ...
 IDTENTRY 47
IDT_Desc:
	dw $ - IDT - 1		; IDT size
	dd IDT			; IDT Offset/Base
	

; Set up ISRs
ISR_NOERRORCODE 0
...
ISR_NOERRORCODE 33		; keyboard
...
ISR_NOERRORCODE 47

; Define ISR location data for IDT Entries
ISR_OFFSETS 0
...
ISR_OFFSETS 31				; end of built-in software interrupts
ISR_OFFSETS 32				; PIC HARDWARE INTERRUPTS START HERE (0x20)
ISR_OFFSETS 33				; PIC keyboard IRQ (remapped)
...
ISR_OFFSETS 47


PICmaster_Mask		dw 0
PICslave_Mask		dw 0


PIC_sendEOI:	; send end-of-interrupt command to PIC(s)
	; ARGS -> 1: irq #
	mov ebx, [esp + 4]		; last irq on stack
	mov ax, PIC_EOI
	cmp ebx, 8
	jl PIC_sendEOI.skipSlave
	mov dx, PIC2_COMMAND
	out dx, ax
 .skipSlave:
	mov dx, PIC1_COMMAND
	out dx, ax
	ret
	

PIC_remap:		; bh = offsetMaster, bl = offsetSlave
	; Save masks
	in al, PIC1_DATA
	mov byte [PICmaster_Mask], al
	in al, PIC2_DATA
	mov byte [PICslave_Mask], al
	
	; Initialization command 0x11
	mov al, 0x11
	out PIC1_COMMAND, al
	out PIC2_COMMAND, al
	
	; Update vector offsets
	mov al, bl
	out PIC1_DATA, al
	mov al, bh
	out PIC2_DATA, al
	
	; Cascading (skip for now)
	xor al, al
	out PIC1_DATA, al
	out PIC2_DATA, al
	
	; Additional environment information. 
	mov al, 1
	out PIC1_DATA, al
	out PIC2_DATA, al
	
	; Restore masks
	mov byte al, [PICmaster_Mask]
	out PIC1_DATA, al
	mov byte al, [PICslave_Mask]
	out PIC2_DATA, al
	
	ret


isr_common:
        ; Code doesn't even get here, so if this is erroneous, please don't mind (but tips are welcome)
	pushad					; save state (EDI, ESI, EBP, ESP, EBX, EDX, ECX, EAX) -> 32 bytes
	
	mov ax, ds				; ax = current data segment selector
	push eax				; saved onto the stack (4 bytes)
	
	mov ax, DATA_SELECTOR	; activate the ring 0 (kernel) data selector
	mov ds, ax				; this handles calls with highest permission
	mov es, ax
	mov fs, ax
	mov gs, ax

	call isr_handler		
	pop eax					; restore original selector to all data segments
	mov ds, ax                                  ; useful for userspace implementations WAY later
	mov es, ax
	mov fs, ax
	mov gs, ax
	
	popad				; restore state
	add esp, 8				; clean up extra stack variables from IRQ routines (pushed error codes and ISR numbers)
	sti					; set interrupts
	iret
	
	
isr_handler:
        ; Code doesn't get here either, so don't mind.
	mov dword [esi], 0x00515249		; "IRQ" <-- I did this instead of a string ptr because those don't work either
	mov dx, 0x0707
	mov bl, 0x0F
	call _screenWrite
	mov eax, [esp + 40]		; reach back into the stack and pull out the IRQ# pushed earlier
	;call _screenPrintDecimal
	ret
Also, here is my "kernel" that I do actual calls from... This thing is put into memory at 0x1000 from the bootloader, which I can share if you need. It has the paging setup in it (70000h to 73000h and maps out 0x0 to 0x100000 right before calling the kernel).

Code: Select all

;NO org statement, loaded by bootloader to 0x1000
GLOBAL kernel_main
[BITS 32]
...
;includes and blah blah
...
; This is all functional. But actually pressing a key will cause a crash and instant reboot.
kernel_main:
	cld
	lidt [IDT_Desc]
	
	call _screenCLS

	pushad
	mov word [cursorOffset], 0x0A01
	mov dx, [cursorOffset]
	mov dword [esi], 0x00636465 ;"dce"
	mov bl, 0x0F  ; attrib
	call _screenWrite
	mov dword [esi], 0x00636465 ; just testing again for the cursor position updates
	mov bl, 0x4E
	call _screenWrite
	
	mov bl, 0x0A
	mov esi, szTestHello   ; <-- This isn't the problem I'm asking about per se, but it could be related
	call _screenWrite       ; Does NOT work, no matter how I try to move the pointer around.
	
	mov word [cursorOffset], dx
	popad

	
	mov bh, 0x20
	mov bl, 0x28
	call PIC_remap
	
	; Unmask only the keyboard for now (bits !NOT! flagged are enabled IRQs)
	mov al, 0xFD		; mask = 1111 1101 // PIC1 IRQ #1 (0 being the clock), keyboard enabled.
	out PIC1_DATA, al
	mov al, 0xFF
	out PIC2_DATA, al
	sti

        hlt
Over in my 'kernel' (as you probably read), I was trying to point the ESI register to a string pointer that was defined in KERNEL.asm and it would either crash the program or spit out garbage when it would actually move past the _screenWrite function. It seems that any defined pointers outside of the bootloader are causing errors, and this is the only constant obstacle I've faced.

And before I forget, here's my very simple compiling script... Yeh, I'm doing this on Windows (pls no bully :oops: )

Code: Select all

nasm "BOOT.asm" -f bin -o "..\bin\boot.bin"
nasm "KERNEL.asm" -f bin -o "..\bin\kernel.bin"
dd if="..\bin\boot.bin" of="..\bin\image.img" bs=512
dd if="..\bin\kernel.bin" of="..\bin\image.img" bs=512 seek=1
"%PROGRAMFILES%\qemu\qemu-system-i386" -drive format=raw,index=0,file="..\bin\image.img"
Am I missing an important lesson on addressing/memory here? If you need any more info, please tell me.
Also, please go easy on me. :mrgreen:
Thanks in advance!
2024-05-07: Returning from a 7-year disappearing act; please be kind.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: NASM Addressing and Raw Binaries

Post by ~ »

Looking at your code, it is easy to see that it's not simple enough yet.

It has taken me at least 12 years just to make a very simple console kernel. It can load programs from floppy, can use the PS/2 keyboard, can use the timer, can print strings, can execute console commands, can pass command line to programs, can check if a program was fully loaded without errors by checking a header and footer signatures, but no paging or multitasking yet...

It has very clean code. It has been necessary for me for not getting lost in the assembly code.

It will very probably help you cut on around 10 years of random efforts showing in an easy and working way how to do things like that.

You can see several examples on how to print strings in its code:
BOOTCFG__v2017-06-16.zip

See how to use it, you just need DOS to launch it:
http://f.osdev.org/viewtopic.php?t=32121
http://devel.archefire.org/forum/viewto ... 4263&hl=en

_______________________________________________________________

For things like this, it is key that you mainly focus in creating the structure of everything you do in your mind from nothingness, from your creativity.

Then, after that, as you create the structure of your very own projects in your very own mind, you will be able to make effective and strong use of external information like this, and being able to adjust it to your own structure.

But you need to keep creating your own tricks fully by yourself to cover all of their logic and being able to extend it. It's part of your mind so it's logical that it's best to do things in this way.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: NASM Addressing and Raw Binaries

Post by human00731582 »

~ wrote:Looking at your code, it is easy to see that it's not simple enough yet.
...
It has very clean code. It has been necessary for me for not getting lost in the assembly code.

It will very probably help you cut on around 10 years of random efforts showing in an easy and working way how to do things like that.
Thank you for this, I completely agree.
With many of my past projects, they were built in waves of ideas that were just rushing out of my head. With it being so streamlined, it was difficult to adequately comment on the code because I was so full of ideas. Then when coming back later, I had to relearn what I had made previously and it was just tiresome. I'm sure there's a word out there for it somewhere. :D

I will consider your advice and slow it down, keep it tidy, and hope to not have to sanity-check my work 80 times every time I need to edit a source document. Thanks again for the response!
2024-05-07: Returning from a 7-year disappearing act; please be kind.
User avatar
MichaelFarthing
Member
Member
Posts: 167
Joined: Thu Mar 10, 2016 7:35 am
Location: Lancaster, England, Disunited Kingdom

Re: NASM Addressing and Raw Binaries

Post by MichaelFarthing »

Code: Select all

mov dword [esi], 0x00636465 ;"dce"
mov bl, 0x0F  ; attrib
call _screenWrite

I think what you are trying to do here is make esi point to a zero terminated string "dce" {it's actually "edc" but that's a trivial slip]
However, the code gives no clue what is in esi when this instruction is reached. What have you put in it?
I'm a bit afraid that the answer might be that you haven't initialised it at all - and if that is the case the instruction has written your zero terminated string to whatever part of memory esi happened to be pointing to at the time.

I'm also wondering if you properly understand the difference between "mov DWORD [esi] 0x00636465" and "mov esi 0x00636465" in more detail than just being able to say [esi] means a pointer and esi doesn't. Can you explain exactly what the processor does in these two cases and how memory is affected?

Forgive me if you do undertand properly, but the code you've posted really leaves me wondering. If you cannot answer the question, "What is in esi before the instruction executes?" you really do have understanding problems.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: NASM Addressing and Raw Binaries

Post by human00731582 »

MichaelFarthing wrote:

Code: Select all

mov dword [esi], 0x00636465 ;"dce"
mov bl, 0x0F  ; attrib
call _screenWrite

...However, the code gives no clue what is in esi when this instruction is reached. What have you put in it?
In my bootloader, I've arbitrarily pointed the ESI register to address 0x60000. This is one of those concepts-in-the-works sort of things, where it's subject to change later of course (just like the rest of the project :D ). At least for now, I am mapping the stack to 0x90000, ESI to 0x60000, pages to 0x70000-0x73000, etc etc.
MichaelFarthing wrote: I'm also wondering if you properly understand the difference between "mov DWORD [esi] 0x00636465" and "mov esi 0x00636465" in more detail than just being able to say [esi] means a pointer and esi doesn't. Can you explain exactly what the processor does in these two cases and how memory is affected?
I appreciate your response, I truly do. But I would rather focus on the core issue I am having, as errors printing strings is a trivial-ish issue that has less priority than finishing my IDT. Although, the IDT offset variables and these data string addresses share a related issue, and that is that their index or reference in memory is not being addressed properly during runtime.

The label szStringToPrint denotes the start of the bytes I am defining. When I mov it to ESI, it is making ESI point to the address labeled as szStringToPrint, so that when I lodsb (w/d/q), it is retrieving values pointed to by those locations and incrementing accordingly, until I break the loop by catching the null-terminator.

The issue is not how I print, or how I work with ESI, but rather with defining labels outside of the bootloader. The same thing happened when creating my GDT. Instead of doing the work and lgdt after the kernel jump (which was a bad idea anyways), I had to put the GDT in the bootloader for the labels to even work properly.

Needless to say that I am very bad at articulating my knowledge anyways, and I would make a horrible teacher. So if I sound ditzy about something, feel free to offer a correction if you wish to, but understand that I am very enthusiastic about learning everything I can, not to mention the process of using that information -- and any information the more experienced of you can offer is 100% valuable to me! :)

Thanks again!
2024-05-07: Returning from a 7-year disappearing act; please be kind.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: NASM Addressing and Raw Binaries

Post by ~ »

You could probably simply fix it by using ORG in your main kernel file.

For example, if you are loading your kernel to 0x100000 from your boot code, you could use:

Code: Select all

org 0x100000

That will make all memory references point properly where they should, instead of pointing to base address 0.

In 64-bit the code is supposed to be much more position-independent, but it's always a good practice to specify the base address of the binary image.

It will probably always be necessary for specifying the addresses of data labels.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: NASM Addressing and Raw Binaries

Post by human00731582 »

~ wrote: You could probably simply fix it by using ORG in your main kernel file.
...
That will make all memory references point properly where they should, instead of pointing to base address 0.
Wow, that fixed it!!! =D> Such a simple solution, yet it's one I didn't think of even trying... A bit embarrassing, to be frank.

So for anyone browsing this thread in the future who's also using NASM, if your references to labels are not working and you are assembling an ASM-only kernel with a custom bootloader... Use ORG statements on each file that is not just a "%include" when you're trying to address a label. I'm probably pointing out the absolute obvious, but hey, if it happened to me it could happen to anyone else. :)

Thank you so so so much, Mr. ~! I can finally move on with working out the specifics of my IDT! You rock! [-o<
2024-05-07: Returning from a 7-year disappearing act; please be kind.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: NASM Addressing and Raw Binaries

Post by human00731582 »

Also, I wanted to add something real quick, for the sake of some more conversation and dialogue with the folks here. Maybe this could get added to the section in the "ISR not working" page as a small aside in the ISR not returning section...

I was finishing up the keyboard ISR handler and it was weirdly not letting me iretd from the interrupt routine. I deliberated for hours, checking constantly that my stack was in order, that I was not trashing any registers, and that my functions were passing data between each other correctly. Then, it hit me, the interrupt was returning to a hlt instruction, with nothing after it but a small data section and empty space. [-X

So, I changed the "hlt" at the end of my kernel to this:

Code: Select all

.repeatISRTest:
	mov ecx, 500
 .repThis:
	mov eax, ecx
	loop kernel_main.repThis
	jmp kernel_main.repeatISRTest
A weird way to hang up the processor, but it worked like a charm. Hope this helps somebody someday! 8)
2024-05-07: Returning from a 7-year disappearing act; please be kind.
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: NASM Addressing and Raw Binaries

Post by iansjack »

Wouldn't it be simpler to do:

Code: Select all

jmp .
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: NASM Addressing and Raw Binaries

Post by LtG »

iansjack wrote:Wouldn't it be simpler to do:

Code: Select all

jmp .
Lol. Though you'll want to throw HLT in there, which both of you decided to leave out =)

Btw, what's with the moving to eax and double loop?

Presumably there was a reason why the ISR returned to a HLT, instead of looping that HLT you did something completely different?
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: NASM Addressing and Raw Binaries

Post by human00731582 »

LtG wrote: Lol. Though you'll want to throw HLT in there, which both of you decided to leave out =)

Btw, what's with the moving to eax and double loop?

Presumably there was a reason why the ISR returned to a HLT, instead of looping that HLT you did something completely different?
It was a quick-fix sort of solution to keep the processor busy while not receiving INTs. What I realized is that doing that useless loop, I was burning up a lot of CPU resources in the interim between my interrupts. I have since refined the code to a much simpler format that is much more energy-efficient as well (and isn't completely useless :D ).

Code: Select all

; Hang and wait for some ISRs.
	sti
 .repHalt:
	call _parserCheckQueue
	hlt
	jmp kernel_main.repHalt
I have since learned that if you only HLT at the end of your main code, then after an INT the EIP will return to the space directly after the HLT command and start trying to execute the next stuff in line, which for me was my misc data section. This loop is a much calmer, resource-efficient way to wait for IRQs and INTs to trigger command checks (such as 'did the user hit enter?').

Also, this is relevant, because in every %include file I've used to run a command and check the input buffer, I've created super-variables (or globals, w/e you want to call them) like COMMAND_QUEUE and INPUT_BUFFER and the only way to access those variables without causing a GPF or triple-fault was to add the KERNEL_OFFSET variable, which ~ told me about earlier.

That code is executed when the kernel is done initializing everything, right at the end of the main function. I'm adding a shell command to proceed to userspace from there, and eventually I'll probably just ditch the raw shell on startup altogether for straight userspace.
2024-05-07: Returning from a 7-year disappearing act; please be kind.
Post Reply