trying to understand paging

digo_rp · Post by **digo_rp** » Thu May 31, 2007 9:36 am

guys, in a trying to understand a paging I have this situation:

my O.S, I have a load_program funcion

so IÂ´m trying to implement paging. example.

I create a simple flat 32bits program, written in gcc like my kernel, now I donÂ´t know how to setup a paging for that new application to be loaded.
on link.ld for that program is right to be:

OUTPUT_FORMAT("binary")
ENTRY(start)
SECTIONS
{
.text 0x000000 : {
code = .; _code = .; __code = .;
*(.text)
*(.rodata*)
. = ALIGN(4096);
}
.data : {
data = .; _data = .; __data = .;
*(.data)
. = ALIGN(4096);
}
.bss :
{
bss = .; _bss = .; __bss = .;
*(.bss)
. = ALIGN(4096);
}
. = ALIGN(4096);
end = .; _end = .; __end = .;
}

do I have to put at load.asm something like ORG 0?

and I have to setup the paging table for the virtual addres 0 with the physical addres that I load the application ?

how to create a user application, how to create a link.ld for that program and how to setup a paging.

guys please anyone could help me.

jnc100 · Post by **jnc100** » Thu May 31, 2007 4:39 pm

The line

digo_rp wrote: .text 0x000000

in your linker script is the address at which the test program expects to be loaded. Now its okay to load it at 0x0, as long as your kernel is mapped elsewhere (e.g. you followed the HigherHalf barebones).

I guess you're using a flat binary as your file format because you just want to get it working and avoid having to parse section tables for the time being. That's okay, but you have to bear in mind that there's no way to encode section information in a flat binary. In fact, I don't actually know what providing a link script like that does in a flat binary... As I recall, it complains about not supporting the .bss section and converts it to a large section of zeros in the output.

In essence, though, if your program expects to be loaded at 0x0, then that's where you're going to have to map it to. In other words, assuming protected mode, you have to:

- Load the program to start on a page boundary.
- Create a page table
- Set the first entry of this page table to point to the start of the program.
- If the program is larger than 4096 bytes, then set the next entry to point to start of program + 4096 bytes and so on until you map the whole program.
- Set the new page table you have created as the first entry in the page directory.
- Jump to the start of the program, presumably 0x0 as its a flat binary.

I recommend, however, that you read the ELF page in the wiki. Using a proper file format makes it a lot easier in the long run. You can have the start address anywhere in the image, you can properly support .bss and its actually not that difficult to do, especially if you have paging working.

digo_rp wrote:do I have to put at load.asm something like ORG 0?

You haven't told us what load.asm is, so I really can't help you there.

Regards,
John.

Brendan · Post by **Brendan** » Fri Jun 01, 2007 3:20 am

Hi,

jnc100 wrote:I guess you're using a flat binary as your file format because you just want to get it working and avoid having to parse section tables for the time being. That's okay, but you have to bear in mind that there's no way to encode section information in a flat binary.

Encoding information about sections is actually relatively easy for flat binary file formats - all you do is invent your own header, and do something like (in NASM):

Code: Select all

Header:
   dd ENTRY_POINT
   dd CODE_START
   dd CODE_SIZE
   dd DATA_START
   dd DATA_SIZE
   dd BSS_START
   dd BSS_SIZE

Of course the header itself needs to be at a fixed offset in the file (e.g. at the beginning) and the information about sections can be much more complex if you want...

I've done something similar with GCC/LD with my own linker script and a custom startup library that contains the header, although I'm certainly no expert on GCC/LD and I'd assume creating your own cross-compiler would help.

Also, because you're inventing your own header you can impose your own requirements, and include anything you like. For e.g. you could have a "flags" field that enables/disables certain features for the process, a "version" field intended for auto-update (package management), process requrements (e.g. a "needs SSE1/2/3" flags), a process name string ("Hello World"), a process information string ("This code is copyright under GPL", etc), a developer/support email address ("[email protected]"), etc.

This allows for some more advanced and/or more user-friendly stuff (e.g. checking if the computer has everything necessary to run the process before it's started, automatically checking for a newer version online, automatically generating "Your program crashed" emails to be sent to the developers, etc).

IMHO unless you need position independant code and shared libraries, flat binary is much easier and much more flexible in the long run.

Cheers,

Brendan

jnc100 · Post by **jnc100** » Fri Jun 01, 2007 4:06 am

Brendan wrote:Encoding information about sections is actually relatively easy for flat binary file formats - all you do is invent your own header

When you start doing that its hardly a flat binary anymore. Why not use a file format that is already supported by gcc, binutils and nasm?

Brendan wrote:Also, because you're inventing your own header you can impose your own requirements, and include anything you like. For e.g. you could have a "flags" field that enables/disables certain features for the process, a "version" field intended for auto-update (package management), process requrements (e.g. a "needs SSE1/2/3" flags), a process name string ("Hello World"), a process information string ("This code is copyright under GPL", etc), a developer/support email address ("[email protected]"), etc.

Such information can easily be incorporated into a separate section in a standard binary format, thus removing the need to create a separate binary format for your own os. Microsoft managed a similar feat with including metadata for managed code in their existing PE file format. As I recall, they didn't even need to put it in a separate section. In my opinion, it is silly to create a new executable format for each new operating system and we should only create a new one if none of the others is possibly able to satisfy our needs. I remember a similar comment from a 1999 copy of the os-faq by Dark Fiber relating to the fact that every new operating system seemed to come with its own version of FAT.

Regards,
John.

Brendan · Post by **Brendan** » Fri Jun 01, 2007 6:51 am

Hi,

When you add your own required/custom entries to an existing format like ELF, you end up with your own proprietory/custom format (i.e. "extended ELF"). When you add your own required/custom header to an existing format like flat binary, you end up with your own proprietory/custom format (i.e. "extended flat binary").

I don't see a difference.

Why not use a file format that is already supported by gcc, binutils, NASM/YASM, a86/a386, MASM, TASM, FASM, Borland C, etc?

The only reason I can think of is position independant code and dynamic linking.

I'd also assume that most people who think they support ELF only actually support a subset of ELF - i.e. ELF executables that have one .text, .rodata, .data and .bss section, and don't have arbitrary sections (e.g. .myText1, .myText2, .myText3). I'd also assume most people don't use any extra information that may be present (like debugging information).

When I see something like "If anyone around knows what this is about, you're welcome ... -- PypeClicker" then it really does make me wonder how easy it would be to fully support ELF (rather than only supporting a subset of ELF).

jnc100 wrote:In my opinion, it is silly to create a new executable format for each new operating system and we should only create a new one if none of the others is possibly able to satisfy our needs.

Why? Unless your OS can actually run executables that are designed for another OS, there doesn't seem to be much point using the same executable file format as another OS...

The funny part is that there's no standard "OS identifier" for ELF (unlike the Architecture identifier that says what CPU/architecture the file is intended for). This means that unless you support the whole "System V ABI" (which includes curses, X, standard C libraries, etc) you must create your own proprietory/custom format (i.e. "extended ELF") so that you can tell the difference between a standard ELF executable and a native "Your OS ELF executable". Of course this is bad news for a Linux user who tries to start a "Your OS ELF executable". This is mainly because ELF is part of the "System V ABI", and is expected to be used by UNIX clones (where all UNIX clones can run all ELF executables).

Cheers,

Brendan

jnc100 · Post by **jnc100** » Fri Jun 01, 2007 9:50 am

Brendan wrote:I'd also assume that most people who think they support ELF only actually support a subset of ELF - i.e. ELF executables that have one .text, .rodata, .data and .bss section, and don't have arbitrary sections (e.g. .myText1, .myText2, .myText3). I'd also assume most people don't use any extra information that may be present (like debugging information).

If you divide the parts of a file into loadable and non-loadable, then you can get all the information on segments that a statically-linked executable needs to run (i.e. loadable) from the program headers. They don't differentiate between, for example, a .text and a .myText1 section as long as both are read-only executable. You simply map each section as needed and set the access rights in the page tables.

The non-loadable information contains debugging information, the format of which isn't defined by ELF, and dynamic library information.

Brendan wrote:When I see something like "If anyone around knows what this is about, you're welcome ... -- PypeClicker" then it really does make me wonder how easy it would be to fully support ELF (rather than only supporting a subset of ELF).

He's referring to the segment marked 'DYNAMIC' in the program headers table, which corresponds to the .dynamic section. This is basically information that must be loaded and is used by the run-time dynamic linker. It is an array of Elf32_Dyn structures, with the d_tag value of the last set to DT_NULL. Each entry defines a different aspect of dynamic loading, e.g. dependencies and relocation information. This is in the elf documentation, although I do admit it is marked as SysV specific.

Regards,
John.