OSDev.org

Posted: **Sun Oct 12, 2003 11:58 am**

..So as long as our address space is continuous (that is, there’s no gaps or jumps) we can get the processor to associate any virtual address with any physical address. In our example before, with the kernel located at 0xC0000000 but loaded at 0x100000, we need a segment base address which turns 0xC0000000 into 0x1000000; that is, 0xC0000000 + base = 0x1000000. This is easy: our segment base address needs to be 0x41000000

I just wanted to know if this was right or am I missing out something here. Tim talks about addresses 0xC0000000 and 0x100000(1MB) but then shows the association between 0xC0000000 and 0x1000000(16MB). Is this what was intended or is there something else ??. Pls let me know.

Posted: **Sun Oct 12, 2003 4:11 pm**

That's a typo. It should be 0010_0000 (1MB) in both places.

Posted: **Mon Oct 13, 2003 12:21 pm**

Thanks for letting me know.I appreciate it.

Posted: **Tue Dec 09, 2003 11:30 am**

@Tim: I just wanted to know more about the ISA DMA addresses you talk about in your Memory management1. how do you handle this using either the stack or the bitmap method? info about the floppy too would be nice?
btw I've decided to use the bitmap method for allocating physical memory as it usually isn't much. Any ideas/opinions on this?

Posted: **Tue Dec 09, 2003 12:07 pm**

Bitmap is just fine. The main point against it is that it is slow. So try to implement a system with super bitmaps. I think there is some more information on BonaFide ... *looking* ... yes, here it is: http://osdev.neopages.net/tutorials/cot ... ?the_id=47

Posted: **Tue Dec 09, 2003 12:56 pm**

The problem with bitmaps is that allocation speed increases linearly with the number of pages allocated -- i.e. the first allocation happens instantly, then the second one needs to skip over the first bit before it succeeds, and so on. Allocating from a stack or a linked list takes constant time -- you just grab the first item.

You can grab specific addresses with a stack or linked list, too. Mobius maintains two linked lists (note: not stacks any more) of free pages: free_low and free. free_low contains the pages below 1MB; free contains the rest. When the floppy driver wants to allocate its buffer, it allocates from free_low; everything else allocates from free. The normal allocator will allocate from free_low if it runs out of spare pages in free.

Posted: **Tue Dec 09, 2003 1:31 pm**

yeah i was thinking of using 'super_pages'
The linear search wont be a delay if the no. of pages are low as in physical memory. what say you?

Posted: **Tue Dec 09, 2003 2:46 pm**

It's all relative. The lower the 'order' of an algorithm, the better. "Big O" notation represents the order of an algorithm, and gives an idea as to how efficient it is. When considering the efficiency of an algorithm you always think about the worst case, and you always think relatively -- you can say "this algorithm is twice as fast as this one", but you can't say "this algorithm takes 2 seconds less to run than this one".

Constant, or O(1), is best. O(1) algorithms execute in constant time regardless of the number of items; that is, in time c * 1.

Linear, or O(n) is next. O(n) algorithms execute in a length of time proportional to the number of items; that is, in time c * n.

O(log2 n) is another one. This covers things like binary trees, where the time taken increases as c * log[base 2] n. For example, a search of a binary tree increases proportionally to the number of levels in the tree; the number of levels in the tree increases as log2 of the number of items; so a search of a binary tree executes in O(log2 n).

Next is O(n^2), or order n squared. This is pretty bad. This search routine is O(n^2):

Code: Select all

for (i = 0; i < count; i++)
    for (j = 0; j < count; j++)
        if (i != j && items[i] == items[j])
            items[j] = -1;

The time taken for this code to run is proportional to count*count: if count doubles, the time taken to run quadrupes.

Higher order algorithms (O(n^3) and above) just get worse. If you find yourself writing one of these -- stop, and think how you could do it better.

So:
(1) Allocating one page from a stack is O(1): it takes the same amount of time regardless of the number of pages
(2) Allocating one page from low memory from a stack of all pages is O(n): the time it takes depends on the number of pages in the stack (since you might need to search the whole stack)
(3) Allocating one page from a bitmap is also O(n): again, you might need to search the whole bitmap for a free page

It's impossible to say whether (2) is faster than (3) without knowing exactly how the two implementations were written. At this point, you need to profile your code by measuring the time taken for, say, 1000 allocations to happen.

Posted: **Tue Dec 09, 2003 3:03 pm**

what about the space complexity?
the stack approach needs 1MB space for 128Mb RAM isn't that so(or am i wrong)?

Posted: **Tue Dec 09, 2003 3:06 pm**

A few more points:

If there were such a thing, O(0) is best. This algorithm executes in no time regardless of the number of items. Also known as "not doing it at all". In other words, the fastest code is the code that isn't there.

Another example: Some scheduler algorithms.

Code: Select all

For t1 in threads
    For t2 in threads
        If t2 != t1 and t2->Priority > t1->Priority
            Return t2;
    Return Idle;

This looks at every thread and tries to find one with a higher priority; if there is one, it returns it. OK, it's probably not a very good scheduler, but that's not its main problem. As I hope you can see, it executes in O(n^2): if n is the number of threads, then for each thread you create, it's got to go through the list n more times. As n gets big, the time taken to run gets really big.

Code: Select all

MaxPriority = 0
Current = Idle
For t in threads
    If t->Priority > MaxPriority
        MaxPriority = t->Priority
        Current = t
    End If

This is more useful as a scheduler. It looks for the thread with the highest priority and runs it. It executes in O(n). When you add another thread to the list, it takes a fixed amount of time extra to run.

Code: Select all

For i = MIN_PRIORITY to MAX_PRIORITY
    If Priorities[i].First != NULL
        Current = Priorities[i].First
        Break
    End If

This algorithm is the same as the previous one -- it picks the thread with the highest priority -- but this one is O(1). The different priority levels are separated out into queues, and it's easy to find the highest-priority queue containing a thread. Finding the right queue is O(1) (it depends on the values of MIN_PRIORITY and MAX_PRIORITY, which are fixed), and grabbing the head of the selected queue is also O(1).

If you have an algorithm consisting of several operations of different orders, always take the highest order operation. The reason for this is that you should be thinking of really big numbers. For suitably low values of n, an O(1) loop containing an O(n) loop would be dominated by the O(1) loop outside -- say, if the outer loop was going from 0 to 100, the inner loop was going from 0 to n, and n=7. But that doesn't help us. If n=10000, the effect of that inner loop is going to far outweigh the outer loop. So the overall algorithm is O(n).

These couple of posts have really been about optimisation. Always remember some golden rules of optimisation:
Premature optimization is the root of all evil - Donald Knuth (said to be the best programmer in the world)
Rules of optimzation:
1. Don't
2. (for experts only) Don't -- yet - anonymous
It's better to have a correct program running slowly than an incorrect program running quickly. I can write an incorrect program and have it execute in zero time.

This is turning into "Tim Robinson's Optimisation"...

Posted: **Tue Dec 09, 2003 3:10 pm**

Neo wrote:what about the space complexity?
the stack approach needs 1MB space for 128Mb RAM isn't that so(or am i wrong)?

Well, a stack with 32 bits per page would need 1KB for every 1MB of RAM (or less than 0.1%), so there's not much space overhead. A bitmap would need 1/32 of that amount, but you pay for that with decreased allocation speed.

Posted: **Sat Dec 13, 2003 8:40 am**

help GDT
My code was working fine until today when i tried to enable paging. I want to load the kernel at 0xC0000000 so i changed gdt descriptors to all have a base address of 0x40100000 so that (0xC0000000+0x40100000) gives 0x100000 but now i get an error as soon as i reload the GDT in the bootloader (just before i set the PE bit in CRO). Anybody have an idea why?
My gdt file is here for reference.
;-------------------------------------------------------;
;         GLOBAL DESCRIPTOR TABLE    ;
;-------------------------------------------------------;

gdt:

NULL_SEL    EQU    $-gdt
   dw   0   ; seg_length1_15
   dw   0   ; base_addr0_15
   db   0   ; base_addr16_23
   db   0   ; dflags
   db   0   ; access
   db   0   ; base_addr24_31

CODE_SEL   EQU    $-gdt
   dw   0xFFFF   ; seg_length1_15
   dw   0x0000   ; base_addr0_15
   db   0x10   ; base_addr16_23
   db   0x9A   ; dflags
   db   0xCF   ; access
   db   0x40   ; base_addr24_31

DATA_SEL   EQU    $-gdt
   dw    0xFFFF
   dw    0x0000
   db   0x10
   db   0x92
   db   0xCF
   db   0x40

STACK_SEL   EQU    $-gdt
   dw   0xbFFF
   dw   0x0000
   db   0x10
   db   0x92
   db   0x49
   db   0x40

gdt_end:

gdt_descriptor:
   GDT_SIZE   dw   gdt_end - gdt - 1
   GDT_BASE   dd   0xF00   ; i also tried 0x40000F00 with same results
;-----------------------------------------------------------------------;

Posted: **Sat Dec 13, 2003 9:46 am**

hi,
you use a modified selector base before you set PE bit. This is so that your addresses prior to enabling paging and your addresses after you enable paging map to the same addresses.
This is what you have to do. Lets say that your kernel is at 0x100000 but linked to 0xC000 0000,before paging inorder to run the code you need selectors with base set as 0x40100000 (0x40100000 + 0xC0000000 = 0x100000)
you also need to map the address space where the code that enables paging is to the same address in the page tables i.e. if the code is in the 1mb - 2mb range,identity map 1mb - 2mb linear to 1mb - 2mb physical. (THIS CAUSED ME HEADACHES FOR A WHOLE WEEK BEFORE I GOT IT ???)
Then, before you enable paging change the base address in the gdt to zero and limit to 4GB. Then enable paging, then change the selectors for DS,ES,.... Then jump _new_cs_sel_zero_base:0xC0000000

Also GDT_BASE equ 0xf00 is wrong if you are not copying the gdt to that address.
If you are using the gdt as it is, then your GDT_base equ gdt.

Posted: **Sat Dec 13, 2003 12:03 pm**

err... can't say that i understood that completely

you also need to map the address space where the code that enables paging is to the same address in the page tables i.e. if the code is in the 1mb - 2mb range,identity map 1mb - 2mb linear to 1mb - 2mb physical.

can you give me a more detailed explanation please.
and yeah i do relocate the GDT to that base

Posted: **Sat Dec 13, 2003 1:46 pm**

i guess my real question is what really happens when i use LGDT [desc], does the CPU straight away start using this new GDT or does it continue using the old CS etc values until i explicitly issue a jmp instructionv

OSDev.org

Tim Robinson's Memory Management-1

Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1

Re:Tim Robinson's Memory Management-1