Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
I don't remember who said that PSE (Page size extension) doesn't work under bochs....well I got a short example here of something that works under bochs.
It maps on 1:1 the first 4mb of memory no problem.
// Page Attributes
#define PG_PRESENT 1 // 0 = not present, 1 = present
#define PG_READWRITE 2 // 0 = read-only, 1 = read-write
#define PG_US 4 // 0 = supervisor, 1 = user
#define PG_PWT 8 // Page write-through 0 = disabled, 1 = enabled
#define PG_PCD 16 // Page cache disable 0 = cached, 1 = not cached
#define PG_ACCESSED 32 // 0 = not accessed, 1 = accessed
#define PG_DIRTY 64 // 0 = not written to, 1 = written to For page table
#define PGDT_SIZE 64 // For Page Directory table, should always be set to 0
#define PG_SIZE 128 // 0 = 4kbytes page, 1 = 4 mbyte page
// PG_PAT 0x256 ; don't want to support this yet.
/*
PG_GLOBAL if set, will not get invalidated in the TLB table.
In order to work, the PGE flag in CR4 needs to be set
*/
#define PG_GLOBAL 512
// Bits 9,10,11 are available to software
// If PG_PRESENT is clear, bits 1 to 31 are available to software.
extern void paging_install();
#define CR4_PSE_ENABLE 16 // Page size extension
PSE are mainly useful in conjunction with the "GLOBAL" bit so that you can reduce the amount of TLB entries that should be kept "sticky" for all address spaces.
It may also be interresting for e.g. mapping 16MB of video memory in some address space at low cost, etc.
the TLB's commonly have separate 4M (or 2M) entries that allow you to not waste them with loads of 4k pages. So, using 4M pages not only frees a lot of 4k entries, it also makes those 4M pages a lot quicker to access.
Combine that with global and you always have your kernel pages in the TLB. Now that speeds it up quite a lot.