Hi,
Enabling PSE doesn't actually do anything unless/until you also set the "PSE" bit in one or more page directory entries.
Supporting large page sizes alone (without any 4 KB pages) wastes huge amounts of memory.
Supporting mixed page sizes complicates the physical memory manager because you need to keep track of 4 KB pages and 2/4 MB pages seperately. You'd also need to be able to split a 2/4 MB page into many 4 KB pages, and find a way to combine many 4 KB pages back into a 2/4 MB page (without this "re-combinining" sooner or later everything turns into 4 KB pages anyway).
For this reason I'd suggest you figure out how you're going to manage physical memory before making this decision. For example, if you want to use fast free page stacks then you can't re-combine - you'd need to use a page allocation map, and even then the chance of being able to re-combine may be small (it depends on what percentage of RAM is free 4 KB pages).
I've heard of people exchanging used pages for free pages so re-combinining can happen much more often (for e.g. when most of the 4 KB pages in a larger page are free you search for the remaining used 4 KB pages and exchange them for other free 4 KB pages so you can create a single free 2/4 MB page). I'm not sure how much overhead this would cause.
For PAE everything is different. Instead of having 3 levels (page directory, page table and page) you get 4 levels (page directory pointer table, page directory, page table and page). It also makes each entry in all of these tables 8 bytes instead of 4 bytes, and if a page table entry has it's PSE bit set you get 2 MB pages instead of 4 MB pages.
If you intend to use PAE and support older CPUs that don't have PAE then you'll need to write most of your linear memory management code twice. In this case I'd suggest starting with non-PAE paging and adding PAE support later on.
Because the page directory entries and page table entries use 8 bytes each instead of 4 bytes, PAE doubles the amount of memory used to maintain linear address spaces. For computers with less than 4 GB of RAM it's more efficient to avoid PAE...
BTW for complete implementation details see Intel's System Programmer's Guide

...
Cheers,
Brendan