The I/O bitmap is, for userland drivers, the fastest
safe method to allow port I/O.
I don't know about the erratum that says to add extra bytes, but the only reason it would be needed was to fix port i/o that wraps around the 16 bits - which is a software error. If you need to lose the byte on the processor in question, remove the last byte (i.e. make the TSS size 12k - 1 byte) - from what I've seen there's nothing mapped to ports FFFC-FFFF. You better not drop the first 32 as the DMA controller starts from port 0000 and you can't do "proper" (no Dex, don't) floppy and soundblaster drivers without that range.
The fully correct solution is to have a page of ones, or no page, directly after the TSS. This also has the advantage that you can just reuse your "white page" for all TSSes until the process in question actually uses ports, in which case you can allocate the needed page on demand and save space. If you care about the bits you can also just use three pages.
Note that you might want to take care of the fact that directly under the IO bitmap is the interrupt redirection bitmap - in case you ever want to do Virtual 8086 mode. (so you'd lose the top 256+8 ports with only 2 pages, which again never was a problem here)
I won't bother with my TSS implementation - it uses some really nasty hacks to save spacetime on SMP systems. There's a link in my signature for the daring ones