Still, if you want to assume a "modern" processor, you can relieve a bit of register pressure by offloading integer computations off to the MMX registers, and floats to the SSE registers. They can do less than vector instructions
This can of course be impractical in kernel space when you want to avoid the FPU or when you can not safely assume you have a P3 or better.