For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
I recommend using aligned buffers for 4 bytes in each iteration using any general purpose register (GPR) or using 8 bytes per iteration using MMX. Putting 1 byte per iteration is really not a good idea in my opinion.
On the field with sword and shield amidst the din of dying of men's wails. War is waged and the battle will rage until only the righteous prevails.