Size of Data

All off topic discussions go here. Everything from the funny thing your cat did to your favorite tv shows. Non-programming computer questions are ok too.
Post Reply
Columbus
Posts: 18
Joined: Sat Sep 13, 2014 2:26 am

Size of Data

Post by Columbus »

Why is the smallest addressable data always 1 Byte or 8 Bits wide?
Why hasn't someone extended it to 16 Bits?
Wouldn't that reduce the size and/or complexity of some mechanics.
Maybe one could introduce a 32 bit wide "Byte" (smalles addressable unit).

What may be the reasons, and how do you think about it?
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: Size of Data

Post by NickJohnson »

Assuming we're talking about x86 here, you can do memory operations at 8 bit, 16 bit, 32 bit, 64 bit, and 128 bit (using SSE), often with arbitrary alignment. There's a difference between the smallest addressable unit of memory and the only addressable unit of memory.
Columbus
Posts: 18
Joined: Sat Sep 13, 2014 2:26 am

Re: Size of Data

Post by Columbus »

But as far as I understand, the CPU loads values byte by byte. (Or am I mistaken?)
So wouldn't it be an improvement to extend its size?
AndrewBuckley
Member
Member
Posts: 95
Joined: Thu Jan 29, 2009 9:13 am

Re: Size of Data

Post by AndrewBuckley »

memory operations are not limited to 8 bits wide, cpus are very optimized to allow loading a full 32 bit or 64 bit number from memory at once. a "Byte" was not always 8 bits, but now that we have 45 years worth of code that have assumed an 8 bit Byte, any change to this would break everything. Multi-Byte wide math operations have rules for every architecture that say how to efficiently use them, and the compiler is the one that maps your code to them.
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: Size of Data

Post by NickJohnson »

Modern processors definitely don't load memory on a byte-by-byte basis; the reality is much more complex. At the very least, the smallest unit of data transfer between main memory and the last level of cache (L2 or L3) is the size of a last level cache line, which is (e.g. on Haswell) 64 *bytes* per line, or 512 bits. This ignores prefetching, unaligned accesses, and other latency-hiding mechanisms that might increase how much data is transferred.

So, in reality, the processor is not capable of doing single-byte transfers or even 128-bit transfers to/from memory. The real transfers are much larger.
alexfru
Member
Member
Posts: 1111
Joined: Tue Mar 04, 2014 5:27 am

Re: Size of Data

Post by alexfru »

Columbus wrote:Why is the smallest addressable data always 1 Byte or 8 Bits wide?
Why not? It's handy, not too small, not too large.
Columbus wrote:Why hasn't someone extended it to 16 Bits?
How do you know? A number of Texas Instruments' DSP CPUs have their smallest addressable unit equal to 16 bits.
Columbus wrote:Wouldn't that reduce the size and/or complexity of some mechanics.
In the CPU? Absolutely. Now take the software side. Suddenly you need to waste extra instructions and cycles to extract and pack 8-bit quantities out of and into 16-bit ones. Or you just waste half the available memory.
Columbus wrote:Maybe one could introduce a 32 bit wide "Byte" (smalles addressable unit).
Maybe one could read up on CPUs and find out that such CPUs have existed in the past and exist today?
Columbus wrote:What may be the reasons, and how do you think about it?
Is this your homework?
willedwards
Member
Member
Posts: 96
Joined: Sat Mar 15, 2014 3:49 pm

Re: Size of Data

Post by willedwards »

The 8-bit byte is legacy from the IBM 360.

Before the 360 there were lots of other variations such as by-bit, decimal digits or word-addressable (for different word sizes).

Nearly 50 years is so much legacy that its not viable to challenge it: you have to interact with the rest of the world, which is firmly 8-bit.

Historically x86 and ARM CPUs have penalized non-aligned loads, but minimizing this penalty has been getting attention in recent years and the gap has narrowed.

Most new SIMD instruction sets require aligned loads.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Size of Data

Post by Brendan »

Hi,
NickJohnson wrote:Modern processors definitely don't load memory on a byte-by-byte basis; the reality is much more complex. At the very least, the smallest unit of data transfer between main memory and the last level of cache (L2 or L3) is the size of a last level cache line, which is (e.g. on Haswell) 64 *bytes* per line, or 512 bits. This ignores prefetching, unaligned accesses, and other latency-hiding mechanisms that might increase how much data is transferred.
Yes - modern 80x86 typically loads cache lines.
NickJohnson wrote:So, in reality, the processor is not capable of doing single-byte transfers or even 128-bit transfers to/from memory. The real transfers are much larger.
For "uncached" areas (e.g. memory mapped IO) the CPU will read/write individual bytes when software tells it to; including areas of RAM that are configured as "uncached" (e.g. the firmware's SMM area). In addition, it's possible to force the CPU to write a byte to RAM even for "write-back" cached areas (e.g. by using a MASKMOVDQU instruction where all bytes are masked except one that's followed by an SFENCE or MFENCE).
Columbus wrote:Why is the smallest addressable data always 1 Byte or 8 Bits wide?
Why hasn't someone extended it to 16 Bits?
There are CPUs that don't allow misaligned loads, that are only capable of (e.g.) accessing 16 bits or 32 bits of data from RAM. The problem is that you end up emulating byte accesses in software (e.g. doing a "load, shift, mask" instead of a 1-byte load, and doing a "load, mask, shift, or, store" instead of a 1-byte store) and it ends up being significantly slower. The alternative is for software to never use anything smaller than the CPU's minimum access size (e.g. "CHAR_BITS == 16" in C) which can waste a lot of memory and make caches less efficient and it ends up being significantly slower.
Columbus wrote:Wouldn't that reduce the size and/or complexity of some mechanics.
Maybe one could introduce a 32 bit wide "Byte" (smalles addressable unit).
It would reduce the complexity of the CPU a little (and make the CPU slower in practice). However, CPU manufacturer's are trying to do the opposite - they've got a budget of "many millions of transistors" and are trying to find ways of using those transistors to improve performance. For an example, Intel's Haswell CPUs are using around 1.4 billion transistors, and Apple's A8 chip (which contains an ARMv8 core) is using around 2 billion transistors. They can afford to use a few extra transistors to improve the performance of byte accesses.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Re: Size of Data

Post by SpyderTL »

Also, I still use a lot of Boolean values when I'm coding, so I'm already wasting 7 bits every time I store a 0 or a 1 value in a byte. Larger "bytes" would just make the problem worse...
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
willedwards
Member
Member
Posts: 96
Joined: Sat Mar 15, 2014 3:49 pm

Re: Size of Data

Post by willedwards »

Also, I still use a lot of Boolean values when I'm coding, so I'm already wasting 7 bits every time I store a 0 or a 1 value in a byte. Larger "bytes" would just make the problem worse...
Well, if you are programming in C you can use bit width specifiers on integers e.g.

Code: Select all

int my_bool:1;
However, conventional wisdom is its usually slower.
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: Size of Data

Post by onlyonemac »

When I designed my own CPU archietechture, I did at one point consider using 16-bit addressable data units, but then I realised that I would still need to split them into bytes for comaptibility with e.g. the ASCII system. So basically, there are still too many things that still work with single-byte data units, so it would really just be too awkward to try to restrict a system to using larger data units only. It just wouldn't be feasible with the way that computing has developed to not be able to address single-byte data units. Plus what about the wasted disk space when storing small numbers?

(For the CPU I decided that memory accesses would access the given address for the low byte and the next address for the high byte, thus each memory byte can form either the low or the high byte of a 16-bit word.)
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Post Reply