Sigh I really ought to leave this alone, just leave your imbecilic ranting buried under the Killfile, but noooooo...
~ wrote:
About using automatically-sized variables for increased portability, if you think about it, it would be better to use something like intwide_t/uintwide_t, than having to decide between standard longs and ints.
That is
exactly what
int and
long already are! The standard C numeric types are (IIRC) defined as:
char - a integer value of at least seven bits, capable of holding an ASCII character,
defined by the platform and the specific compiler.
short int, or
short - an integer value no smaller than the size of a
char, and no larger than the size of an
int.
int - an integer value no smaller than a
short, but no larger than a
long.
long int or
long - an integer value no smaller than an
int.
float - a floating-point type, preferably matching one of the native FPU formats, not larger than a
doubledouble - a floating-point type, preferably matching one of the native FPU formats, not smaller than a
floatIn all cases, the signed versions would use (one of) the native sign method(s) of the CPU's arithmetic operations, though most C code assumes 2's-complement wherever it matters, because very, very few CPUs after the mid-1960s used anything else.
This ambiguity was deliberate, because C doesn't just run on x86 systems, or even just on 32-bit and 64-bit systems. The language standard leave a lot of wiggle room, in part to avoid having to deprecate a lot of existing code, but mostly because they can't predict the hardware it would be used on.
However, this came at the price of exactitude in the language - you could have two compilers, on different systems or even on the same system, which used different bit widths. The fixed-size integer types -
int8_t,
uint8_t,
int16_t,
uint16_t,
int32_t,
uint32_t,
int64_t,
uint64_t - were introduced to bring back the predictability, but at the cost of - you guessed it - the very flexibility you are looking for. The flexibility, I might remind you, which is still part of the language with the native-sized types.
Now, on the 32-bit x86, and most other 32-bit systems, compilers usually mapped these as
char - byte,
short - two bytes,
int - four bytes,
long - usually 4 bytes, but occasionally 6 or 8 bytes (in either case, some special data juggling would be needed to use a bit width greater than the largest native value). However, this was just a convention, not a part of the language definition, or even a necessary method on the x86.
When 64-bit systems started to appear, this led to an obvious compatibility problem for code that mistakenly assumed a 32-bit
long, so the
long long int type was added, and (IIRC) defined as 'no shorter than
long and matching the largest native type in the system'. One wonders what will happen if 128-bit registers, addresses, and data paths ever become widely used...
For addresses - or rather, pointers - things get a bit trickier, because some CPU ISAs have multiple pointer sizes, either for different memory modes (as in the x86) or for different kinds of addressing (I can't think of any examples OTTOMH, but I understand it was something known to come up in things like the Burroughs mainframes). The default assumption is that addresses are all of a fixed size for a given CPU mode, that the address space is flat, and that the CPU won't change modes or mix memory models at runtime. This led to some complications for the MS-DOS C compilers, which meant that you generally needed to select a memory model at compile time, but again, the standard allowed for this. Hence the 'tiny' (single code/data/stack segment), 'small' (single code and data segment, separate stack segment), 'standard' (single code segment, separate data and stack segments), 'big' (multiple code segments, one each for data and stack), and 'huge' (anything goes, but all pointers are FAR, and watch the segment boundaries) models used by most x86 compilers of the era.
I am, as I said, working mostly from memory, so I probably have some of this wrong or out of date. Comments and corrections welcome.
~ wrote:
With automatic size selection, you bring automatic register/word width portability to any CPU architecture, not just x86, and with that your code would be much cleaner.
I really don't know why types like ptrwide_t/uintwide_t/intwide_t were never added to the C/C++ standard and assembly language. Code today would be incredibly more portable now.
Well, gee, we can just use the type tags too... oh, wait, you tell me that the x86 doesn't support
tagged memory, and neither do any of the other major general-purpose ISAs today? Shocking!
(Actually, it
is shocking, or at least unfortunate, but the truth is that tagged memory has
never been common, mainly for reasons only tangentially related to the technique itself. I would love to see a tagged memory architecture go into widespread use, but I don't expect it to ever happen.)
Write this on the blackboard 1000, ~:
THERE ARE NO DATA TYPES IN x86 ASSEMBLY. None. They don't exist. There are data
sizes, but they only exist to tell the CPU how many bytes to fetch, and, more importantly,
they are part of the opcode. The assembler syntax may show them as modifiers, rather than giving them separate mnemonics, but in the actual machine code, they are
part of the instruction. Changing the value after it is assembled would require changing the executable image - either by putting patch tags in for the linker to reset them, or by using self-modifying code at runtime.
It can be done, but it would require either changing the object format to support it, or adding extra code to perform the runtime modification - not too hard in real mode, though it would be difficult to do it safely and consistently, but in the protected modes or long mode, it would require a system call requesting the kernel remove the 'executable, read-only' flags on the page in question, making the modification, setting the flags back, before returning to the user application.
Better, I would say, to do it all in a high-level language - and C doesn't count as one, really - which can abstract the numeric types entirely, either by default or by specific syntax, and leave all this work to the compiler.
~ wrote:
Even when automatic word width selection is a very important programming concept used in x86,
WUT?!?! Which x86 are you talking about? There
is no such thing in x86! This is one of the most absurd statements you have ever made, and frankly, that's astounding by itself!
~ wrote:
it doesn't seem to be integrated anywhere else, not even in the most recent .NET languages, Java, or anywhere else. I will add those types to my compilers because I know that they alone can simplify the whole programming facilities of any language in existence.
Actually, those are now more the exception rather than the rule. Dynamic languages, both older ones like SNOBOL, Icon, Prolog, and almost the entire Lisp family, and newer ones like Python, Ruby, and Io (and newer Lisps such as Clojure), all use flexible numbers of one variety or another - since there is no express typing, the compiler or interpreter is free to select the appropriate size
and representation, and
resize it when something bigger or less precise is needed - often with some sort of
arbitrary-precision data types (AKA BigInt, BigFixed, and BigFloat types) used when there is an overflow. They generally have a 'numeric tower', or a class hierarchy that serves the same purpose, and will generally go from
unnamed max supported system integers -> integer bignums -> rational numbers ->
fixed-point bignums -> max system supported floating-point numbers -> floating-point bignums -> complex numbers
Not all have all of these, but most have at least system longs, bignum ints, and max floats. Some may have a Binary Coded Decimal Fixed-Point type as well, or (as Python does) have libraries to support them. When they do support rationals, fixednums, fixed BCDs, or complex numbers, they will generally either size the components dynamically, or just use bigints (for the numerator and denominator of rationals, or for the underlying integer value for fixnums) or a BCD equivalent (for big BCDs), or with any of the above, for the real imaginary parts of complex numbers.
This doesn't apply only to implicitly-typed languages, either. Haskell, Erlang, Go, and (I think) Rust all have some ability to work in this way, though they aren't the default behavior in any of them and they all have their quirks (for example, while Haskell does require typing for all data, it allows you to define 'typeclasses' for groups of similar types - something almost, but not quite, completely unlike an OOP class hierarchy, but serving basically the same purpose; the Haskell compiler or interpreter applies type inference to determine what the
actual type of a datum is, and use it to hold the value.
Even C# has some ability to do this now, with the 'var' types, though I have no idea whether it does any kind of type resolution either at compile time or run time. I assume so, because that's sort of an important feature of a well-designed OOP language with sensible support for polymorphism (not all do - I am looking at you, C++), but I can't be bothered to check.
Mind you, bignums are usually orders of magnitude slower than the system numbers, so you generally can coerce a value to one size and representation when you need to, to some extent.
Each language has its own way of handling (or not handling) ambiguities and loss of exactitude, but for the most part, they do a decent job of it for most purposes. and can force an inexact representation of an exact value, or vice versa, at the possible cost of precision (e.g, when going from a rational '1/3'
to a floating-point '0.333333..').
In all of these cases, this sort of numeric flexibility depends on two things: a practical separation of the language definition and language implementation, and a conceptual separation of the data from the variables. This means that in all of these languages, the
assumed structure is that variables are references - either typed or untyped, depending - to typed data objects, but they translator (compiler, interpreter, whatev') is free to 'snap pointers' and use the memory of the references to hold suitably small elements
so long as doing so doesn't change the semantics or run-time behavior.
As you can see, this just doesn't fit C well, and doesn't fit assembly language - which,
by definition, approximates a one-to-one correspondence between mnemonics and machine opcodes.
I am done for now, but I have a feeling that you are going to say more foolish things in the near future.