Using -1 with unsigned integers only as missing marker
Using -1 with unsigned integers only as missing marker
I think that it would be much better to use -1 with unsigned int or unsigned long instead of using their signed variants.
The idea is to reserve the value -1 (all bits set to 1) solely to indicate a missing offset, but use unsigned integers for the rest of values as it should be, to avoid dividing the capacity of an offset to its half.
It just takes away 1 single unsigned number to be reserved similarly to NULL, so why isn't it normally implemented like this. I think that I will implement all of my offsets as unsigned integers and I will only reserve the unsigned value of -1 for missing offsets.
Nobody said to use unsigned offsets, all they said was to use -1 (all bits to 1) as the marker for a missing offset.
The idea is to reserve the value -1 (all bits set to 1) solely to indicate a missing offset, but use unsigned integers for the rest of values as it should be, to avoid dividing the capacity of an offset to its half.
It just takes away 1 single unsigned number to be reserved similarly to NULL, so why isn't it normally implemented like this. I think that I will implement all of my offsets as unsigned integers and I will only reserve the unsigned value of -1 for missing offsets.
Nobody said to use unsigned offsets, all they said was to use -1 (all bits to 1) as the marker for a missing offset.
YouTube:
http://youtube.com/@AltComp126
My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
http://youtube.com/@AltComp126
My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
Re: Using -1 with unsigned integers only as missing marker
Hi,
For this reason, a good C programmer won't use -1 for unsigned integers - they'd use (e.g.) UINT_MAX instead to ensure that it's portable (and ensure that everyone doesn't think their code is bad even when portability doesn't matter).
The next problem is that this value doesn't survive promotion. For example, if "bar()" returns an unsigned int and you do "unsigned long foo = bar(); if (foo == ULONG_MAX) {..." then you might or might not get bugs depending on the actual sizes (e.g. if "int" is 32-bit and "long" is 64-bit; then the 32-bit value 0xFFFFFFFF might be converted to the 64-bit value 0x00000000FFFFFFFF and won't match ULONG_MAX; but if "int" and "long" are both 32-bit the code will work). In other words; this can easily become another "implementation defined" portability disaster.
The next problem is that a good interface may return more than just one kind of error (note: admittedly the C standard library is pure trash, doesn't use good interfaces and ends up relying on the horrible "errno" mess for the same purpose). In this case, for extensibility, it'd make sense to use a range of values for "errors of any kind" and another range for "no error"; and the easiest way to do that is to use signed integer where negative values indicate errors. For example, something like -1 = bad parameters, -2 = not enough memory, -3 = permission denied, -4 = file not found, etc.
The final problem is that nobody has a clue what this is actually for. Something that's ideal for one thing might be completely inappropriate for something else. Sometimes you might want errors to cause immediate termination, or cause a signal, or throw an exception; so that the programmer can assume that errors are never returned from functions. Sometimes negative values may be valid and can't be used to indicate errors. Sometimes you might want to return "status" even when there was no error (e.g. "succeeded without problem" and "succeeded, but an error occurred and was corrected"). It's a little bit like saying "I think people should smile and laugh more" - it sounds nice if you don't spend any time thinking about the corner-cases (e.g. funerals).
Cheers,
Brendan
C is designed to be portable to different computers; where some computers use "two's complement" for signed numbers (where -1 == 0xFFFFFFFF), some computers use "sign and magnitude" (where -1 == 0x80000001), and some computers use something else (where -1 == something else). More correctly, something like "unsigned int x = -1;" relies implementation defined behaviour, where implementation defined behaviour is only slightly better than undefined behaviour.~ wrote:I think that it would be much better to use -1 with unsigned int or unsigned long instead of using their signed variants.
The idea is to reserve the value -1 (all bits set to 1) solely to indicate a missing offset, but use unsigned integers for the rest of values as it should be, to avoid dividing the capacity of an offset to its half.
It just takes away 1 single unsigned number to be reserved similarly to NULL, so why isn't it normally implemented like this. I think that I will implement all of my offsets as unsigned integers and I will only reserve the unsigned value of -1 for missing offsets.
Nobody said to use unsigned offsets, all they said was to use -1 (all bits to 1) as the marker for a missing offset.
For this reason, a good C programmer won't use -1 for unsigned integers - they'd use (e.g.) UINT_MAX instead to ensure that it's portable (and ensure that everyone doesn't think their code is bad even when portability doesn't matter).
The next problem is that this value doesn't survive promotion. For example, if "bar()" returns an unsigned int and you do "unsigned long foo = bar(); if (foo == ULONG_MAX) {..." then you might or might not get bugs depending on the actual sizes (e.g. if "int" is 32-bit and "long" is 64-bit; then the 32-bit value 0xFFFFFFFF might be converted to the 64-bit value 0x00000000FFFFFFFF and won't match ULONG_MAX; but if "int" and "long" are both 32-bit the code will work). In other words; this can easily become another "implementation defined" portability disaster.
The next problem is that a good interface may return more than just one kind of error (note: admittedly the C standard library is pure trash, doesn't use good interfaces and ends up relying on the horrible "errno" mess for the same purpose). In this case, for extensibility, it'd make sense to use a range of values for "errors of any kind" and another range for "no error"; and the easiest way to do that is to use signed integer where negative values indicate errors. For example, something like -1 = bad parameters, -2 = not enough memory, -3 = permission denied, -4 = file not found, etc.
The final problem is that nobody has a clue what this is actually for. Something that's ideal for one thing might be completely inappropriate for something else. Sometimes you might want errors to cause immediate termination, or cause a signal, or throw an exception; so that the programmer can assume that errors are never returned from functions. Sometimes negative values may be valid and can't be used to indicate errors. Sometimes you might want to return "status" even when there was no error (e.g. "succeeded without problem" and "succeeded, but an error occurred and was corrected"). It's a little bit like saying "I think people should smile and laugh more" - it sounds nice if you don't spend any time thinking about the corner-cases (e.g. funerals).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 232
- Joined: Mon Jul 25, 2016 6:54 pm
- Location: Adelaide, Australia
Re: Using -1 with unsigned integers only as missing marker
What if you had something at offset of MAX_INT? Cause valid ranges often go 0 to MAX_<something>~ wrote:I think that it would be much better to use -1 with unsigned int or unsigned long instead of using their signed variants.
The idea is to reserve the value -1 (all bits set to 1) solely to indicate a missing offset, but use unsigned integers for the rest of values as it should be, to avoid dividing the capacity of an offset to its half.
It just takes away 1 single unsigned number to be reserved similarly to NULL, so why isn't it normally implemented like this. I think that I will implement all of my offsets as unsigned integers and I will only reserve the unsigned value of -1 for missing offsets.
Nobody said to use unsigned offsets, all they said was to use -1 (all bits to 1) as the marker for a missing offset.
Re: Using -1 with unsigned integers only as missing marker
(unsigned int)-1 equals UINT_MAX. See the "Representations of types" and "Conversions" sections of the C standard.Brendan wrote:More correctly, something like "unsigned int x = -1;" relies implementation defined behaviour, where implementation defined behaviour is only slightly better than undefined behaviour.
For this reason, a good C programmer won't use -1 for unsigned integers - they'd use (e.g.) UINT_MAX instead to ensure that it's portable (and ensure that everyone doesn't think their code is bad even when portability doesn't matter).
Re: Using -1 with unsigned integers only as missing marker
@alexfru is correct -- "unsigned int x = -1" is a portable way to get all bits set in x. "unsigned int x = UINT_MAX" is, of course, more expressive.
As for what ~ originally wrote...
As for what ~ originally wrote...
- C++ does that (e.g. "std::string::npos" is defined as "static_cast<size_t>(-1)").
- If you're using "signed int" as an index, you're doing it wrong already -- as e.g. strlen() returns the unsigned type size_t, and you (hopefully) start getting signed / unsigned comparison warnings from your compiler, which you will (hopefully) not ignore.
- An offset is a different thing from an index, and can be negative.
Every good solution is obvious once you've found it.
Re: Using -1 with unsigned integers only as missing marker
Doesn't the compiler generate the right sequence for -1 and the other values whenever it's specified anyway, regardless of using it as an unsigned bit pattern or as an actual signed integer?
YouTube:
http://youtube.com/@AltComp126
My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
http://youtube.com/@AltComp126
My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
Re: Using -1 with unsigned integers only as missing marker
Errr... I beg your pardon?
Could you re-phrase that question? Because I have no idea what you just asked.
Could you re-phrase that question? Because I have no idea what you just asked.
Every good solution is obvious once you've found it.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Using -1 with unsigned integers only as missing marker
You don't actually understand how two's complement works, do you? Not that it really matters; as Brendan stated, the C standard is written in a way that tries to avoid making assumptions about the underlying representation of integers, signed or unsigned, other than that there is one.~ wrote:Doesn't the compiler generate the right sequence for -1 and the other values whenever it's specified anyway, regardless of using it as an unsigned bit pattern or as an actual signed integer?
In a two's complement system - which is nearly universal today for system-word addressable integers, though exceptions exist (and floating-point numbers are entirely different, as are any bignums which a language or library might support) - there is no 'unsigned bit pattern' that the hardware can differentiate from a 'signed bit pattern'. Indeed, this is one of the main advantages of two's complement over, say, sign and magnitude, or one's complement: the exact same addition operation suffices for both signed and unsigned addition and subtraction.
In 2's complement, you negate a signed value by taking the complement of the value (the bitwise negation, that is to say, toggling the bits - ones goes to zeros and vice versa; this can be accomplished simply by XORing the value to a mask of all one's, though many ISAs including x86 have a built-in NOT instruction) and then adding one. You can also get the negated integer value by subtracting one from it's current value, then taking the complement of the result - the effect is the same.
For example, in a 4-bit signed value, -2 = (~0010) + 1 = 1101 + 1 = 1110.
This means that you don't need a separate subtraction system in hardware; you simply perform an arithmetic negation on the value being subtracted, and then add the two values normally.
It also mean that addition is the same for signed or unsigned values; whether the value is signed or not is entirely in the programmer's (or compiler's) interpretation of the value. The hardware doesn't have to even know which it is.
It also means that all the NEG instruction in the x86 has to do is simply apply the NOT and INC hardware in sequence - the circuit (or more likely, microcode/firmware) is made just but directing the output of the first operation to the input of the second.
Two's complement also has the advantage over one's complement that zero has the same representation for both signed and unsigned values. In one's complement (in negation is just the complement), a 4-bit zero is either 0000 or 1111; in two's complement, negating 0000 becomes 1111 + 1 = 0000 (due to wrapping overflow).
Last edited by Schol-R-LEA on Thu Mar 08, 2018 11:27 am, edited 1 time in total.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Using -1 with unsigned integers only as missing marker
Sorry for the sequential posts; I had to leave for an appointment earlier, before I had a chance to finish what i mean to say, and since I am uncertain if ~ read the previous post or not (the timing is really close to when i last edited it, meaning he may have seen the first posted version but not the fixes) thought it would be better to bite the bullet and post again.
Anyway, the rest of the story here is that, as Brendan said earlier, it isn't entirely clear what you are trying to accomplish in the first place. What is the use case for this? What is the context of the question? What 'missing offsets' are you concerned about?
I mean, honestly, you didn't ask a question, or even make a coherent statement really, you just sort of started into a something as if you were halfway through a conversation. What do you expect us to make of that? (A paper hat, perhaps... I may be dating myself with that reference, but whatev'.)
In the absence of other information I would say, bundle the value into a struct with a Boolean flag to indicate a valid/invalid value,
but I have no idea if that's even an applicable solution. You might use a struct with a bit field instead, if you know how the compiler will lay the values out (or don't need to know), but that can lead to a number of unexpected compiler dependencies.
Anyway, the rest of the story here is that, as Brendan said earlier, it isn't entirely clear what you are trying to accomplish in the first place. What is the use case for this? What is the context of the question? What 'missing offsets' are you concerned about?
I mean, honestly, you didn't ask a question, or even make a coherent statement really, you just sort of started into a something as if you were halfway through a conversation. What do you expect us to make of that? (A paper hat, perhaps... I may be dating myself with that reference, but whatev'.)
In the absence of other information I would say, bundle the value into a struct with a Boolean flag to indicate a valid/invalid value,
Code: Select all
typedef struct GOFFSET {
bool valid;
unsigned int offset; // note that the exact size may not matter here
} guarded_offset;
Code: Select all
typedef struct BF_GOFFSET {
bool valid: 1;
uint32_t offset: 31; // ... but it definitely does here
} bitfield_guarded_offset;
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Re: Using -1 with unsigned integers only as missing marker
The use case is an offset as big as possible, on disk, memory, etc...
I wouldn't like to have, say, 31 bits available for a file offset just to reserve -1 (specifically unsigned 0xFFFFFFFF or 0xFFFFFFFFFFFFFFFF, depending on the register width) as an "offset not found". I could just use an unsigned integer, use the full 32 or 64-bit range and just reserve -1 as a simple bit pattern generated by the compiler to signal "offset not found". 0 is a valid offset, -1 would be like NULL unless we return more than a value at once from a function to check an error/success flag (probably cleaner than reserving special values, like the CPU or BIOS do on error (CF, etc...) and more portable down to the machine level?)
I wouldn't like to have, say, 31 bits available for a file offset just to reserve -1 (specifically unsigned 0xFFFFFFFF or 0xFFFFFFFFFFFFFFFF, depending on the register width) as an "offset not found". I could just use an unsigned integer, use the full 32 or 64-bit range and just reserve -1 as a simple bit pattern generated by the compiler to signal "offset not found". 0 is a valid offset, -1 would be like NULL unless we return more than a value at once from a function to check an error/success flag (probably cleaner than reserving special values, like the CPU or BIOS do on error (CF, etc...) and more portable down to the machine level?)
YouTube:
http://youtube.com/@AltComp126
My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
http://youtube.com/@AltComp126
My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
Re: Using -1 with unsigned integers only as missing marker
This is already being done, using unsigned as index (not offset!) and -1 ("all bits set") as "fault" indicator.
Have you read what was written by others?
No. As usual, you just keep harping on.
Have you read what was written by others?
No. As usual, you just keep harping on.
Every good solution is obvious once you've found it.
Re: Using -1 with unsigned integers only as missing marker
Wait, this put a 2GB limit on file size. Consider use full 64-bit.~ wrote:31 bits available for a file offset
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re: Using -1 with unsigned integers only as missing marker
I'd be tempted to use
for unsigned /* long long */ types.
Code: Select all
INVALID_WHATEVER=~0U/*LL*/
Re: Using -1 with unsigned integers only as missing marker
Pype, consider the StackOverflow link I posted earlier. ~0 works for two's complement, but not for others...
Every good solution is obvious once you've found it.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Using -1 with unsigned integers only as missing marker
OK, first off, as Solar said, it seems you probably mean an 'index' (that is to say, an absolute location within the storage unit, whether that unit is primary memory or a disk partition) rather than an offset (which is relative to some point in the storage, such as the start of a file, the 200th element of the array, what have you). While one can view an index as a special case of an offset¹, the use of arbitrary offset² has very different implications about how you can use them effectively.~ wrote:The use case is an offset as big as possible, on disk, memory, etc...
Second, I gather you are trying to have a common 'offset type' for all kinds of storage units - that is to say, you would use a variable a type, let's call it offset_t, for both memory and disk structure.
Uhm... I can see this working if you were using a language which supported either inheritance or prototyping³, and had an abstract type/class which the specific cases could inherit a type interface from, but trying to use one index type to rule them all is going to bite you down the road, hard. If nothing else, you would be committing yourself to a specific size for that index on a given system (even if you defined the size itself in some standards document as being system-specific , a la the basic integer types in C), meaning that you will either waste memory where it is too large, or have to work around the offsets when they are too small (or more likely still, both).
I know that this sort of in-band error checking and exception flagging is still common - all too common, if you ask me - but it is one of those things that only sounds like a good idea but never really is. It was, unfortunately, enshrined in systems such as Unix back in the 1970s⁴, and from there carried into the C standard, but IMAO that's just one more thing Ken and Dennis are to blame for. I strongly advise not using a bare index or offset at all, and use a separate flag bundled with it in a struct (or use some other out-of-band mechanism if you can) if you can - and since this is your own OS design, and don't intend to adhere to POSIX or some other external standard, there is no reason I can see that you can't.~ wrote:I wouldn't like to have, say, 31 bits available for a file offset just to reserve -1 (specifically unsigned 0xFFFFFFFF or 0xFFFFFFFFFFFFFFFF, depending on the register width) as an "offset not found". I could just use an unsigned integer, use the full 32 or 64-bit range and just reserve -1 as a simple bit pattern generated by the compiler to signal "offset not found". 0 is a valid offset, -1 would be like NULL unless we return more than a value at once from a function to check an error/success flag
WTF is this supposed to even mean? It's a contradiction in terms!~ wrote:portable down to the machine level?)
Footnotes:
1. One that is 'relative' to the start of the whole storage unit.
2. By which I mean an offset that isn't an index, which doesn't necessarily imply 'any possible index', for you rules lawyers out there.
3. With or without polymorphism, though that could be a value too in this.
4. At the time, one could - and many did - argue that this was more efficient due to the limited memory available. However, even at the time this was (IMAO) false fire, as it required a lot more additional code to handle it correctly than you saved. this was a known issue even before Multics was designed, having cropped up in both CTSS and OS/360. In both Unix and MS-DOS (and indeed most contemporary OSes), this was resolved by simply ignoring the problems it caused, a decision we are still paying for today.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.