OSDev.org

Posted: **Thu Mar 12, 2015 1:05 pm**

Hi,

alexfru wrote:
Brendan wrote: a) C is bad because it has too many rules that are not enforceable by the compiler
True. IOW, the compiler can help only so much with broken code.

Brendan wrote: b) GCC is bad because it doesn't detect or report "invalid input" where it could (C's rules that are enforceable by the compiler)
Like what? I know that several years ago it didn't warn about indexing an array with an invalid (too large) index in code something like "int a[10]; a[11] = 1;". I also know that it doesn't always spot things like "a[ i] = i++;" (a bit more complex expression will cause it miss the problem). What else?

Like (where possible) detecting strict aliasing bugs, and making sure the source code actually checks the value return by "malloc()", and array indexes, and various signed integer shifts, and whether the input/output parameters for a "extern void foo(void)" declaration actually matches the definition, and probably hundreds of other corner cases.

Think of it like this. For C there's about 50 different static analysers. For just one of them (from the Coverty wikipedia page) we get things like "the tool was used to examine over 150 open source applications for bugs; 6000 bugs found by the scan were fixed, across 53 projects". Of those maybe about 15% are bugs that the compiler could have detected but failed to mention (basically everything where a static analyser can find the bug with no "false negatives"), and the remaining bugs are problems in the C standard.

alexfru wrote:
Brendan wrote: c) the combination of bad language and bad compiler unnecessarily increases the number of bugs and security vulnerabilities users are exposed to for no reason whatsoever
For a long time I thought it was mainly a problem of teaching/learning the language properly, of availability of good books and articles. The language is clearly much less intuitive than others, less well defined than assembly language / CPU architecture and makes math even more inhumane.

I used to think it was mostly a problem with the programmers too (and in some cases it is). However; people with far more experience using C than I'll (hopefully) ever have are still creating the same bugs as everyone else. Mostly, it's easy to write code in C that "seems to work" that doesn't comply with the language's specification and has subtle bugs that might not be noticed for 10+ years even if/when hundreds of C programmers look at the code.

alexfru wrote:
Brendan wrote: d) this is probably the single largest "root cause" of security vulnerabilities
I'm not sure if it's the largest. You have to be pretty much in a paranoid mode when writing or fixing security-sensitive code. It's not a typical mindset/mode for most software developers. We also tend to overcomplicate things to the point at which it becomes hard to not miss an important edge case as such and not get overwhelmed by the amount of code we're dealing with. From my experience with Windows code I can tell that missing/insufficient/incorrect checks/validation were around the top issues implementation-wise. Even decent C/C++ coders from time to time will forget to check this or that.

It's much worse than that. For a simple test, see if you can write a 100% correct/valid version of this code:

Code: Select all

    int saturatingAdd(int a, int b) {
        if(a + b > INT_MAX) return INT_MAX;
        if(a + b < INT_MIN) return INT_MIN;
        return a + b;
    }

I'd be willing to bet that over half of the experienced/professional C programmers will end up with subtle bugs; especially if they're writing the code as part of their normal work and don't know it's a test/challenge.

alexfru wrote:
Brendan wrote: e) with a better language and better compiler the majority of security vulnerabilities could've been avoided without any significant disadvantages
You'd have to trade some performance to make C well/better defined.

Maybe, yes. It depends too much on what you change and how. For a simple example, if the C standard said signed integer overflow causes wrapping (and isn't undefined) it'd probably be faster on most CPUs.

Cheers,

Brendan

Posted: **Thu Mar 12, 2015 1:15 pm**

Assuming integers do not overflow is actually pretty important for loop optimizations, so barring some other changes to the type system, defining it would actually make it slower.

Posted: **Thu Mar 12, 2015 2:01 pm**

Hi,

Rusky wrote:Assuming integers do not overflow is actually pretty important for loop optimizations, so barring some other changes to the type system, defining it would actually make it slower.

For C, I doubt it. More likely is that those loop optimisations rely on the assumption that array indices don't overflow.

Cheers,

Brendan

Posted: **Thu Mar 12, 2015 7:08 pm**

That's what I said- several important loop optimizations rely on integers not overflowing (array indices and otherwise).

Posted: **Thu Mar 12, 2015 7:50 pm**

Hi,

Rusky wrote:That's what I said- several important loop optimizations rely on integers not overflowing (array indices and otherwise).

Yes. After that I replied "For C, I doubt that".

Essentially I think it's unfounded nonsense; given that for C most loops either:

Don't use signed integers in the first place
Do use signed integers but it can be proven that the integer doesn't overflow anyway
Do use signed integers, but these hypothetical optimisations still work
Do use signed integers, but the integer/s are used for array indexes and therefore the compiler can assume they don't overflow regardless

Can you provide an example that's common enough to matter?

Cheers,

Brendan

Posted: **Thu Mar 12, 2015 8:41 pm**

An awful lot of code does use signed int for the induction variable, and I would say any code where the assumption of non-overflow is important is also code that you can't prove much about the values (because they come in as function arguments or from user input).

As far as examples go, I have to admit I don't know enough to say. It does seem important enough to the GCC maintainers that they added -fno-strict-overflow in addition to -fwrapv, specifically to inhibit the optimizer in fewer situations: http://www.airs.com/blog/archives/120

Posted: **Thu Mar 12, 2015 11:33 pm**

Hi,

Rusky wrote:As far as examples go, I have to admit I don't know enough to say. It does seem important enough to the GCC maintainers that they added -fno-strict-overflow in addition to -fwrapv, specifically to inhibit the optimizer in fewer situations: http://www.airs.com/blog/archives/120

Originally, I think GCC just assumed that signed overflow never happened and optimised accordingly. Then everyone in the world complained (examples: 1, 2, 3) because it broke a massive amount of existing "not strictly correct" code (where the programmer just assumed wrapping behaviour because that's what CPUs do and what other compilers give you) and caused severe security vulnerabilities everywhere. They added "-fno-strict-overflow" and "-fwrapv" so people can make the compiler do what it always should have done.

Cheers,

Brendan

Posted: **Fri Mar 13, 2015 12:20 am**

Well, they originally added -fwrapv for that purpose, which specifies wrapping behavior. However (as explained in the link in my last post) they later added -fno-strict-overflow, which removes the UB without specifying the actual resulting value, for performance reasons. There are a couple micro-examples in the article I linked that I imagine come up occasionally.

Posted: **Fri Mar 13, 2015 1:51 am**

Brendan wrote:
alexfru wrote:
Brendan wrote: c) the combination of bad language and bad compiler unnecessarily increases the number of bugs and security vulnerabilities users are exposed to for no reason whatsoever
For a long time I thought it was mainly a problem of teaching/learning the language properly, of availability of good books and articles. The language is clearly much less intuitive than others, less well defined than assembly language / CPU architecture and makes math even more inhumane.
I used to think it was mostly a problem with the programmers too (and in some cases it is). However; people with far more experience using C than I'll (hopefully) ever have are still creating the same bugs as everyone else. Mostly, it's easy to write code in C that "seems to work" that doesn't comply with the language's specification and has subtle bugs that might not be noticed for 10+ years even if/when hundreds of C programmers look at the code.

Like I said, distractions, overload and fatigue can make even good programmers write buggy code. It's understandable.

It's also understandable that if you happen to be about the only C expert working on the project, others won't be able to point at your bugs because they aren't nearly as qualified as you are. And in big projects you wouldn't expect many people to read and meticulously review the parts not directly related to theirs.

If C was a bit more friendly, there would be more people capable spotting C-specific bugs and there would be fewer of such bugs in the first place. I give you that.

But we don't have such a friendly dialect of C in existence or common use. So, if it's C, you're stuck with its problems and you can only help others learning it by pointing to the right resources, by reviewing their code with them and showing how you write your code, so they have good examples to learn from.

Brendan wrote:
alexfru wrote:
Brendan wrote: d) this is probably the single largest "root cause" of security vulnerabilities
I'm not sure if it's the largest. You have to be pretty much in a paranoid mode when writing or fixing security-sensitive code. It's not a typical mindset/mode for most software developers. We also tend to overcomplicate things to the point at which it becomes hard to not miss an important edge case as such and not get overwhelmed by the amount of code we're dealing with. From my experience with Windows code I can tell that missing/insufficient/incorrect checks/validation were around the top issues implementation-wise. Even decent C/C++ coders from time to time will forget to check this or that.
It's much worse than that. For a simple test, see if you can write a 100% correct/valid version of this code:
Code: Select all
    int saturatingAdd(int a, int b) {
        if(a + b > INT_MAX) return INT_MAX;
        if(a + b < INT_MIN) return INT_MIN;
        return a + b;
    }
I'd be willing to bet that over half of the experienced/professional C programmers will end up with subtle bugs; especially if they're writing the code as part of their normal work and don't know it's a test/challenge.

That's probably a bad example. I wouldn't call anyone an experienced/professional C programmer if they wrote the above piece of code. Because it shows two basic problems and doesn't even touch trickier and more interesting things.

First, it shows that the programmer either isn't testing their code or isn't using compiler warnings. The condition expressions in the if statements are all false. I don't know why my 4.8.2 isn't showing any warnings (-O3 -Wall -Wextra -pedantic -std=c99), but Open Watcom C/C++ 1.9 does. I'd expect clang or Microsoft's C++ compiler to be able to issue a warning here at the appropriate warning level. This probably speaks in favor of your statement that gcc is a bad compiler.

Second, it would really be strange to not know that int+int yields int just as long+long yields long and double+double yields double and so on. The exception is when you get to deal with types smaller than int, e.g. short and char. This is where things become interesting and where even experienced C programmers can write nonsense. And the below variant for signed chars could have a chance to work:

Code: Select all

    signed char saturatingAdd(signed char a, signed char b) {
        if(a + b > SCHAR_MAX) return SCHAR_MAX;
        if(a + b < SCHAR_MIN) return SCHAR_MIN;
        return a + b;
    }

Typical and less immediately obvious bugs are like this:

Code: Select all

    uint32_t mul32_16x16(uint16_t a, uint16_t b) {
        return a * b;
    }

And like this:

Code: Select all

    uint32_t ReadAsLittleEndian32(uint8_t* p) {
        return p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24);
    }

While everything looks fine at first glance, most likely there's a UB hiding in plain sight.

Your saturating add for signed ints becomes interesting when we try to avoid overflow and any other UB and such. Extra conditions, moving operands between left-hand and right-hand sides, fun. Been there, done that.

But what people often do is something like

Code: Select all

    size_t bufsz, item1sz, item2sz, item3sz;

    // get the sizes into the above variables

    if (item1sz + item2sz + item3sz > bufsz)
    {
        // error handling
    }

    // copy all items into the buffer

They fail to think well about individual overflows in item1sz + item2sz and in sum_of_item1sz_and_item2sz + item3sz.

This becomes a larger mess when signed types get involved in size/count/index calculations.

Posted: **Fri Mar 13, 2015 8:05 am**

Hi,

alexfru wrote:
Brendan wrote:It's much worse than that. For a simple test, see if you can write a 100% correct/valid version of this code:
Code: Select all
    int saturatingAdd(int a, int b) {
        if(a + b > INT_MAX) return INT_MAX;
        if(a + b < INT_MIN) return INT_MIN;
        return a + b;
    }
I'd be willing to bet that over half of the experienced/professional C programmers will end up with subtle bugs; especially if they're writing the code as part of their normal work and don't know it's a test/challenge.
That's probably a bad example. I wouldn't call anyone an experienced/professional C programmer if they wrote the above piece of code. Because it shows two basic problems and doesn't even touch trickier and more interesting things.

I know it's buggy - the question is whether people can be expected to write a correct version. So far, nobody has had the courage to attempt this extremely simple piece of code.

For what it's worth, I don't want to attempt it either. For this case I'd be tempted to use inline assembly (where there is no undefined behaviour and where you can access the carry/overflow flags, and where it's very easy to be confident the code is correct).

alexfru wrote:And the below variant for signed chars could have a chance to work:
Code: Select all
    signed char saturatingAdd(signed char a, signed char b) {
        if(a + b > SCHAR_MAX) return SCHAR_MAX;
        if(a + b < SCHAR_MIN) return SCHAR_MIN;
        return a + b;
    }

You've tried to avoid problems by using signed char instead of int; but even in that case are you sure your code is correct in all cases? Hint: Imagine a computer (maybe a DSP) where CHAR_BITS == 32, and where sizeof(int) == sizeof(long) == 1.

alexfru wrote:While everything looks fine at first glance, most likely there's a UB hiding in plain sight.

Exactly. People (experienced C programmers) can't be expected to do trivial things in C correctly because the language itself makes it virtually impossible; and the only reason anything larger/more complex works is because everyone relies on "seems to work despite being technically wrong" (including relying on both undefined behaviour and implementation defined behaviour).

Cheers,

Brendan

Posted: **Fri Mar 13, 2015 10:05 am**

Just for the sake of trying (not tested at all):

Code: Select all

int saturateAdd(int a, int b)
{
    if ((b > 0) && (INT_MAX - b < a)) return INT_MAX;
    if ((b < 0) && (INT_MIN - b > a)) return INT_MIN;
    return a+b;
}

Posted: **Fri Mar 13, 2015 10:46 am**

Hi,

Combuster wrote:Just for the sake of trying (not tested at all):
Code: Select all
int saturateAdd(int a, int b)
{
    if ((b > 0) && (INT_MAX - b < a)) return INT_MAX;
    if ((b < 0) && (INT_MIN - b > a)) return INT_MIN;
    return a+b;
}

Is far as I can tell, that's correct (correct result, and no undefined or implementation defined behaviour).

For my next challenge, try saturating subtraction, saturating multiplication and saturating division.

Cheers,

Brendan

Posted: **Fri Mar 13, 2015 11:03 am**

Brendan wrote:
alexfru wrote: That's probably a bad example. I wouldn't call anyone an experienced/professional C programmer if they wrote the above piece of code. Because it shows two basic problems and doesn't even touch trickier and more interesting things.
I know it's buggy - the question is whether people can be expected to write a correct version. So far, nobody has had the courage to attempt this extremely simple piece of code.

I left it as an exercise for the others.

Brendan wrote:
alexfru wrote:And the below variant for signed chars could have a chance to work:
Code: Select all
    signed char saturatingAdd(signed char a, signed char b) {
        if(a + b > SCHAR_MAX) return SCHAR_MAX;
        if(a + b < SCHAR_MIN) return SCHAR_MIN;
        return a + b;
    }
You've tried to avoid problems by using signed char instead of int; but even in that case are you sure your code is correct in all cases? Hint: Imagine a computer (maybe a DSP) where CHAR_BITS == 32, and where sizeof(int) == sizeof(long) == 1.

Nope, I haven't tried avoiding that. I just said [it] could have a chance to work, implying it would be a more conventional system than mentioned in your hint. Perhaps, I should've been more clear.

Brendan wrote:
alexfru wrote:While everything looks fine at first glance, most likely there's a UB hiding in plain sight.
Exactly. People (experienced C programmers) can't be expected to do trivial things in C correctly because the language itself makes it virtually impossible; and the only reason anything larger/more complex works is because everyone relies on "seems to work despite being technically wrong" (including relying on both undefined behaviour and implementation defined behaviour).

You have a point and it's a bit extreme.

Posted: **Fri Mar 13, 2015 1:25 pm**

Saturation is hard to get right, and harder to get right portably. Its a great example of when C's low level is hard on even experienced practitioners.

I think it also shows how C compilers are slowly improving. Newer compilers - LLVM and GCC - are aligning builtins for lots of things in this vein (just not saturation yet).

Right now, there are branchless c recipes using only bitwise ops that you can find the web.

Saturated arithmetic is also available MMX and SSE intrinsics.

On the Mill all appropriate ops come in four flavours: modulo, excepting, saturating and widening. This shows our DSP roots.

When the time comes I will push for LLVM and GCC builtins for saturating arithmetic, in line with the recent alignment on overflow: http://clang.llvm.org/docs/LanguageExte ... c-builtins

Posted: **Fri Mar 13, 2015 1:38 pm**

willedwards wrote: When the time comes I will push for LLVM and GCC builtins for saturating arithmetic, in line with the recent alignment on overflow: http://clang.llvm.org/docs/LanguageExte ... c-builtins

What about the same for types like uint32_t?

OSDev.org

Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?

Re: Secure? How?