Page 1 of 2

The Random Number Generator page needs reworking

Posted: Sat May 22, 2021 3:18 am
by nullplan
Hi all,

I think the Random Number Generator page is due for an overhaul. The information there is not bad per se, it's just incomplete and misleading. I think the main division of the page into "true" and "pseudo" RNGs is already misguided, and I think the article maybe ought to focus on entropy and CSPRNGs. I have some ideas how to start, but would end up throwing away most of the current article, and I don't know if that would be a welcome change. Any ideas how to get started?

Another thing that ought to be addressed is the prominent placement of the RDRAND instruction. That thing has the potential of seeding entropy buffers, sure, but I would not use it as sole source of random numbers. Because it is a black box. Does the CPU/mainboard contain an avalanche diode or something, or does RDRAND merely give the output of some NSA approved PRNG? Who knows, you can't check it.

Maybe I ought to draft it in my user namespace first... I'll get on it. Has anyone else tried to do a similar thing already?

Re: The Random Number Generator page needs reworking

Posted: Sat May 22, 2021 4:00 am
by nullplan
Update: Draft created: https://wiki.osdev.org/User:Nullplan/Ra ... _Generator. Only an outline for now.

Re: The Random Number Generator page needs reworking

Posted: Sat May 22, 2021 9:58 am
by kzinti
Looks good to me... Keep at it :)

Re: The Random Number Generator page needs reworking

Posted: Sat May 22, 2021 8:38 pm
by bzt
nullplan wrote:Another thing that ought to be addressed is the prominent placement of the RDRAND instruction. That thing has the potential of seeding entropy buffers, sure, but I would not use it as sole source of random numbers. Because it is a black box. Does the CPU/mainboard contain an avalanche diode or something, or does RDRAND merely give the output of some NSA approved PRNG? Who knows, you can't check it.
Actually, that's the purpose of a random generator... It has to be a black box without any recognizable algorithm. But I agree it's only good for setting the seed though.
nullplan wrote:Update: Draft created: https://wiki.osdev.org/User:Nullplan/Ra ... _Generator. Only an outline for now.
Otherwise your page looks good to me, but you must mention hardware random sources. It would also worth adding a section on how to add real entropy (for example by changing the seed on every interrupt, which includes some real-life event entropy, like the user's typing speed, how often the mouse is moved, when network packets arrives etc. These are non-algorithmic changes to the seed, impossible to calculate and backtrack.)
wiki wrote:Explain cryptographic security, and simple idea for SHA-based PRNG.
As for using sha on the seed, take a look at MBedTLS' entropy.c as example. But I'd like to point out this has nothing to do with cryptographically sound random, that requires real-life event entropy. (For example, have you ever generated a key with GPG? It's not using libc and srand at all, instead you have to type for quite a while because it's gaining entropy from the elapsed time between your keystrokes.)

Cheers,
bzt

Re: The Random Number Generator page needs reworking

Posted: Sun May 23, 2021 2:09 am
by nullplan
bzt wrote:Actually, that's the purpose of a random generator... It has to be a black box without any recognizable algorithm.
Only in use. In design, the random generator has to preclude the sequence of numbers being predicted. It can only do that when its design and its inner workings are known. For the PRNG in Linux, you can know exactly what goes into the entropy pool, and how it is mixed in, and see that there is no predetermined sequence of numbers being generated. For the RDRAND instruction, there are no such assurances. Only Intel's word that they are definitely not trying to undermine your security, honest!
bzt wrote:But I'd like to point out this has nothing to do with cryptographically sound random, that requires real-life event entropy.
That would be the section on high-entropy generators I'm preparing. The idea is that if the entropy pool has some entropy, and some message has entropy, then SHA-1(entropy pool + message) has the same amount of entropy as both combined, while compressing down to a single SHA-1 block size. You can also use SHA-2 or SHA-3, or even hashing algorithms not called SHA. In any case, distribution of entropy and compression are properties of hash algorithms.

You then have to jealously guard the entropy pool, so nobody gets a look at it. When asked to generate random numbers based on the pool, you take the pool contents, calculate a hash based on them, and return the hash. However, that would lead to a repeating pattern in the hash block size, so before you return it, you must also mix the output back into the entropy pool again. That is basically the entire design. In production, the algorithm to mix the entropy into the pool does not have to be secure, but the point of the article should be to start with simplicity and move on to speed afterwards.

The idea of cryptographic security of the RNG is also about the entropy pool: The state of the random generator must not be knowable for anyone observing the outputs before or after another process has used it for some secure purpose. That is also a failing of the low-entropy generators, that you can identify their internal state by observing the outputs. For an RNG that is shared across the entire system, this is even more important. If the two of us were sharing a computer, it would not help you to generate a random key if right before that, I figured out the state of the RNG and know exactly what random numbers you are going to get.

Most of my knowledge about this comes from a talk I forgot the title of, but the subtitle was "How I learned to stop worrying and love the urandom", which was held at some Chaos Communication Conference if memory serves, and it made so many things so much clearer for me. For instance why the blocking behavior of /dev/random on Linux is nonsense, except maybe at boot when not many random events have happened yet. But gauging the entropy of arbitrary events is really hard in general, so guessing how many bits you may have added now is an inexact science at best. Hard disk command timing is random due to the behavior of air molecules on the surface of the disk, but an SSD has the exact same interface and no randomness at all. And similar for all other sources.7

And then there's adversarial entropy, which I was looking for the paper DJB wrote about it (or was it just a blog post? I forget). Essentially, it is possible an adversarial entropy device would try to get the state of the entropy pool and then deliver "random" numbers to the system such that the entropy pool is forced to become less random. Which sounds far fetched, but really, all you need is a rootkit and a local network device willing to time packets just right. And suddenly the attack becomes scarily plausible and hard to defend against. And the rootkit doesn't have to write into kernel space, only read, and Meltdown just recently happened... oh boy. And if it isn't Meltdown, it's Rowhammer or something like that.

So yeah, I have some writing to do.

Re: The Random Number Generator page needs reworking

Posted: Sun Jun 20, 2021 7:52 am
by Korona
Instead of inventing your own SHA-based CSPRNG, it's probably better to link to something like Fortuna (methods such as simply taking the SHA hash of your entropy + a counter, or iteratively taking SHA, etc. are not robust against adversarial input). Fortuna (and other CSPRNGs from the literature) remains secure even in the presence of adversarial input, as long as there is some input that the adversary cannot predict. And they are easy to implement if you have the SHA + AES primitives (~ 160 lines of C code in our implementation, excluding the primitives).

Re: The Random Number Generator page needs reworking

Posted: Sun Jun 20, 2021 11:58 am
by nullplan
Well, the purpose of this website is to provide a proper start to things. Obviously I cannot provide a full background on all aspects of making cryptographically secure code, be it an RNG or something else. The important thing is to get the main ideas across: Entropy, and how to use it to make random numbers. But yes, I'll add a "further reading" section.

Re: The Random Number Generator page needs reworking

Posted: Sat Jul 03, 2021 3:51 am
by nullplan
Alright, I've found the time to add what was missing from the page. I believe the page to be in a releasable state now, and don't really have anything more to add to it. Any comments before I try to figure out how to move a page?

Re: The Random Number Generator page needs reworking

Posted: Sat Jul 03, 2021 5:10 am
by Korona
Again, I think the SHA-based code does more harm than good because it is not a state-of-the-art CSPRNG (for example, it is vulnerable to the attack mentioned in DJB's blog post). Maybe it helps to make the code more abstract and/or only have a textual description. Or at least state something like: "**Note that this code is still vulnerable to various attacks when applied as is, for further reading we refer to CSPRNGs such as [...]**" (e.g., Fortuna, or the EmbedTLS implementation that bzt posted).

Re: The Random Number Generator page needs reworking

Posted: Sat Jul 03, 2021 10:35 am
by bzt
I see you have added hardware random, great! :thumbsup:

But I must agree with Korona about the cryptographic part. SHA does not give you high-level entropy. SHA, AES, and all the other hashes are fully deterministic, meaning they will always produce the same output for the same input, therefore they can't increase the entropy at all. They are used to make an adversary's life more difficult, and not for improving entropy. Adding non-deterministic salt to the hash from real-life events at arbitrary and non-deterministic intervals is what makes it a high-level entropy.

Otherwise pretty nice page, I like it! Specially like the descriptions, those are very easy to understand and well-written. Maybe you should make the last part shorter, just mention that such a thing as cryptographically sound random exists, and pseudo-random won't cut it, but not going into the details that much? I don't think people lurking on the OSDev wiki and searching for RNG want to learn about cryptography, after all (but I could be wrong about this).

Maybe you could also mention Stanislaw Lem's His Master's Voice. That sci-fi novel starts with a customer sueing a company that offers random numbers (generated using cosmic radiation as the ultimate entropy source), for the numbers not being random at all :-) (In the novel little green E.T.s are the adversaries unintentionally messing with the random seed)

Cheers,
bzt

Re: The Random Number Generator page needs reworking

Posted: Sat Jul 03, 2021 1:57 pm
by nullplan
bzt wrote:I see you have added hardware random, great! :thumbsup:
Yeah, that's been there for a while. While researching the topic, I stumbled upon RDSEED, so I added that into the text as well.
bzt wrote:But I must agree with Korona about the cryptographic part. SHA does not give you high-level entropy. SHA, AES, and all the other hashes are fully deterministic, meaning they will always produce the same output for the same input, therefore they can't increase the entropy at all. They are used to make an adversary's life more difficult, and not for improving entropy. Adding non-deterministic salt to the hash from real-life events at arbitrary and non-deterministic intervals is what makes it a high-level entropy.
What are you on about? I used SHA-1 to illustrate the idea of compressing entropic data into a given form factor. It's not the function that is entropic, it is the input. mix_in_entropy() should be called with the new entropic data given as argument. Maybe I should make that more clear.

Also. AES isn't a hash function, or at least I have never seen it used to as such. AES is reversible, which is pretty much the one thing a hash function should never be.
bzt wrote:Maybe you should make the last part shorter, just mention that such a thing as cryptographically sound random exists, and pseudo-random won't cut it, but not going into the details that much?
I already see that I failed. I tried to show that the important thing is entropy, and how to manage it, and that there are pretty much only pseudo-random numbers out there. Indeed that is the point of one of my references, that no appreciable difference exists - for the purposes of cryptography - between the output of a fully seeded CSPRNG and some magical "true" random numbers. Also the point of the linked DJB blog post.
bzt wrote:I don't think people lurking on the OSDev wiki and searching for RNG want to learn about cryptography, after all (but I could be wrong about this).
I don't know about you, but I got into OSDev to learn how things work in this domain. I would quote Faust here, but I only know the German text. I cannot stand the explanation "that's too complicated for you"; I would very much like to decide that for myself, please. I dislike glossing over details. Therefore, I would rather err on the side of too many details than too few. It is possible that the bit about information-theoretic security is going overboard, but maybe it does impress the point onto the reader that deriding pseudo-random numbers for not being "secure" is not specific enough.

Now, I am not the most knowledgeable about cryptography. I have managed to amass a bit of knowledge by listening to people, but I have no formal education on the topic, beyond the basics I learned in my CS Bachelor's course. So it is entirely possible I got the details subtly wrong. But I would like to add them to the Wiki anyway, so if I am wrong, someone more knowledgeable than me can come along and correct them. Is that not the Wiki idea?
bzt wrote:Maybe you could also mention Stanislaw Lem's His Master's Voice. That sci-fi novel starts with a customer sueing a company that offers random numbers (generated using cosmic radiation as the ultimate entropy source), for the numbers not being random at all :-) (In the novel little green E.T.s are the adversaries unintentionally messing with the random seed)
Much as I like classic science-fiction, I would like to keep it to non-fiction. Else I would have to link this Dilbert cartoon.

Re: The Random Number Generator page needs reworking

Posted: Sat Jul 03, 2021 4:41 pm
by Octocontrabass
nullplan wrote:While researching the topic, I stumbled upon RDSEED, so I added that into the text as well.
Your description of RDSEED isn't quite right. It's actually much more easily exhausted than RDRAND, since it's an entropy source instead of a random number generator.

Re: The Random Number Generator page needs reworking

Posted: Sun Jul 04, 2021 3:29 am
by bzt
nullplan wrote:What are you on about? I used SHA-1 to illustrate the idea of compressing entropic data into a given form factor.
I guess what I'm trying to say is, the section's title "High-entropy PRNGs" suggests that the section, which talks too much about SHA for an illustration is about high-entropy PRNGs, while it's not. As a matter of fact, in the middle of the section you mention too (very correctly):
However, we can hash the entropy pool. The hash of the entropy pool is as random as the entropy pool itself
but that's somehow lost in the noise. There's nothing factually wrong about this section, it's just somehow misleading I think, I feel (and I must stress IMHO) that it unintentionally suggests that using SHA is enough to provide a high-entropy PRNG. I hope this helps. (Maybe you should rename this section to "Compressing Entropic Data", and everything would be fine? Yeah, that sounds reasonable.)
nullplan wrote:Much as I like classic science-fiction, I would like to keep it to non-fiction. Else I would have to link this Dilbert cartoon.

Code: Select all

int random()
{
    return 4; /* generated with a real-life random dice throw */
}
:-)

Cheers,
bzt

Re: The Random Number Generator page needs reworking

Posted: Mon Jul 12, 2021 3:41 pm
by moonchild
This discusses various problems with rdrand/rdseed, including their ability of a malicious implementation to poison other sources of entropy. I think the use of rdrand/rdseed should not be recommended at all, even for seeding entropy pools; or if it is recommended, it should be made very clear that you have to:

1, collect rdrand before other strong sources of entropy (e.g. hardware timings); and

2, don't rely solely on rdrand

Re: The Random Number Generator page needs reworking

Posted: Mon Jul 12, 2021 3:43 pm
by moonchild
The non-cryptographic RNGs section is, I think, remiss without mentions of pcg and xoshiro/xoroshiro.