Notes
I'm trying to keep myself organized by publishing work-in-progress. The thought that
someone other than me might actually see this stuff tends to encourage a certain coherency
to the diffs and their associated notes.
Any patches that do not have a date as part of the filename or have a not-so-recent date
are unlikely to apply cleanly to -current. I do have all these things still in one tree or
another or they have been included in the official OpenBSD tree. The network and
sparc64 pages are particularly out-of-date (as in, “half a
decade”).
- June 20 (2015)
-
More papers on random numbers have been added.
- June 3 (2013)
-
Links to some of my other projects have been added.
- Feb 18 (2008)
-
The kernel rnd and VIA RNG patches have
been synced with -current.
- Nov 9
-
The kernel rnd and VIA RNG patches have
been synced with -current.
- Sept 27
-
There is a new .NET wrapper for Makoto Matumoto's Mersenne Twister called RandomSFMT.
- Sept 26
-
The
c7random
program inside the NIST SP 800-90 CTR_DRBG archive now uses /dev/urandom
to form part of the nonce.
- Sept 5
-
Vista x64 and PasswordSafe
do not get along perfectly. This should help.
- Sept 3
-
NIST SP 800-90 CTR_DRBG compiles under VS2005 again.
- Aug 28
-
An OpenSSL hack has been added to provide full entropy for
the default RNG.
- Aug 14
-
The minimal pr5205 diff was checked in.
- Aug 9
-
It looks like there's a small diff that can fix pr5205.
- Aug 7
-
c7random has been merged into the NIST SP 800-90 CTR_DRBG code. There have been some bugfixes,
Rijndael known-answer tests have been added, and the code has been reorganized.
- July 30
-
I've gathered some random number generation information onto an Entropy and Random Numbers page.
- July 18
-
The minor effort to make a port for DieHarder has turned into
a major patch.
- July 17
-
DieHarder has come along to provide a GPLed alternative to
DIEHARD.
- June 22
-
I've updated the Kern rnd and VIA C3/C7
RNG patches to -current.
- June 13
-
I've added some links to NIST's take on RNGs.
Old notes have been archived.
Robert G. Brown has put together a GPLed RNG test suite that includes the and expands
upon George Marsaglia's DIEHARD suite called DieHarder.
Here's a preliminary port that
should get it compiled:
dieharder-port-20070716.tgz
It does not install any documentation or header files. Also, I'm no port, automake, or
autoconf expert.
Alternately, here's a patched version of dieharder-2.24.4.tgz
that has been lightly tested on amd64/FreeBSD and on i386/OpenBSD.
Full distribution: dieharder-20070718.tar.bz2
(679k)
Patch: dieharder-2.24.4-20070718.diff.bz2 (395k)
VIA RNG fix (pr5205)
The VIA
VT-310DP is a dual-processor Mini-ITX board that sometimes likes to panic under heavy
load. As pr5205
notes, the problem goes away when entropy collection from the CPU's hardware RNG is
disabled. Use of the AES hardware does not trigger this panic. Both require SSE, but the
former uses SSE in a callout whereas the latter does so in a kernel thread. More troubling,
the entropy collection callout enables SSE by directly manipulating the “Emulation” and
“Task Switched” bits of CR0.
This patch moves the entropy collection into a kernel thread (without any CR0
twiddling): pr5205_20070215.diff.gz
An alternate solution can be developed based on a theory about the cause of this
problem: Let's say that a process that uses the FPU has just run on CPU0 and is about to be
run on CPU1. The FPU state for the process is still living on CPU0, so CPU1 sends an IPI to
CPU0 to ask it to flush the FPU state to the appropriate PCB. Unfortunately, CPU0 is right
in the middle of its entropy collection loop—which means that it has already sampled CR0.
The npxdna_xmm()
function gets called, clearing TS (through
clts()
). The entropy polling resumes, and when it completes,
it sets CR0 back to what it was before xpndna_xmm()
cleared
it. At this point “TS” is set when it should not be set. Exactly how this leads to the
ensuing panic is beyond my understanding of the deferred FPU handling…
At any rate, here's a minimal diff that blocks IPIs while the entropy polling is
running: pr5205_minimal_20070809.diff.gz
(this was checked in to the OpenBSD tree on Aug
14.)
The minimal diff makes the assumption that while the VIA PadLock hardware uses the SSE
datapath, it does not change the state of any of the SSE registers. The documentation seems
a bit unclear on this point. If this assumption is not correct, then programs that use any
of the FPU and/or SSE state that PadLock touches, may occasionally find their math going
wrong. The kernel thread solution should avoid this issue since a kernel thread is a
process and the kernel already knows how to deal with processes that use SEE. The crypto
acceleration is safe since it always runs in the context of the kernel's crypto thread.
The minimal diff does avoid the panics described in pr5205.
For reference, this is what I'm using in
production.
kernel rnd
The OpenBSD kernel has a single arc4-based generator that supplies almost all the
kernel's random numbers (from /dev/arandom
, seeding userland's
arc4-based generator, to selecting ephemeral network ports).
The entropy pool is derived from the the Linux random driver written by Theodore Ts'o.
(Perhaps a mixing function based on SFMT could inspire
some changes?)
The paper Analysis of the Linux
Random Number Generator by Zvi Gutterman and Benny Pinkas may also be applicable
to OpenBSD's RNG given its common heritage. Barak and Halevi share their thoughts about an
alternative architecture in An
Architecture for Robust Pseudo-Random Generation and Applications to
/dev/random.
There are some possible races in the kernel rnd driver (which supports /dev/Xrandom
). This should help.
See this tech@
thread for details.
- Feb 18
- Yet again, I've regenerated the diff against -current.
- Nov 9
- Once again, I've regenerated the diff against -current.
- June 22
- I've regenerated the diff against -current and fixed two bugs in the old patch.
rnd_20080218.diff.gz
Enhanced VIA C3/C7 RNG Support
The VIA C3/C7 processors
have internal random number
generators. There is already support in OpenBSD for harvesting the entropy from these
generators, but here are some improvements.
The current code polls the generator until it has delivered the requested number of
bytes. This can take a while (some experiments suggest that a 1GHz C3 spends ~5% of the
total available CPU time polling the RNG). If someone decides to shut off the RNG—or if it
fails—the polling may never terminate without user intervention.
Later versions of the core add a second entropy source. The diff makes sure that both
sources are enabled. On the single C7 where it has been tested, the whitened output rate
increased from ~21Mbit/s to ~35Mbit/s (WARNING: it turns out those numbers were collected
with apmd -aC
, so the CPU clock was changing).
The VIA
PadLock Developer Center page has a link to the
VIA C5J programming guide for Asssembler
The VIA PadLock
Security Engine page has a link to
VIA C5J PadLock Security Engine and to an evaluation
whitepaper.
- Feb 18
- Yet again, I've regenerated the patch against -current.
- Nov 9
- Once again, I've regenerated the patch against -current.
- June 22
- I've regenerated the patch against -current.
viarng_20080218.diff.gz (This
requires the kern rnd patch.)
rndstats Monitor
Here's a utility to monitor the kernel's random number generator/entropy pool (the
rnd
device).
The kernel entropy pool collects statistics about where bits come from, how much is
exported, and so on. This can be monitored through “sysctl
kern.random” which provides a great deal of information in a single line:
kern.random=102543780 537536 0 220972 6 1288 0 0 0 0 0 0 3337411 114221 1447 755 8806 108 162 271 363 494 732 1070 1478 1747 2012 2401 3599 3990 3576 4570 3811 2329 1100 733 412 288 156 66 17 12 0 1 0 0 3 3293104 3293104 8516 0 10570 3055 22166 0 0 102086224 0 0 140963 31278 286917 0 0
rndstats
provides the same information in a slightly less
compact format and will also display the rate of change for the various counters. For
example, here's “rndstats -vw 5” on a 1.2GHz C7 (with
apmd -aC running):
total = 113208394 bits (0.113208 Gbit 31.7348 kbit/s)
used = 539520 strong bits (0 bits/s)
reads = 0 calls (0 calls/s)
ARC4
reads = 226124 bytes (37.6 bytes/s)
nstirs = 7 calls (0 calls/s)
stirs = 1544 bits used (0 bits/s)
Queue:
waits = 0 (0 waits/s)
enqs = 3683621 calls (1026 calls/s)
deqs = 125122 calls (33.2 calls/s)
drops = 1495 (0 drops/s)
drople = 846 (0 droples/s)
Sources:
true = 3635512 calls, 112700872 bits (31 bits/call 1024.8 calls/s)
timer = 8548 calls, 0 bits
mouse = 0 calls, 0 bits
tty = 11817 calls, 157519 bits
disk = 3129 calls, 32346 bits (13.5 bits/call 0.4 calls/s)
net = 24615 calls, 318422 bits (15.25 bits/call 0.8 calls/s)
audio = 0 calls, 0 bits
video = 0 calls, 0 bits
Entropy Histogram:
0 bits = 8924 calls (0 calls/s 0 bits/s)
1 bits = 121 calls (0 calls/s 0 bits/s)
2 bits = 177 calls (0 calls/s 0 bits/s)
3 bits = 301 calls (0 calls/s 0 bits/s)
4 bits = 394 calls (0 calls/s 0 bits/s)
5 bits = 546 calls (0 calls/s 0 bits/s)
6 bits = 810 calls (0 calls/s 0 bits/s)
7 bits = 1176 calls (0 calls/s 0 bits/s)
8 bits = 1623 calls (0.2 calls/s 1.6 bits/s)
9 bits = 1891 calls (0 calls/s 0 bits/s)
10 bits = 2163 calls (0 calls/s 0 bits/s)
11 bits = 2584 calls (0 calls/s 0 bits/s)
12 bits = 3887 calls (0.4 calls/s 4.8 bits/s)
13 bits = 4337 calls (0 calls/s 0 bits/s)
14 bits = 3994 calls (0 calls/s 0 bits/s)
15 bits = 5147 calls (0 calls/s 0 bits/s)
16 bits = 4328 calls (0 calls/s 0 bits/s)
17 bits = 2596 calls (0.2 calls/s 3.4 bits/s)
18 bits = 1252 calls (0 calls/s 0 bits/s)
19 bits = 812 calls (0.2 calls/s 3.8 bits/s)
20 bits = 457 calls (0.2 calls/s 4 bits/s)
21 bits = 312 calls (0 calls/s 0 bits/s)
22 bits = 169 calls (0 calls/s 0 bits/s)
23 bits = 72 calls (0 calls/s 0 bits/s)
24 bits = 19 calls (0 calls/s 0 bits/s)
25 bits = 13 calls (0 calls/s 0 bits/s)
26 bits = 0 calls (0 calls/s 0 bits/s)
27 bits = 1 calls (0 calls/s 0 bits/s)
28 bits = 0 calls (0 calls/s 0 bits/s)
29 bits = 0 calls (0 calls/s 0 bits/s)
30 bits = 3 calls (0 calls/s 0 bits/s)
31 bits = 3635512 calls (1024.8 calls/s 31768.8 bits/s)
Perhaps one should take those entropy estimates with a grain of salt (and perhaps that
grain is large enough to make for a respectable bench press).
rndstats_20070128.tgz
C7 Random Number Generator
The C7 has a built in SHA256 engine in
addition to a hardware random number
generator. This utility uses the two together to produce what should be a pretty solid
random stream on stdout. Each output block of 256 bits is generated from 448 bits generated
by the hardware RNG. Enabling the “paranoid” mode (“p” option) XORs the output of the hash
with a fresh set of RNG bits and iterates the whole generation process N times before
generating an output block (where N is the argument to the “-p” option).
There's a new version that adds a “-N” option to produce random output that is hopefully consistent with the
“RBG” of NIST SP
800-90 Appendix D with the C3/C7's entropy source and an AES-256-based CTR_DRBG.
Here's a neat trick: since it isn't such a great idea to stall the kernel for long
periods while seeding the entropy pool, we move the heavy lifting into userland. For use
when the system is starting, build a static c7random
binary
(uncomment the LDSTATIC
line in the Makefile
, then “make clean ; make”) and copy the
resulting binary to the root filesystem (e.g., /bin
). Then edit
/etc/rc
to add two copies of /bin/c7random
-p 4 -s 16384, one before the host.random
file is read and
one after host.random
is rewritten. It should look something
like:
mount -s /usr >/dev/null 2>&1
mount -s /var >/dev/null 2>&1
/bin/c7random -p 4 -s 16384 > /dev/urandom
# if there's no /var/db/host.random, make one through /dev/urandom
if [ ! -f /var/db/host.random ]; then
dd if=/dev/urandom of=/var/db/host.random bs=1024 count=64 \
>/dev/null 2>&1
chmod 600 /var/db/host.random >/dev/null 2>&1
else
dd if=/var/db/host.random of=/dev/urandom bs=1024 count=64 \
> /dev/null 2>&1
dd if=/var/db/host.random of=/dev/arandom bs=1024 count=64 \
> /dev/null 2>&1
fi
# reset seed file, so that if a shutdown-less reboot occurs,
# the next seed is not a repeat
dd if=/dev/urandom of=/var/db/host.random bs=1024 count=64 \
> /dev/null 2>&1
/bin/c7random -p 4 -s 16384 > /dev/urandom
# clean up left-over files
Finally, to get the kernel pool reseeded every ten minutes, add
*/10 * * * * /bin/c7random -p 4 -s 8192 > /dev/urandom
to root's crontab
. The command shouldn't take more than 100ms to run, so
one could run it more often. Increasing the size of the write would not be useful as the
kernel pool is only 4096 bytes (the data is processed by the kernel as it adds it to pool,
so a 4096 byte write may not be enough to reach everything). The point is to completely
reseed the entropy pool to make life more difficult for someone that has partial knowledge
of the pool contents. Small incremental writes to the pool are easier for the attacker to
guess and track (if they can get output between the updates).
In reality, the kernel's normal C7 RNG polling will the refill the entire entropy pool
once every two seconds or so… Then again, perhaps that means the pool needs to be be
larger?
Come to think of it, an entropy-deprived host could use the C7 as a remote entropy
source (and c7random
doesn't need any particular
privileges):
$ ssh randomsource.host /bin/c7random -p 5 -s 32 | hexdump
0000000 1643 bef9 fedb a1b5 f5ab 8230 45f9 e8e8
0000010 fdc5 ca50 a4f3 8e88 39cc c5dd 3011 ae5f
0000020
Obviously, that is only as good as the link security. It could be useful for seeding a
server running SSL over a local LAN—as long as the web server is sensible enough to grab
new entropy every once in a while.
c7random
doesn't check what kind of CPU it is running on.
To use is on a CPU with the RNG but without SHA256, use the “-x”
option.
For comparison, here's are 1MB runs of from c7random
and
from /dev/urandom
on the same box (1.2GHz C7).
$ /bin/c7random | dd of=/dev/null bs=1k count=1k
1024+0 records in
1024+0 records out
1048576 bytes transferred in 0.429 secs (2443960 bytes/sec)
$ dd if=/dev/urandom of=/dev/null bs=1k count=1k
1024+0 records in
1024+0 records out
1048576 bytes transferred in 10.551 secs (99380 bytes/sec)
The former is consuming more high-quality entropy bits than it is generating output
bits—so it should provide solid prediction resistance—at ~2.4Mbyte/s whereas the latter is
producing 100kbyte/s from roughly 31kbit/s of entropy input (almost all from the C7's
RNG).
Note that this does not require any of the above kernel
changes.
The c7random
code is now part of the NIST SP 800-90 CTR_DRBG distribution.
C3/C7 Random Number Benchmark (hack)
The C3/C7's internal random number generator has a few control and status bits that can
sometimes be of interest. This little hack displays them as well as an estimate of the
generator's output rate. The output of via_rng
for a 1.2GHz C7
box looks like this:
CentaurHauls Type=0 Family=6 Model=10 Stepping=9
VIA Esther processor 1200MHz
RNG MSR 0x11b: 0x00000248 ( ENBL RNG-BOTH BIAS=0 )
Raw rate:
3.6 s total time
1.7795 us per 8 byte iteration
35.9652 Mbit/s
Kernel polling:
2.09 s total time
103.419 us per 64-byte iteration
1.03419% CPU
And here's the same box with only one of the RNG sources enabled:
CentaurHauls Type=0 Family=6 Model=10 Stepping=9
VIA Esther processor 1200MHz
RNG MSR 0x11b: 0x00000048 ( ENBL RNG-A BIAS=0 )
Raw rate:
6.28 s total time
3.07814 us per 8 byte iteration
20.7918 Mbit/s
Kernel polling:
3.75 s total time
184.049 us per 64-byte iteration
1.84049% CPU
via_rng_20070128.tgz