Python Pseudorandom Bytes

A new algorithm for customizable, fast pseudorandom bytes that passes the tests with flying colors.

Photo by Ryunosuke Kikuno on Unsplash

Randomness is too important to leave to chance. Donald Knuth said that, and it’s a good mantra for programmers to adhere to. Generating pseudorandom numbers is a notoriously error prone challenge, and getting it right is extremely important.

Photo by Brett Jordan on Unsplash

The following very short pseudorandom byte generator Python class is easy to implement, and easy to analyze for any shortcomings, because it’s so simple compared to more complex algorithms. Most importantly, it passes standard tests of randomness, such as the software suite of tests provided by John Walker at Fourmilab in Switzerland.

I’ve implemented the algorithm as a Python class called Prb (Pseudo-random bytes) making it easy to import into your own projects. I’ll present the code first, and then explain how it works, and why it works as well as it does.

# prb.pyclass Prb:
def __init__(self, seed):
unique_seed = seed + "Customize here"
len_unique_seed = len(unique_seed)
self.p, self.q = 0, 0
self.buf = list(range(256))
for i in range(997):
n = ord(unique_seed[i % len_unique_seed])
self.next_byte(n + self.next_byte())
def next_byte(self, b=0):
self.p = (self.buf[self.p] + b + 1) % 256
self.q = (self.buf[self.q] + b + 2) % 256
self.buf[self.p], self.buf[self.q] = self.buf[self.q], self.buf[self.p]
return self.p ^ self.q

Here’s a short example of using the Prb class to generate and print twenty pseudorandom byte numerical values.

import prbseed = "Any unique string"
p = prb.Prb(seed)
for i in range(20):
n = p.next_byte()
print(n, " ", end="")
print("\nDone")# 134 61 24 206 151 152 51 100 177 155 39 225 144 198 13 39 102 141 58 11

The Core Variables

Internally, a buffer of 256 bytes containing the values 0 to 255 is maintained as each byte is generated. Variables p and q also contain values ranging from 0 to 255. The buffer and the variables p and q all act as pointers, or indexes, as well as byte values, a detail that makes this algorithm rather unique. The values in the buffer are never altered, instead their locations are swapped, so the buffer always contains a list of all possible byte values in an ever changing order. The possible unique states of the buffer, or the possible ordering of its bytes, is 256 factorial, a very large number. As a comparison, 64 factorial is a number greater than the number of electrons in the known universe, and 256 factorial is much, much larger.

A Whole Lot of Electrons Out There! — Photo by Luca Baggio on Unsplash

Initialization

The Prb class is initialized by passing a seed string. If even one bit of this string is changed, the output sequence of bytes is instantly and completely changed going forward. The value of p and q are altered with each step by the byte values of the seed string, and also by the byte values at the buffer locations they point to. Once altered to new values in the range 0 to 255, the bytes in the buffer at those locations are swapped. During the warmup phase of the initialization, the seed string is used repeatedly to shuffle the buffer state in a loop that repeats 997 times. The next_byte() method is actually called twice with each iteration, once with a value extracted from the seed, and once without.

Generating the bytes

The next_byte() method returns a byte each time it’s called, and this is the core algorithm in the Prb class. An optional byte (or larger) value can be passed to this method. The values of p and q are modified by adding this optional value with the buffer bytes they each point to, plus either 1 or 2 to force their divergence. The value of p and q are then each brought into the range 0 to 255 using the mod function, and the bytes pointed to in the buffer by their new values are swapped. Finally, the values of p and q are XOR’d together to create the byte returned by the method.

Note that there are 256 possible pairs of bytes that, when XOR’d together will return any given byte value. This adds difficulty to any attempt to predict the state of the generator based on back calculating from the output sequence.

Two Ways To Use the Generator

An initial seed string passed to Prb’s initialization method results in a completely unique generator state every time, even if only one bit is changed in the seed. Because the 256 byte buffer can exist in a huge number of unique states, the sequence of output bytes cannot realistically be used to figure out the state, even with billions of output bytes following the initialization. However, the next_byte() method can be called with an optional parameter. If any value is passed in, the output sequence is instantly and completely changed in an unpredictable way into a new output sequence. This provides a way to use this class as a hash function. After initialization, even using the same starting seed every time, if a stream of unique bytes from a file or other source is fed to the next_byte() method, a unique but repeatable sequence of output bytes will always be the result. Similar to SHA256, this action can be used to generate a statistically unique hash value of any desired length.

Ultimate Unpredictability

The Prb class passed a variety of random number tests. For example, binary files of 100 million bytes each were generated using a variety of seed strings, and the results were tested using John Walker’s ENT.exe test suite. In all cases the various tests for statistical randomness were very good.

However, because of the way this class is constructed, the author found it easy to add greatly to the unpredictability and uniqueness of the output bytes by feeding the output from one Prb instance as input to the next. Here’s a new class, called Prb2 and built in the same prb.py file, that uses two instances of Prb objects.

class Prb2:
def __init__(self, seed):
self.p1 = Prb(seed + "a")
self.p2 = Prb(seed + "b")
def next_byte(self, b=0):
b1 = self.p1.next_byte(b)
b2 = self.p2.next_byte(b1)
return b1 ^ b2

This new class object is used in an identical manner to how the original Prb class is used. The unpredictable pseudorandom bytes created by the first instance of Prb are used to continuously seed the second instance of a Prb object. Yes, this is a strong overkill in generating pseudorandom bytes, but for your own homegrown encryption it might be of value.

Warning! Until tested and approved by cryptographic experts, it is strongly advised that you do not use this algorithm for any significant cryptographic purposes. This algorithm very likely provides extreme security that you can trust for yourself, but use Python’s secrets module for tested and officially approved cryptographic security!

Going Even Further

The Prb object is very fast, involving no multiplications, divisions, or other complicated or repetitive math. The initialization loop is the slowest step, but once completed, billions of bytes can be generated very quickly with no predictable patterns in the output.

As an experiment, I decided to go ahead and repeat this same code pattern by creating yet another object in the prb.py file, this time named Prb4. As you might be guessing, this object creates two unique instances of the Prb2 object, where each of those in turn creates two unique instances of the Prb class. Here’s the resulting code.

class Prb4:
def __init__(self, seed):
self.p1 = Prb2(seed + "c")
self.p2 = Prb2(seed + "d")
def next_byte(self, b=0):
b1 = self.p1.next_byte(b)
b2 = self.p2.next_byte(b1)
return b1 ^ b2

The same code used to test the Prb class can be used to test this new Prb4 class:

import prbseed = "Any unique string"
p = prb.Prb4(seed)
for i in range(20):
n = p.next_byte()
print(n, " ", end="")
print("\nDone")

If you look close, the only change is in the call to instantiate and initialize the object, where prb.Prb4(seed) is called instead of prb.Prb(seed). Otherwise, the code is identical.

Ridiculousness

I couldn’t help myself, so I went ahead and followed the same pattern to create several more class objects, named Prb8, Prb16, Prb32, Prb64, Prb128, and finally Prb256. I won’t put all the code in this article, but here’s the last one in the sequence so you can verify the pattern if you care to recreate any of this code for yourself.

class Prb256:
def __init__(self, seed):
self.p1 = Prb128(seed + "o")
self.p2 = Prb128(seed + "p")
def next_byte(self, b=0):
b1 = self.p1.next_byte(b)
b2 = self.p2.next_byte(b1)
return b1 ^ b2

Yes, that last class ends up creating 256 unique instances of the core Prb class, where the results from each are fed as input to the next. This is way, way, way over the top overkill in creating pseudorandom bytes, but other than a slight noticeable delay during the initialization process of all those objects, the bytes were then still generated very quickly. I include this code just to demonstrate the efficiency of the Prb class itself.

Ridiculousness can be useful! — Photo by William Moreland on Unsplash

Customization

Early on I mentioned that this algorithm is customizable. Take a look at the following line of code in the Prb class.

unique_seed = seed + "Customize here"

You can change the string contents in this line to anything you want. Again, even a one-bit change in one character will result in a completely different output stream of bytes from the object instance. Unlike SHA256, changing this custom string allows the creation of a completely unique hash function. That might be handy in some unique situations.

Thanks for reading, and as always, have fun with Python!

John’s passion and mission is sharing Python code to help demystify life’s challenges and to have fun. John is the author of Python for Numworks , Python for OpenSCAD, and many other titles.

Author, inventor, entrepreneur — passionate about alternate energy, technology, and how Python programming can help you hack your life.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store