Summary
- Create an rng object with np.random.default_rng(), you can seed it for reproducible results.
- You can draw samples from probability distributions, including from the binomial and normal distributions.
- You can shuffle arrays in place with rng.shuffle().
Whether you’re simulating probability distributions or just want a random number, it’s easy to do with Python’s NumPy library.
Creating the random number generator
To be able to generate random numbers with NumPy, you need to create the random number generator. You can do this after importing the library with a simple command:
import numpy as nprng = np.random.default_rng()
Don’t forget…
Summary
- Create an rng object with np.random.default_rng(), you can seed it for reproducible results.
- You can draw samples from probability distributions, including from the binomial and normal distributions.
- You can shuffle arrays in place with rng.shuffle().
Whether you’re simulating probability distributions or just want a random number, it’s easy to do with Python’s NumPy library.
Creating the random number generator
To be able to generate random numbers with NumPy, you need to create the random number generator. You can do this after importing the library with a simple command:
import numpy as nprng = np.random.default_rng()
Don’t forget to put the parentheses at the end. This will create a random number generator object. You can also seed the random generator. This will mean that the sequence of random numbers using this seeded version will be repeatable. If you don’t supply a seed, NumPy will use your operating system’s default source of randomness instead.
To create a seeded random number generator object with the number 42 as the seed:
seeded_rng = np.random.default_rng(42)
Generating random numbers
To generate a random number from your newly created random number generator, just use the random method:
rng.random()
To create an array of random numbers, supply the length of the array you want. To create an array of 10 random numbers:
rng.random(10)
Since NumPy works on multidimensional arrays, you can use it to build tables of random numbers. Just use the number of rows and the number of columns, separated by a column. To create a NumPy array with three rows and five columns:
A = rng.random((3,5))
Generating random numbers from the binomial distribution
One reason NumPy is such a favorite for data analysis is that it’s easy to generate random numbers for simulation.
You can generate random numbers from specific probability distributions. The binomial distribution is perhaps the most well-known discrete distribution, which represents the number of successes in n trials.
You can generate random numbers from the binomial distribution with the binomial method.
We can simulate flipping a coin ten times. Since a fair coin has two different sides, heads and tails, we should have a 50% percent chance of getting heads or tails.
rng.binomial(10,0.5)
This will return the number of successes over 10 coin flips. In this context, “successes” means the number of heads or tails out of 10 coin flips. The number of successes returned by the random number generator will usually be at least five. You might think six or seven successes out of 10 is more than the 50% you might expect for a fair coin. These coin flips are mutually exclusive, meaning that one coin flip doesn’t affect the other. How would you know if the coin was really fair? This problem shows why it’s hard to tell from small samples. You would have to flip it some more, according to the law of large numbers (as explained in both the “weak” and “strong” versions on Wolfram MathWorld. Let’s try increasing the number of flips. Let’s try 50:
rng.binomial(50,0.5)
The value will fluctuate, but the number of successes will still be close to 25, or half of 50. Let’s try 100:
rng.binomial(100,0.5)
If you keep increasing the number of trials, you’ll get closer and closer to 50%, or five successes out of 10. This illustrates the limiting behavior of probability, that you’ll approach the theoretical probability when the number of observations is very large.
You can also create lists of binomial trials. To create a list of the results of 10 trials of 10 coin flips each:
a = rng.binomial(10,0.5,10)
You can plot a histogram with Seaborn:
import seaborn as snssns.set_theme()sns.displot(x=a)
It won’t look like much, but if you keep increasing the number of observations and plot the histograms of them, you’ll notice that the distribution of successes looks more like the normal distribution, with the famous bell-shaped curve.
For example, try 100 observations:
b = rng.binomial(10,.5,100)sns.displot(x=b)
And 1000:
c = rng.binomial(10,.5,100)sns.displot(x=c)
This is an example of the Central Limit Theorem (also explained by Wolfram MathWorld). If you look at the means of these arrays, you’ll notice that they converge on 5, which means that the proportion of successes matches the theoretical 50% chance of getting heads or tails, the larger the arrays get:
a.mean()b.mean()c.mean()
Random numbers from the Normal Distribution
Apart from converging on the normal distribution, you can generate random numbers from the normal distribution directly with the standard_normal method.
To take a single random number from the normal distribution:
rng.standard_normal()
This will print a random number if you’re in an interactive session. You can also generate arrays of values similar to the random method. For example, to get a normal distribution array of 10 numbers;
a = rng.standard_normal(10)
To prove that the values are indeed from the normal distribution, you can generate an array of larger numbers and plot a histogram of them. Let’s try 100 numbers:
b = rng.standard_normal(100)sns.displot(x=b)
That looks closer to the normal curve. Let’s try 1000:
c = rng.standard_normal(1000)sns.displot(x=b)
You can also create an array with a specific mean and standard deviation by adding and multiplying numbers to the array. To create a normally distributed array of 100 numbers with a mean of 4 and a standard deviation of 2:
a = 4 + 2 * rng.standard_normal(100)
You can check this by taking the mean and standard deviation of this sample.
a.mean()a.std()
The results will be close to the values we chose.
Shuffling an array
You can use a NumPy random number generator to shuffle an array in place. For example, to shuffle an array of strings of fruit names:
fruits = ['apples','oranges','bananas','grapefruits']rng.shuffle(fruits)
When you examine the array again, you’ll notice that the order of items is different.
You could use this function to randomize items if you were developing a small game. You could represent a deck of cards as a Python array and have it shuffled automatically.
With NumPy, you can add a little randomness to your Python programs. It’s easy to generate random numbers by creating a random number generator with NumPy. f