5.2. Histograms#

5.2.1. Visualization using histograms#

We can visualize a set of random numbers by counting their frequencies and plot a histogram. The function below counts how many times each integer in 1 up to nbr_outcomes appears in sequence:

function count_histogram(nbr_outcomes, sequence)
    count = zeros(nbr_outcomes)
    for x in sequence
        count[x] += 1
    end
    count
end
count_histogram (generic function with 1 method)

We can now e.g. visualize the probability of getting 1,2,…,6 when rolling a fair die. The function below simulates the rolling of a die \(n\) times, by repeatedly drawing random numbers between 1 and 6. Next it uses the count_histogram function above to count the frequency of each outcome, and divides by \(n\) to estimate the probability:

using PyPlot

function simulate_die(ntrials)
    outcomes = collect(1:6)     # Simulate a fair die
    x = rand(outcomes, ntrials);
    bar(outcomes, count_histogram(6, x) / ntrials)
    xlabel("Die outcome")
    ylabel("Probability")
end

simulate_die(1000);
../../_images/4ca2bc9db6c13358ba6544c4090a428c1c545d02ec7461ba2b70365d0a41307b.png

As a generalization, we can simulate rolling a die \(n\) times and adding all the outcomes:

function simulate_sum_of_n_dice(ntrials, ndice)
    outcomes = collect(1:6)     # Simulate fair dice
    x = zeros(Int64, ntrials)
    for i = 1:ndice
        x .+= rand(outcomes, ntrials)
    end
    outcomesn = collect(1:6ndice)
    bar(outcomesn, count_histogram(6ndice, x) / ntrials)
    xlabel("Sum of n dice outcome")
    ylabel("Probability")
end

simulate_sum_of_n_dice(1000, 2);    # Two dice
../../_images/18c8350024ae873a5487893c17a601f23ba2ace4e7f872cede74cce723be4b36.png

The famous central limit theorem states that this distribution approaches a normal distribution as the number of dice rolls increases:

simulate_sum_of_n_dice(10000, 50);
../../_images/1bd5d3d19fe539b53e962a55c5e94549ab48ffd56760ded2cb10f205bc672f5c.png

5.2.2. General histogram into bins#

PyPlot also provides a histogram function which can automatically choose an arbitrary number of “bins”. For example, this code histograms 10000 random numbers from the normal distribution by counting the frequencies inside each of 50 equally sized bins:

x = randn(10000)
nbins = 50

plt.hist(x, nbins)
xlabel("Random variable x")
ylabel("Count");
../../_images/cbfe8985529db3e2e7e01e0be04d4db344c97667b6d8d92a4037cd2c89ac1f16.png