This page is devoted to explaining our recent paper describing a model of numerosity perception. After a couple sections explaining the basic premise, there are are a few interactive demos for you to play with.

Noisy representations of number

Without counting, people can't perceive numbers perfectly. Perception is noisy so there is always uncertainty about how many objects you see in front of you. For instance, if you are shown 5 objects, you might think that you probably saw 5 things but there's some chance it was 4 or 6. The plot below illustrates this idea, with a distribution over numerical estimates centered at 5 with some variance. \( \)

Two systems?

People who study number have noted that small numbers seem to behave differently when drawn this way. Small numbers seem to be pretty much exact, so if we drew their distributions, the distribution for each number \(n\) would just be peaked on \(n\), as in the figure below.

But when you get to larger numbers, the distributions are more spread out, and in fact their standard deviation gets larger and larger as the number increases:

The difference between these is what has led people to think that there are separate systems underlying these number representations.

Bounded information capacity

In our paper, we set out to derive the optimal psychophysical function for numerical estimation given a limited informational capacity. We will call this function \(Q\). We first assume that people start with a prior distribution over numbers \(P(n) \propto 1/n^2\), meaning that people expect to have to deal with small numbers more than large numbers (this is observed empirically for both humans and animals). Given numerosities \(n\) and estimates \(k\), we want a psychophysical function \(Q(k|n)\) that minimizes the squared error between \(n\) and \(k\). Of course, if people had unlimited neural resources, the best \(Q\) would be the one shown in the first plot in the previous section, with \(Q(k|n) = 1\) only for \(n=k\). However, perception is noisy and people do not have unlimited neural resources. So we also assume that there is some information bound \(B\) which bounds the maximum allowable KL-divergence from \(P\) to \(Q\). This can be thought of as bounding how "far away" your posterior \(Q\) can be from your prior \(P\) (in bits of information).

The basic idea of this analysis, then, is that evolution has created a perception system which is efficient relative to its limits, meaning that it minimizes the error subject to the bound on information capacity. To see what this optimization is like, below (after the example plots) you can try creating a distribution and see how much information it requires and how much error it has: try to minimize the error while staying below the KL bound.

We first illustrate how this game works. There are our two probability distributions, the prior in blue (\(P\)) and the posterior in orange (\(Q\)). Given some numerosity \(n\), the goal is to minimize error while staying below the KL-bound, which we have set to 3 bits or below. We start with the number 10, shown by the dotted line. If your posterior \(Q\) was the same as your prior \(P\), then you would have 0 KL-divergence, but extremely high error, as shown in the plot on the left. Of course, putting all the probability mass around 10 would give you the lowest squared error. But, as you can see in middle plot, that will exceed the KL bound. Then there are intermediates, like the plot on the right, which has non-zero error but remains under the capacity bound.

Now it's your turn! Click on the plot below to modify values of \(Q\). Again, you should try to get the squared error as low as possible while not exceeding 3 bits. The buttons on the left allow you to 1) pick a new number; 2) re-normalize Q (this is done under the hood regardless of whether you press the button); and 3) see the optimal solution. Check out the optimal solution if you're stuck!

Unified psychophysics

There is actually only one optimal \(Q\) for each numerosity and information capacity that minimizes the squared error. In our paper, we derive this function analytically. You can see the optimal solutions in the above panel by clicking "show optimal". In the plot below, you can see the optimal \(Q\) for each numerosity at different amounts of information. By dragging the slider, you can see how \(Q\) changes as a function of the number of bits of information available.

You will notice that as you increase the amount of information, more numbers are represented with perfect fidelity (i.e., where the distribution gives probability 1 to n). But at some cutoff point, the precision begins to decrease for larger numbers. That is, the single psychophysical function \(Q\) recovers the observed psychophysics for both small and large numerosities.

In the final panel below, you can again change the number of bits to see what the model predicts about mean estimates (left) and absolute error of estimates (right) across numerosities. The discontinuity in psychophysics for small and large numerosities can be observed in the absolute error plot as you increase the number of bits. You'll also notice that at low amounts of information, the model predicts underestimation.