Confidence can be conveyed using verbal terms: low, medium, high, and so on. But, for various reasons, it may be useful to have a numerical scale. Think of what the numerical probability scale can add to verbal probability phrases (likely, unlikely).

However, for a scale to be useful, it needs to have a clear meaning. What does it mean for your confidence in a given probability judgement to be 60?

The probability scale gets such meaning by calibrating itself against the scale of objective, known chances. What does it mean to say that the probability of rain tomorrow is 50%? It means that the probability is the same as the probability of picking a red ball from an urn containing 50 red balls and 50 blue ones. And what does it mean to say that your probability estimate for your country being the next soccer world champion is 50%? It means that, in your opinion, it is as likely that they will be champion than that the next ball picked from the 50 red-50 blue urn is red.

The numerical confidence scale involves a similar calibration, except it does not look at the (known) composition of the urn, but rather at the evidence available about its composition.

Consider an urn with 20 balls, each of which is red or blue. All you know about the composition comes from the observations of draws from the urn.

Consider how confident you are in the assessment of probability between 40% and 100% for the next ball drawn from the urn being red. (Equivalently, you could consider your confidence that there are at least 8 red balls in the urn.)

On the basis of no observed draws, you may not have any confidence at all in this assessment.

However, after having observed 50 draws from the urn, half of which are red, you may be more confident in the probability assessment of 40%-to-100% for red.

And after having observed 100 draws, half of which are red, you will have even more confidence.

Indeed, at 100 draws, standard statistical tests would recommend the probability assessment, with very high confidence. (The hypothesis that there are less than 8 red balls in the urn rejected with p < 0.001).

So your confidence in the probability assessment of 40%-to-100% for red varies with the number of draws observed (half of which were red), from no confidence at all to a very high level. Just as urns of known composition can “calibrate” probability judgements, this range of situations with more or less evidence can “calibrate” confidence assessments:

**What does it mean to say that your have confidence 60 in a particular probability judgement? It means that you are as confident in it as in the assessment of probability 40%-to-100% for red on the next draw from the previously specified urn, after having observed 60 draws, half of which were red.**

If you choose any of the numerical-based confidence-reporting formats, you will be asked to communicate your confidence on this scale.