In machine learning, we often take probability for granted. We desire a system for representing uncertainty in the world, and Cox’s theorem tells us that if we accept some basic postulates regarding what is desirable in a system of uncertainty, we will end up with probability.
So that should be the end of the story… right? Well, maybe not. The first Cox postulate is
Divisibility and comparability – The plausibility of a statement is a real number and is dependent on information we have related to the statement,
which seems quite innocent. However, who’s to say that there is anything fundamental about real numbers? Real numbers have strange things like irrational numbers and negative numbers (crazy, I know), but they’re lacking in comparison to imaginary numbers (there’s no operation that you can apply 4 times before returning to your original value, which you can do by multiplying by i with imaginary numbers). It seems kind of arbitrary to choose real numbers. For a fun and interesting read, see the following link. It makes the point better than I can:
Negative numbers aren’t easy. Imagine you’re a European mathematician in the 1700s. You have 3 and 4, and know you can write 4 – 3 = 1. Simple.
But what about 3-4? What, exactly, does that mean? How can you take 4 cows from 3? How could you have less than nothing?
Negatives were considered absurd, something that “darkened the very whole doctrines of the equations” (Francis Maseres, 1759). Yet today, it’d be absurd to think negatives aren’t logical or useful. Try asking your teacher whether negatives corrupt the very foundations of math.
Imaginary numbers come up in the context of systems of uncertainty when we deal with quantum mechanics. The basic idea is that interactions operate over amplitudes (expressed as complex numbers), then to determine the likelihood of a final configuration, you look at norms of amplitudes. For a relatively straightforward explanation, see here: http://lesswrong.com/lw/pd/configurations_and_amplitude/
So I don’t necessarily have any well-formed thoughts on the matter (yet?), but it’s fun to think about other principled ways of representing uncertainty. I’m curious to know if there are types of interactions useful for machine learning that would be hard to represent with standard probability models but that would be aided by these types of quantum models.
Finally, I leave you with this blog comment from The Blog of Scott Aaronson:
“graphical models with amplitudes instead of probabilities” is a fair definition of a quantum circuit (and therefore a quantum computer).
That seems to me, worth understanding deeper.