Entropy is a measure of randomness. In other words, its a measure of unpredictability. Let’s take an example of a coin toss. Suppose we tossed a coin 4 times, and the output of the events came as {Head, Tail, Tail, Head}. Based solely on this observation, if you have to guess what will be the output of the coin toss, what would be your guess?

Umm..two heads and two tails. Fifty percent probability of having head and fifty percent probability of having a tail. You can not be sure. The output is a random event between head and tail.

But what if we have a biased coin, which when tossed four times, gives following output: {Tail, Tail, Tail, Head}. Here, if you have to guess the output of the coin toss, what would be your guess? Chances are you will go with Tail, and why? Because seventy-five percent chance is the output is tail based on the sample set that we have. In other words, the result is less random in case of the biased coin than what it was in case of the perfect coin.

We will take a moment here to give entropy in case of binary event(like the coin toss, where output can be either of the two events, head or tail) a mathematical face:

Entropy = -(probability(a) * log2(probability(a))) – (probability(b) * log2(probability(b)))

where probability(a) is probability of getting head and probability(b) is probability of getting tail.

Entropy or randomness is highest when chances of happening both the outcome are equal i.e. at p = 0.5. This gives us the following graph between entropy and probability.

Of course this formulae can be generalised for n discreet outcome as follow:

Entropy = -p(a1)*log2(p(a1)) -p(a2)*log2(p(a2))-p(a3)*log2(p(a3))………………………..p(aN)*log2(p(aN))

Where a1, a2……..aN are discreet events and p(a1), p(a2), p(a3)……p(aN) are respective probabilities.

Entropy is used in decision tree classifier in machine learning. Entropy is a interdisciplinary concept. It originated in thermodynamics and finds uses in evolutionary studies, information theory as well as quantum mechanics.