But how and why does it work? Here is how I came up with a simple argument based on stochastic geometry.

Let us make few assumptions. We assume a flat and infinite Earth (no tin-foils). In addition we assume that our communication channel environment consists of many obstacles so that our transmitting antenna and receiving antenna cant see each other. We can assume that we are in a city full of houses, cars, trees, etc.. Then our data signal is prone to propagate to the receiver through multiple paths; for example through different streets around different houses. Aggregate signal in the receiver will be Rayleigh faded. City is full of mobile phones transmitting data with their base-stations, and in the other hand causing interference to each other. We make a natural assumption that the interfering transmitters are distributed according to the Poisson point process.

As derived in here we express the probability of a successful transmission (for example a message to a friend) as:

where denotes the mean transmitting power.

In MIMO we use multiple antennas to send the message. In each antenna the message should be coded in a way that it can't be mixed with the information that the other antennas are transmitting. This can be done by orthogonal modulation. Assuming that each message from each antenna will propagate to the receiver in an independent way, we can calculate by complementary probability that at least one message is received:

where is the number of antennas. Notice that we divided the transmitting power by number of antennas, so we don't increase the aggregate power at all. In the other hand, should we have no MIMO technology at hand, we could try to improve to increase the link quality just by increasing the power . In the following figure we compare these two cases.

It is evident that MIMO is a great solution for increasing the throughput of a wireless communication link. Using MIMO we can achieve better data-rates than by increasing the power of the transmitters.