The map equation — see also mapequation.org

In the Flash applet below, we explore the duality between finding community structure in networks and minimizing the description length of a random walker's movements on a network. For a given network partition M, the map equation specifies the theoretical limit L(M) of how concisely we can describe the trajectory of this random walk (PNAS 105, 1118 (2008)). The underlying code structure of the map equation is designed such that the description can be compressed if the network has regions in which the random walker tends to stay for a long time. Therefore, with a random walk as a proxy for real flow, minimizing the map equation over all possible network partitions reveals important aspects of network structure with respect to the dynamics on the network.

To take advantage of the regional structure of the network, one index codebook and m module codebooks, one for each module in the network, are used to describe the random walker's movements. The module codebooks have codewords for nodes within each module (and exit codes to leave the module), which are derived from the node visit/exit frequencies of the random walker. The index codebook has codewords for the modules, which are derived from the module switch rates of the random walker. Therefore, the average length of the code describing a step of the random walker is the average length of codewords from the index codebook and the module codebooks weighted by their rates of use (mouseover the map equation for more information):

The description length for module partition M. For module partition M of n nodes into m modules, the lower bound of the average length of the code describing a step of the random walker.

The rate at which the index codebook is used. The per-step use rate of the index codebook is given by the total probability that the random walker exits any of the m modules. The total height of the blocks under "Index codebook" in the visualization below corresponds to this rate.

The frequency-weighted average length of codewords in the index codebook. The entropy of the relative rates to use the module codebooks measures the smallest average codeword length that is theoretically possible. The heights of individual blocks under "Index codebook" in the visualization below correspond to the relative rates and the codeword lengths approximately correspond to the negative logarithm of the rates in base 2.

The rate at which the module codebooks are used. The per-step use rate of the module codebooks is given by the total use rate of the m module codebooks. This corresponds to the total height of the blocks under "Module codebooks" in the visualization below. For module i, this is given by the fraction of time the random walker spends in module i, which is given by the total probability that any node in the module is visited, plus the probability that it exits the module and the exit message is used.

The frequency-weighted average length of codewords in module codebook i. The entropy of the relative rates at which the random walker exits module i and visits each node in module i measures the smallest average codeword length that is theoretically possible. The heights of individual blocks under "Module codebooks" in the visualization below correspond to the relative rates and the codeword lengths approximately correspond to the negative logarithm of the rates in base 2.
A detailed description of the map equation is available here and C++ source code to partition large networks with millions of weighted directed links is available here.

Visualization

In the Flash applet below, release a random walker and see how, with multiple Huffman codebooks, we can represent its trajectory on the network with a compressed binary message. Change the assignment of nodes into modules by changing their node colors and see how the description length changes. The code structure corresponding to the current partition is shown by the stacked boxes on the right. With multiple module codebooks, each of which re-uses a similar set of codewords, we must specify which module codebook should be used. This is implemented by using the index codebook shown to the left and exit codes in each module (marked by arrows). The coding procedure is as follows. The index codebook specifies a module codebook, and the module codebook specifies a succession of nodes within that module. When the random walker leaves the module, and we need to return to the index codebook, we use the exit code. The codeword lengths in the index codebook are derived from the relative rates at which a random walker enters each module. The codeword lengths for each module codebook are derived from the relative rates at which a random walker visits each node in the module or exits the module. Mouseover objects in the visualization for more information. Visualization built by Daniel Edler with flare. Send questions to .