In the Flash applet below
, we explore the duality between finding community structure in networks and minimizing the description length of a random walker's movements on a network. For a given network partition M
, the map equation
specifies the theoretical limit L(M)
of how concisely we can describe the trajectory of this random walk (PNAS 105, 1118 (2008)
The underlying code structure of the map equation is designed such that the description can be compressed if the network has regions in which the random walker tends to stay for a long time. Therefore, with a random walk as a proxy for real flow, minimizing the map equation over all possible network partitions reveals important aspects of network structure with respect to the dynamics on the network.
To take advantage of the regional structure of the network, one index codebook and m
module codebooks, one for each module in the network, are used to describe the random walker's movements. The module codebooks have codewords for nodes within each module (and exit codes to leave the module), which are derived from the node visit/exit frequencies of the random walker. The index codebook has codewords for the modules, which are derived from the module switch rates of the random walker. Therefore, the average length of the code describing a step of the random walker is the average length of codewords from the index codebook and the module codebooks weighted by their rates of use (mouseover the map equation for more information):
A detailed description of the map equation is available here
and C++ source code to partition large networks with millions of weighted directed links is available here
In the Flash applet below
, release a random walker and see how, with multiple Huffman codebooks
, we can represent its trajectory on the network with a compressed binary message. Change the assignment of nodes into modules by changing their node colors and see how the description length changes. The code structure corresponding to the current partition is shown by the stacked boxes on the right. With multiple module codebooks, each of which re-uses a similar set of codewords, we must specify which module codebook should be used. This is implemented by using the index codebook shown to the left and exit codes in each module (marked by arrows). The coding procedure is as follows. The index codebook specifies a module codebook, and the module codebook specifies a succession of nodes within that module. When the random walker leaves the module, and we need to return to the index codebook, we use the exit code. The codeword lengths in the index codebook are derived from the relative rates at which a random walker enters each module. The codeword lengths for each module codebook are derived from the relative rates at which a random walker visits each node in the module or exits the module. Mouseover objects in the visualization for more information.