flashapplet
Update (March 6, 2013): Completely rewritten source code for multilevel community detection with Infomap is available at mapequation.org.

This page contains code related to my papers. They are written in C++ for the GNU Compiler Collection. If you are using Windows, I can recommend MinGW, Minimalist GNU for Windows, and their HOWTO Install the MinGW (GCC) Compiler Suite. Below is a description of how I installed MinGW and MSYS to run the code on a Windows machine. Don't hesitate to if you have questions or comments about the code.
Martin Rosvall and Carl T. Bergstrom,
PLoS ONE 6(4): e18209 (2011).
[arXiv:1010.0431]

infohiermap [Click to enlarge] Minimizing the hierarchical map equation over all network partitions gives an optimal clustering of the network with respect to the dynamics on the network.

Update (March 6, 2013): Completely rewritten source code for multilevel community detection with Infomap is available at mapequation.org.
Code for (weighted) undirected networks (April 15, 2011): infohiermap_undir.tgz.
Code for directed weighted networks (August 31, 2012): infohiermap_dir.tgz.

Extract the gzipped tar archive, run 'make' to compile and, for example, './infohiearmap 345234 ninetriangles.net 10' to run the code.
tar xzvf infohiermap_undir.tgz
cd infohiermap_undir
make
./infohiermap 345234 ninetriangles.net 10
Here infohiermap is the name of the executable, 345234 is a random seed (can be any positive integer value), ninetriangles.net is the network to partition (in Pajek format), and 10 is the number of attempts to partition the network (can be any integer value equal to or larger than 1).

The output file has the extension .tree (plain text file) and corresponds to the best partition (shortest description length) of the attempts. The output format has the pattern:
# Codelength = 3.48419 bits.
1:1:1 0.0384615 "7"
1:1:2 0.0384615 "8"
1:1:3 0.0384615 "9"
1:2:1 0.0384615 "4"
1:2:2 0.0384615 "5"
...
Each row (except the first one, which summarizes the result) begins with the multilevel module assignments of a node. The module assignments are colon separated from coarse to fine level, and all modules within each level are sorted by the total PageRank of the nodes they contain. Further, the integer after the last comma is the rank within the finest-level module, the decimal number is the steady state population of random walkers, and finally, within quotation marks, is the node name.

See the supplementary information of the paper for more information about the algorithm.

Don't hesitate to if you have questions or comments about the code.
Martin Rosvall and Carl T. Bergstrom,
PLoS ONE 5(1): e8694 (2010).
[arXiv:0812.1242] [Alluvial generator]

mapchange [Click to enlarge] Significance clustering of networks. The standard approach to cluster networks is to minimize an objective function over possible partitions of the network, as in the left side of the diagram. By repeated resampling of the weighted links from the original network, we create a "bootstrap world" of resampled networks. By clustering these as well, and comparing to the clustering of the original network, we can estimate the degree of support that the data provide in assigning each node to a cluster. In the bottom network, the darker nodes are clustered together in at least 95 percent of the 1000 bootstrap networks.

Code for weighted undirected networks (May 31, 2011): conf-infomap_undir.tgz.
Code for weighted directed networks (May 31, 2011): conf-infomap_dir.tgz.

The source code is combines a stochastic and recursive algorithm to minimize the map length, resampling of links with normally distributed link weights, a simulated annealing scheme to identify the largest significant set of nodes in each module, and a simple cooccurance analysis to identify non-standalone modules. Extract the gzipped tar archive, run 'make' to compile and, for example, './conf-infomap 344 flow.net 10 100 0.90' to run the code.
tar xzvf conf-infomap_dir.tgz
cd conf-infomap_dir
make
./conf-infomap 344 flow.net 10 100 0.90
Here conf-infomap is the name of executable, 344 is a random seed (can be any positive integer value), flow.net is the network to partition (in Pajek format), 10 is the number of attempts to partition the network (can be any integer value equal or larger than 1) and each of the 100 bootstrap networks. Finally, 0.90 is the confidence level for the significance analysis.

The output file with extension .smap is a plain text file that contains a significance analysis of the best partition. The output format has the pattern:
# modules: 4
# modulelinks: 4
# nodes: 16
# links: 20
# codelength: 3.32773
*Directed
*Modules 4
1 "Node 13,..." 0.25 0.0395432
2 "Node 5,..." 0.25 0.0395432
3 "Node 9,..." 0.25 0.0395432
4 "Node 1,..." 0.25 0.0395432
*Insignificants 2
4>2
3>2
*Nodes 16
1:1 "Node 13" 0.0820133
1;2 "Node 14" 0.0790863
1:3 "Node 16" 0.0459137
1:4 "Node 15" 0.0429867
2:1 "Node 5" 0.0820133
2;2 "Node 6" 0.0790863
2;3 "Node 8" 0.0459137
2;4 "Node 7" 0.0429867
3:1 "Node 9" 0.0820133
3:2 "Node 10" 0.0790863
3;3 "Node 12" 0.0459137
3;4 "Node 11" 0.0429867
4;1 "Node 1" 0.0820133
4;2 "Node 2" 0.0790863
4:3 "Node 4" 0.0459137
4:4 "Node 3" 0.0429867
*Links 4
1 4 0.0395432
2 3 0.0395432
3 1 0.0395432
4 2 0.0395432
This file contains the necessary information to generate a significance map in the alluvial generator on www.mapequation.org. The names under *Modules are derived from the node with the highest flow volume within the module and 0.25 0.0395432 represent, respectively, the aggregated flow volume of all nodes within the module and the per step exit flow from the module. The nodes are listed with their module assignments together with their flow volumes. Finally, all links between the modules are listed in order from high flow to low flow

This file also contains information about which modules that are not significantly standalone and which modules they most often are clustered together with. The notation 4>2 under *Insignificants 2 in the example file above means that the significant nodes in module 4 more often than the confidence level are clustered together with the significant nodes in module 2. In the module assignments, we use colons to denote significantly clustered nodes and semicolons to denote insignificantly clustered nodes. For example, the colon in 1:1 "Node 13" 0.0820133 means that the node belongs to the largest, measured by flow, set of nodes that are clustered together in at least a fraction of bootstrap networks that is given by the confidence level.

Don't hesitate to if you have questions or comments about the code.
Martin Rosvall and Carl T. Bergstrom,
PNAS 105, 1118 (2008).
[arXiv:0707.0609] [Dynamic visualization] [Map generator] [Interactive map]

infomap [Click to enlarge] Detecting communities by compressing the description of information flows on networks. (A) We want to describe the trajectory of a random walk on the network, such that important structures have unique names. (B) A basic approach is to give a unique name to every node in the network. The Huffman code illustrated here is an efficient way to do so. (C) A two-level description of the random walk, in which major clusters receive unique names but the names of nodes within clusters are reused, yields on average a 32% shorter description for this network. Module codes and exit codes are shown left and right of the arrows under the network. (D) Reporting only the module names, and not the locations within the modules, provides an optimal coarse-graining of the network.

Here is a dynamic visualization of the inference-compression duality and the mechanics of the map equation.

Thanks to Emmanuel Navarro, infomap will be part of release 0.6 of iGraph.

Update (March 6, 2013): Completely rewritten source code for multilevel community detection with Infomap is available at mapequation.org.
Code for (weighted) undirected networks (April 15, 2011): infomap_undir.tgz.
Code for directed weighted networks (April 15, 2011): infomap_dir.tgz.

The source code is based on a better stochastic and recursive algorithm to minimize the map length. Compared to the first code published here, this code gives noticeably better results for large networks. The directories also contain example networks. Extract the gzipped tar archive, run 'make' to compile and, for example, './infomap 345234 flow.net 10' to run the code.
tar xzvf infomap_dir.tgz
cd infomap_dir
make
./infomap 345234 flow.net 10
Here infomap is the name of the executable, 345234 is a random seed (can be any positive integer value), flow.net is the network to partition (in Pajek format), and 10 is the number of attempts to partition the network (can be any integer value equal to or larger than 1).

The output file has the extension .tree (plain text file) and corresponds to the best partition (shortest description length) of the attempts. The output format has the pattern:
# Code length 3.32773 in 4 modules.
1:1 0.0820133 "Node 5"
1:2 0.0790863 "Node 6"
1:3 0.0459137 "Node 8"
1:4 0.0429867 "Node 7"
2:1 0.0820133 "Node 9"
2:2 0.0790863 "Node 10"
...
For each row (except the first one, which summarizes the result), the first number is the module assignment, the second number is the rank within the module, the decimal number is the steady state population of random walkers, and finally, within quotation marks, is the node name.

For visualizing the result, the file with extension .map can be loaded in the Map generator on www.mapequation.org.

Results are also written to three files in Pajek format. The partition file with extension .clu should be used together with the original network (import both files and use the command Draw->Draw-Partition in Pajek). The file with extension _map.net is a network with all nodes and links aggregated according to the modular map. Finally, the vector file with extension _map.vec gives the size of the modules and should be used together with the file _map.net (import both files and use the command Draw->Draw-Vector in Pajek). In Pajek, the .net file should be imported as "Networks", the .clu file should be imported as "Partitions", and the .vec file should be imported as "Vectors."

By uncommenting an obvious line in infomap.cc, the directed code can also handle self-links. For more information about the algorithm, I can recommend the Supplementary Appendix of Mapping change in large networks and this short layout of the algorithm.

Don't hesitate to if you have questions or comments about the code.
Martin Rosvall and Carl T. Bergstrom,
PNAS 104, 7327 (2007).
[physics/0612035]

infomaod [Click to enlarge]Basic framework for detecting communities as a communication process. A signaler knows the full network structure and wants to send as much information as possible about the network to a receiver over a channel with limited capacity. The signaler therefore encodes the network into modules in a way that maximizes the amount of information about the original network. This figure illustrates an encoder that compresses the network into 3 modules i=circle, square, star, with ni nodes and lii links connected by lij links between the modules. The receiver can then decode the message and construct a set of possible candidates for the original network. The smaller the set of candidates, the more information the signaler has managed to transfer.

New code (June 10, 2009) for unweighted undirected networks: infomod.tgz.

Extract the gzipped tar archive, run 'make' to compile and './infomod 345234 MultiphysChemBioEco40_unweighted_undir.net 10' to run the code.
tar xzvf infomod.tgz
cd infomod
make
./infomod 345234 MultiphysChemBioEco40_unweighted_undir.net 10
Here infomap is the name of the executable, 345234 is a random seed (can be any positive integer value), MultiphysChemBioEco40W_unweighted_undir.net is the network to partition (in Pajek format), and 10 is the number of attempts to partition the network (can be any integer value equal to or larger than 1).

Two networks are included for reference: MultiphysChemBioEco40_unweighted_undir.net and karate.net. The first is a set of 40 journals connected by undirected links, and the second is the famous karate network by Zachary.

The source code is based on the stochastic and recursive search algorithm developed for the map equation above, but here updated to instead optimize this cluster-based compression. Compared to the first code published here, this code gives noticeably better results for large networks.

The results are written to two files with extensions .clu and .mod (plain text files). The .clu file stores each node's cluster number (which can be imported to Pajek as a partition), with reversed organization in the .mod file (enumeration of all nodes in each module).

infomod.cc is the main file that reads from and writes to files. Greedy.cc contains the algorithm and Node.cc defines the nodes and modules.

Don't hesitate to if you have questions or comments about the code.
MinGW for Windows is a port of the GNU Compiler Collection (GCC) for compiling source code developed in Linux. MinGW comes with all the necessary tools to run the code, but it is convenient to install also MSYS, which provides a Linux-like command line environment. Following the HOWTO Install the MinGW (GCC) Compiler Suite, here is what I did to install MinGW and MSYS on a Windows machine to be able to use the commands above to extract, compile, and run the source code:
  1. First I downloaded MinGW-5.1.6.exe (latest automated MinGW installer) from here, installed it to c:/MinGW, and upon request selected the components MinGW base tool, g++ compiler, and MinGW Make.
  2. Then I downloaded MSYS-1.0.11.exe (latest MSYS base system) from the same place, installed it to c:/msys and gave the search path c:/mingw when the post installer asked for the path to MinGW. The command line environment can be started from the Start Menu in Windows.
  3. With MinGW and MSYS installed, I used the file manager to navigate to the home folder of the command line environment (Computer->Local Disk C:->msys->home->User Name). Here I added the folder "code" to which I downloaded the source code.
  4. From within the command line environment, which works like any shell/terminal in Linux or Mac, I accessed the same folder by typing 'cd code' at the prompt of the default home folder. By typing 'ls' to list the contents of the directory I could see the downloaded gzipped tar archive. From here on it is straight forward to follow the code specific instructions above.
Don't hesitate to if you have questions or comments about this procedure.
Contact
Martin Rosvall
Assistant Professor 
+46 70 239 1973


Department of Physics
Umeå University
SE-901 87 Umeå
Sweden
Misc.
Are you looking for a PhD or postdoc position? Does the content of this page excite you? Please drop me an and let me know.
Short introduction to complex networks available in html format.
igroup Complete PhD thesis available in pdf format.
Links
CMOL
Eigenfactor.org
Carl Bergstrom
Infobaleen