This word cloud, based on the content of this page, demonstrates my interest in information and networks.
Renaud Lambiotte, Martin Rosvall, and Ingo Scholtes
Nature Physics (2019)
⊕ Abstract »
Rich data are revealing that complex dependencies between the nodes of a network may not be captured by models based on pairwise interactions. Higher-order network models go beyond these limitations, offering new perspectives for understanding complex systems.
Ulf Aslak, Martin Rosvall, and Sune Lehmann
Phys. Rev. E 97, 062312
⊕ Abstract »
Many real-world networks are representations of dynamic systems with interactions that change over time, often in uncoordinated ways and at irregular intervals. For example, university students connect in intermittent groups that repeatedly form and dissolve based on multiple factors, including their lectures, interests, and friends. Such dynamic systems can be represented as multilayer networks where each layer represents a snapshot of the temporal network. In this representation, it is crucial that the links between layers accurately capture real dependencies between those layers. Often, however, these dependencies are unknown. Therefore, current methods connect layers based on simplistic assumptions that cannot capture node-level layer dependencies. For example, connecting every node to itself in other layers with the same weight can wipe out essential dependencies between intermittent groups, making it difficult or even impossible to identify them. In this paper, we present a principled approach to estimating node-level layer dependencies based on the network structure within each layer. We implement our node-level coupling method in the community detection framework Infomap and demonstrate its performance compared to current methods on synthetic and real temporal networks. We show that our approach more effectively constrains information inside multilayer communities so that Infomap can better recover planted groups in multilayer benchmark networks that represent multiple modes with different groups and better identify intermittent communities in real temporal contact networks. These results suggest that node-level layer coupling can improve the modeling of information spreading in temporal networks and better capture their dynamic community structure.
Rubén Bernardo-Madrid, Joaquín Calatayud, Manuela González-Suarez, Martin Rosvall, Pablo M. Lucas, Marta Rueda, Alexandre Antonelli, and Eloy Revilla
bioRxiv 287300
⊕ Abstract »
Human activity leading to both species introductions and extinctions is widely known to influence diversity patterns on local and regional scales. Yet, it is largely unknown whether the intensity of this activity is enough to affect the configuration of biodiversity at broader levels of spatial organization. Zoogeographical regions, or zooregions, are surfaces of the Earth defined by characteristic pools of species, which reflect ecological, historical, and evolutionary processes acting over millions of years. Consequently, it is widely assumed that zooregions are robust and unlikely to change on a human timescale. Here, however, we show that human-mediated introductions and extinctions can indeed reconfigure the currently recognized zooregions of amphibians, mammals, and birds. In particular, introductions homogenize the African and Eurasian zooregions in mammals; reshape boundaries with the reallocation of Oceania to the New World zooregion in amphibians; and divide bird zooregions by increasing biotic heterogeneity. Furthermore, the combined effect of amphibian introductions and extinctions has the potential to divide two zooregions largely representing the Old and the New World. Interestingly, the robustness of zooregions against changes in species composition may largely explain such zoogeographical changes. Altogether, our results demonstrate that human activities can erode the higher-level organization of biodiversity formed over millions of years. Comparable reconfigurations have previously been detectable in Earth's history only after glaciations and mass extinction events, highlighting the profound and far-reaching impact of ongoing human activity and the need to protect the uniqueness of biotic assemblages from the effects of future species introductions and extinctions.
Tiago P. Peixoto and Martin Rosvall
Nature Communications 8, 582 (2017)
⊕ Abstract »
In evolving complex systems such as air traffic and social organizations, collective effects emerge from their many components' dynamic interactions. While the dynamic interactions can be represented by temporal networks with nodes and links that change over time, they remain highly complex. It is therefore often necessary to use methods that extract the temporal networks' large-scale dynamic community structure. However, such methods are subject to overfitting or suffer from effects of arbitrary, a priori imposed timescales, which should instead be extracted from data. Here we simultaneously address both problems and develop a principled data-driven method that determines relevant timescales and identifies patterns of dynamics that take place on networks as well as shape the networks themselves. We base our method on an arbitrary-order Markov chain model with community structure, and develop a nonparametric Bayesian inference framework that identifies the simplest such model that can explain temporal interaction data.
Daniel Edler, Ludvig Bohlin, and Martin Rosvall
Algorithms 10, 112 (2017)
⊕ Abstract »
Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and mapping higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we show that various forms of higher-order network flows can be represented in a unified way with networks that distinguish physical nodes for representing a~complex system's objects from state nodes for describing flows between the objects. Moreover, these so-called sparse memory networks allow the information-theoretic community detection method known as the map equation to identify overlapping and nested flow modules in data from a range of~different higher-order interactions such as multistep, multi-source, and temporal data. We derive the map equation applied to sparse memory networks and describe its search algorithm Infomap, which can exploit the flexibility of sparse memory networks. Together they provide a general solution to reveal overlapping modular patterns in higher-order flows through complex systems.
Martin Rosvall, Jean-Charles Delvenne, Michael T. Schaub, and Renaud Lambiotte
⊕ Abstract »
Community detection, the decomposition of a graph into essential building blocks, has been a core research topic in network science over the past years. Since a precise notion of what constitutes a community has remained evasive, community detection algorithms have often been compared on benchmark graphs with a particular form of assortative community structure and classified based on the mathematical techniques they employ. However, this comparison can be misleading because apparent similarities in their mathematical machinery can disguise different goals and reasons for why we want to employ community detection in the first place. Here we provide a focused review of these different motivations that underpin community detection. This problem-driven classification is useful in applied network science, where it is important to select an appropriate algorithm for the given purpose. Moreover, highlighting the different facets of community detection also delineates the many lines of research and points out open directions and avenues for future research.
Michael T. Schaub, Jean-Charles Delvenne, Martin Rosvall, and Renaud Lambiotte
Appl. Netw. Sci. 2: 4 (2017)
⊕ Abstract »
Community detection, the decomposition of a graph into essential building blocks, has been a core research topic in network science over the past years. Since a precise notion of what constitutes a community has remained evasive, community detection algorithms have often been compared on benchmark graphs with a particular form of assortative community structure and classified based on the mathematical techniques they employ. However, this comparison can be misleading because apparent similarities in their mathematical machinery can disguise different goals and reasons for why we want to employ community detection in the first place. Here we provide a focused review of these different motivations that underpin community detection. This problem-driven classification is useful in applied network science, where it is important to select an appropriate algorithm for the given purpose. Moreover, highlighting the different facets of community detection also delineates the many lines of research and points out open directions and avenues for future research.
Daniel Edler, Thaís Guedes, Alexander Zizka, Martin Rosvall, and Alexandre Antonelli
Syst. Biol. 66 (2): 197-204 (2017)
arXiv:1512.00892 Infomap Bioregions
⊕ Abstract »
Biogeographical regions (bioregions) reveal how different sets of species are spatially grouped and therefore are important units for conservation, historical biogeography, ecology and evolution. Several methods have been developed to identify bioregions based on species distribution data rather than expert opinion. One approach successfully applies network theory to simplify and highlight the underlying structure in species distributions. However, this method lacks tools for simple and efficient analysis. Here we present Infomap Bioregions, an interactive web application that inputs species distribution data and generates bioregion maps. Species distributions may be provided as georeferenced point occurrences or range maps, and can be of local, regional or global scale. The application uses a novel adaptive resolution method to make best use of often incomplete species distribution data. The results can be downloaded as vector graphics, shapefiles or in table format. We validate the tool by processing large datasets of publicly available species distribution data of the world's amphibians using species ranges, and mammals using point occurrences. We then calculate the fit between the inferred bioregions and WWF ecoregions. As examples of applications, researchers can reconstruct ancestral ranges in historical biogeography or identify indicator species for targeted conservation.
Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, and Bill Howe
ACM Trans. Knowl. Discov. Data 11, 3, Article 32 (2017)
⊕ Abstract »
Community detection is an increasingly popular approach to uncover important structures in large networks. Flow-based community detection methods rely on communication patterns of the network rather than structural properties to determine communities. The Infomap algorithm in particular optimizes a novel objective function called the map equation and has been shown to outperform other approaches in third-party benchmarks. However, Infomap and its variants are inherently sequential, limiting their use for large-scale graphs. In this paper, we propose a novel algorithm to optimize the map equation called RelaxMap. RelaxMap provides two important improvements over Infomap: parallelization, so that the map equation can be optimized over much larger graphs, and prioritization, so that the most important work occurs first, iterations take less time, and the algorithm converges faster. We implement these techniques using OpenMP on shared-memory multicore systems, and evaluate our approach on a variety of graphs from standard graph clustering benchmarks as well as real graph datasets. Our evaluation shows that both techniques are effective: RelaxMap achieves 70% parallel efficiency on 8 cores, and prioritization improves algorithm performance by an additional 20%–50% in average, depending on the graph properties. Additionally, RelaxMap converges in the similar number of iterations and provides solutions of equivalent quality as the serial Infomap implementation.
Masoumeh Kheirkhah, Andrea Lancichinetti, and Martin Rosvall
Phys. Rev. E 93, 032309 (2016)
⊕ Abstract »
Community detection of network flows conventionally assumes one-step dynamics on the links. For sparse networks and interest in large-scale structures, longer timescales may be more appropriate. Oppositely, for large networks and interest in small-scale structures, shorter timescales may be better. However, current methods for analyzing networks at different timescales require expensive and often infeasible network reconstructions. To overcome this problem, we introduce a method that takes advantage of the inner-workings of the map equation and evades the reconstruction step. This makes it possible to efficiently analyze large networks at different Markov times with no extra overhead cost. The method also evades the costly unipartite projection for identifying flow modules in bipartite networks.
Ludvig Bohlin, Alcides Viamontes Esquivel, Andrea Lancichinetti, and Martin Rosvall
J. Assn. Inf. Sci. Tec. 67, 2527 (2016)
⊕ Abstract »
As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions with influence from journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network between a selection of journals and unavoidably depend on this selection. However, little is known about how robust rankings are to the selection of included journals. Here we compare the robustness of three journal rankings based on network flows induced on citation networks. They model pathways of researchers navigating scholarly literature, stepping between journals and remembering their previous steps to different degree: zero-step memory as impact factor, one-step memory as Eigenfactor, and two-step memory, corresponding to zero-, first-, and second-order Markov models of citation flow between journals. We conclude that a second-order Markov model is slightly more robust, because it combines the advantages of the lower-order models: perturbations that remain local and citation weights that depend on journal importance. However, the robustness gain comes at the cost of requiring more data, because the second-order Markov model requires citation data from twice as long a period.
Christian Persson, Ludvig Bohlin, Daniel Edler, and Martin Rosvall
⊕ Abstract »
To better understand the flows of ideas or information through social and biological systems, researchers develop maps that reveal important patterns in network flows. In practice, network flow models have implied memoryless first-order Markov chains, but recently researchers have introduced higher-order Markov chain models with memory to capture patterns in multi-step pathways. Higher-order models are particularly important for effectively revealing actual, overlapping community structure, but higher-order Markov chain models suffer from the curse of dimensionality: their vast parameter spaces require exponentially increasing data to avoid overfitting and therefore make mapping inefficient already for moderate-sized systems. To overcome this problem, we introduce an efficient cross-validated mapping approach based on network flows modeled by sparse Markov chains. To illustrate our approach, we present a map of citation flows in science with research fields that overlap in multidisciplinary journals. Compared with currently used categories in science of science studies, the research fields form better units of analysis because the map more effectively captures how ideas flow through science.
Fariba Karimi, Ludvig Bohlin, Ann Samoilenko, Martin Rosvall, and Andrea Lancichinetti
Palgrave Communications 1, 15041 (2015)
⊕ Abstract »
We live in a global village where electronic communication has eliminated the geographical barriers of information exchange. The road is now open to worldwide convergence of information interests, shared values and understanding. Nevertheless, interests still vary between countries around the world. This raises important questions about what today’s world map of information interests actually looks like and what factors cause the barriers of information exchange between countries. To quantitatively construct a world map of information interests, we devise a scalable statistical model that identifies countries with similar information interests and measures the countries’ bilateral similarities. From the similarities we connect countries in a global network and find that countries can be mapped into 18 clusters with similar information interests. Through regression we find that language and religion best explain the strength of the bilateral ties and formation of clusters. Our findings provide a quantitative basis for further studies to better understand the complex interplay between shared interests and conflict on a global scale. The methodology can also be extended to track changes over time and capture important trends in global information exchange.
E. R. Hotchkiss, R. O. Hall Jr, R. A. Sponseller, D. Butman, J. Klaminder, H. Laudon, M. Rosvall, and J. Karlsson
Nature Geoscience 8, 696-699 (2015)
⊕ Abstract »
Carbon dioxide (CO2) evasion from streams and rivers to the atmosphere represents a substantial flux in the global carbon cycle1, 2, 3. The proportions of CO2 emitted from streams and rivers that come from terrestrially derived CO2 or from CO2 produced within freshwater ecosystems through aquatic metabolism are not well quantified. Here we estimated CO2 emissions from running waters in the contiguous United States, based on freshwater chemical and physical characteristics and modelled gas transfer velocities at 1463 United States Geological Survey monitoring sites. We then assessed CO2 production from aquatic metabolism, compiled from previously published measurements of net ecosystem production from 187 streams and rivers across the contiguous United States. We find that CO2 produced by aquatic metabolism contributes about 28% of CO2 evasion from streams and rivers with flows between 0.0001 and 19,000 m3 s−1. We mathematically modelled CO2 flux from groundwater into running waters along a stream–river continuum to evaluate the relationship between stream size and CO2 source. Terrestrially derived CO2 dominates emissions from small streams, and the percentage of CO2 emissions from aquatic metabolism increases with stream size. We suggest that the relative role of rivers as conduits for terrestrial CO2 efflux and as reactors mineralizing terrestrial organic carbon is a function of their size and connectivity with landscapes.
Manlio De Domenico, Andrea Lancichinetti, Alex Arenas, and Martin Rosvall
Physical Review X 5, 011027 (2015)
⊕ Abstract »
To comprehend interconnected systems across the social and natural sciences, researchers have developed many powerful methods to identify functional modules. For example, with interaction data aggregated into a single network layer, flow-based methods have proven useful for identifying modular dynamics in weighted and directed networks that capture constraints on flow processes. However, many interconnected systems consist of agents or components that exhibit multiple layers of interactions, possibly from several different processes. Inevitably, representing this intricate network of networks as a single aggregated network leads to information loss and may obscure the actual organization. Here, we propose a method based on a compression of network flows that can identify modular flows both within and across layers in nonaggregated multilayer networks. Our numerical experiments on synthetic multilayer networks, with some layers originating from the same interaction process, show that the analysis fails in aggregated networks or when treating the layers separately, whereas the multilayer method can accurately identify modules across layers that originate from the same interaction process. We capitalize on our findings and reveal the community structure of two multilayer collaboration networks with topics as layers: scientists affiliated with the Pierre Auger Observatory and scientists publishing works on networks on the arXiv. Compared to conventional aggregated methods, the multilayer method uncovers connected topics and reveals smaller modules with more overlap that better capture the actual organization.
Tatsuro Kawamoto and Martin Rosvall
Physical Review E 91, 012809 (2015)
⊕ Abstract »
A community detection algorithm is considered to have a resolution limit if the scale of the smallest modules that can be resolved depends on the size of the analyzed subnetwork. The resolution limit is known to prevent some community detection algorithms from accurately identifying the modular structure of a network. In fact, any global objective function for measuring the quality of a two-level assignment of nodes into modules must have some sort of resolution limit or an external resolution parameter. However, it is yet unknown how the resolution limit affects the so-called map equation, which is known to be an efficient objective function for community detection. We derive an analytical estimate and conclude that the resolution limit of the map equation is set by the total number of links between modules instead of the total number of links in the full network as for modularity. This mechanism makes the resolution limit much less restrictive for the map equation than for modularity; in practice, it is orders of magnitudes smaller. Furthermore, we argue that the effect of the resolution limit often results from shoehorning multilevel modular structures into two-level descriptions. As we show, the hierarchical map equation effectively eliminates the resolution limit for networks with nested multilevel modular structures.
Martin Rosvall, Alcides V. Esquivel, Andrea Lancichinetti, Jevin D. West, and Renaud Lambiotte
Nature Communications 5, 4630 (2014)
⊕ Abstract »
Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is usconventional community detection, ranking and spreading analysis, although it ignores a potentially important feature of the dynamics: where flow moves to may don where it comes from. Here we analyse pathways from different systems, and although we only observe marginal consequences for disease spreading, we showignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking and information spreading. For example, captdynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in sciencommunication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for horder memory in network flows can help us better understand how real systems are organized and function.
Morten L. Bech, Carl T. Bergstrom, Martin Rosvall, and Rodney J. Garratt
Physica A 424, 44-51 (2014)
NY Fed 507
⊕ Abstract »
We use an information-theoretic approach to describe changes in lending relationships between financial institutions around the time of the Lehman Brothers failure. Unlike previous work that conducts maximum likelihood estimation on undirected networks our analysis distinguishes between borrowers and lenders and looks for broader lending relationships (multi-bank lending cycles) that extend beyond the immediate counter-parties. We detect significant changes in lending patterns following implementation of the Interest on Required and Excess Reserves policy by the Federal Reserve in October 2008. Analysis of micro-scale rates of change in the data suggests these changes were triggered by the collapse of Lehman Brothers a few weeks before.
Ludvig Bohlin and Martin Rosvall
PLoS ONE 9(7): e103006 (2014)
⊕ Abstract »
Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between an investor's portfolio structure and her trading behavior in practice. In this paper, we investigate the relation between what stocks investors hold, and what stocks they buy, and show that investors with similar portfolio structures to a great extent trade in a similar way. With data from the central register of shareholdings in Sweden, we model the market in a similarity network, by considering investors as nodes, connected with links representing portfolio similarity. From the network, we find investor groups that not only identify different investment strategies, but also represent individual investors trading in a similar way. These findings suggest that the stock portfolios of investors hold meaningful information, which could be used to earn a better understanding of stock market dynamics.
Daril A Vilhena, Jacob G Foster, Martin Rosvall, Jevin D West, James Evans, Carl T Bergstrom
Sociological Science 1, 221 (2014)
⊕ Abstract »
Divergent interests, expertise, and language form cultural barriers to communication. No formalism has been available to characterize these “cultural holes.” Here we use information theory to measure cultural holes and demonstrate our formalism in the context of scientific communication using papers from JSTOR. We extract scientific fields from the structure of citation flows and infer field-specific cultures by cataloging phrase frequencies in full text and measuring the relative efficiency of between-field communication. We then combine citation and cultural information in a novel topographic map of science, mapping citations to geographic distance and cultural holes to topography. By analyzing the full citation network, we find that communicative efficiency decays with citation distance in a field-specific way. These decay rates reveal hidden patterns of cohesion and fragmentation. For example, the ecological sciences are balkanized by jargon, whereas the social sciences are relatively integrated. Our results highlight the importance of enriching structural analyses with cultural data.
Renaud Lambiotte, Vsevolod Salnikov, and Martin Rosvall
Journal of Complex Networks (2014)
⊕ Abstract »
Pathways of diffusion observed in real-world systems often require stochastic processes going beyond first-order Markov models, as implicitly assumed in network theory. In this work, we focus on second-order Markov models, and derive an analytical expression for the effect of memory on the spectral gap and thus, equivalently, on the characteristic time needed for the stochastic process to asymptotically reach equilibrium. Perturbation analysis shows that standard first-order Markov models can either overestimate or underestimate the diffusion rate of flows across the modular structure of a system captured by a second-order Markov network. We test the theoretical predictions on a toy example and on numerical data, and discuss their implications for network theory, in particular in the case of temporal or multiplex networks.
Atieh Mirshahvalad, Alcides Viamontes, Ludvig Lizana, and Martin Rosvall
Phys. Rev. E 89, 012809 (2014)
⊕ Abstract »
To better understand the inner workings of information spreading, network researchers often use simple models to capture the spreading dynamics. But most models only highlight the effect of local interactions on the global spreading of a single information wave, and ignore the effects of interactions between multiple waves. Here we take into account the effect of multiple interacting waves by using an agent-based model in which the interaction between information waves is based on their novelty. We analyzed the global effects of such interactions and found that information that actually reaches nodes reaches them faster. This effect is caused by selection between information waves: slow waves die out and only fast waves survive. As a result, and in contrast to models with non-interacting information dynamics, the access to information decays with the distance from the source. Moreover, when we analyzed the model on various synthetic and real spatial road networks, we found that the decay rate also depends on the path redundancy and the effective dimension of the system. In general, the decay of the information wave frequency as a function of distance from the source follows a power law distribution with an exponent between -0.2 for a two-dimensional system with high path redundancy and -0.5 for a tree-like system with no path redundancy. We found that the real spatial networks provide an infrastructure for information spreading that lies in between these two extremes. Finally, to better understand the mechanics behind the scaling results, we provide analytic calculations of the scaling for a one-dimensional system.
Ludvig Bohlin, Daniel Edler, Andrea Lancichinetti, and Martin Rosvall
Book chapter in Measuring Scholarly Impact: Methods and Practice
Available as a tutorial to
⊕ Abstract »
Large networks contain plentiful information about the organization of a system. The challenge is to extract useful information buried in the structure of myriad nodes and links. Therefore, powerful tools for simplifying and highlighting important structures in networks are essential for comprehending their organization. Such tools are called community detection methods and they are designed to identify strongly intra-connected modules that often correspond to important functional units. Here we describe one such method, known as the map equation, and its accompanying algorithms for finding, evaluating, and visualizing the modular organization of networks. The map equation framework is very flexible and can with its search algorithm Infomap, for example, identify two-level, multi-level and overlapping organization in weighted, directed, and multiplex networks. Because the map equation framework operates on the flow induced by the links of a network, it naturally captures flow of ideas and citation flow, and is therefore well-suited for analysis of bibliometric networks.
Atieh Mirshahvalad, Olivier H. Beauchesne, Éric Archambault, and Martin Rosvall
PLoS ONE 8(1): e53943 (2013)
⊕ Abstract »
Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984–2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change.
Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, and Bill Howe
2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW), 303 (2013)
⊕ Abstract »
Community-detection is a powerful approach to uncover important structures in large networks. Since networks often describe flow of some entity, flow-based community-detection methods are particularly interesting. One such algorithm is called Info map, which optimizes the objective function known as the map equation. While Info map is known to be an effective algorithm, its serial implementation cannot take advantage of multicore processing in modern computers. In this paper, we propose a novel parallel generalization of Info map called Relax Map. This algorithm relaxes concurrency assumptions to avoid lock overhead, achieving 70% parallel efficiency in shared-memory multicore experiments while exhibiting similar convergence properties and finding similar community structures as the serial algorithm. We evaluate our approach on a variety of real graph datasets as well as synthetic graphs produced by a popular graph generator used for benchmarking community detection algorithms. We describe the algorithm, the experiments, and some emerging research directions in high-performance community detection on massive graphs.
Johan Lindholm, Mattias Derlén, Martin Rosvall, and Atieh Mirshahvalad
Europarättslig Tidskrift 3, 517 (2013)
⊕ Abstract »
Recent research has demonstrated the ability of network analysis to better understand law. In this study we apply network analysis to the case law of the European Court of Justice (CJEU) in order understand its role as a source of law. In doing so, we apply network analysis tools not previously used in legal scholarship, most significantly (i) a modified version of the PageRank algorithm, (ii) the Map Equation, and (iii) resampling to infer “missing” links. In the article we demonstrate that this method can help us to understand not only the CJEU’s case law but law generally.
Renaud Lambiotte and Martin Rosvall
Phys. Rev. E 85, 056107 (2012)
⊕ Abstract »
Random teleportation is a necessary evil for ranking and clustering directed networks based on random walks. Teleportation enables ergodic solutions, but the solutions must necessarily depend on the exact implementation and parametrization of the teleportation. For example, in the commonly used PageRank algorithm, the teleportation rate must trade off a heavily biased solution with a uniform solution. Here we show that teleportation to links rather than nodes enables a much smoother trade-off and effectively more robust results. We also show that, by not recording the teleportation steps of the random walker, we can further reduce the effect of teleportation with dramatic effects on clustering.
Atieh Mirshahvalad, Johan Lindholm, Mattias Derlén, and Martin Rosvall
PLoS ONE 7(3): e33721 (2012)
⊕ Abstract »
Researchers use community-detection algorithms to reveal large-scale organization in biological and social networks, but community detection is useful only if the communities are significant and not a result of noisy data. To assess the statistical significance of the network communities, or the robustness of the detected structure, one approach is to perturb the network structure by removing links and measure how much the communities change. However, perturbing sparse networks is challenging because they are inherently sensitive; they shatter easily if links are removed. Here we propose a simple method to perturb sparse networks and assess the significance of their communities. We generate resampled networks by adding extra links based on local information, then we aggregate the information from multiple resampled networks to find a coarse-grained description of significant clusters. In addition to testing our method on benchmark networks, we use our method on the sparse network of the European Court of Justice (ECJ) case law, to detect significant and insignificant areas of law. We use our significance analysis to draw a map of the ECJ case law network that reveals the relations between the areas of law.
Robin Haring, Martin Rosvall, Uwe Völker, Henry Völzke, Heyo Kroemer, Matthias Nauck, and Henri Wallaschofski
PLoS ONE 7(6): e39461 (2012)
⊕ Abstract »
The additional clinical value of clustering cardiovascular risk factors to define the metabolic syndrome (MetS) is still under debate. However, it is unclear which cardiovascular risk factors tend to cluster predominately and how individual risk factor states change over time. We used data from 3,187 individuals aged 20–79 years from the population-based Study of Health in Pomerania for a network-based approach to visualize clustered MetS risk factor states and their change over a five-year follow-up period. MetS was defined by harmonized Adult Treatment Panel III criteria, and each individual's risk factor burden was classified according to the five MetS components at baseline and follow-up. We used the map generator to depict 32 (25) different states and highlight the most important transitions between the 1,024 (322) possible states in the weighted directed network. At baseline, we found the largest fraction (19.3%) of all individuals free of any MetS risk factors and identified hypertension (15.4%) and central obesity (6.3%), as well as their combination (19.0%), as the most common MetS risk factors. Analyzing risk factor flow over the five-year follow-up, we found that most individuals remained in their risk factor state and that low high-density lipoprotein cholesterol (HDL) (6.3%) was the most prominent additional risk factor beyond hypertension and central obesity. Also among individuals without any MetS risk factor at baseline, low HDL (3.5%), hypertension (2.1%), and central obesity (1.6%) were the first risk factors to manifest during follow-up. We identified hypertension and central obesity as the predominant MetS risk factor cluster and low HDL concentrations as the most prominent new onset risk factor.
Alcides Viamontes Esquivel and Martin Rosvall
⊕ Abstract »
In network science, researchers often use mutual information to understand the difference between network partitions produced by community detection methods. Here we extend the use of mutual information to covers, that is, the cases where a node can belong to more than one module. In our proposed solution, the underlying stochastic process used to compare partitions is extended to deal with covers, and the random variables of the new process are simply fed into the usual definition of mutual information. With partitions, our extended process behaves exactly as the conventional approach for partitions, and thus, the mutual information values obtained are the same. We also describe how to perform sampling and do error estimation for our extended process, as both are necessary steps for a practical application of this measure. The stochastic process that we define here is not only applicable to networks, but can also be used to compare more general set-to-set binary relations.
Alcides Viamontes Esquivel and Martin Rosvall
Phys. Rev. X 1, 021025 (2011)
arXiv:1105.0812 Source code
⊕ Abstract »
To better understand the organization of overlapping modules in large networks with respect to flow, we introduce the map equation for overlapping modules. In this information-theoretic framework, we use the correspondence between compression and regularity detection. The generalized map equation measures how well we can compress a description of flow in the network when we partition it into modules with possible overlaps. When we minimize the generalized map equation over overlapping network partitions, we detect modules that capture flow and determine which nodes at the boundaries between modules should be classified in multiple modules and to what degree. With a novel greedy-search algorithm, we find that some networks, for example, the neural network of the nematode Caenorhabditis elegans, are best described by modules dominated by hard boundaries, but that others, for example, the sparse European-roads network, have an organization of highly overlapping modules.
Atieh Mirshahvalad and Martin Rosvall
Phys. Rev. E 84, 036102 (2011)
⊕ Abstract »
In social systems, people communicate with each other and form groups based on their interests. The pattern of interactions, the network, and the ideas that flow on the network naturally evolve together. Researchers use simple models to capture the feedback between changing network patterns and ideas on the network, but little is understood about the role of past events in the feedback process. Here we introduce a simple agent-based model to study the coupling between peoples' ideas and social networks, and better understand the role of history in dynamic social networks. We measure how information about ideas can be recovered from information about network structure and, the other way around, how information about network structure can be recovered from information about ideas. We find that it is in general easier to recover ideas from the network structure than vice versa.
Martin Rosvall and Carl T. Bergstrom
PLoS ONE 6(4): e18209 (2011)
arXiv:1010.0431 Source code
⊕ Abstract »
To comprehend the hierarchical organization of large integrated systems, we introduce the hierarchical map equation that reveals multilevel structures in networks. In this information-theoretic approach, we exploit the duality between compression and pattern detection; by compressing a description of a random walker as a proxy for real flow on a network, we find regularities in the network that induce this system-wide flow. Finding the shortest multilevel description of the random walker therefore gives us the best hierarchical clustering of the network — the optimal number of levels and modular partition at each level — with respect to the dynamics on the network. With a novel search algorithm, we extract and illustrate the rich multilevel organization of several large social and biological networks. For example, from the global air traffic network we uncover countries and continents, and from the pattern of scientific communication we reveal more than 100 scientific fields organized in four major disciplines: life sciences, physical sciences, ecology and earth sciences, and social sciences. In general, we find shallow hierarchical structures in globally interconnected systems, such as neural networks, and rich multilevel organizations in systems with highly separated regions, such as road networks
Carl T. Bergstrom and Martin Rosvall
Biology and Philosophy 26, 159-176 (2011)
arXiv:0810.4168 Response to commentaries on The Transmission Sense of Information
⊕ Abstract »
Biologists rely heavily on the language of information, coding, and transmission that is commonplace in the field of information theory developed by Claude Shannon, but there is open debate about whether such language is anything more than facile metaphor. Philosophers of biology have argued that when biologists talk about information in genes and in evolution, they are not talking about the sort of information that Shannon's theory addresses. First, philosophers have suggested that Shannon theory is only useful for developing a shallow notion of correlation, the so-called "causal sense" of information. Second, they typically argue that in genetics and evolutionary biology, information language is used in a "semantic sense," whereas semantics are deliberately omitted from Shannon theory. Neither critique is well-founded. Here we propose an alternative to the causal and semantic senses of information: a transmission sense of information, in which an object X conveys information if the function of X is to reduce, by virtue of its sequence properties, uncertainty on the part of an agent who observes X. The transmission sense not only captures much of what biologists intend when they talk about information in genes, but also brings Shannon's theory back to the fore. By taking the viewpoint of a communications engineer and focusing on the decision problem of how information is to be packaged for transport, this approach resolves several problems that have plagued the information concept in biology, and highlights a number of important features of the way that information is encoded, stored, and transmitted as genetic sequence.
Ludvig Lizana, Martin Rosvall, and Kim Sneppen
Phys. Rev. Lett. 104, 040603 (2010)
arXiv:0910.4045 Java simulation
⊕ Abstract »
The distribution of information is essential for living system's ability to coordinate and adapt. Random walkers are often used to model this distribution process and, in doing so, one effectively assumes that information maintains its relevance over time. But the value of information in social and biological systems often decay and must continuously be updated. To capture the spatial dynamics of ageing information, we introduce time walkers. A time walker moves like a random walker, but interacts with traces left by other walkers, some representing older information, some newer. The traces forms a navigable information landscape. We quantify the dynamical properties of time walkers moving on a two-dimensional lattice and the quality of the information landscape generated by their movements. We visualise the self-similar landscape as a river network, and show that searching in this landscape is superior to random searching and scales as the length of loop-erased random walks.
Martin Rosvall, Daniel Axelsson, and Carl T. Bergstrom
Eur. Phys. J. Special Topics 178, 13 (2009)
arXiv:0906.1405 Map generator
⊕ Abstract »
Many real-world networks are so large that we must simplify their structure before we can extract useful information about the systems they represent. As the tools for doing these simplifications proliferate within the network literature, researchers would benefit from some guidelines about which of the so-called community detection algorithms are most appropriate for the structures they are studying and the questions they are asking. Here we show that different methods highlight different aspects of a network's structure and that the the sort of information that we seek to extract about the system must guide us in our decision. For example, many community detection algorithms, including the popular modularity maximization approach, infer module assignments from an underlying model of the network formation process. However, we are not always as interested in how a system's network structure was formed, as we are in how a network's extant structure influences the system's behavior. To see how structure influences current behavior, we will recognize that links in a network induce movement across the network and result in system-wide interdependence. In doing so, we explicitly acknowledge that most networks carry flow. To highlight and simplify the network structure with respect to this flow, we use the map equation. We present an intuitive derivation of this flow-based and information-theoretic method and provide an interactive on-line application that anyone can use to explore the mechanics of the map equation. The differences between the map equation and the modularity maximization approach are not merely conceptual. Because the map equation attends to patterns of flow on the network and the modularity maximization approach does not, the two methods can yield dramatically different results for some network structures. To illustrate this and build our understanding of each method, we partition several sample networks. We also describe an algorithm and provide source code to efficiently decompose large weighted and directed networks based on the map equation.
Martin Rosvall and Carl T. Bergstrom
PLoS ONE 5(1): e8694 (2010)
arXiv:0812.1242 Source code Alluvial generator
⊕ Abstract »
Change is the very nature of interaction patterns in biology, technology, economics, and science itself: The interactions within and between organisms change; the air, ground, and sea traffic change; the global financial flow changes; the scientific research front changes. With increasingly available data, networks and clustering tools have become important methods used to comprehend instances of these large-scale structures. But blind to the difference between noise and trends in the data, these tools alone must fail when used to study change. Only if we can assign significance to the partition of single networks can we distinguish structural changes from fluctuations and assess how much confidence we should have in the changes. Here we show that bootstrap resampling accompanied by significance clustering provides a solution to this problem. We use the significance clustering to realize de Solla Price's vision of mapping the change in science.
Martin Rosvall and Kim Sneppen
Phys. Rev. E 79, 026111 (2009)
arXiv:0809.4803 Java simulation
⊕ Abstract »
To investigate the role of information flow in group formation, we introduce a model of communication and social navigation. We let agents gather information in an idealized network society, and demonstrate that heterogeneous groups can evolve without presuming that individuals have different interests. In our scenario, individuals' access to global information is constrained by local communication with the nearest neighbors on a dynamic network. The result is reinforced interests among like-minded agents in modular networks; the flow of information works as a glue that keeps individuals together. The model explains group formation in terms of limited information access and highlights global broadcasting of information as a way to counterbalance this fragmentation. To illustrate how the information constraints imposed by the communication structure affect future development of real-world systems, we extrapolate dynamics from the topology of four social networks.
Martin Rosvall and Carl T. Bergstrom
PNAS 105, 1118 (2008)
arXiv:0707.0609 Source code Map generator Interactive map
⊕ Abstract »
To comprehend the multipartite organization of large-scale biological and social systems, we introduce a new information-theoretic approach to reveal community structure in weighted and directed networks. The method decomposes a network into modules by optimally compressing a description of information flows on the network. The result is a map that both simplifies and highlights the regularities in the structure and their relationships to each other. We illustrate the method by making a map of scientific communication as captured in the citation patterns of more than 6000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network — which includes physics, chemistry, molecular biology, and medicine — information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.
Martin Rosvall and Carl T. Bergstrom
PNAS 104, 7327 (2007)
physics/0612035 Source code
⊕ Abstract »
To understand the structure of a large-scale biological, social, or technological network, it can be helpful to decompose the network into smaller subunits or modules. In this article, we develop an information-theoretic foundation for the concept of modularity in networks. We identify the modules of which the network is composed by finding an optimal compression of its topology, capitalizing on regularities in its structure. We explain the advantages of this approach and illustrate them by partitioning a number of real-world and model networks.
Martin Rosvall and Kim Sneppen
arXiv:0708.0368 Java simulation
⊕ Abstract »
Social groups with widely different music tastes, political convictions, and religious beliefs emerge and disappear on scales from extreme subcultures to mainstream mass-cultures. Both the underlying social structure and the formation of opinions are dynamic, and changes in one affect the other. Several positive feedback mechanisms have been proposed to drive the diversity in social and economic systems, but little effort has been devoted to pinpointing the interplay between a dynamically changing social network and the spread and gathering of information on the network. Here we analyze this phenomenon in terms of a social network model that explicitly simulates the feedback between information assembly and the emergence of social structures: changing beliefs are coupled to changing relationships because agents self-organize a dynamic network to facilitate their hunter-gatherer behavior in information space. Our analysis demonstrates that tribal organizations and modular social networks can emerge as a result of contact-seeking agents that reinforce their beliefs among like-minded. We also find that prestigious persons can streamline the social network into hierarchical structures around themselves.
Martin Rosvall, Ian B. Dodd, Sandeep Krishna, and Kim Sneppen
Phys. Rev. E 74, 066105 (2006)
q-bio.PE/0609031 Java simulation
⊕ Abstract »
Bacteria and their bacteriophages are the most abundant, widespread and diverse groups of biological entities on the planet. In an attempt to understand how the interactions between bacteria, virulent phages and temperate phages might affect the diversity of these groups, we developed a novel stochastic network model for examining the co-evolution of these ecologies. In our approach, nodes represent whole species or strains of bacteria or phages, rather than individuals, with "speciation" and extinction modelled by duplication and removal of nodes. Phage-bacteria links represent host-parasite relationships and temperate-virulent phage links denote prophage-encoded resistance. The effect of horizontal transfer of genetic information between strains was also included in the dynamic rules. The observed networks evolved in a highly dynamic fashion but the ecosystems were prone to collapse (one or more entire groups going extinct) Diversity could be stably maintained in the model only if the probability of speciation was independent of the diversity. Such an effect could be achieved in real ecosystems if the speciation rate is primarily set by the availability of ecological niches.
Martin Rosvall and Kim Sneppen
International Journal of Bifurcation and Chaos 17, 2509 (2007)
⊕ Abstract »
In this paper we quantify our limited information horizon, by measuring the information necessary to locate specific nodes in a network. To investigate different ways to overcome this horizon, and the interplay between communication and topology in social networks, we let agents communicate in a model society. In this way, they build a perception of the network that they can use to create strategic links to improve their standing in the network. We observe a narrow distribution of links when communication is low and a network with a broad distribution of links when communication is high.
Martin Rosvall and Kim Sneppen
Europhys. Lett. 74, 1109 (2006)
physics/0603218 Java simulation
⊕ Abstract »
We model self-assembly of information in networks to investigate necessary conditions for building a global perception of a system by local communication. Our approach is to let agents chat in a model system to self-organize distant communication pathways. We demonstrate that simple local rules allow agents to build a perception of the system that is robust to dynamical changes and mistakes. We find that messages are most effectively forwarded in the presence of hubs, while transmission in hub-free networks is more robust against misinformation and failures.
Martin Rosvall and Kim Sneppen
Phys. Rev. E 74, 016108 (2006)
physics/0512105 Java simulation
⊕ Abstract »
This paper introduces a model of self-organization between communication and topology in social networks, with feedback between different communication habits and the topology. To study this feedback, we let agents communicate to build a perception of a network and use this information to create strategic links. We observe a narrow distribution of links when the communication is low and a system with a broad distribution of links when the communication is high. We also analyze the outcome of chatting, cheating, and lying as strategies to get better access to information in the network. Chatting, although only adopted by a few agents, results in a global gain in the system. Contrary, a global loss is inevitable in a system with too many liars.
Jacob Bock Axelsen, Sebastian Bernhardsson, Martin Rosvall, Kim Sneppen, and Ala Trusina
Phys. Rev. E 74, 036119 (2006)
physics/0512075 Java simulation
⊕ Abstract »
We generalize the degree-organizational view of real-world networks with broad degree-distributions in a landscape analogue with mountains (high-degree nodes) and valleys (low-degree nodes) For example, correlated degrees between adjacent nodes correspond to smooth landscapes (social networks), hierarchical networks to one-mountain landscapes (the Internet), and degree-disassortative networks without hierarchical features to rough landscapes with several mountains. We also generate ridge landscapes to model networks organized under constraints imposed by the space the networks are embedded in, associated to spatial or, in molecular networks, to functional localization. To quantify the topology, we here measure the widths of the mountains and the separation between different mountains.
Martin Rosvall, Andreas Grönlund, Petter Minnhagen, and Kim Sneppen
Phys. Rev. E 72, 046117 (2005)
⊕ Abstract »
We investigate the searchability of complex systems in terms of their interconnectedness. Associating searchability with the number and size of branch points along the paths between the nodes, we find that scale-free networks are relatively difficult to search, and thus that the abundance of scale-free networks in nature and society may reflect an attempt to protect local areas in a highly interconnected network from nonrelated communication. In fact, starting from a random node, real-world networks with higher order organization like modular or hierarchical structure are even more difficult to navigate than random scale-free networks. The searchability at the node level opens the possibility for a generalized hierarchy measure that captures both the hierarchy in the usual terms of trees, as in military structures, and the intrinsic hierarchical nature of topological hierarchies for scale-free networks, as in the Internet.
Ala Trusina, Martin Rosvall, and Kim Sneppen
Phys. Rev. Lett. 94, 238701 (2005)
⊕ Abstract »
We investigate and quantify the interplay between topology and the ability to send specific signals in complex networks. We find that in a majority of investigated real-world networks, the ability to communicate is favored by the network topology at short distances, but disfavored at longer distances. We further discuss how the ability to locate specific nodes can be improved if information associated with the overall traffic in the network is available.
Martin Rosvall, Petter Minnhagen, and Kim Sneppen
Phys. Rev. E 71, 066111 (2005)
cond-mat/0412051 Java simulation
⊕ Abstract »
We study navigation with limited information in networks and demonstrate that many real-world networks have a structure that can be described as favoring communication at short distance at the cost of constraining communication at long distance. This feature, which is robust and more evident with limited than with complete information, reflects both topological and possibly functional design characteristics. For example, the characteristics of the networks studied derived from a city and from the Internet are manifested through modular network designs. We also observe that directed navigation in typical networks requires remarkably little information on the level of individual nodes. By studying navigation, or specific signaling, we take a complementary approach to the common studies of information transfer devoted to the broadcasting of information in studies of virus spreading and the like.
Kim Sneppen, Ala Trusina, and Martin Rosvall
Europhys. Lett. 69, 853 (2005)
⊕ Abstract »
Signaling pathways and networks determine the ability to communicate in systems ranging from living cells to human society. We investigate how the network structure constrains communication in social, man-made and biological networks. We find that human networks of governance and collaboration are predictable on tête-è-tête level, reflecting well-defined pathways, but globally inefficient. In contrast, the Internet tends to have better overall communication abilities and more alternative pathways, and is therefore more robust. Between these extremes, the molecular network of Saccharomyces cerevisea is more similar to the simpler social systems, whereas the pattern of interactions in the more complex Drosophilia melanogaster resembles the robust Internet.
Kim Sneppen, Ala Trusina, and Martin Rosvall
Pramana J. Phys. 64, 1121 (2005)
⊕ Abstract »
Traffic and communication between different parts of a complex system are fundamental elements in maintaining its overall cooperativity. Because a complex system consists of many different parts, it matters where signals are transmitted. Thus signaling and traffic are in principle specific, with each message going from a unique sender to a specific recipient. In the current paper we review some measures of network topology that are related to its ability to direct specific communication.
Martin Rosvall, Ala Trusina, Petter Minnhagen, and Kim Sneppen
Phys. Rev. Lett. 94, 028701 (2005)
⊕ Abstract »
Traffic is constrained by the information involved in locating the receiver and the physical distance between sender and receiver. We here focus on the former, and investigate traffic in the perspective of information handling. We re-plot the road map of cities in terms of the information needed to locate specific addresses and create information city networks with roads mapped to nodes and intersections to links between nodes. These networks have the broad degree distribution found in many other complex networks. Mapping to an information city network makes it possible to quantify the information associated with locating specific addresses.
Petter Minnhagen, Martin Rosvall, Kim Sneppen, and Ala Trusina
Physica A 340, 725 (2004)
cond-mat/0406752 Java simulation
⊕ Abstract »
We discuss merging-and-creation as a self-organizing process for scale-free topologies in networks. Three power-law classes characterized by the power-law exponents 3/2, 2 and 5/2 are identified and the process is generalized to networks. In the network context, the merging can be viewed as a consequence of optimization related to more efficient signaling.
Kim Sneppen, Martin Rosvall, Ala Trusina, and Petter Minnhagen
Europhys. Lett. 67, 349 (2004)
nlin.AO/0403005 Java simulation
⊕ Abstract »
We suggest a minimalistic model for directed networks and suggest an application to the injection and merging of magnetic field lines. We obtain a network of connected donor and acceptor vertices with degree distribution 1/s^2, and with dynamic reconnection events of size \Delta s occurring with a frequency that scales as 1/\Delta s^3. This suggests that the model is in the same universality class as the model for self-organization in the solar atmosphere suggested by Hughes et al.
Martin Rosvall, and Kim Sneppen
Phys. Rev. Lett. 91, 178701 (2003)
cond-mat/0308399 Java simulation
⊕ Abstract »
We propose an information-based model for network dynamics in which imperfect information leads to networks where the different vertices have a widely different number of edges to other vertices, and where the topology has hierarchical features. The possibility to observe scale-free networks is linked to a minimally connected system where hubs remain dynamic.
Martin Rosvall
Associate professor 
+46 70 239 1973

Department of Physics
Umeå University
SE-901 87 Umeå
Are you looking for a PhD or postdoc position? Does the content of this page excite you? Please drop me an and let me know.
Short introduction to complex networks available in html format.
igroup Complete PhD thesis available in pdf format.
Carl Bergstrom