Community partitioning algorithms

These functions offer different algorithms useful for partitioning networks into sets of communities:

node_optimal() is a problem-solving algorithm that seeks to maximise modularity over all possible partitions.
node_kernaghinlin() is a greedy, iterative, deterministic partitioning algorithm that results in two equally-sized communities.
node_edge_betweenness() is a hierarchical, decomposition algorithm where edges are removed in decreasing order of the number of shortest paths passing through the edge.
node_fast_greedy() is a hierarchical, agglomerative algorithm, that tries to optimize modularity in a greedy manner.
node_leading_eigen() is a top-down, hierarchical algorithm.
node_walktrap() is a hierarchical, agglomerative algorithm based on random walks.
node_infomap() is a hierarchical algorithm based on the information in random walks.
node_spinglass() is a greedy, iterative, probabilistic algorithm, based on analogy to model from statistical physics.
node_fluid() is a propogation-based partitioning algorithm, based on analogy to model from fluid dynamics.
node_louvain() is an agglomerative multilevel algorithm that seeks to maximise modularity over all possible partitions.
node_leiden() is an agglomerative multilevel algorithm that seeks to maximise the Constant Potts Model over all possible partitions.

The different algorithms offer various advantages in terms of computation time, availability on different types of networks, ability to maximise modularity, and their logic or domain of inspiration.

node_optimal(.data)

node_kernighanlin(.data)

node_edge_betweenness(.data)

node_fast_greedy(.data)

node_leading_eigen(.data)

node_walktrap(.data, times = 50)

node_infomap(.data, times = 50)

node_spinglass(.data, max_k = 200, resolution = 1)

node_fluid(.data)

node_louvain(.data, resolution = 1)

node_leiden(.data, resolution = 1)

Arguments

.data

An object of a {manynet}-consistent class:

matrix (adjacency or incidence) from {base} R
edgelist, a data frame from {base} R or tibble from {tibble}
igraph, from the {igraph} package
network, from the {network} package
tbl_graph, from the {tidygraph} package

times

Integer indicating number of simulations/walks used. By default, times=50.

max_k

Integer constant, the number of spins to use as an upper limit of communities to be found. Some sets can be empty at the end.

resolution

The Reichardt-Bornholdt “gamma” resolution parameter for modularity. By default 1, making existing and non-existing ties equally important. Smaller values make existing ties more important, and larger values make missing ties more important.

Optimal

The general idea is to calculate the modularity of all possible partitions, and choose the community structure that maximises this modularity measure. Note that this is an NP-complete problem with exponential time complexity. The guidance in the igraph package is networks of <50-200 nodes is probably fine.

Edge-betweenness

This is motivated by the idea that edges connecting different groups are more likely to lie on multiple shortest paths when they are the only option to go from one group to another. This method yields good results but is very slow because of the computational complexity of edge-betweenness calculations and the betweenness scores have to be re-calculated after every edge removal. Networks of ~700 nodes and ~3500 ties are around the upper size limit that are feasible with this approach.

Fast-greedy

Initially, each node is assigned a separate community. Communities are then merged iteratively such that each merge yields the largest increase in the current value of modularity, until no further increases to the modularity are possible. The method is fast and recommended as a first approximation because it has no parameters to tune. However, it is known to suffer from a resolution limit.

Leading eigenvector

In each step, the network is bifurcated such that modularity increases most. The splits are determined according to the leading eigenvector of the modularity matrix. A stopping condition prevents tightly connected groups from being split further. Note that due to the eigenvector calculations involved, this algorithm will perform poorly on degenerate networks, but will likely obtain a higher modularity than fast-greedy (at some cost of speed).

Walktrap

The general idea is that random walks on a network are more likely to stay within the same community because few edges lead outside a community. By repeating random walks of 4 steps many times, information about the hierarchical merging of communities is collected.

Infomap

Motivated by information theoretic principles, this algorithm tries to build a grouping that provides the shortest description length for a random walk, where the description length is measured by the expected number of bits per node required to encode the path.

Spin-glass

This is motivated by analogy to the Potts model in statistical physics. Each node can be in one of k "spin states", and ties (particle interactions) provide information about which pairs of nodes want similar or different spin states. The final community definitions are represented by the nodes' spin states after a number of updates. A different implementation than the default is used in the case of signed networks, such that nodes connected by negative ties will be more likely found in separate communities.

Fluid

The general idea is to observe how a discrete number of fluids interact, expand and contract, in a non-homogenous environment, i.e. the network structure. Unlike the {igraph} implementation that this function wraps, this function iterates over all possible numbers of communities and returns the membership associated with the highest modularity.

Louvain

The general idea is to take a hierarchical approach to optimising the modularity criterion. Nodes begin in their own communities and are re-assigned in a local, greedy way: each node is moved to the community where it achieves the highest contribution to modularity. When no further modularity-increasing reassignments are possible, the resulting communities are considered nodes (like a reduced graph), and the process continues.

Leiden

The general idea is to optimise the Constant Potts Model, which does not suffer from the resolution limit, instead of modularity. As outlined in the {igraph} package, the Constant Potts Model object function is:

$$\frac{1}{2m} \sum_{ij}(A_{ij}-\gamma n_i n_j)\delta(\sigma_i, \sigma_j)$$

where m is the total tie weight, $A_{ij}$ is the tie weight between i and j, $\gamma$ is the so-called resolution parameter, $n_i$ is the node weight of node i, and $\delta(\sigma_i, \sigma_j) = 1$ if and only if i and j are in the same communities and 0 otherwise.

References

Brandes, Ulrik, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer, Zoran Nikoloski, Dorothea Wagner. 2008. "On Modularity Clustering", IEEE Transactions on Knowledge and Data Engineering 20(2):172-188.

Kernighan, Brian W., and Shen Lin. 1970. "An efficient heuristic procedure for partitioning graphs." The Bell System Technical Journal 49(2): 291-307. doi:10.1002/j.1538-7305.1970.tb01770.x

Newman, M, and M Girvan. 2004. "Finding and evaluating community structure in networks." Physical Review E 69: 026113.

Clauset, A, MEJ Newman, MEJ and C Moore. "Finding community structure in very large networks."

Newman, MEJ. 2006. "Finding community structure using the eigenvectors of matrices" Physical Review E 74:036104.

Pons, Pascal, and Matthieu Latapy "Computing communities in large networks using random walks".

Rosvall, M, and C. T. Bergstrom. 2008. "Maps of information flow reveal community structure in complex networks", PNAS 105:1118. doi:10.1073/pnas.0706851105

Rosvall, M., D. Axelsson, and C. T. Bergstrom. 2009. "The map equation", Eur. Phys. J. Special Topics 178: 13. doi:10.1140/epjst/e2010-01179-1

Reichardt, Jorg, and Stefan Bornholdt. 2006. "Statistical Mechanics of Community Detection" Physical Review E, 74(1): 016110–14. doi:10.1073/pnas.0605965104

Traag, VA, and Jeroen Bruggeman. 2008. "Community detection in networks with positive and negative links".

Parés F, Gasulla DG, et. al. 2018. "Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm". In: Complex Networks & Their Applications VI Springer, 689: 229. doi:10.1007/978-3-319-72150-7_19

Blondel, Vincent, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre. 2008. "Fast unfolding of communities in large networks", J. Stat. Mech. P10008.

Traag, V. A., L Waltman, and NJ van Eck. 2019. "From Louvain to Leiden: guaranteeing well-connected communities", Scientific Reports, 9(1):5233. doi:10.1038/s41598-019-41695-z

Examples

node_optimal(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     1     1     2     2     2     3     3     3
node_kernighanlin(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     1     2     2     1     1     1     2     2
node_kernighanlin(ison_southern_women)
#>   Evelyn Laura Theresa Brenda Charlotte Frances Eleanor Pearl  Ruth Verne  Myra
#> 1      2     2       2      2         2       2       2     2     2     2     2
#> # ... with 7 more values from this nodeset unprinted. Use `print(..., n = Inf)` to print all values.
#>      E1    E2    E3    E4    E5    E6    E7    E8    E9   E10   E11   E12   E13
#> 1     1     1     1     1     1     1     1     1     1     1     1     1     1
#> # ... with 1 more values from this nodeset unprinted. Use `print(..., n = Inf)` to print all values.
node_edge_betweenness(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     1     1     1     1     1     1     2     2
node_fast_greedy(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     3     3     2     2     2     1     1     1
node_leading_eigen(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     1     1     3     3     3     2     2     2
node_walktrap(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     2     2     2     2     2     3     1     1
node_infomap(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     1     1     1     1     1     1     1     1
node_spinglass(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     2     2     3     3     3     1     1     1
node_fluid(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     1     2     2     3     3     2     1     1
node_louvain(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     1     1     2     2     2     3     3     3
node_leiden(ison_adolescents)
#>   Betty   Sue Alice  Jane  Dale   Pam Carol  Tina
#> 1     1     2     3     4     5     6     7     8