Measures of network diversity

These functions offer ways to measure the heterogeneity of an attribute across a network, within groups of a network, or the distribution of ties across this attribute:

net_richness() measures the number of unique categories in a network attribute.
node_richness() measures the number of unique categories of an attribute to which each node is connected.
net_diversity() measures the heterogeneity of ties across a network.
node_diversity() measures the heterogeneity of each node's local neighbourhood.
net_heterophily() measures how embedded nodes in the network are within groups of nodes with the same attribute.
node_heterophily() measures each node's embeddedness within groups of nodes with the same attribute.
net_assortativity() measures the degree assortativity in a network.
net_spatial() measures the spatial association/autocorrelation (global Moran's I) in a network.

net_richness(.data, attribute)

node_richness(.data, attribute)

net_diversity(
  .data,
  attribute,
  method = c("blau", "teachman", "variation", "gini")
)

node_diversity(
  .data,
  attribute,
  method = c("blau", "teachman", "variation", "gini")
)

net_heterophily(.data, attribute)

node_heterophily(.data, attribute)

net_homophily(.data, attribute, method = c("ie", "ei", "yule", "geary"))

node_homophily(.data, attribute, method = c("ie", "ei", "yule", "geary"))

net_assortativity(.data)

net_spatial(.data, attribute)

Arguments

.data

An object of a manynet-consistent class:

matrix (adjacency or incidence) from {base} R
edgelist, a data frame from {base} R or tibble from {tibble}
igraph, from the {igraph} package
network, from the {network} package
tbl_graph, from the {tidygraph} package

attribute

Name of a nodal attribute or membership vector to use as categories for the diversity measure.

method

Which method to use for net_diversity(). Either "blau" (Blau's index) or "teachman" (Teachman's index) for categorical attributes, or "variation" (coefficient of variation) or "gini" (Gini coefficient) for numeric attributes. Default is "blau". If an incompatible method is chosen for the attribute type, a suitable alternative will be used instead with a message.

Richness

Richness is a simple count of the number of different categories present for a given attribute.

Diversity

Blau's index (1977) uses a formula known also in other disciplines by other names (Gini-Simpson Index, Gini impurity, Gini's diversity index, Gibbs-Martin index, and probability of interspecific encounter (PIE)): $$1 - \sum\limits_{i = 1}^k {p_i^2 }$$ where $p_i$ is the proportion of group members in $i$th category and $k$ is the number of categories for an attribute of interest. This index can be interpreted as the probability that two members randomly selected from a group would be from different categories. This index finds its minimum value (0) when there is no variety, i.e. when all individuals are classified in the same category. The maximum value depends on the number of categories and whether nodes can be evenly distributed across categories.

Teachman's index (1980) is based on information theory and is calculated as: $$- \sum\limits_{i = 1}^k {p_i \log(p_i)}$$ where $p_i$ is the proportion of group members in $i$th category and $k$ is the number of categories for an attribute of interest. This index finds its minimum value (0) when there is no variety, i.e. when all individuals are classified in the same category. The maximum value depends on the number of categories and whether nodes can be evenly distributed across categories. It thus shares similar properties to Blau's index, but includes also a notion of richness that tends to give more weight to rare categories and thus tends to highlight imbalances more.

The coefficient of variation (CV) is a standardised measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation $\sigma$ to the mean $\mu$: $$CV = \frac{\sigma}{\mu}$$ It is often expressed as a percentage. The CV is useful because the standard deviation of data must always be understood in the context of the mean of the data. The CV is particularly useful when comparing the degree of variation from one data series to another, even if the means are drastically different from each other.

The Gini coefficient is a measure of statistical dispersion that is intended to represent the income or wealth distribution of a nation's residents, and is commonly used as a measure of inequality. It is defined as a ratio with values between 0 and 1, where 0 corresponds with perfect equality (everyone has the same income) and 1 corresponds with perfect inequality (one person has all the income, and everyone else has zero income). The Gini coefficient can be calculated from the Lorenz curve, which plots the proportion of the total income of the population that is cumulatively earned by the bottom x% of the population. The Gini coefficient is defined as the area between the line of equality and the Lorenz curve, divided by the total area under the line of equality.

Homophily

Given a partition of a network into a number of mutually exclusive groups then The E-I index is the number of ties between (or external) nodes grouped in some mutually exclusive categories minus the number of ties within (or internal) these groups divided by the total number of ties. This value can range from 1 to -1, where 1 indicates ties only between categories/groups and -1 ties only within categories/groups.

References

On richness

Magurran, Anne E. 1988. Ecological Diversity and Its Measurement. Princeton: Princeton University Press. doi:10.1007/978-94-015-7358-0

On diversity

Blau, Peter M. 1977. Inequality and heterogeneity. New York: Free Press.

Teachman, Jay D. 1980. Analysis of population diversity: Measures of qualitative variation. Sociological Methods & Research, 8:341-362. doi:10.1177/004912418000800305

Page, Scott E. 2010. Diversity and Complexity. Princeton: Princeton University Press. doi:10.1515/9781400835140

On heterophily

Krackhardt, David, and Robert N. Stern. 1988. Informal networks and organizational crises: an experimental simulation. Social Psychology Quarterly 51(2): 123-140. doi:10.2307/2786835

McPherson, Miller, Lynn Smith-Lovin, and James M. Cook. 2001. "Birds of a Feather: Homophily in Social Networks". Annual Review of Sociology, 27(1): 415-444. doi:10.1146/annurev.soc.27.1.415

On assortativity

Newman, Mark E.J. 2002. "Assortative mixing in networks". Physical Review Letters, 89(20): 208701. doi:10.1103/physrevlett.89.208701

On spatial autocorrelation

Moran, Patrick Alfred Pierce. 1950. "Notes on continuous stochastic phenomena". Biometrika 37(1): 17-23. doi:10.2307/2332142

Examples

net_richness(ison_networkers)
#> [1] 3
node_richness(ison_networkers, "Discipline")
#> ▂▃▁▅ 
#>   `Lin Freeman` `Doug White` `Ev Rogers` `Richard Alba` `Phipps Arabie`
#> 1             4            4           2              4               4
#> # ... and 27 more values from this nodeset. Use `print_all(...)` to print all values.
marvel_friends <- to_unsigned(ison_marvel_relationships, "positive")
net_diversity(marvel_friends, "Gender")
#> [1] 0.306
net_diversity(marvel_friends, "Appearances")
#> [1] 0.802
node_diversity(marvel_friends, "Gender")
#> ▃▁▁▃▂ 
#>   Abomination `Ant-Man` Apocalypse Beast `Black Panther` `Black Widow` Blade
#> 1           0      0.48          0 0.363            0.34         0.337     0
#> # ... and 46 more values from this nodeset. Use `print_all(...)` to print all values.
node_diversity(marvel_friends, "Attractive")
#> ▂▄▂▂▁▁▁▁▁ 
#>   Abomination `Ant-Man` Apocalypse Beast `Black Panther` `Black Widow` Blade
#> 1          NA     0.559        NaN 0.332           0.316         0.288     0
#> # ... and 46 more values from this nodeset. Use `print_all(...)` to print all values.
net_heterophily(marvel_friends, "Gender")
#> [1] -0.285
net_heterophily(marvel_friends, "Attractive")
#> [1] -0.632
node_heterophily(marvel_friends, "Gender")
#> ▅▂▁▂ 
#>   Abomination `Ant-Man` Apocalypse Beast `Black Panther` `Black Widow` Blade
#> 1         NaN       0.5         -1  -0.5          -0.545         0.692    -1
#> # ... and 46 more values from this nodeset. Use `print_all(...)` to print all values.
node_heterophily(marvel_friends, "Attractive")
#> ▆▂▁▁ 
#>   Abomination `Ant-Man` Apocalypse Beast `Black Panther` `Black Widow` Blade
#> 1         NaN      -0.5         -1  -0.8          -0.818        -0.846    -1
#> # ... and 46 more values from this nodeset. Use `print_all(...)` to print all values.
net_homophily(marvel_friends, "Gender")
#> [1] 0.285
net_assortativity(ison_networkers)
#> [1] -0.41
net_spatial(ison_lawfirm, "age")
#> [1] 0.126