Measures of network diversity

These functions offer ways to measure the heterogeneity of an attribute across a network, within groups of a network, or the distribution of ties across this attribute:

net_by_richness() measures the number of unique categories in a network attribute.
net_by_diversity() measures the heterogeneity of ties across a network.

net_by_richness(.data, attribute)

net_by_diversity(
  .data,
  attribute,
  diversity = c("blau", "teachman", "variation", "gini")
)

Arguments

.data: A network object of class mnet, igraph, tbl_graph, network, or similar. For more information on the standard coercion possible, see manynet::as_tidygraph().
attribute: Name of a nodal attribute, mark, measure, or membership vector.
diversity: Which method to use for *_diversity(). Either "blau" (Blau's index) or "teachman" (Teachman's index) for categorical attributes, or "variation" (coefficient of variation) or "gini" (Gini coefficient) for numeric attributes. Default is "blau". If an incompatible method is chosen for the attribute type, a suitable alternative will be used instead with a message.

Value

A network_measure numeric score.

Richness

Richness is a simple count of the number of different categories present for a given attribute.

Diversity

Blau's index (1977) uses a formula known also in other disciplines by other names (Gini-Simpson Index, Gini impurity, Gini's diversity index, Gibbs-Martin index, and probability of interspecific encounter (PIE)): $$1 - \sum\limits_{i = 1}^k {p_i^2 }$$ where $p_i$ is the proportion of group members in $i$th category and $k$ is the number of categories for an attribute of interest. This index can be interpreted as the probability that two members randomly selected from a group would be from different categories. This index finds its minimum value (0) when there is no variety, i.e. when all individuals are classified in the same category. The maximum value depends on the number of categories and whether nodes can be evenly distributed across categories.

Teachman's index (1980) is based on information theory and is calculated as: $$- \sum\limits_{i = 1}^k {p_i \log(p_i)}$$ where $p_i$ is the proportion of group members in $i$th category and $k$ is the number of categories for an attribute of interest. This index finds its minimum value (0) when there is no variety, i.e. when all individuals are classified in the same category. The maximum value depends on the number of categories and whether nodes can be evenly distributed across categories. It thus shares similar properties to Blau's index, but includes also a notion of richness that tends to give more weight to rare categories and thus tends to highlight imbalances more.

The coefficient of variation (CV) is a standardised measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation $\sigma$ to the mean $\mu$: $$CV = \frac{\sigma}{\mu}$$ It is often expressed as a percentage. The CV is useful because the standard deviation of data must always be understood in the context of the mean of the data. The CV is particularly useful when comparing the degree of variation from one data series to another, even if the means are drastically different from each other.

The Gini coefficient is a measure of statistical dispersion that is intended to represent the income or wealth distribution of a nation's residents, and is commonly used as a measure of inequality. It is defined as a ratio with values between 0 and 1, where 0 corresponds with perfect equality (everyone has the same income) and 1 corresponds with perfect inequality (one person has all the income, and everyone else has zero income). The Gini coefficient can be calculated from the Lorenz curve, which plots the proportion of the total income of the population that is cumulatively earned by the bottom x% of the population. The Gini coefficient is defined as the area between the line of equality and the Lorenz curve, divided by the total area under the line of equality.

References

On richness

Magurran, Anne E. 1988. Ecological Diversity and Its Measurement. Princeton: Princeton University Press. doi:10.1007/978-94-015-7358-0

On diversity

Blau, Peter M. 1977. Inequality and heterogeneity. New York: Free Press.

Teachman, Jay D. 1980. Analysis of population diversity: Measures of qualitative variation. Sociological Methods & Research, 8:341-362. doi:10.1177/004912418000800305

Page, Scott E. 2010. Diversity and Complexity. Princeton: Princeton University Press. doi:10.1515/9781400835140

Examples

net_by_richness(ison_networkers)
#> [1] 3
marvel_friends <- to_unsigned(to_uniplex(fict_marvel, "relationship"), "positive")
net_by_diversity(marvel_friends, "Gender")
#> [1] 0.306
net_by_diversity(marvel_friends, "Appearances")
#> [1] 0.802