Introduction
In this article, we will implement a method to compute the entropy of a network, undirected without self-loop using GHypEG.
Network can be used to denote multiple types of organization such as social ones like a club or a company, but it can also denote a market and more. Networks are part of our everyday life and omnipresent in our society, they also have the advantage to be well represented using graphs and this is how we study them, however many networks are of high complexity due to their size and the dynamic nature they tend to have (e.g. members joining and leaving an organization). Here we will see how can we characterize the potentiality of a network by using entropy as proposed in [1] — for undirected and without self-loop associated graph — i.e. its ability to attain different configurations considering existing constraints, this approach makes use of Generalised Hypergeometric Ensembles of random Graphs (GHypEG) from [2].
Problem set-up
First let's define how to represent networks with graphs, to do so the example of a social organization will be considered. A network is composed of participants that we will refer as nodes, those participants are the elements that perform interactions between them, interactions that we will refer as edges noting that multiple interactions can occur between two participants leading possibly to multiple edges between two nodes in the graph representation.
As an example let's assume a network composed of a teacher denoted T and three students denoted S1, S2, and S3. The teacher teaches 3 times a week, once for S1 and S2 where S1 and S2 interact together, once for S1 only, and once for S3 only, the graph of this network would be as follow:
We see that this graph is composed of 4 nodes and 5 edges but there exist 252 different "states", or configurations of graphs, with this number of nodes and edges — assuming this graph is underirected and does not allow self-loop.
How can we quantify the capability of our network to attain those different configurations? In other words, how can we characterize the potentiality of a network?
Solution
A network has many different states it can attain under existing constraints — that can be temporal or spatial for instance —, thus we can use entropy to characterize its potentiality, where the constraints are lowering the potentiality.
In order to proxy those constraints, we have to consider the probability distribution to find a graph given its number of nodes and edges, assuming those constraints are encoded in the number of interactions present in the current configuration of the network, we set the expected number of interactions between each pair of nodes to the observed ones. To specify this probability distribution characterizing the network ensemble, the GHypEG is used; this method uses two sets of parameters reflecting all ways by which the nodes can be linked and the preferences of nodes to be connected. From these parameters, the probability distribution of interest can be obtained using the multivariate Wallenius noncentral hypergeometric distribution though its approximation by the multinomial distribution is used for large networks, allowing a feasible way to compute the Shannon entropy.
Now the entropy of the network can be computed but for a better comparison it is important to normalize it, this can be done by dividing the entropy value by the maximum possible entropy which in our case of undirected without self-loop network is achieved by equiprobability for the distribution.
The authors of [1] present their results on 3 classic data sets, here is a table summarizing their results for the Karate Club data set from [3] and the Southern Women data set from [4]:
Network | n | m | m/n | D | H |
---|---|---|---|---|---|
Karate Club | 34 | 231 | 6.79 | 0.14 | 0.31 |
Southern Women | 18 | 322 | 17.89 | 0.91 | 0.89 |
n and m are respectively the number of nodes and edges, D is the density of the network, and H its normalized multinomial entropy.
A low value of H is representative of a low potentiality of the network, i.e. not many different configurations can be attained. As could be expected we notice that the density of a network has a correlation with its potentiality.
An implementation in python is available in [5]
References
[1] What is the Entropy of a Social Organization? [Zingg et al.]
[2] Generalised hypergeometric ensembles of random graphs: the configuration model as an urn problem [Casiraghi, Nanumyan]
[3] An information Flow Model for conflict and Fission in Small Groups [Zachary]
[4] Deep South; a Social Anthropological Study of Caste and Class [Davis, Gardner B.B., Gardner M.R.]
[5] https://github.com/AlixBernard/network_entropy