Tutorial on Stochastic Block Models

Analysis of the tree/tree data

Tree-tree binary interaction networks

We first consider the binary network where an edge is drawn between two trees when they do share a least one common fungi:

tree_tree_binary <- 1 * (fungusTreeNetwork$tree_tree != 0)

The simple function plotMyMatrix can be use to represent simple or bipartite SBM:

plotMyMatrix(tree_tree_binary, dimLabels = list(row = 'tree', col = 'tree'))

We look for some latent structure of the network by adjusting a simple SBM with the function estimateSimpleSBM. We assume the our matrix is the realisation of the SBM: \[\begin{align*} (Z_i) \text{ i.i.d.} \qquad & Z_i \sim \mathcal{M}(1, \pi) \\ (Y_{ij}) \text{ indep.} \mid (Z_i) \qquad & (Y_{ij} \mid Z_i=k, Z_j = \ell) \sim \mathcal{B}(\alpha_{k\ell}) \end{align*}\] and infer this model with sbm (used package : blockmodels). Note that simpleSBM refers to standard networks (w.r.t. bipartite)

mySimpleSBM <- tree_tree_binary %>%
  estimateSimpleSBM("bernoulli", 
                    dimLabels ='tree', 
                    estimOptions = list(verbosity = 2, plot=FALSE))
#> -> Estimation for 1 groups
#> 
-> Computation of eigen decomposition used for initalizations
#> 
#> -> Pass 1
#>     -> With ascending number of groups
#> 






    -> With descending number of groups
#> 









-> Pass 2
#>     -> With ascending number of groups
#> 

    -> With descending number of groups
#> 

-> Pass 3
#>     -> With ascending number of groups
#>     -> With descending number of groups

ICL_tree_bin

Once fitted, the user can manipulate the fitted model by accessing the various fields and methods enjoyed by the class simpleSBMfit. Most important fields and methods are recalled to the user via the show method:

class(mySimpleSBM)
#> [1] "SimpleSBM_fit" "SimpleSBM"     "SBM"           "R6"

For instance,

mySimpleSBM$nbBlocks
#> [1] 5
mySimpleSBM$nbNodes
#> tree 
#>   51
mySimpleSBM$nbCovariates
#> [1] 0

The plot method is available as a S3 or R6 method. The default represents the network data reordered according to the memberships estimated in the SBM

plot(mySimpleSBM, type = "data")

One can also plot the expected network which, in case of the Bernoulli model, corresponds to the probability of connection between any pair of nodes in the network.

plot(mySimpleSBM, type = "expected", dimLabels = list(row = 'tree', col = 'tree'),estimOptions=list(legend=TRUE))

Finally one can plot the mesoscopic view of the network.

plot(mySimpleSBM, type = "meso", 
     dimLabels = list(row = 'tree', col = 'tree'),
     plotOptions = list(edge.width = 1.2))

About model selection and choice of the number of blocks

During the estimation, a certain range of models are explored corresponding to different number of blocks. By default, the best model in terms of Integrated Classification Likelihood is sent back. IN fact, all the model are stored internally. The user can have a quick glance at them via the $storedModels field:

mySimpleSBM$storedModels %>% kable()

indexModel	nbParams	nbBlocks	ICL	loglik
1	1	1	-883.3334	-879.7581
2	4	2	-619.0799	-606.3880
3	8	3	-537.5179	-512.1339
4	13	4	-540.6318	-498.9806
5	19	5	-520.6645	-459.1706
6	26	6	-530.6278	-445.7159
7	34	7	-544.0191	-432.1138
8	43	8	-562.1450	-419.6710

We can then what models are competitive in terms of model selection by checking the ICL

mySimpleSBM$storedModels %>%  
  ggplot() + 
  aes(x = nbBlocks, y = ICL) + geom_line() + geom_point(alpha = 0.5)

The 4-block model could have been a good choice too, in place of the 5-block model. The user can update the current simpleSBMfit thanks to the the setModel method:

mySimpleSBM$setModel(4)
mySimpleSBM$nbBlocks
#> [1] 4
mySimpleSBM$plot(type = 'expected')

Going back to th best model

mySimpleSBM$setModel(5)

Analysis of the weighted interaction network

Instead of considering the binary network tree-tree we may consider the weighted network where the link between two trees is the number of fungi they share.

We plot the matrix with function plotMyMatrix:

tree_tree <- fungusTreeNetwork$tree_tree
plotMyMatrix(tree_tree, dimLabels = list(row = 'tree', col = 'tree'))

Here again, we look for some latent structure of the network by adjusting a simple SBM with the function estimateSimpleSBM, considering a Poisson distribution on the edges.

\[\begin{align*} (Z_i) \text{ i.i.d.} \qquad & Z_i \sim \mathcal{M}(1, \pi) \\ (Y_{ij}) \text{ indep.} \mid (Z_i) \qquad & (Y_{ij} \mid Z_i=k, Z_j = \ell) \sim \mathcal{P}(\exp(\alpha_{kl})) = \mathcal{P}(\lambda_{kl}) \end{align*}\]

mySimpleSBMPoisson <- tree_tree %>%
  estimateSimpleSBM("poisson", directed = FALSE,
                    estimOptions = list(verbosity = 0 , plot = FALSE),
                    dimLabels = c('tree'))

ICL_tree_poisson

class(mySimpleSBMPoisson)
#> [1] "SimpleSBM_fit" "SimpleSBM"     "SBM"           "R6"
mySimpleSBMPoisson
#> Fit of a Simple Stochastic Block Model -- poisson variant
#> =====================================================================
#> Dimension = ( 51 ) - ( 6 ) blocks and no covariate(s).
#> =====================================================================
#> * Useful fields 
#>   $nbNodes, $modelName, $dimLabels, $nbBlocks, $nbCovariates, $nbDyads
#>   $blockProp, $connectParam, $covarParam, $covarList, $covarEffect 
#>   $expectation, $indMemberships, $memberships 
#> * R6 and S3 methods 
#>   $rNetwork, $rMemberships, $rEdges, plot, print, coef 
#> * Additional fields
#>   $probMemberships, $loglik, $ICL, $storedModels, 
#> * Additional methods 
#>   predict, fitted, $setModel, $reorder

For instance,

mySimpleSBMPoisson$nbBlocks
#> [1] 6
mySimpleSBMPoisson$nbNodes
#> tree 
#>   51
mySimpleSBMPoisson$nbCovariates
#> [1] 0

We now plot the matrix reordered according to the memberships estimated in the SBM

plot(mySimpleSBMPoisson, type = "data")

One can also plot the expected network which, in case of the Poisson model, corresponds to the expectation of connection between any pair of nodes in the network.

plot(mySimpleSBMPoisson, type = "expected")

The same manipulations can be made on the models as before. One can also plot the macroview of the network.

plot(mySimpleSBMPoisson, type = "meso")

The composition of the clusters/blocks are given by :

lapply(1:mySimpleSBMPoisson$nbBlocks,
       function(q){fungusTreeNetwork$tree_names[mySimpleSBMPoisson$memberships == q]})
#> [[1]]
#> [1] Abies alba                                   
#> [2] Abies grandis                                
#> [3] Cedrus spp (Cedrus atlantica, Cedrus libani) 
#> [4] Larix decidua                                
#> [5] Picea excelsa                                
#> [6] Pinus nigra laricio                          
#> [7] Pinus pinaster                               
#> [8] Pinus sylvestris                             
#> [9] Pseudotsuga menziesii                        
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 
#> [[2]]
#>  [1] Large Maples (Acer platanoides, Acer pseudoplatanus)                                                
#>  [2] Fagus silvatica                                                                                     
#>  [3] Fraxinus spp (Fraxinus angustifolia, Fraxinus excelsior)                                            
#>  [4] Juglans spp (Juglans nigra, Juglans regia)                                                          
#>  [5] Cultivated Poplars (Populus trichocarpa, P. canescens, P.alba, P.nigra and their cultivated hybrids)
#>  [6] Prunus avium                                                                                        
#>  [7] Quercus petraea                                                                                     
#>  [8] Quercus robur                                                                                       
#>  [9] Quercus rubra                                                                                       
#> [10] Sorbus torminalis                                                                                   
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 
#> [[3]]
#> [1] Abies nordmanniana Larix kaempferi    Picea sitchensis   Pinus halepensis  
#> [5] Pinus nigra nigra  Pinus strobus      Pinus uncinata    
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 
#> [[4]]
#> [1] Small Maples (Acer campestre, Acer monspessulanum, Acer negundo, Acer opalus) 
#> [2] Alnus glutinosa                                                               
#> [3] Castanea sativa                                                               
#> [4] Quercus ilex                                                                  
#> [5] Quercus pubescens                                                             
#> [6] Quercus suber                                                                 
#> [7] Sorbus aria                                                                   
#> [8] Tilia spp (Tilia platiphyllos, Tilia cordata)                                 
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 
#> [[5]]
#> [1] Cupressus sempervirens                            
#> [2] Pinus brutia (Pinus brutia, Pinus brutia eldarica)
#> [3] Pinus cembra                                      
#> [4] Pinus pinea                                       
#> [5] Pinus radiata                                     
#> [6] Pinus taeda                                       
#> [7] Platanus hybrida                                  
#> [8] Tsuga heterophylla                                
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 
#> [[6]]
#> [1] Betulus spp (Betula pendula, Betula pubescens)      
#> [2] Carpinus betulus                                    
#> [3] Populus tremula                                     
#> [4] Quercus pyrenaica                                   
#> [5] Sorbus aucuparia                                    
#> [6] Sorbus domestica                                    
#> [7] Taxus baccata                                       
#> [8] Thuja plicata                                       
#> [9] Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra)

We are interested in comparing the two clusterings. To do so we use the alluvial flow plots.

listMemberships <- list(binarySBM = mySimpleSBM$memberships)
listMemberships$weightSBM <- mySimpleSBMPoisson$memberships
P <- plotAlluvial(listMemberships)

Introduction of covariates

We have on each pair of trees 3 covariates, namely the genetic distance, the taxonomic distance and the geographic distance. Each covariate has to be introduced as a matrix: $X^k_{ij}$ corresponds to the value of the $k$-th covariate describing the couple $(i,j)$.

We can also use the sbm package to estimate the parameters of the SBM with covariates.

mySimpleSBMCov<- estimateSimpleSBM(
  netMat = as.matrix(tree_tree),
  model = 'poisson',
  directed =FALSE,
  dimLabels =c('tree'), 
  covariates  = fungusTreeNetwork$covar_tree,
  estimOptions = list(verbosity = 0))

ICL_tree_poisson

We select the best number of clusters (with respect to the ICL criteria)

mySimpleSBMCov$nbBlocks
#> [1] 4

We can now extract the parameters of interest, namely ($\lambda$, $\pi$) and the clustering of the nodes.

mySimpleSBMCov$connnectParam
#> NULL
mySimpleSBMCov$blockProp
#> [1] 0.3715916 0.2159544 0.2354117 0.1770424
mySimpleSBMCov$memberships
#>  [1] 1 1 2 1 2 2 4 4 2 1 3 1 1 1 1 2 1 2 3 3 2 1 2 1 3 3 2 1 3 2 3 1 4 1 1 3 1 3
#> [39] 4 1 1 3 2 4 4 1 4 4 3 3 4
mySimpleSBMCov$covarParam
#>            [,1]
#> [1,]  0.1976220
#> [2,] -2.0550285
#> [3,] -0.3582768

The composition of the clusters/blocks are given by:

lapply(1:mySimpleSBMCov$nbBlocks,function(q){fungusTreeNetwork$tree_names[mySimpleSBMCov$memberships == q]})
#> [[1]]
#>  [1] Abies alba                                                                                          
#>  [2] Abies grandis                                                                                       
#>  [3] Large Maples (Acer platanoides, Acer pseudoplatanus)                                                
#>  [4] Cedrus spp (Cedrus atlantica, Cedrus libani)                                                        
#>  [5] Fagus silvatica                                                                                     
#>  [6] Fraxinus spp (Fraxinus angustifolia, Fraxinus excelsior)                                            
#>  [7] Juglans spp (Juglans nigra, Juglans regia)                                                          
#>  [8] Larix decidua                                                                                       
#>  [9] Picea excelsa                                                                                       
#> [10] Pinus nigra laricio                                                                                 
#> [11] Pinus pinaster                                                                                      
#> [12] Pinus sylvestris                                                                                    
#> [13] Cultivated Poplars (Populus trichocarpa, P. canescens, P.alba, P.nigra and their cultivated hybrids)
#> [14] Prunus avium                                                                                        
#> [15] Pseudotsuga menziesii                                                                               
#> [16] Quercus petraea                                                                                     
#> [17] Quercus robur                                                                                       
#> [18] Quercus rubra                                                                                       
#> [19] Sorbus torminalis                                                                                   
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 
#> [[2]]
#>  [1] Abies nordmanniana                                                            
#>  [2] Small Maples (Acer campestre, Acer monspessulanum, Acer negundo, Acer opalus) 
#>  [3] Alnus glutinosa                                                               
#>  [4] Castanea sativa                                                               
#>  [5] Larix kaempferi                                                               
#>  [6] Picea sitchensis                                                              
#>  [7] Pinus halepensis                                                              
#>  [8] Pinus nigra nigra                                                             
#>  [9] Pinus strobus                                                                 
#> [10] Pinus uncinata                                                                
#> [11] Sorbus aria                                                                   
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 
#> [[3]]
#>  [1] Cupressus sempervirens                            
#>  [2] Pinus brutia (Pinus brutia, Pinus brutia eldarica)
#>  [3] Pinus cembra                                      
#>  [4] Pinus pinea                                       
#>  [5] Pinus radiata                                     
#>  [6] Pinus taeda                                       
#>  [7] Platanus hybrida                                  
#>  [8] Quercus ilex                                      
#>  [9] Quercus pubescens                                 
#> [10] Quercus suber                                     
#> [11] Tilia spp (Tilia platiphyllos, Tilia cordata)     
#> [12] Tsuga heterophylla                                
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 
#> [[4]]
#> [1] Betulus spp (Betula pendula, Betula pubescens)      
#> [2] Carpinus betulus                                    
#> [3] Populus tremula                                     
#> [4] Quercus pyrenaica                                   
#> [5] Sorbus aucuparia                                    
#> [6] Sorbus domestica                                    
#> [7] Taxus baccata                                       
#> [8] Thuja plicata                                       
#> [9] Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra) 
#> 51 Levels: Abies alba Abies grandis Abies nordmanniana ... Ulmus spp (Ulmus minor, Ulmus laevis, Ulmus glabra)

We are interested in comparing the three cluterings. To do so we use the alluvial flow plots

listMemberships <- list(binary = mySimpleSBM$memberships)
listMemberships$weighted <- mySimpleSBMPoisson$memberships
listMemberships$weightedCov <- mySimpleSBMCov$memberships
plotAlluvial(listMemberships)

#> $plotOptions
#> $plotOptions$curvy
#> [1] 0.3
#> 
#> $plotOptions$alpha
#> [1] 0.8
#> 
#> $plotOptions$gap.width
#> [1] 0.1
#> 
#> $plotOptions$col
#> [1] "darkolivegreen3"
#> 
#> $plotOptions$border
#> [1] "white"
#> 
#> 
#> $tableFreq
#>     binary weighted weightedCov Freq
#> 1        1        1           1    9
#> 6        1        2           1    1
#> 7        2        2           1    9
#> 41       1        3           2    5
#> 43       3        3           2    2
#> 46       1        4           2    1
#> 49       4        4           2    3
#> 79       4        4           3    4
#> 81       1        5           3    1
#> 83       3        5           3    5
#> 84       4        5           3    2
#> 120      5        6           4    9

Tutorial on Stochastic Block Models

An illustration on antogonistic tree/fungus network

team großBM

2024-02-12

Requirements

Data set: antagonistic tree/fungus interaction network

Analysis of the tree/tree data

Tree-tree binary interaction networks

About model selection and choice of the number of blocks

Analysis of the weighted interaction network

Introduction of covariates

Analysis of the tree/fungi data

References