Handling
zeros in
multivariate
count tables
A zero-inflated PLN
Mixture of PLN and Bernoulli distribution
Use two latent vectors Wi and Zi to model excess of zeroes and dependence structure
PLN latent spaceexcess of zerosobservation spaceZi=(Zij)j=1…pWi=(Wij)j=1…pYij|Wij,Zij∼(x⊤iB,Σ),∼⊗pj=1(πij),∼indepWijδ0+(1−Wij)(exp{oij+Zij}),
⇝ Better handling of zeros +additional interpretable parameters
Basic properties
Letting Aij≜exp(oij+μij+σjj/2) with μij=x⊺iBj, then
- 𝔼(Yij)=(1−πij)Aij≤Aij (PLN’s mean),
- 𝕍(Yij)=(1−πij)Aij+(1−πij)A2ij(eσjj−(1−πij)).
ZI-PLN: refinements
Modeling of the pure zero component
πijπijπijπijπij=π∈[0,1]=πj∈[0,1]=πi∈[0,1]=logit−1(X0B0)ij, X0∈ℝn×d0, B0∈ℝd0×p=logit−1(B¯0X¯0)ij, B¯0∈ℝn×d0, X¯0∈ℝd0×p(single global parameter)(species dependent)(site dependent)(site covariates)(species covariates)
Proposition 1 (Identifiability of ZI-PLN)
- The “single global parameter” ZI-PLN model with parameter θ=(Ω,μ,π) and parameter space 𝕊++p×ℝp×(0,1)p is identifiable (moment-based proof)
- The site-covariates zero-inflation model with parameter θ=(Ω,B,B0) and parameter space 𝕊++p×p,d(ℝ)×p,d(ℝ) is identifiable if and only if both n×d and n×d0 matrices of covariates X and X0 are full rank.
Standard mean-field
Variational approximation breaks all dependencies
p(Zi,Wi|Yi)≈qψi(Zi,Wi)≜qψi(Zi)qψi(Wi)=⊗pj=1qψi(Zij)qψi(Wij)
with Gaussian and Bernoulli distributions for Zij and Wij, then
qψi(Zi,Wi)=⊗pj=1(Mij,S2ij)(ρij)
Variational lower bound
Let θ=(B,B0,Σ) and ψ=(M,S,R), then
J(θ,ψ)=logpθ(Y)−KL(pθ(.|Y)‖qψ(.))=𝔼qψlogpθ(Z,W,Y)−𝔼qψlogqψ(Z,W)=𝔼qψlogpθ(Y|Z,W)+𝔼qψlogpθ(Z)+𝔼qψlogpθ(W)−𝔼qψlogqψ(Z)−𝔼qψlogqψ(W)
Property: concave in each element of θ,ψ.
Sparse regularization
Recall that θ=(B,B0,Ω=Σ−1). Sparsity allows to control the number of parameters:
argminθ,ψJ(θ,ψ)+λ1‖B‖1+λ2‖Ω‖1(+λ1‖B0‖1)
Alternate optimization
- (Stochastic) Gradient-descent on B0,M,S
- Closed-form for posterior probabilities R
- Inverse covariance Ω
- if λ2=0, Σ̂ =n−1[(M−XB)⊤(M−XB)+S¯2]
- if λ2>0, ℓ1 penalized MLE ( ⇝ Graphical-Lasso with Σ̂ as input)
- PLN regression coefficient B
- if λ1=0, B̂ =[X⊤X]−1X⊤M
- if λ1>0, vectorize and solve a ℓ1 penalized least-squared problem
Initialize With univariate zero-inflated Poisson regression models
Enhancing variational approximation (1)
Two paths of improvements to break less dependencies between the latent variables:
p(Zi,Wi|Yi)≈q(Zi,Wi)≜⎧⎩⎨⎪⎪∏jq(Wij|Zij)q(Zij)∏jq(Zij|Wij)q(Wij)
The W|Z,Y path
One can show that
Wij|Yij,Zij∼(πijπij+(1−πij)exp(−Zij))1{Yij=0}
Sadly, the resulting ELBO involves the untractable entropy term 𝔼̃ [logqψ(W|Z)]
⇝ requires computing 𝔼̃ [−log(1+exp(−U))1+exp(−U)] for arbitrary univariate Gaussians U
Enhancing variational approximation (2)
The Z|W,Y path
Since Wij only takes two values, the dependence between Zij and Wij can easily be highlighted:
Zij|Wij,Yij=(Zij|Yij,Wij=1)Wij(Zij|Yij,Wij=0)1−Wij.
Then, p(Zij|Yij,Wij=1)=p(Zij|(Wij=1)=p(Zij) by independence of Zij and Wij.
⇝ Only an approximation of Zij|Yij,Wij=0 is needed.
More accurate variational approximation
qψi(Zi,Wi)=qψi(Zi|Wi)qψi(Wi)=⊗pj=1(x⊤iBj,Σjj)Wij(Mij,S2ij)1−WijWij,Wij∼indep(ρij).
Counterpart
We loose close-forms in M Step of VEM for B̂ and Σ̂ in the corresponding ELBO…
Additional refinement
Optimization using analytic law of Wij|Yij
Proposition 2 (Distribution of Wij|Yij) Wij|Yij∼(πijφ(x⊤iBj,Σjj)(1−πij)+πij)1{Yij=0}
with φ(μ,σ2)=𝔼[exp(−X)], X∼(μ,σ2).
Approximation of φ
The function φ is intractable but an approximation (Rojas-Nandayapa 2008) can be computed:
φ(μ,σ2)≈φ̃ (μ,σ2)=exp(−L2(σ2eμ)+2L(σ2eμ)2σ2)1+L(σ2eμ)‾‾‾‾‾‾‾‾‾‾‾‾√,
where L(⋅) is the Lambert function (i.e. z=xexp(x)⇔x=L(z),x,z∈ℝ).