Zero-inflation in the 🐟 lognormal family

Statistiques aux sommets, Rochebrune

Bastien Batardière

PhD - UMR MIA Paris-Saclay

Julien Chiquet

UMR MIA Paris-Saclay

François Gindraud

IR INRIA

Mahendra Mariadassou

MaIAGE

March 25, 2024

Model selection criteria
	nb_param	loglik	BIC	ICL
PLN	33929	-220146.1	-335526.4	-856215.2
PLN site	34706	-217792.1	-335814.7	-856263.3
PLN time	34706	-218444.9	-336467.6	-854845.5
PLN site * time	37555	-213942.0	-341653.1	-859927.5

Standard mean-field

Variational approximation breaks all dependencies

p (Z_{i}, W_{i} | Y_{i}) \approx q_{ψ_{i}} (Z_{i}, W_{i}) ≜ q_{ψ_{i}} (Z_{i}) q_{ψ_{i}} (W_{i}) = \otimes_{j = 1}^{p} q_{ψ_{i}} (Z_{i j}) q_{ψ_{i}} (W_{i j})

$\begin{equation*} p(\mathbf{Z}_i, \mathbf{W}_i | \mathbf{Y}_i) \approx q_{\psi_i}(\mathbf{Z}_i, \mathbf{W}_i) \triangleq q_{\psi_i}(\mathbf{Z}_i) q_{\psi_i}(\mathbf{W}_i) = \otimes_{j=1}^p q_{\psi_i}(Z_{ij}) q_{\psi_i}(W_{ij}) \end{equation*}$

with Gaussian and Bernoulli distributions for $Z_{ij}$ and $W_{ij}$ , then

q_{ψ_{i}} (Z_{i}, W_{i}) = \otimes_{j = 1}^{p} N (M_{i j}, S_{i j}^{2}) B (ρ_{i j})

$\begin{equation*} q_{\psi_i}(\mathbf{Z}_i, \mathbf{W}_i) = \otimes_{j=1}^p \mathcal N\left( M_{ij}, S_{ij}^2\right) \mathcal B\left(\rho_{ij} \right) \end{equation*}$

Variational lower bound

Let $\theta = (\mathbf{B}, \mathbf{B}^0, \mathbf{\Sigma})$ and $\psi= (\mathbf{M}, \mathbf{S}, \mathbf{R})$ , then

\begin{aligned} J (θ, ψ) & = \log p_{θ} (Y) - K L (p_{θ} (. | Y) ‖ q_{ψ} (.)) \\ = E_{q_{ψ}} \log p_{θ} (Z, W, Y) - E_{q_{ψ}} \log q_{ψ} (Z, W) \\ = E_{q_{ψ}} \log p_{θ} (Y | Z, W) + E_{q_{ψ}} \log p_{θ} (Z) + E_{q_{ψ}} \log p_{θ} (W) \\ - E_{q_{ψ}} \log q_{ψ} (Z) - E_{q_{ψ}} \log q_{ψ} (W) \end{aligned}

$\begin{align*} J(\theta, \psi) & = \log p_\theta(\mathbf{Y}) - KL(p_\theta(. | \mathbf{Y}) \| q_\psi(.) ) \\ & = \mathbb{E}_{q_{\psi}} \log p_\theta(\mathbf{Z}, \mathbf{W}, \mathbf{Y}) - \mathbb{E}_{q_{\psi}} \log q_\psi(\mathbf{Z}, \mathbf{W}) \\ & = \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{Y} | \mathbf{Z}, \mathbf{W}) + \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{Z}) + \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{W}) \\ & \qquad - \mathbb{E}_{q_{\psi}} \log q_{\psi} (\mathbf{Z}) - \mathbb{E}_{q_{\psi}} \log q_{\psi} (\mathbf{W}) \\ \end{align*}$

Property: concave in each element of $\theta, \psi$ .

Sparse regularization

Recall that $\theta = (\mathbf{B}, \mathbf{B}^0, \mathbf{\Omega} = \mathbf{\Sigma}^{-1})$ . Sparsity allows to control the number of parameters:

\arg min_{θ, ψ} J (θ, ψ) + λ_{1} ‖ B ‖_{1} + λ_{2} ‖ Ω ‖_{1} (+ λ_{1} ‖ B^{0} ‖_{1})

$\begin{equation*} \arg\min_{\theta, \psi} J(\theta, \psi) + \lambda_1 \| \mathbf{B} \|_1 + \lambda_2 \| \mathbb{\Omega} \|_1 \color{#ddd}{ \left( + \lambda_1 \| \mathbf{B}^0 \|_1 \right)} \end{equation*}$

Alternate optimization

(Stochastic) Gradient-descent on $\mathbf{B}^0, \mathbf{M}, \mathbf{S}$
Closed-form for posterior probabilities $\mathbf{R}$
Inverse covariance $\mathbf{\Omega}$
- if $\lambda_2=0$ , $\hat{\mathbf{\Sigma}} = n^{-1} \left[ (\mathbf{M} - \mathbf{XB})^\top(\mathbf{M} - \mathbf{XB}) + \bar{\mathbf{S}}^2 \right]$
- if $\lambda_2 > 0$ , $\ell_1$ penalized MLE ( $\rightsquigarrow$ Graphical-Lasso with $\hat{\mathbf{\Sigma}}$ as input)
PLN regression coefficient $\mathbf{B}$
- if $\lambda_1=0$ , $\hat{\mathbf{B}} = [\mathbf{X}^\top \mathbf{X}]^{-1} \mathbf{X}^\top \mathbf{M}$
- if $\lambda_1 > 0$ , vectorize and solve a $\ell_1$ penalized least-squared problem

Initialize With univariate zero-inflated Poisson regression models

Enhancing variational approximation (1)

Two paths of improvements to break less dependencies between the latent variables:

p (Z_{i}, W_{i} | Y_{i}) \approx q (Z_{i}, W_{i}) ≜ {\begin{cases} \prod_{j} q (W_{i j} | Z_{i j}) q (Z_{i j}) \\ \prod_{j} q (Z_{i j} | W_{i j}) q (W_{i j}) \end{cases}

$p(\mathbf{Z}_i, \mathbf{W}_i | \mathbf{Y}_i) \approx q(\mathbf{Z}_i, \mathbf{W}_i) \triangleq \left\{\begin{array}{l} \prod_j q(W_{ij} | Z_{ij}) q(Z_ {ij}) \\[1ex] \prod_j q(Z_{ij} | W_{ij}) q(W_{ij}) \\ \end{array}\right.$

The $W|Z,Y$ path

One can show that

W_{i j} | Y_{i j}, Z_{i j} \sim B (\frac{π_{i j}}{π_{i j} + (1 - π_{i j}) \exp (- Z_{i j})}) 1_{{Y_{i j} = 0}}

$W_{ij}| Y_{ij},Z_{ij} \sim\mathcal B \left( \frac{\pi_{ij}}{\pi_{ij} + (1-\pi_{ij})\exp(-Z_{ij})}\right) \boldsymbol 1_{\{Y_{ij} = 0\}}$

Sadly, the resulting ELBO involves the untractable entropy term $\tilde{\mathbb{E}}[\log q_{\psi}(\mathbf{W} | \mathbf{Z})]$

$\rightsquigarrow$ requires computing $\tilde{\mathbb{E}}\left[ -\frac{\log(1+\exp(-U))}{1+\exp(-U)} \right]$ for arbitrary univariate Gaussians $U$

Enhancing variational approximation (2)

The $Z|W,Y$ path

Since $W_{ij}$ only takes two values, the dependence between $Z_{ij}$ and $W_{ij}$ can easily be highlighted:

Z_{i j} | W_{i j}, Y_{i j} = {(Z_{i j} | Y_{i j}, W_{i j} = 1)}^{W_{i j}} {(Z_{i j} | Y_{i j}, W_{i j} = 0)}^{1 - W_{i j}} .

$Z_{ij} | W_{ij}, Y_{ij} = \left(Z_{ij}|Y_{ij}, W_{ij} = 1 \right)^{ W_{ij}}\left(Z_{ij}|Y_{ij}, W_{ij} = 0 \right)^{1- W_{ij}}.$ Then,

p (Z_{i j} | Y_{i j}, W_{i j} = 1) = p (Z_{i j} | (W_{i j} = 1) = p (Z_{i j})

$p(Z_{ij}| Y_{ij}, W_{ij} = 1) = p(Z_{ij} | (W_{ij} = 1) = p(Z_{ij})$ by independence of

Z_{i j}

$Z_{ij}$ and

W_{i j}

$W_{ij}$ .

$\rightsquigarrow$ Only an approximation of $Z_{ij} | Y_{ij}, W_{ij} = 0$ is needed.

More accurate variational approximation

\begin{aligned} q_{ψ_{i}} (Z_{i}, W_{i}) & = q_{ψ_{i}} (Z_{i} | W_{i}) q_{ψ_{i}} (W_{i}) \\ = \otimes_{j = 1}^{p} N (x_{i}^{⊤} B_{j}, Σ_{j j})^{W_{i j}} N (M_{i j}, S_{i j}^{2})^{1 - W_{i j}} W_{i j}, W_{i j} \sim^{indep} B (ρ_{i j}) \end{aligned} .

$\begin{aligned} q_{\psi_i}(\boldsymbol Z_i, \boldsymbol W_i) & = q_{\psi_i}(\boldsymbol Z_i | \boldsymbol W_i) q_{\psi_i}(\boldsymbol W_i) \\ & = \otimes_{j = 1}^p \mathcal{N}(\boldsymbol x_i^\top \mathbf{B}_j, \Sigma_{jj})^{W_{ij}} \mathcal{N}(M_{ij}, S_{ij}^2)^{1-W_{ij}} W_{ij}, \quad W_{ij} \sim^\text{indep} \mathcal{B}\left(\rho_{ij}\right)\end{aligned}.$

Counterpart

We loose close-forms in M Step of VEM for $\hat{\mathbf{B}}$ and $\hat{\mathbf{\Sigma}}$ in the corresponding ELBO…

Additional refinement

Optimization using analytic law of $W_{ij}| Y_{ij}$

Proposition 2 (Distribution of $W_{ij}| Y_{ij}$ )

W_{i j} | Y_{i j} \sim B (\frac{π_{i j}}{φ (x_{i}^{⊤} B_{j}, Σ_{j j}) (1 - π_{i j}) + π_{i j}}) 1_{{Y_{i j} = 0}}

$W_{ij} | Y_{ij} \sim \mathcal{B}\left(\frac{\pi_{ij}}{ \varphi\left(\mathbf{x}_i^\top \boldsymbol B_j, \Sigma_{jj}\right) \left(1 - \pi_{ij}\right) + \pi_{ij}}\right) \boldsymbol 1_{\{Y_{ij} = 0\}}$

with $\varphi(\mu,\sigma^2) = \mathbb E \left[ \exp(-X)\right], ~ X \sim \mathcal L \mathcal N \left( \mu, \sigma^2\right)$ .

Approximation of $\varphi$

The function $\varphi$ is intractable but an approximation (Rojas-Nandayapa 2008) can be computed:

φ (μ, σ^{2}) \approx \tilde{φ} (μ, σ^{2}) = \frac{\exp (- \frac{L^{2} (σ^{2} e^{μ}) + 2 L (σ^{2} e^{μ})}{2 σ^{2}})}{\sqrt{1 + L (σ^{2} e^{μ})}},

$\varphi(\mu, \sigma^2)\approx \tilde \varphi(\mu, \sigma^2)= \frac{\exp \left(-\frac{L^2\left(\sigma^2 e^\mu\right)+2 L\left(\sigma^2 e^\mu\right)}{2 \sigma^2}\right)}{\sqrt{1+L\left(\sigma^2 e^\mu\right)}},$ where

L (\cdot)

$L(\cdot)$ is the Lambert function (i.e.

z = x \exp (x) \Leftrightarrow x = L (z), x, z \in R

$z = x \exp(x) \Leftrightarrow x = L(z), x,z \in \mathbb R$ ).

Locate the best model

Model selection criteria
The top two largest are presented
model	nb_param	loglik	BIC	ICL
PLN	33929	-220146.1	-335526.4	-856215.2
PLN site	34706	-217792.1	-335814.7	-856263.3
PLN time	34706	-218444.9	-336467.6	-854845.5
PLN site * time	37555	-213942.0	-341653.1	-859927.5
ZI	33930	-216464.3	-331848.1	-594390.5
ZI site	34188	-203256.9	-319518.0	-531080.2
ZI time	34188	-204861.7	-321122.8	-578202.4
ZI site * time	34188	-194490.5	-310751.7	-546597.5
ZI site PLN site	35742	-201981.4	-323527.1	-563262.0
ZI time PLN time	35742	-206516.2	-328061.9	-534570.5
ZI and PLN site * time	41440	-191913.2	-332835.8	-500573.7

$\rightsquigarrow$ Ok, let us keep model with site and time with main and interaction effects in both ZI an PLN components.

1 / 27

Zero- inflation in the 🐟 lognormal family Statistiques aux sommets, Rochebrune Bastien Batardière PhD - UMR MIA Paris-Saclay Julien Chiquet UMR MIA Paris-Saclay François Gindraud IR INRIA Mahendra Mariadassou MaIAGE March 25, 2024

Zero-inflation in the 🐟 lognormal family
Motivation: multivariate count data
Model fo multivariate count data
Analysis of microcosm with standard PLN
Handling zeros in multivariate count tables
A zero-inflated PLN
ZI-PLN: refinements
Variational Inference
Standard mean-field
Sparse regularization
Enhancing variational approximation (1)
Enhancing variational approximation (2)
Additional refinement
Microcosm data analysis
Fit various ZI-PLN models
Locate the best model
Model fits
Model fits
Fits of zeros
ZI-PLN: latent layer
PLN: latent layer
ZI PLN
ZI PLN
Residual covariance
Network analysis
Cluster: means vs covariance
References