
Variance ordering for SMC algorithms
Adelaide University*
Joint work with Anthony Lee
at University of Bristol

$$




















\eta_0 = {\color{green}{M_0}}
~
\eta_1(\cdot) \propto \int\eta_0(\mathrm{d}x_0){\color{blue}{G_0}}(x_0){\color{green}{M_1}}(x_0, \cdot)
~
\eta_2(\cdot) \propto \int\eta_1(\mathrm{d}x_1){\color{blue}{G_1}}(x_1){\color{green}{M_2}}(x_1, \cdot)
Constructed from Markov kernel and potential

\eta_0^N = \frac{1}{N} \sum_{i=1}^N \delta_{{\color{red}{\zeta_0^i}}}
~
\eta_1^N = \frac{1}{N} \sum_{i=1}^N \delta_{{\color{red}{\zeta_1^i}}}
~
\eta_2^N = \frac{1}{N} \sum_{i=1}^N \delta_{{\color{red}{\zeta_2^i}}}
~
Approximation: Sample, weight, and resample particles
Sample initial \zeta_0^i \overset{\text{iid}}{\sim} \color{green}{M_0} then for each time p \in [1:n]
Sample ancestors A_{p-1}^i w.p.
\qquad\mathbb{P}(A_{p-1}^i=j) \propto {\color{blue}{G_{p-1}}}(\zeta_{p-1}^j) with j \in [1:N]
Sample predictive
\qquad\zeta^{i}_{p} \sim {\color{green}{M_{p}}}(\zeta^{A^{i}_{p-1}}_{p-1},\cdot)
Output: Terminal particles \zeta_n^{1:N}
Optional: Terminal weights w.r.t. {\color{blue}{G_{n}}}
Sample initial \zeta_0^i \overset{\text{iid}}{\sim} \color{green}{\mathcal{N}(\mu,\Sigma)} then for each time p \in [1:n]
1(a). Calculate normalised weights w_i \propto {\color{blue}{L(y_{p-1}}} \mid \zeta_{p-1}^i)
1(a). Resample particles according to (w_1,\ldots,w_i,\ldots,w_N)
Output: Terminal particles \zeta_n^{1:N}
Optional: Terminal weights w.r.t. {\color{blue}{L(y_{n}}} \mid \cdot)
{\color{green}{M_0}} = {\color{green}{\mathcal{N}(\mu,\Sigma)}}
{\color{green}{M_p}}(x_{p-1}, \cdot) = {\color{green}{\mathcal{N}(B}} x_{p-1}{\color{green}{+c,\Sigma)}}
{\color{blue}{G_p}}(x_p) = {\color{blue}{L( y_p}} \mid x_p)
\mathcal{M} = (M_{0:n},G_{0:n})
\Longleftrightarrow
M_0~~G_0~~ \cdots {\color{green}{~M_p~~}}{\color{blue}{G_p~}} \cdots ~M_n~~G_n
~
\cdots {\color{green}{M_p}}(x_{p-1},\mathrm{d} x_p){\color{blue}{G_p}}(x_p) \cdots
\mathcal{M} has marginal measures:
\gamma_{p+1}(\cdot) = \int \gamma_p(\mathrm{d}x_p){\color{blue}{G_p}}(x_p){\color{green}{M_{p+1}}}(x_p, \cdot) with \gamma_{0} = M_0.
\gamma_{p}(\varphi) = \mathbb{E}\left[\varphi(X_p) \prod_{t=0}^{p-1}{\color{blue}{G_t}}(X_t) \right] where X_{0:p} \sim M_0 \otimes \cdots \otimes M_p
\mathcal{M} has marginal probability measures:
\eta_{p}(\mathrm{d}x_p) = \frac{\gamma_{p}(\mathrm{d}x_p)}{\gamma_{p}(1)}
\mathcal{M} has marginal updated (probability) measures:
\hat\gamma_{p}(\mathrm{d}x_p) = \gamma_{p}(\mathrm{d}x_p) G_p(x_p) \qquad \hat\eta_{p}(\mathrm{d}x_p) = \frac{\hat\gamma_{p}(\mathrm{d}x_p)}{\hat\gamma_{p}(1)}
Particle filter yields \zeta^{1:N}_n
\eta_{n}^{N} = \frac{1}{N}\sum_{i=1}^{N}\delta_{\zeta^{i}_n}
\gamma_{n}^{N} = \left\{\prod_{t=0}^{n-1} \eta_t^{N}(G_t)\right\}\eta_{n}^{N}
Optional: non-terminal particles \zeta^{1:N}_p
As N \rightarrow \infty for suitable functions \varphi: \mathsf{X}_p \rightarrow \mathbb{R}
\gamma_p^N(\varphi) \rightarrow \gamma_p(\varphi)\\~
\eta_p^N(\varphi) \rightarrow \eta_p(\varphi)
Measure
N \,\mathrm{Var}\left\{\gamma_n^N(\varphi)/\gamma_n(1)\right\} \rightarrow \sigma^2(\varphi)
Probability measure
N \,\mathbb{E}\left[\left\{\eta_n^N(\varphi)-\eta_n(\varphi)\right\}^2 \right]\rightarrow \sigma^2(\varphi - \eta_n(\varphi))
\sigma^2(\varphi) = \sum_{p=0}^n v_{p,n}(\varphi)
v_{p,n}(\varphi) = \frac{\gamma_p(1)\gamma_p(Q_{p,n}(\varphi)^2)}{\gamma_n(1)^2} - \eta_n(\varphi)^2
Q_{p,n}(\varphi)(x_p) = \mathbb{E}\left[ \varphi(X_n) \prod_{t=p}^n G_t(X_t) \mid X_p=x_p \right]
Key idea: Improve PFs by ordering variance maps \sigma^2(\cdot)
M K(x,\cdot) = \int M(x,\mathrm{d}y)K(y,\cdot)\qquad \text{(Composition)}
M(G)(x) = \int M(x,\mathrm{d}y)G(y) \qquad \text{(Integration)}
M^G(x,\mathrm{d}y) = \frac{M(x,\mathrm{d}y)G(y)}{M(G)(x)}\qquad \text{(Twisting)}
\mathcal{M} = (M_{0:n},G_{0:n}) with {\color{green}{M_t}} = {\color{orange}{R}} {\color{purple}{K}}
~
~
\big\Updownarrow
~
~
~
~
| \cdots | {\color{green}{M_t}} | {\color{blue}{G_t}} | M_{t+1} | \cdots |
| \cdots | {\color{orange}{R}}{\color{purple}{K}} | {\color{blue}{G_t}} | M_{t+1} | \cdots |
| \cdots | {\color{orange}{R}} | {\color{purple}{K}}({\color{blue}{G_t}}) | {\color{purple}{K}}^{{\color{blue}{G_t}}}M_{t+1} | \cdots |
| \cdots | {\color{green}{M_t^\ast}} | {\color{blue}{G_t}^\ast} | M_{t+1}^\ast | \cdots |
\mathcal{M}^\ast = (M_{0:n}^\ast,G_{0:n}^\ast)
\mathcal{M}= (M_{0:n},G_{0:n}) and \mathcal{K}= (t,{\color{orange}{R}},{\color{purple}{K}}) with M_t = {\color{orange}{R}}{\color{purple}{K}}
\mathcal{M}^\ast = \mathcal{K}\ast \mathcal{M}= (M_{0:n}^\ast,G_{0:n}^\ast)
M_t(x,\cdot) defined by
R(x,\mathrm{d}y_{1:m}) = Q(x,\mathrm{d}y_1)Q(y_1,\mathrm{d}y_2) \cdots Q(y_{m-1},\mathrm{d}y_{m})
K(y_{1:m},\{z\}) = \begin{cases} \frac{1}{m} & \text{if}~z \in y_{1:m} \\ 0 & \text{otherwise} \end{cases}
\mathcal{M}^\ast is stratified waste-free SMC (Dau and Chopin 2022)
\mathcal{M}= (M_{0:n},G_{0:n}) and \mathcal{K}= (t,{\color{orange}{R}},{\color{purple}{K}}) with M_t = {\color{orange}{R}}{\color{purple}{K}}
\mathcal{M}= (M_{0:n}, G_{0:n})
\sigma^2(\varphi)
~
\geq
\mathcal{M}^\ast = \mathcal{K}\ast \mathcal{M}
\sigma^2_\ast(\varphi)
for any relevant test function \varphi and t \in [0:n-1].
A trivial t-knot for model \mathcal{M}= (M_{0:n},G_{0:n}) is
\mathcal{K}=(t, M_{t},\mathrm{Id})
\mathcal{K}\ast \mathcal{M}= \mathcal{M}
An adapted t-knot for \mathcal{M}= (M_{0:n},G_{0:n}) is
\mathcal{K}=(t,\mathrm{Id},M_t)
M_0(x_0) = \begin{cases} \frac{1}{2}, & x_0 = 0\\ \frac{1}{2}, & x_0 = 1\\ \end{cases}
M_1(x_0, x_1) = \begin{cases} 1-\delta, & x_1 = x_0\\ \delta, & x_1 = 1-x_0\\ \end{cases}
G_t(x_t) = \begin{cases} 1-\epsilon, & x_t = y_t\\ \epsilon, & x_t = 1- y_t,\\ \end{cases}, \quad t \in \{0,1\} with fixed observations y_t.
Take y_0 = 0 and y_1 = 1 and
consider a test function \varphi(x_1) = x_1
X_0\sim \mathcal{T}(\mu,\Sigma,\nu)
(X_p \mid X_{p-1}) \sim \mathcal{T}(f_p(X_{p-1}),\Sigma,\nu)~\text{for}~p\in[1:n]
Knots are a tool for algorithm design
Generalise Rao-Blackwellisation
“Fix” the counter example in [J&D, 2008]
Knots create an ordering of algorithms
Not discussed
Definition 1: Feynman–Kac knot
A knot \mathcal{K} is specified by a time t \in \mathbb{N}_{0}, and an ordered pair of composable Markov kernels R and K. We say \mathcal{K}=(t,R,K).
Note that R is a probability distribution when t=0.
Definition 2: Knot compatibility
Let \mathcal{M}=(M_{0:n},G_{0:n}) be a Feynman–Kac model.
A knot \mathcal{K}=(t,R,K) is compatible with \mathcal{M} if t < n and M_t = RK.
Definition 3: Knot operator
For knot \mathcal{K} = (t,R,K) and model \mathcal{M} = (M_{0:n},G_{0:n}),
the operation \mathcal{K} \ast \mathcal{M} = (M_{0:n}^\ast,G_{0:n}^\ast) where M_{t}^\ast = R, \quad G_t^\ast = K(G_t), \quad M_{t+1}^\ast = K^{G_t} M_{t+1}, with
K^{G_t}(y_t, \mathrm{d} x_t) = \frac{K(y_t, \mathrm{d} x_t)G_t(x_t)}{K(G_t)(y_t)}
M_{p}^\ast = M_{p} for p \notin \{t,t+1\}
G_{p}^\ast = G_{p} for p \neq t
If \mathcal{M}^\ast = \mathcal{K}\ast \mathcal{M} where \mathcal{K}= (t,R,K) then
\gamma_n^\ast(\varphi) = \gamma_n(\varphi)
\gamma_t^\ast = \hat\gamma_{t-1} R
Q_{t,n}^\ast = K Q_{t,n}
Theorem 1: Variance reduction from a knot
Consider models \mathcal{M} and {\mathcal{M}^\ast = \mathcal{K} \ast \mathcal{M}} for some knot \mathcal{K}. Let \mathcal{M} and \mathcal{M}^\ast have terminal measures \gamma_n and \gamma_n^\ast and asymptotic variance maps \sigma^2 and \sigma_\ast^2 respectively. If \varphi \in \mathcal{F}(\mathcal{M}) then the measures are equivalent \gamma_n(\varphi) = \gamma^\ast_n(\varphi) and the variances satisfy {\sigma_\ast^2(\varphi) \leq \sigma^2(\varphi)}.
Definition 4: A partial ordering
Consider two Feynman–Kac models, \mathcal{M},\mathcal{M}^\ast\in\mathscr{M}_n.
We say that {\mathcal{M}^\ast \preccurlyeq \mathcal{M}} if there exists a sequence of knots \mathcal{K}_1, \mathcal{K}_2,\ldots, \mathcal{K}_m such that \mathcal{M}^\ast = \mathcal{K}_m\ast\cdots\ast \mathcal{K}_1\ast\mathcal{M} for some m \in \mathbb{N}_{1}.
Theorem 2: Variance ordering
If {\mathcal{M}^\ast \preccurlyeq \mathcal{M}} then \sigma_\ast^2(\varphi) \leq \sigma^2(\varphi) for all \varphi where \sigma^2(\varphi) is finite.
Definition 5: Knotset
A knotset \mathcal{K}=(R_{0:n-1},K_{0:n-1}) is specified by n ordered pairs of composable kernels R_p and K_p. Note that for p=0, R_0 is a probability distribution.
Definition 6: Knotset compatibility
A knotset \mathcal{K}=(R_{0:n-1},K_{0:n-1}) is compatible with a Feynman–Kac
model \mathcal{M}=(M_{0:n},G_{0:n}) if M_p = R_p K_p for all p\in[0\,{:}\,n-1].
Definition 7: Knotset operator
For knotset \mathcal{K}= (R_{0:n-1},K_{0:n-1}) and model \mathcal{M}= (M_{0:n},G_{0:n}),
the operation \mathcal{K}\ast \mathcal{M}= (M_{0:n}^\ast,G_{0:n}^\ast) where \begin{alignat*}{3} M_0^\ast &= R_0, &\quad G_0^\ast &= K_0(G_0), & & \\ M_{p}^\ast &= K_{p-1}^\prime R_p, &\quad G_p^\ast &= K_p(G_p), & &\quad p \in [1\,{:}\,n-1] \\ M_{n}^\ast &= K_{n-1}^\prime M_{n}, &\quad G_n^\ast &= G_n & \end{alignat*} where K_p^\prime(y_p, \mathrm{d}x_p) = \frac{K_p(y_p, \mathrm{d}x_p)G_p(x_p)}{K_p(G_p)(y_p)} for p \in [0\,{:}\,n-1].
