
Variance ordering for SMC algorithms
Adelaide University*
Joint work with Anthony Lee
at University of Bristol

$$





















\eta_0(\mathrm{d}x_0) =\color{green}{M_0(\mathrm{d}x_0)}
~
\eta_1(\mathrm{d}x_1) \propto \int\eta_0(\mathrm{d}x_0)\color{blue}{G_0(x_0)}\color{green}{M_1(x_0, \mathrm{d}x_1)}
~
\eta_2(\mathrm{d}x_2) \propto \int\eta_1(\mathrm{d}x_1)\color{blue}{G_1(x_1)}\color{green}{M_2(x_1, \mathrm{d}x_2)}
Construction: Distribution/kernel, Potential

\eta_0^N(\mathrm{d}x_0)
~
\eta_1^N(\mathrm{d}x_1)
~
\eta_2^N(\mathrm{d}x_2)
~
Construction: Sample, Weight, resample
Sample initial \zeta_0^i \overset{\text{iid}}{\sim} \color{green}{M_0} then for each time p \in [1:n]
Sample ancestors A_{p-1}^i w.p.
\qquad\mathbb{P}(A_{p-1}^i=j) \propto \color{blue}{G_{p-1}}(\zeta_{p-1}^j) with j \in [1:N]
Sample predictive
\qquad\zeta^{i}_{p} \sim \color{green}{M_{p}}(\zeta^{A^{i}_{p-1}}_{p-1},\cdot)
Output: Terminal particles \zeta_n^{1:N}
Sample initial \zeta_0^i \overset{\text{iid}}{\sim} \color{green}{\mathcal{N}(\mu,\Sigma)} then for each time p \in [1:n]
Output: Terminal particles \zeta_n^{1:N}
\color{green}{M_0} = \mathcal{N}(\mu,\Sigma)
\color{green}{M_p}(x_{p-1}, \cdot) = \mathcal{N}(B x_{p-1}+c,\Sigma)
\color{blue}{G_p}(x_p) = L(y_p \mid x_p)
\color{green}{M_0(\mathrm{d}x_0)}\color{blue}{G_0(x_0)} \color{black}{\prod_{p=1}^n} \color{green}{M_p(x_{p-1},\mathrm{d} x_p)}\color{blue}{G_p(x_p)}
\Longleftrightarrow
\mathcal{M} = (\color{green}{M_{0:n}},\color{blue}{G_{0:n}})
\mathcal{M} associated with marginal measures:
\gamma_{p+1}(\cdot) = \int \gamma_p(\mathrm{d}x_p)G_p(x_p)M_{p+1}(x_p, \cdot) with \gamma_{0} = M_0.
\gamma_{p}(\varphi) = \mathbb{E}\left[\varphi(X_p) \prod_{t=0}^{p-1}G_t(X_t) \right]
\mathcal{M} associated with marginal probability measures:
\eta_{p}(\mathrm{d}x_p) = \frac{\gamma_{p}(\mathrm{d}x_p)}{\gamma_{p}(1)}
\mathcal{M} associated with marginal updated (probability) measures:
\hat\gamma_{p}(\mathrm{d}x_p) = \gamma_{p}(\mathrm{d}x_p) G_p(x_p) \qquad \hat\eta_{p}(\mathrm{d}x_p) = \frac{\hat\gamma_{p}(\mathrm{d}x_p)}{\hat\gamma_{p}(1)}
Particle filter yields \zeta^{1:N}_n (and optionally \zeta^{1:N}_p)
\eta_{n}^{N} = \frac{1}{N}\sum_{i=1}^{N}\delta_{\zeta^{i}_n}
\gamma_{n}^{N} = \left\{\prod_{t=0}^{n-1} \eta_t^{N}(G_t)\right\}\eta_{n}^{N}
As N \rightarrow \infty for suitable functions \varphi: \mathsf{X}_p \rightarrow \mathbb{R}
\gamma_p^N(\varphi) \rightarrow \gamma_p(\varphi)\\~
\eta_p^N(\varphi) \rightarrow \eta_p(\varphi)
Measure
N \,\mathrm{Var}\left\{\gamma_n^N(\varphi)/\gamma_n(1)\right\} \rightarrow \sigma^2(\varphi)
Probability measure
N \,\mathbb{E}\left[\left\{\eta_n^N(\varphi)-\eta_n(\varphi)\right\}^2 \right]\rightarrow \sigma^2(\varphi - \eta_n(\varphi))
\sigma^2(\varphi) = \sum_{p=0}^n v_{p,n}(\varphi)
v_{p,n}(\varphi) = \frac{\gamma_p(1)\gamma_p(Q_{p,n}(\varphi)^2)}{\gamma_n(1)^2} - \eta_n(\varphi)^2
Q_{p,n}(\varphi)(x_p) = \mathbb{E}\left[ \varphi(X_n) \prod_{t=p}^n G_t(X_t) \mid X_p=x_p \right].
Key idea: Rank PFs with asymptotic variance maps \sigma^2(\cdot)
\mathcal{M} \Longleftrightarrow M_0(\mathrm{d}x_0)G_0(x_0)\prod_{p=1}^n M_p(x_{p-1},\mathrm{d}x_p)G_p(x_p)
\mathcal{M} \Longleftrightarrow \cdots M_p(x_{p-1},\mathrm{d}x_p)G_p(x_p) \cdots
\mathcal{M} \Longleftrightarrow \cdots {\color{orange}R_p}(x_{p-1},\mathrm{d}y_p){\color{purple}K_p}(y_p,\mathrm{d}x_p)G_p(x_p) \cdots
\mathcal{M}^\ast \Longleftrightarrow \cdots {\color{orange}R_p}(x_{p-1},\mathrm{d}y_p){\color{purple}K_p}(G_p)(y_p){\color{purple}K_p}^{G_p}(y_p,\mathrm{d}x_p) \cdots
\mathcal{M}^\ast \Longleftrightarrow \cdots M_p^\ast(y_{p-1},\mathrm{d}y_p)G_p^\ast(y_p) \cdots
Feynman–Kac model: \mathcal{M}= (M_{0:n},G_{0:n})
Knot with M_t=RK: \mathcal{K}= (t,R,K)
Knot-model: \mathcal{M}^\ast = \mathcal{K}\ast \mathcal{M}
\mathcal{M}^\ast = \mathcal{K}\ast \mathcal{M}= (M_{0:n}^\ast,G_{0:n}^\ast), \qquad \mathcal{K}= (t,R,K)
\mathcal{M}^\ast = \mathcal{K}\ast \mathcal{M}, \quad \mathcal{K}= (t,R,K), \quad \mathcal{M}= (M_{0:n},G_{0:n})
\sigma^2(\cdot), \quad \sigma^2_\ast(\cdot)
If t < n, \sigma^2_\ast(\varphi) \leq \sigma^2(\varphi)
for any relevant test function \varphi.
A trivial t-knot for model \mathcal{M}= (M_{0:n},G_{0:n}) is
\mathcal{K}=(t, M_{t},\mathrm{Id})
\mathcal{K}\ast \mathcal{M}= \mathcal{M}
An adapted t-knot for \mathcal{M}= (M_{0:n},G_{0:n}) is
\mathcal{K}=(t,\mathrm{Id},M_t)
M_0(x_0) = \begin{cases} \frac{1}{2}, & x_0 = 0\\ \frac{1}{2}, & x_0 = 1\\ \end{cases}
M_1(x_0, x_1) = \begin{cases} 1-\delta, & x_1 = x_0\\ \delta, & x_1 = 1-x_0\\ \end{cases}
G_t(x_t) = \begin{cases} 1-\epsilon, & x_t = y_t\\ \epsilon, & x_t = 1- y_t,\\ \end{cases}, \quad t \in \{0,1\} with fixed observations y_t.
Take y_0 = 0 and y_1 = 1 and
consider a test function \varphi(x_1) = x_1
X_0\sim \mathcal{T}(\mu,\Sigma,\nu)
(X_p \mid X_{p-1}) \sim \mathcal{T}(f_p(X_{p-1}),\Sigma,\nu)~\text{for}~p\in[1:n]
Knots are a tool for algorithm design
Generalise Rao-Blackwellisation
“Fix” the counter example in [J&D, 2008]
Knots create an ordering of algorithms
Not discussed
Definition 1: Feynman–Kac knot
A knot \mathcal{K} is specified by a time t \in \mathbb{N}_{0}, and an ordered pair of composable Markov kernels R and K. We say \mathcal{K}=(t,R,K).
Note that R is a probability distribution when t=0.
Definition 2: Knot compatibility
Let \mathcal{M}=(M_{0:n},G_{0:n}) be a Feynman–Kac model.
A knot \mathcal{K}=(t,R,K) is compatible with \mathcal{M} if t < n and M_t = RK.
Definition 3: Knot operator
For knot \mathcal{K} = (t,R,K) and model \mathcal{M} = (M_{0:n},G_{0:n}),
the operation \mathcal{K} \ast \mathcal{M} = (M_{0:n}^\ast,G_{0:n}^\ast) where M_{t}^\ast = R, \quad G_t^\ast = K(G_t), \quad M_{t+1}^\ast = K^{G_t} M_{t+1}, with
K^{G_t}(y_t, \mathrm{d} x_t) = \frac{K(y_t, \mathrm{d} x_t)G_t(x_t)}{K(G_t)(y_t)}
M_{p}^\ast = M_{p} for p \notin \{t,t+1\}
G_{p}^\ast = G_{p} for p \neq t
If \mathcal{M}^\ast = \mathcal{K}\ast \mathcal{M} where \mathcal{K}= (t,R,K) then
\gamma_n^\ast(\varphi) = \gamma_n(\varphi)
\gamma_t^\ast = \hat\gamma_{t-1} R
Q_{t,n}^\ast = K Q_{t,n}
Theorem 1: Variance reduction from a knot
Consider models \mathcal{M} and {\mathcal{M}^\ast = \mathcal{K} \ast \mathcal{M}} for some knot \mathcal{K}. Let \mathcal{M} and \mathcal{M}^\ast have terminal measures \gamma_n and \gamma_n^\ast and asymptotic variance maps \sigma^2 and \sigma_\ast^2 respectively. If \varphi \in \mathcal{F}(\mathcal{M}) then the measures are equivalent \gamma_n(\varphi) = \gamma^\ast_n(\varphi) and the variances satisfy {\sigma_\ast^2(\varphi) \leq \sigma^2(\varphi)}.
Definition 4: A partial ordering
Consider two Feynman–Kac models, \mathcal{M},\mathcal{M}^\ast\in\mathscr{M}_n.
We say that {\mathcal{M}^\ast \preccurlyeq \mathcal{M}} if there exists a sequence of knots \mathcal{K}_1, \mathcal{K}_2,\ldots, \mathcal{K}_m such that \mathcal{M}^\ast = \mathcal{K}_m\ast\cdots\ast \mathcal{K}_1\ast\mathcal{M} for some m \in \mathbb{N}_{1}.
Theorem 2: Variance ordering
If {\mathcal{M}^\ast \preccurlyeq \mathcal{M}} then \sigma_\ast^2(\varphi) \leq \sigma^2(\varphi) for all \varphi where \sigma^2(\varphi) is finite.
Definition 5: Knotset
A knotset \mathcal{K}=(R_{0:n-1},K_{0:n-1}) is specified by n ordered pairs of composable kernels R_p and K_p. Note that for p=0, R_0 is a probability distribution.
Definition 6: Knotset compatibility
A knotset \mathcal{K}=(R_{0:n-1},K_{0:n-1}) is compatible with a Feynman–Kac
model \mathcal{M}=(M_{0:n},G_{0:n}) if M_p = R_p K_p for all p\in[0\,{:}\,n-1].
Definition 7: Knotset operator
For knotset \mathcal{K}= (R_{0:n-1},K_{0:n-1}) and model \mathcal{M}= (M_{0:n},G_{0:n}),
the operation \mathcal{K}\ast \mathcal{M}= (M_{0:n}^\ast,G_{0:n}^\ast) where \begin{alignat*}{3} M_0^\ast &= R_0, &\quad G_0^\ast &= K_0(G_0), & & \\ M_{p}^\ast &= K_{p-1}^\prime R_p, &\quad G_p^\ast &= K_p(G_p), & &\quad p \in [1\,{:}\,n-1] \\ M_{n}^\ast &= K_{n-1}^\prime M_{n}, &\quad G_n^\ast &= G_n & \end{alignat*} where K_p^\prime(y_p, \mathrm{d}x_p) = \frac{K_p(y_p, \mathrm{d}x_p)G_p(x_p)}{K_p(G_p)(y_p)} for p \in [0\,{:}\,n-1].
