University of Melbourne, 23rd March 2026
Adelaide University
Statistics
Privacy
Bon, Bailie, Rousseau & Robert (2026) arXiv 2601.22945
Under review at ICML 2026
\text{Dataset}~x \overset{\text{release}}{\longrightarrow} \text{statistic}~T
Example
M(x,\cdot) = \mathcal{N}(\overline{x},\sigma^2)
To limit information leaked about data x by statistic T,
control the sensitivity of the mechanism M(x,\cdot) w.r.t. x.
Borrow from Lipschitz continuity (Bailie and Gong 2024).
d_\text{P}[M(x,\cdot),M(x^\prime,\cdot)] \leq \epsilon~d_\text{X}[x,x^\prime]
for all x,x^\prime \in \mathcal{D}.
A mechanism M is \epsilon-DP if
\sup_{S\in\mathcal{S}}\left\vert\ln\frac{M(x,S)}{M(x^\prime,S)}\right\vert \leq \epsilon
for all x,x^\prime \in \mathcal{D}, differing by at most one element.
Pure Differential Privacy
M(x,S) \leq \exp\{\epsilon\} M(x^\prime, S) for all x \sim x^\prime and S \in \mathcal{S}.
Focusing on (pure) \epsilon-DP.
M_1 \otimes M_2 ~\text{is}~ 2\epsilon\text{-DP}
M_1 K ~\text{is}~ \epsilon\text{-DP}
Focusing on \epsilon-DP. Examples:
If releasing the mean and variance of a dataset is \epsilon-DP respectively, then the joint release is 2\epsilon-DP.
If M constructs a histogram that is \epsilon-DP, then any quantile from the histogram will also be \epsilon-DP.
There are many variants and relaxations of DP.
\inf_{x\sim x^\prime}\mathbb{P}_{x}\left[ m(x,T) \leq \exp\{\epsilon\} m(x^\prime, T) \right] \geq 1 - \delta
m(x,S) \leq \exp\{\epsilon\} m(x^\prime, S) + \delta for all S \in \mathcal{S}.
There are many variants and relaxations of DP.
All related to Lipschitz interpretation
Our approach is a semantics-first understanding of data privacy, where
Sender
Government agency release small area statistics on a disease
Receiver
Insurance company raises premiums using inferred disease prevalence in a small town
Effect on data privacy
Statistic release \longrightarrow adversarial decisions \longrightarrow privacy effect
Sender’s privacy measured with a privacy function
\rho:\mathcal{D} \times \mathsf{X} \rightarrow \mathbb{R}
Receiver’s decision \quad d_i \in \mathcal{D}
Data value \quad x \in \mathsf{X}
\rho orders preferences of decisions for a given dataset
For some fixed \kappa>0,
\rho(d,x) = \begin{cases} 0, & \text{if } x \in d, \vert d \vert < \kappa\\ 1, & \text{otherwise}, \end{cases}
x \in \mathsf{X} = \mathbb{R}
d \in \mathcal{D} = \{[a,b]:a,b\in \mathbb{R}, a\leq b\}
\rho(d,x) = -\log d(x)
x \in \mathsf{X} = \mathbb{R}
d \in \mathcal{D} = \{\text{probability density functions on } \mathbb{R}\}
A statistic T \in \mathsf{T} is output from a mechanism M given x
T \sim M(x,\cdot)
M can be deterministic or randomised
M can be deterministic or randomised
Privacy class: A set of mechanisms that satisfy a given privacy definition.
If \mathfrak{C} is the privacy class generated by “\text{D}” then
M \in \mathfrak{C} \Longleftrightarrow M~\text{satisfies}~\text{D}
Assumption 1: Transparency
Sender shares the mechanism M and privacy class \mathfrak{C}, for which M \in \mathfrak{C}, with Receiver. Further, the definitions of M and \mathfrak{C} do not depend on the data.
Assumption 2: Bayesian adversary
Receiver makes Bayesian decisions
\mathfrak{C} may affect Receiver’s data posterior. Here assumed not to.
From Assumption 1:
Receiver’s optimal decision
d^{P} \in \arg\inf_{d \in \mathcal{D}} \mathbb{E}_{z \sim Q_T}[ \ell(d,z)]
To model Receiver’s decision:
We need assumptions on Q and \ell
Assumption 3: Adversarial loss function
Receiver has loss function \ell(d,x) = \rho(d,x)
If d^P_{\ell^\prime} is Receiver’s optimal decision under (P,\ell^\prime) then
\mathbb{E}_{x \sim P}[ \rho(d^{P}_\rho,x)] \leq \mathbb{E}_{x \sim P}[ \rho(d^P_{\ell^\prime},x)]
Assumption 4: Adversarial prior class
Let the data-prior Q \in \mathcal{Q}_x
Implies privacy outcome in terms of
Receiver’s decision
d^{Q_T} \in \arg\inf_{d \in \mathcal{D}} \mathbb{E}_{z \sim Q_T}[ \rho(d,z)]
Privacy outcome
\inf_{Q \in \mathcal{Q}_x} \rho(d^{Q_T},x)
Privacy outcome
\inf_{Q \in \mathcal{Q}_x} \rho(d^{Q_T},x)
Privacy outcome
\inf_{Q \in \mathcal{Q}_x} S(Q_T,x)
\rho(d,x) = -\log d(x) then S(Q_T,x)=-\log q_T(x).
Without loss of generality, we focus on proper scoring rules.
Privacy outcome
\inf_{Q \in \mathcal{Q}_x} S(Q_T,x)
Relative privacy outcome
\inf_{Q \in \mathcal{Q}_x} \left[ S(Q_T,x) - S(Q,x) \right]
Relative privacy outcome
\inf_{Q \in \mathcal{Q}_x} \left[ S(Q_T,x) - S(Q,x) \right]
But…
Let \mathcal{Q}_x \subset \mathscr{P}(\mathsf{X},\mathcal{X})
M: \mathcal{T}\times \mathsf{X} \rightarrow \mathbb{R}_+ be a mechanism,
S be a privacy score, constants \kappa \geq 0, 0\leq\delta \ll 1.
Definition: Persuasive Privacy
We say M is (\mathcal{Q}_x, S, \kappa, \delta)-PP if
\inf_{x\in\mathsf{X}}\inf_{Q\in\mathcal{Q}_x}\mathbb{P}_x\left[S(Q, x) - S(Q_{T}, x) \leq \kappa \right] \geq 1 - \delta,
where \mathbb{P}_x is w.r.t. T \sim M(x,\cdot).
Definition: Persuasive Privacy
We say M is (\mathcal{Q}_x, S, \kappa, \delta)-PP if
\inf_{x\in\mathsf{X}}\inf_{Q\in\mathcal{Q}_x}\mathbb{P}_x\left[S(Q, x) - S(Q_{T}, x) \leq \kappa \right] \geq 1 - \delta,
where \mathbb{P}_x is w.r.t. T \sim M(x,\cdot).
Sender chooses a mechanism to persuade Receiver to make decisions having limited impact on privacy.
Similar to Bayesian persuasion (Kamenica and Gentzkow 2011) However,
Composition
Receiver post-processing
Definition: Receiver Post-Processing
A guarantee \mathrm{D} satisfies the receiver post-processing property if M \in \mathfrak{C}(\mathrm{D}) implies that M\otimes K \in \mathfrak{C}(\mathrm{D}) for all Markov kernels K independent of the data x.
Recall the Persuasive Privacy elements (\mathcal{Q}_x, S, \kappa, \delta), and consider
\mathcal{H}_x = \{ Q \in \mathcal P_2:x^\prime \sim x, Q(\{ x , x^\prime \})= 1 \}
S(P,x) = -\log p(x)
Alternative hypothesis prior class and log-probability score recover PrDP as instance of Persuasive Privacy
\inf_{x\sim x^\prime}\mathbb{P}_{x}\left[ m(x,T) \leq \exp\{\epsilon\} m(x^\prime, T) \right] \geq 1 - \delta
M(x,\cdot) = \delta_{\bar x}(\cdot)
Use class of Gaussian distributions
\mathcal{G}_{x}^{r} = \left\{ \mathcal{N}(\mu,\Sigma): \frac{(\bar{x}-\bar{\mu})^2}{\overline{\Sigma}} \leq r_1 , c_\Phi \leq r_2 \left(1- \frac{\sigma_i^2}{\Vert\sigma \Vert_2^2} \right) \right\}
for r_1 > 0 and r_2 > 1.
Use (marginal) Dawid–Sebastiani Score
D_i(Q,x) = \log \sigma^2_i(Q) + \frac{[x_i - \mu_i(Q)]^2}{\sigma^{2}_i(Q)}
for each element of data, and consider the worst-case element for privacy.
Proposition
The average mechanism M(x,\cdot) = \delta_{\bar{x}} satisfies (\mathcal{I},\mathcal{G}_{x}^{r},r_1+ \log r_2,0)-PP.
Short term
Decomposition rule (for deterministic mechanisms)
Nonparametric prior families
Independent Commissioner Against Corruption (SA)
Medium term
Privacy-utility trade-off
Computation (SMC, SBI) versus privacy approximation trade-off