# A Jackknife-bootstrap hybrid resampling method

> Luchang Jin
> 2025/12/29

$$
\def\ba#1\ea{\begin{align}#1\end{align}}
\def\nn{\nonumber}
\def\ra{\rangle}
\def\la{\langle}
\def\bra{\big\rangle}
\def\bla{\big\langle}
\def\Bra{\Big\rangle}
\def\Bla{\Big\langle}
\def\ud{\mathrm{d}}
\nn
$$

The method is proposed in the following paper by Chien-Fu Jeff Wu (吳建福).

https://projecteuclid.org/journals/annals-of-statistics/volume-14/issue-4/Jackknife-Bootstrap-and-Other-Resampling-Methods-in-Regression-Analysis/10.1214/aos/1176350142.full

```
@article{10.1214/aos/1176350142,
author = {C. F. J. Wu},
title = {{Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis}},
volume = {14},
journal = {The Annals of Statistics},
number = {4},
publisher = {Institute of Mathematical Statistics},
pages = {1261 -- 1295},
keywords = {$M$-regression, balanced residuals, bias reduction, bias-robustness, bootstrap, Fieller's linterval, generalized linear models, jackknife percentile, Linear regression, Nonlinear regression, representation of the least squares estimator, variable jackknife, Weighted jackknife},
year = {1986},
doi = {10.1214/aos/1176350142},
URL = {https://doi.org/10.1214/aos/1176350142}
}
```

Below, we concisely describe our implementation of the method in the context of lattice QCD calculations. Let $C_j$ be the initial data, and $j$ is the index of the configuration. For example, $C_j$ can be a correlation function measured on the configuration $j$. For a particular $j$, $C_j$ can be one number or a set of numbers. The average of the data is:

$$
\ba
C_\text{avg} = \frac{1}{N} \sum_{j} C_j
,
\\
\ea
$$

where the summation ranges over all available configurations, $N$ is the total number of available configurations for this observable $C$.

We intend to define the Jackknife-bootstrap hybrid (J-B hybrid) samples to fluctuate around $C_\text{avg}$ similar to how $C_\text{avg}$ fluctuate around the true expectation value of $C$. The total number of J-B hybrid samples is $N_\text{rs}$. Similar to the standard bootstrap procedure, this number is adjustable. The definition of the J-B hybrid samples is

$$
\ba
\overline{C}_{i}
=
C_\text{avg}
+
\sum_{j}
\frac{r_{i,j}}{\sqrt{N(N-1)}}
(C_j - C_\text{avg})
,
\\
\ea
$$

where $i$ is the resampling sample index that ranges from $1$ to $N_\text{rs}$. The random weights $r_{i,j}$ follow the standard normal distribution with

$$
\ba
\mathrm{E}(r_{i,j}) =& 0
,
\\
\mathrm{E}(r_{i,j}^2) =& 1
.
\\
\ea
$$

The random numbers $r_{i,j}$ with different $i$ or $j$ indices are statistically independent. Note that the $j$ index should uniquely label the configuration, including both the ID for the ensemble and the trajectory number of the configuration within the ensemble.

After the J-B hybrid samples are obtained, we can calculate the estimation of the central value and the statistical error of observable $O$.

$$
\ba
O_\text{avg} =& O(C_\text{avg})
,
\\
O_\text{err} =& \sqrt{\frac{1}{N_\text{rs}}\sum_{i=1}^{N_\text{rs}} (O(\overline{C}_i) - O_\text{avg})^2}
.
\\
\ea
$$

## Blocking

To deal with possible correlation between the data from different configurations, we need to introduce blocking. In the J-B hybrid resampling method, we implement the blocking procedure as follow.

We introduce the blocking function acting on the J-B hybrid sample index $i$ and the configuration index $j$

$$
\ba
b(i,j).
\ea
$$

The function should return unique label for the block that the configuration $j$ belongs to. Note that the blocking schemes can be different for different J-B hybrid sample index ($i$). Typically, we should keep the blocking size the same. However, we may choose different the blocking boundaries for different J-B hybrid samples. The number of configurations within a block is denoted as

$$
\ba
N_{b(i,j)}.
\ea
$$

With blocking, the definition of the average is the same as before,

$$
\ba
C_\text{avg} = \frac{1}{N} \sum_{j} C_j
.
\\
\ea
$$

The definition of the J-B hybrid samples is slightly altered as

$$
\ba
\overline{C}_{i}
=
C_\text{avg}
+
\sum_{j}
\frac{r_{i,b(i,j)}}{\sqrt{N(N-N_{b(i,j)})}}
(C_j - C_\text{avg})
.
\\
\ea
$$

Note that the random weights $r_{i,b(i,j)}$ depends on the label of the block ($b(i,j)$), instead of the index of the configuration ($j$).

The remaining procedures are the same as before.