# `qlat_utils.data` — Data Analysis Utilities Source: `qlat-utils/qlat_utils/data.py` > **Note:** Update this document when updating the source file. ## Outline 1. [Overview](#overview) 2. [Physical Constants](#physical-constants) 3. [Type Tuples](#type-tuples) 4. [Interpolation](#interpolation) 5. [Data Wrapper Class](#data-wrapper-class) 6. [Basic Statistics](#basic-statistics) 7. [Jackknife Resampling](#jackknife-resampling) 8. [Super-Jackknife](#super-jackknife) 9. [Randomized Jackknife-Bootstrap Hybrid](#randomized-jackknife-bootstrap-hybrid) 10. [Unified Jackknife API](#unified-jackknife-api) 11. [Value Display](#value-display) 12. [Context Managers](#context-managers) 13. [Examples](#examples) --- ## Overview The `qlat_utils.data` module provides data analysis tools for lattice QCD computations. It includes: - **Interpolation** — linear interpolation for arrays with fractional indices. - **Jackknife resampling** — standard, super-jackknife, and randomized jackknife-bootstrap (RJK) methods for error estimation. - **The `Data` class** — a wrapper that supports arithmetic on nested numeric structures (scalars, lists, NumPy arrays). - **Value display** — formatting of `(value, error)` pairs for publication. - **Context managers** — temporary override of global jackknife and display settings. ```python import qlat_utils as q import numpy as np avg, err = q.avg_err([1.0, 1.1, 0.9, 1.05]) print(q.show_val_err((avg, err))) ``` --- ## Physical Constants | Name | Value | Description | |---|---|---| | `alpha_qed` | `1 / 137.035999084` | Fine-structure constant | | `fminv_gev` | `0.197326979` | Conversion factor: hbar*c / (1 fm * 1 GeV) | ```python import qlat_utils as q print(q.alpha_qed) # 0.0072973525693... print(q.fminv_gev) # 0.197326979 ``` --- ## Type Tuples Module-level tuples used for type-checking throughout the library. Extended types (`float128`, `complex256`) are included when the platform supports them. | Name | Contents | |---|---| | `float_types` | `float`, `np.float32`, `np.float64` (plus `np.float128` if available) | | `complex_types` | `complex`, `np.complex64`, `np.complex128` (plus `np.complex256` if available) | | `int_types` | `int`, `np.int32`, `np.int64` | | `real_types` | `float_types + int_types` | | `number_types` | `real_types + complex_types` | --- ## Interpolation ### `interp_i_arr(data_x_arr, x_arr)` Return index array `i_arr` such that `q.interp(data_x_arr, i_arr)` is approximately `x_arr`. Useful for mapping x-coordinates to fractional indices. | Parameter | Type | Description | |---|---|---| | `data_x_arr` | array-like | Known x-values (must be monotonic) | | `x_arr` | float or array-like | Target x-values | ### `interp(data_arr, i_arr, axis=-1)` Return approximately `data_arr[..., i_arr]` using linear interpolation. The index `i_arr` may be non-integer (fractional indices are interpolated between adjacent elements). | Parameter | Type | Description | |---|---|---| | `data_arr` | array-like | Source data | | `i_arr` | float or 1-D array | Fractional index or indices | | `axis` | int | Axis along which to interpolate (default `-1`) | ### `interp_x(data_arr, data_x_arr, x_arr, axis=-1)` Interpolate `data_arr` at arbitrary x-values. Combines `interp_i_arr` and `interp`. | Parameter | Type | Description | |---|---|---| | `data_arr` | array-like | Source data | | `data_x_arr` | array-like | x-values for `data_arr`; shape must be `(data_arr.shape[axis],)` | | `x_arr` | float or 1-D array | Target x-values | | `axis` | int | Axis along which to interpolate (default `-1`) | ### `get_threshold_idx(arr, threshold)` Return the fractional index `x` such that `interp(arr, [x])` is approximately `threshold`. Uses binary search on a 1-D array. ### `get_threshold_i_arr(data_arr, threshold_arr, axis=-1)` Broadcast version of `get_threshold_idx` over an array. Returns an index array where each entry satisfies the threshold condition along the given axis. ### `get_threshold_x_arr(data_arr, data_x_arr, threshold_arr, axis=-1)` Like `get_threshold_i_arr`, but returns x-values instead of indices. --- ## Data Wrapper Class ### `class Data` A wrapper around numeric values that supports arithmetic operations on nested structures (scalars, lists, NumPy arrays, `LatData`). **Supported value types:** numeric scalars, `numpy.ndarray`, `q.LatData`, and `list` (element-wise operations). ```python import qlat_utils as q d1 = q.Data([1.0, 2.0, 3.0]) d2 = q.Data([0.5, 0.5, 0.5]) d3 = d1 + d2 # Data([1.5, 2.5, 3.5]) d4 = d1 * 2.0 # Data([2.0, 4.0, 6.0]) d5 = -d1 # Data([-1.0, -2.0, -3.0]) ``` | Method | Description | |---|---| | `get_val()` | Return the wrapped value | | `qnorm()` | Return the squared norm | | `glb_sum()` | MPI global sum (requires `qlat`) | | `__add__`, `__radd__` | Addition | | `__sub__`, `__rsub__` | Subtraction | | `__mul__`, `__rmul__` | Scalar or element-wise multiplication | | `__neg__`, `__pos__` | Unary negation and identity | | `__copy__`, `__deepcopy__` | Copy support | --- ## Basic Statistics ### `check_zero(x)` Return `True` if `x` is a real type and equals zero. ### `qnorm(x)` Return the squared norm of `x`. For scalars: `x*x`. For complex: `re^2 + im^2`. For arrays: `abs(vdot(x, x))`. For lists/tuples: sum of `qnorm` of each element. ```python q.qnorm(2) # 4 q.qnorm(1 + 2j) # 5 (1*1 + 2*2) ``` ### `average(data_list)` Return the arithmetic mean of `data_list`. ### `average_ignore_nan(value_arr_list)` Return element-wise average across a list of NumPy arrays, ignoring `NaN` values. Returns `NaN` for elements where all inputs are `NaN`. ### `block_data(data_list, block_size, is_overlapping=True)` Return a list of block averages. If `is_overlapping` is `True` (default), blocks overlap by `block_size - 1` entries. ### `avg_err(data_list, *, eps=1, block_size=1)` Compute `(avg, err)` of `data_list` using blocking. The error estimate is: $$\text{err} = |\text{eps}| \sqrt{\frac{\text{block\_size}}{N - \text{block\_size}}} \cdot \text{fsqrt}\big(\text{avg}\big[(d_i - \text{avg})^2\big]\big)$$ | Parameter | Type | Default | Description | |---|---|---|---| | `data_list` | list | — | Data values | | `eps` | float | `1` | Additional scaling factor for error | | `block_size` | int | `1` | Blocking size | Returns `(avg, err)` where both have the same type as the data. ### `partial_sum(x, *, is_half_last=False)` Modify `x` in-place to its cumulative (partial) sum, preserving length. If `is_half_last` is `True`, each entry becomes the average of the current and previous partial sums (trapezoidal rule). Works for 1-D and 2-D arrays. ### `fsqr(data)` / `fsqrt(data)` Component-wise square and square root. For complex types, real and imaginary parts are processed separately: `fsqr(a + bi) = a^2 + b^2 i`, `fsqrt(a + bi) = sqrt(a) + sqrt(b) i`. Supports scalars, `Data`, and NumPy arrays. ### `err_sum(*vs)` Return the quadrature sum of errors: `sqrt(sum(fsqr(v_i)))`. ```python q.err_sum(1.4, 2.1, 1.0) # 2.7147743920996454 ``` --- ## Jackknife Resampling ### `jackknife(data_list, *, eps=1)` Perform standard jackknife. Returns `jk_arr` of length `N + 1` where: - `jk_arr[0]` = average - `jk_arr[i]` = `avg - (eps / N) * (data[i] - avg)` for `i >= 1` ### `jk_avg(jk_arr)` Return the average (first element) of a jackknife array. ### `jk_err(jk_arr, *, eps=1, block_size=1)` Return the jackknife error estimate: $$\frac{1}{\text{eps}} \sqrt{ \frac{N}{N - \text{block\_size}} \sum_{i=1}^{N} (jk[i] - \text{jk\_avg})^2 }$$ The `eps` and `block_size` must match those used in the corresponding `jackknife` call. Note: `len(jk_arr) = N + 1`. ### `jk_avg_err(jk_arr, *, eps=1, block_size=1)` Return `(jk_avg, jk_err)`. --- ## Super-Jackknife ### `sjackknife(data_list, jk_idx_list, *, avg=None, ...)` Perform super-jackknife resampling. Data from different ensembles (identified by `jk_idx_list`) are combined into a single jackknife array. | Parameter | Type | Default | Description | |---|---|---|---| | `data_list` | list/ndarray | — | Original data | | `jk_idx_list` | list | — | Index for each data point (e.g., `(job_tag, traj)`) | | `avg` | any | `None` | Pre-computed average (auto-computed if `None`) | | `is_hash_jk_idx` | bool | `True` | Use hash when jk_idx not in `all_jk_idx` | | `jk_idx_hash_size` | int | `1024` | Hash table size | | `rng_state` | `RngState` | `None` | RNG state (default: `RngState("rejk")`) | | `all_jk_idx` | list | `None` | All possible indices; `all_jk_idx[0]` must be `"avg"` | | `get_all_jk_idx` | callable | `None` | Function returning `all_jk_idx` | | `jk_blocking_func` | callable | `None` | `(i, jk_idx) -> blocked_jk_idx` | | `eps` | float | `1` | Scaling factor | ### `sjk_avg(jk_arr)` / `sjk_err(jk_arr, *, eps=1)` / `sjk_avg_err(jk_arr, *, eps=1)` Average, error, and `(avg, err)` for super-jackknife arrays. The error formula differs from standard jackknife: no `N/(N-1)` factor. ### `sjk_mk_jk_val(rs_tag, val, err, *, ...)` Create a synthetic jackknife array from a central value and error using Gaussian random numbers. --- ## Randomized Jackknife-Bootstrap Hybrid ### `rjackknife(data_list, jk_idx_list, *, avg=None, ...)` Jackknife-bootstrap hybrid resampling. Returns `jk_arr` of length `1 + n_rand_sample`. The distribution of `jk_arr` approximates the distribution of the average. | Parameter | Type | Default | Description | |---|---|---|---| | `data_list` | list/ndarray | — | Original data | | `jk_idx_list` | list | — | Index for each data point | | `avg` | any | `None` | Pre-computed average | | `rng_state` | `RngState` | `None` | RNG state | | `n_rand_sample` | int | `1024` | Number of random samples | | `jk_blocking_func` | callable | `None` | `(i, jk_idx) -> blocked_jk_idx` | | `is_normalizing_rand_sample` | bool | `False` | Normalize random vectors | | `is_apply_rand_sample_jk_idx_blocking_shift` | bool | `True` | Shift blocking per sample | | `eps` | float | `1` | Scaling factor | The formula is: $$jk\_arr[i] = \text{avg} + \sum_{j=1}^{N} \frac{-\text{eps}}{\sqrt{N(N - b(i,j))}} r_{i,j} (d_j - \text{avg})$$ where $r_{i,j} \sim \mathcal{N}(0, 1)$ and $b(i,j)$ is the block size. ### `rjk_avg(jk_arr)` / `rjk_err(jk_arr, eps=1)` / `rjk_avg_err(rjk_list, eps=1)` Average, error, and `(avg, err)` for randomized jackknife arrays. ### `rjk_mk_jk_val(rs_tag, val, err, *, ...)` Create a synthetic RJK array from a central value and error. --- ## Unified Jackknife API The `g_*` functions provide a unified interface that dispatches to either super-jackknife or RJK based on global settings in `default_g_jk_kwargs`. ### `default_g_jk_kwargs` Global dictionary controlling jackknife behavior. Key settings: | Key | Default | Description | |---|---|---| | `jk_type` | `"rjk"` | `"rjk"` or `"super"` | | `eps` | `1` | Scaling factor | | `n_rand_sample` | `1024` | Number of random samples (RJK only) | | `is_normalizing_rand_sample` | `False` | Normalize random vectors (RJK only) | | `is_hash_jk_idx` | `True` | Hash unknown jk indices (super only) | | `jk_idx_hash_size` | `1024` | Hash table size (super only) | | `block_size` | `1` | Default blocking size | | `block_size_dict` | `{}` | Per-`job_tag` blocking sizes | | `rng_state` | `RngState("rejk")` | RNG state | ### `g_mk_jk(data_list, jk_idx_list, *, avg=None, ...)` Create a (randomized) super-jackknife dataset from un-jackknifed data. Dispatches to `sjackknife` or `rjackknife` based on `jk_type`. ### `g_mk_jk_val(rs_tag, val, err, *, ...)` Create a synthetic jackknife array from a value and error. Dispatches to `sjk_mk_jk_val` or `rjk_mk_jk_val`. ### `g_jk_avg(jk_arr)` / `g_jk_err(jk_arr)` / `g_jk_avg_err(jk_arr)` Unified average, error, and `(avg, err)` extraction. ### `g_jk_avg_err_arr(jk_arr)` Return an array with shape `jk_arr[0].shape + (2,)` where the last axis is `(avg, err)`. ### `g_jk_size(*, jk_type, ...)` Return the number of samples in the jackknife array (`1 + n_samples`). ### `g_jk_blocking_func(i, jk_idx)` Apply the configured blocking function. ### `g_jk_sample_size(job_tag, traj_list)` Return the number of distinct blocks for a given `job_tag` and trajectory list. ### `get_jk_state()` / `set_jk_state(state)` Save and restore the current `default_g_jk_kwargs` state (for use with `@cache_call`). --- ## Value Display ### `show_val(val, *, is_latex=True, num_float_digit=None, num_exp_digit=None, exponent=None)` Format a single numeric value for display. | Parameter | Type | Default | Description | |---|---|---|---| | `val` | int/float | — | Value to format | | `is_latex` | bool/None | `True` | Use LaTeX exponent notation | | `num_float_digit` | int/bool/None | `None` | Number of decimal digits (auto if `None`) | | `num_exp_digit` | int/bool/None | `None` | Significant digits in scientific notation | | `exponent` | int/None | `None` | Force a specific exponent | ```python q.show_val(0.00123) # "1.23 \\times 10^{-3}" q.show_val(0.00123, is_latex=False) # "1.23E-3" q.show_val(1234.0) # "1234.0" ``` ### `show_val_err(val_err, *, is_latex=True, num_float_digit=None, num_exp_digit=None, exponent=None)` Format a `(value, error)` pair. Error is shown in parentheses. If `val_err` is a single number, it is formatted as a plain value. ```python q.show_val_err((1.12e16, 12e6)) # auto scientific notation q.show_val_err((1.12e16, 12e7), exponent=10) # force exponent q.show_val_err((1.12e16, 12e7), exponent=10, is_latex=False) # "1.12000(120)E10" ``` --- ## Context Managers ### `class NewDictValues(dictionary, **kwargs)` Context manager that temporarily overrides keys in `dictionary` and restores them on exit. ### `class JkKwargs(**kwargs)` Context manager that temporarily overrides `default_g_jk_kwargs`. ```python with q.JkKwargs(n_rand_sample=2048, block_size=10): jk_arr = q.g_mk_jk(data_list, jk_idx_list) ``` ### `class ShowKwargs(**kwargs)` Context manager that temporarily overrides `default_show_val_kwargs`. ```python with q.ShowKwargs(is_latex=False, exponent=-10): print(q.show_val_err((1.23e-10, 0.05e-10))) ``` --- ## Examples ### Interpolation ```python import qlat_utils as q import numpy as np # Interpolate data at fractional indices data = np.array([10.0, 20.0, 30.0, 40.0]) result = q.interp(data, 1.5) # 25.0 (midpoint between 20 and 30) result_arr = q.interp(data, [0.5, 1.5, 2.5]) # [15.0, 25.0, 35.0] # Interpolate with explicit x-coordinates x_data = np.array([0.0, 1.0, 2.0, 3.0]) y_data = np.array([0.0, 1.0, 4.0, 9.0]) x_new = np.array([0.5, 1.5, 2.5]) y_new = q.interp_x(y_data, x_data, x_new) # interpolated y-values ``` ### Basic Error Estimation ```python import qlat_utils as q import numpy as np # Generate correlated data data = [1.0 + 0.1 * np.random.randn() for _ in range(100)] # Simple average and error avg, err = q.avg_err(data) print(f"avg = {avg:.4f}, err = {err:.4f}") # With blocking to reduce autocorrelation avg_b, err_b = q.avg_err(data, block_size=5) print(f"avg = {avg_b:.4f}, err = {err_b:.4f}") ``` ### Jackknife Resampling ```python import qlat_utils as q import numpy as np data = [1.0, 1.1, 0.9, 1.05, 0.95, 1.02, 0.98, 1.03] # Standard jackknife jk_arr = q.jackknife(data) avg = q.jk_avg(jk_arr) err = q.jk_err(jk_arr) print(f"Jackknife: avg = {avg:.4f}, err = {err:.4f}") # Unified API (uses RJK by default) jk_arr = q.g_mk_jk(data, list(range(len(data)))) avg, err = q.g_jk_avg_err(jk_arr) print(f"RJK: avg = {avg:.4f}, err = {err:.4f}") ``` ### Formatting Values ```python import qlat_utils as q # Format a single value print(q.show_val(0.00123)) # "1.23 \times 10^{-3}" print(q.show_val(0.00123, is_latex=False)) # "1.23E-3" # Format value with error print(q.show_val_err((1.12e16, 12e6))) # auto-notation print(q.show_val_err((1.12e16, 12e7), exponent=10)) # "1.1200(12) \times 10^{10}" ``` ### Context Managers ```python import qlat_utils as q # Temporarily change jackknife settings with q.JkKwargs(n_rand_sample=2048, block_size=10): jk_arr = q.g_mk_jk(data_list, jk_idx_list) avg, err = q.g_jk_avg_err(jk_arr) # Temporarily change display settings with q.ShowKwargs(is_latex=False): print(q.show_val_err((3.14, 0.01))) ``` ### Data Wrapper ```python import qlat_utils as q d1 = q.Data([1.0, 2.0, 3.0]) d2 = q.Data([0.1, 0.2, 0.3]) d3 = d1 + d2 # Data([1.1, 2.2, 3.3]) d4 = d1 * 2.0 # Data([2.0, 4.0, 6.0]) d5 = d1 - d2 # Data([0.9, 1.8, 2.7]) norm = d1.qnorm() # 14.0 (1 + 4 + 9) ```