qlat_utils.data — Data Analysis Utilities

Source: qlat-utils/qlat_utils/data.py

Note: Update this document when updating the source file.

Outline

  1. Overview

  2. Physical Constants

  3. Type Tuples

  4. Interpolation

  5. Data Wrapper Class

  6. Basic Statistics

  7. Jackknife Resampling

  8. Super-Jackknife

  9. Randomized Jackknife-Bootstrap Hybrid

  10. Unified Jackknife API

  11. Value Display

  12. Context Managers

  13. Examples


Overview

The qlat_utils.data module provides data analysis tools for lattice QCD computations. It includes:

  • Interpolation — linear interpolation for arrays with fractional indices.

  • Jackknife resampling — standard, super-jackknife, and randomized jackknife-bootstrap (RJK) methods for error estimation.

  • The Data class — a wrapper that supports arithmetic on nested numeric structures (scalars, lists, NumPy arrays).

  • Value display — formatting of (value, error) pairs for publication.

  • Context managers — temporary override of global jackknife and display settings.

import qlat_utils as q
import numpy as np

avg, err = q.avg_err([1.0, 1.1, 0.9, 1.05])
print(q.show_val_err((avg, err)))

Physical Constants

Name

Value

Description

alpha_qed

1 / 137.035999084

Fine-structure constant

fminv_gev

0.197326979

Conversion factor: hbar*c / (1 fm * 1 GeV)

import qlat_utils as q
print(q.alpha_qed)   # 0.0072973525693...
print(q.fminv_gev)   # 0.197326979

Type Tuples

Module-level tuples used for type-checking throughout the library. Extended types (float128, complex256) are included when the platform supports them.

Name

Contents

float_types

float, np.float32, np.float64 (plus np.float128 if available)

complex_types

complex, np.complex64, np.complex128 (plus np.complex256 if available)

int_types

int, np.int32, np.int64

real_types

float_types + int_types

number_types

real_types + complex_types


Interpolation

interp_i_arr(data_x_arr, x_arr)

Return index array i_arr such that q.interp(data_x_arr, i_arr) is approximately x_arr. Useful for mapping x-coordinates to fractional indices.

Parameter

Type

Description

data_x_arr

array-like

Known x-values (must be monotonic)

x_arr

float or array-like

Target x-values

interp(data_arr, i_arr, axis=-1)

Return approximately data_arr[..., i_arr] using linear interpolation. The index i_arr may be non-integer (fractional indices are interpolated between adjacent elements).

Parameter

Type

Description

data_arr

array-like

Source data

i_arr

float or 1-D array

Fractional index or indices

axis

int

Axis along which to interpolate (default -1)

interp_x(data_arr, data_x_arr, x_arr, axis=-1)

Interpolate data_arr at arbitrary x-values. Combines interp_i_arr and interp.

Parameter

Type

Description

data_arr

array-like

Source data

data_x_arr

array-like

x-values for data_arr; shape must be (data_arr.shape[axis],)

x_arr

float or 1-D array

Target x-values

axis

int

Axis along which to interpolate (default -1)

get_threshold_idx(arr, threshold)

Return the fractional index x such that interp(arr, [x]) is approximately threshold. Uses binary search on a 1-D array.

get_threshold_i_arr(data_arr, threshold_arr, axis=-1)

Broadcast version of get_threshold_idx over an array. Returns an index array where each entry satisfies the threshold condition along the given axis.

get_threshold_x_arr(data_arr, data_x_arr, threshold_arr, axis=-1)

Like get_threshold_i_arr, but returns x-values instead of indices.


Data Wrapper Class

class Data

A wrapper around numeric values that supports arithmetic operations on nested structures (scalars, lists, NumPy arrays, LatData).

Supported value types: numeric scalars, numpy.ndarray, q.LatData, and list (element-wise operations).

import qlat_utils as q

d1 = q.Data([1.0, 2.0, 3.0])
d2 = q.Data([0.5, 0.5, 0.5])
d3 = d1 + d2       # Data([1.5, 2.5, 3.5])
d4 = d1 * 2.0      # Data([2.0, 4.0, 6.0])
d5 = -d1           # Data([-1.0, -2.0, -3.0])

Method

Description

get_val()

Return the wrapped value

qnorm()

Return the squared norm

glb_sum()

MPI global sum (requires qlat)

__add__, __radd__

Addition

__sub__, __rsub__

Subtraction

__mul__, __rmul__

Scalar or element-wise multiplication

__neg__, __pos__

Unary negation and identity

__copy__, __deepcopy__

Copy support


Basic Statistics

check_zero(x)

Return True if x is a real type and equals zero.

qnorm(x)

Return the squared norm of x. For scalars: x*x. For complex: re^2 + im^2. For arrays: abs(vdot(x, x)). For lists/tuples: sum of qnorm of each element.

q.qnorm(2)          # 4
q.qnorm(1 + 2j)     # 5  (1*1 + 2*2)

average(data_list)

Return the arithmetic mean of data_list.

average_ignore_nan(value_arr_list)

Return element-wise average across a list of NumPy arrays, ignoring NaN values. Returns NaN for elements where all inputs are NaN.

block_data(data_list, block_size, is_overlapping=True)

Return a list of block averages. If is_overlapping is True (default), blocks overlap by block_size - 1 entries.

avg_err(data_list, *, eps=1, block_size=1)

Compute (avg, err) of data_list using blocking. The error estimate is:

\[\text{err} = |\text{eps}| \sqrt{\frac{\text{block\_size}}{N - \text{block\_size}}} \cdot \text{fsqrt}\big(\text{avg}\big[(d_i - \text{avg})^2\big]\big)\]

Parameter

Type

Default

Description

data_list

list

Data values

eps

float

1

Additional scaling factor for error

block_size

int

1

Blocking size

Returns (avg, err) where both have the same type as the data.

partial_sum(x, *, is_half_last=False)

Modify x in-place to its cumulative (partial) sum, preserving length. If is_half_last is True, each entry becomes the average of the current and previous partial sums (trapezoidal rule). Works for 1-D and 2-D arrays.

fsqr(data) / fsqrt(data)

Component-wise square and square root. For complex types, real and imaginary parts are processed separately: fsqr(a + bi) = a^2 + b^2 i, fsqrt(a + bi) = sqrt(a) + sqrt(b) i. Supports scalars, Data, and NumPy arrays.

err_sum(*vs)

Return the quadrature sum of errors: sqrt(sum(fsqr(v_i))).

q.err_sum(1.4, 2.1, 1.0)  # 2.7147743920996454

Jackknife Resampling

jackknife(data_list, *, eps=1)

Perform standard jackknife. Returns jk_arr of length N + 1 where:

  • jk_arr[0] = average

  • jk_arr[i] = avg - (eps / N) * (data[i] - avg) for i >= 1

jk_avg(jk_arr)

Return the average (first element) of a jackknife array.

jk_err(jk_arr, *, eps=1, block_size=1)

Return the jackknife error estimate:

\[\frac{1}{\text{eps}} \sqrt{ \frac{N}{N - \text{block\_size}} \sum_{i=1}^{N} (jk[i] - \text{jk\_avg})^2 }\]

The eps and block_size must match those used in the corresponding jackknife call. Note: len(jk_arr) = N + 1.

jk_avg_err(jk_arr, *, eps=1, block_size=1)

Return (jk_avg, jk_err).


Super-Jackknife

sjackknife(data_list, jk_idx_list, *, avg=None, ...)

Perform super-jackknife resampling. Data from different ensembles (identified by jk_idx_list) are combined into a single jackknife array.

Parameter

Type

Default

Description

data_list

list/ndarray

Original data

jk_idx_list

list

Index for each data point (e.g., (job_tag, traj))

avg

any

None

Pre-computed average (auto-computed if None)

is_hash_jk_idx

bool

True

Use hash when jk_idx not in all_jk_idx

jk_idx_hash_size

int

1024

Hash table size

rng_state

RngState

None

RNG state (default: RngState("rejk"))

all_jk_idx

list

None

All possible indices; all_jk_idx[0] must be "avg"

get_all_jk_idx

callable

None

Function returning all_jk_idx

jk_blocking_func

callable

None

(i, jk_idx) -> blocked_jk_idx

eps

float

1

Scaling factor

sjk_avg(jk_arr) / sjk_err(jk_arr, *, eps=1) / sjk_avg_err(jk_arr, *, eps=1)

Average, error, and (avg, err) for super-jackknife arrays. The error formula differs from standard jackknife: no N/(N-1) factor.

sjk_mk_jk_val(rs_tag, val, err, *, ...)

Create a synthetic jackknife array from a central value and error using Gaussian random numbers.


Randomized Jackknife-Bootstrap Hybrid

rjackknife(data_list, jk_idx_list, *, avg=None, ...)

Jackknife-bootstrap hybrid resampling. Returns jk_arr of length 1 + n_rand_sample. The distribution of jk_arr approximates the distribution of the average.

Parameter

Type

Default

Description

data_list

list/ndarray

Original data

jk_idx_list

list

Index for each data point

avg

any

None

Pre-computed average

rng_state

RngState

None

RNG state

n_rand_sample

int

1024

Number of random samples

jk_blocking_func

callable

None

(i, jk_idx) -> blocked_jk_idx

is_normalizing_rand_sample

bool

False

Normalize random vectors

is_apply_rand_sample_jk_idx_blocking_shift

bool

True

Shift blocking per sample

eps

float

1

Scaling factor

The formula is:

\[jk\_arr[i] = \text{avg} + \sum_{j=1}^{N} \frac{-\text{eps}}{\sqrt{N(N - b(i,j))}} r_{i,j} (d_j - \text{avg})\]

where \(r_{i,j} \sim \mathcal{N}(0, 1)\) and \(b(i,j)\) is the block size.

rjk_avg(jk_arr) / rjk_err(jk_arr, eps=1) / rjk_avg_err(rjk_list, eps=1)

Average, error, and (avg, err) for randomized jackknife arrays.

rjk_mk_jk_val(rs_tag, val, err, *, ...)

Create a synthetic RJK array from a central value and error.


Unified Jackknife API

The g_* functions provide a unified interface that dispatches to either super-jackknife or RJK based on global settings in default_g_jk_kwargs.

default_g_jk_kwargs

Global dictionary controlling jackknife behavior. Key settings:

Key

Default

Description

jk_type

"rjk"

"rjk" or "super"

eps

1

Scaling factor

n_rand_sample

1024

Number of random samples (RJK only)

is_normalizing_rand_sample

False

Normalize random vectors (RJK only)

is_hash_jk_idx

True

Hash unknown jk indices (super only)

jk_idx_hash_size

1024

Hash table size (super only)

block_size

1

Default blocking size

block_size_dict

{}

Per-job_tag blocking sizes

rng_state

RngState("rejk")

RNG state

g_mk_jk(data_list, jk_idx_list, *, avg=None, ...)

Create a (randomized) super-jackknife dataset from un-jackknifed data. Dispatches to sjackknife or rjackknife based on jk_type.

g_mk_jk_val(rs_tag, val, err, *, ...)

Create a synthetic jackknife array from a value and error. Dispatches to sjk_mk_jk_val or rjk_mk_jk_val.

g_jk_avg(jk_arr) / g_jk_err(jk_arr) / g_jk_avg_err(jk_arr)

Unified average, error, and (avg, err) extraction.

g_jk_avg_err_arr(jk_arr)

Return an array with shape jk_arr[0].shape + (2,) where the last axis is (avg, err).

g_jk_size(*, jk_type, ...)

Return the number of samples in the jackknife array (1 + n_samples).

g_jk_blocking_func(i, jk_idx)

Apply the configured blocking function.

g_jk_sample_size(job_tag, traj_list)

Return the number of distinct blocks for a given job_tag and trajectory list.

get_jk_state() / set_jk_state(state)

Save and restore the current default_g_jk_kwargs state (for use with @cache_call).


Value Display

show_val(val, *, is_latex=True, num_float_digit=None, num_exp_digit=None, exponent=None)

Format a single numeric value for display.

Parameter

Type

Default

Description

val

int/float

Value to format

is_latex

bool/None

True

Use LaTeX exponent notation

num_float_digit

int/bool/None

None

Number of decimal digits (auto if None)

num_exp_digit

int/bool/None

None

Significant digits in scientific notation

exponent

int/None

None

Force a specific exponent

q.show_val(0.00123)                     # "1.23 \\times 10^{-3}"
q.show_val(0.00123, is_latex=False)     # "1.23E-3"
q.show_val(1234.0)                      # "1234.0"

show_val_err(val_err, *, is_latex=True, num_float_digit=None, num_exp_digit=None, exponent=None)

Format a (value, error) pair. Error is shown in parentheses. If val_err is a single number, it is formatted as a plain value.

q.show_val_err((1.12e16, 12e6))                         # auto scientific notation
q.show_val_err((1.12e16, 12e7), exponent=10)             # force exponent
q.show_val_err((1.12e16, 12e7), exponent=10, is_latex=False)  # "1.12000(120)E10"

Context Managers

class NewDictValues(dictionary, **kwargs)

Context manager that temporarily overrides keys in dictionary and restores them on exit.

class JkKwargs(**kwargs)

Context manager that temporarily overrides default_g_jk_kwargs.

with q.JkKwargs(n_rand_sample=2048, block_size=10):
    jk_arr = q.g_mk_jk(data_list, jk_idx_list)

class ShowKwargs(**kwargs)

Context manager that temporarily overrides default_show_val_kwargs.

with q.ShowKwargs(is_latex=False, exponent=-10):
    print(q.show_val_err((1.23e-10, 0.05e-10)))

Examples

Interpolation

import qlat_utils as q
import numpy as np

# Interpolate data at fractional indices
data = np.array([10.0, 20.0, 30.0, 40.0])
result = q.interp(data, 1.5)          # 25.0 (midpoint between 20 and 30)
result_arr = q.interp(data, [0.5, 1.5, 2.5])  # [15.0, 25.0, 35.0]

# Interpolate with explicit x-coordinates
x_data = np.array([0.0, 1.0, 2.0, 3.0])
y_data = np.array([0.0, 1.0, 4.0, 9.0])
x_new = np.array([0.5, 1.5, 2.5])
y_new = q.interp_x(y_data, x_data, x_new)  # interpolated y-values

Basic Error Estimation

import qlat_utils as q
import numpy as np

# Generate correlated data
data = [1.0 + 0.1 * np.random.randn() for _ in range(100)]

# Simple average and error
avg, err = q.avg_err(data)
print(f"avg = {avg:.4f}, err = {err:.4f}")

# With blocking to reduce autocorrelation
avg_b, err_b = q.avg_err(data, block_size=5)
print(f"avg = {avg_b:.4f}, err = {err_b:.4f}")

Jackknife Resampling

import qlat_utils as q
import numpy as np

data = [1.0, 1.1, 0.9, 1.05, 0.95, 1.02, 0.98, 1.03]

# Standard jackknife
jk_arr = q.jackknife(data)
avg = q.jk_avg(jk_arr)
err = q.jk_err(jk_arr)
print(f"Jackknife: avg = {avg:.4f}, err = {err:.4f}")

# Unified API (uses RJK by default)
jk_arr = q.g_mk_jk(data, list(range(len(data))))
avg, err = q.g_jk_avg_err(jk_arr)
print(f"RJK: avg = {avg:.4f}, err = {err:.4f}")

Formatting Values

import qlat_utils as q

# Format a single value
print(q.show_val(0.00123))                          # "1.23 \times 10^{-3}"
print(q.show_val(0.00123, is_latex=False))          # "1.23E-3"

# Format value with error
print(q.show_val_err((1.12e16, 12e6)))              # auto-notation
print(q.show_val_err((1.12e16, 12e7), exponent=10)) # "1.1200(12) \times 10^{10}"

Context Managers

import qlat_utils as q

# Temporarily change jackknife settings
with q.JkKwargs(n_rand_sample=2048, block_size=10):
    jk_arr = q.g_mk_jk(data_list, jk_idx_list)
    avg, err = q.g_jk_avg_err(jk_arr)

# Temporarily change display settings
with q.ShowKwargs(is_latex=False):
    print(q.show_val_err((3.14, 0.01)))

Data Wrapper

import qlat_utils as q

d1 = q.Data([1.0, 2.0, 3.0])
d2 = q.Data([0.1, 0.2, 0.3])

d3 = d1 + d2        # Data([1.1, 2.2, 3.3])
d4 = d1 * 2.0       # Data([2.0, 4.0, 6.0])
d5 = d1 - d2        # Data([0.9, 1.8, 2.7])
norm = d1.qnorm()   # 14.0 (1 + 4 + 9)