Mathematical helper functions¶

Helper functions for mathematical and physical operations.

maicos.lib.math.center_cluster(ag: AtomGroup, weights: ndarray) → ndarray[source]¶

Calculate the center of the atomgroup with respect to some weights.

Parameters:

ag (MDAnalysis.core.groups.AtomGroup) – Group of atoms to calculate the center for.
weights (numpy.ndarray) – Weights in the shape of ag.

Returns:

com – The center with respect to the weights.

Return type:

numpy.ndarray

Without proper treatment of periodic boundrary conditions (PBC) most algorithms will result in wrong center calculations. As shown below without treating PBC the center of mass is located in the center of the box

+-----------+
|           |
| 1   x   2 |
|           |
+-----------+

However, the distance accross the box boundary is shorter and therefore the center with PBC should be located somwhere else. The correct way to calculate the center is described in Bai and Breen[1] where coordinates of the particles are projected on a circle and weighted by their mass in this two dimensional space. The center of mass is obtained by transforming this point back to the corresponding point in the real system. This is done seperately for each dimension.

Reasons for doing this include the analysis of clusters in periodic boundrary conditions and consistent center of mass calculation across box boundraries. This procedure results in the right center of mass as seen below

+-----------+
|           |
x 1       2 |
|           |
+-----------+

maicos.lib.math.combine_subsample_variance(n_A, n_B, mu_A, mu_B, M_A, M_B)[source]¶

Calculate the mean and variance of two datasets of arbitrary size.

Given two datasets of arbitrary size, this function calculates the mean and variance of the combined dataset using Welford’s method. This is useful for parallel processing where the datasets are processed in chunks.

Parameters:

n_A (int) – Number of elements in dataset A.
n_B (int) – Number of elements in dataset B.
mu_A (float) – Mean of dataset A.
mu_B (float) – Mean of dataset B.
M_A (float) – Sum of squares of deviations from the mean for dataset A.
M_B (float) – Sum of squares of deviations from the mean for dataset B.

Returns:

n_AB (int) – Total number of elements in the combined dataset (n_A + n_B).
mu_AB (float) – Mean of the combined dataset.
M_AB (float) – Sum of squares of deviations from the mean for the combined dataset.

maicos.lib.math.correlation(a: ndarray, b: ndarray | None = None, subtract_mean: bool = False) → ndarray[source]¶

Calculate correlation or autocorrelation.

Uses fast fourier transforms to give the correlation function of two arrays, or, if only one array is given, the autocorrelation. Setting subtract_mean=True causes the mean to be subtracted from the input data.

Parameters:

a (numpy.ndarray) – The first input array to calculate the correlation
b (numpy.ndarray) – The second input array. If None, autocorrelation of a is calculated.
subtract_mean (bool) – If True, subtract the mean from the input data.

Returns:

The correlation or autocorrelation function.

Return type:

numpy.ndarray

maicos.lib.math.correlation_time(timeseries: ndarray, method: str = 'sokal', mintime: int = 3, sokal_factor: float = 8) → float[source]¶

Compute the integrated correlation time of a time series.

The integrated correlation time (in units of the sampling interval) is given by

\[\tau = \sum\limits_{t=1}^{N_\mathrm{cut}} C(t) \left(1 - \frac{t}{N}\right)\]

where \(N_\mathrm{cut} < N\) is a subset of the time series of length \(N\) and \(C(t)\) is the discrete-time autocorrelation function. To obtain the upper limit of the sum \(N_\mathrm{cut}\) two different methods are provided:

For “chodera” [2] \(N_\mathrm{cut}\) is given by the time when \(C(t)\) crosses zero the first time.
For “sokal” [3] \(N_\mathrm{cut}\) is determined iteratively by stepwise increasing until

\[N_\mathrm{cut} \geq c \cdot \tau\]

where \(c\) is the constant sokal_factor. If the condition is never fulfilled, -1 is returned, indicating that the time series does not provide sufficient statistics to estimate a correlation time.

While both methods give the same correlation time for a smooth time series that decays to 0, “sokal” will results in a more reasonable result for actual time series that are noisy and cross zero several times.

Parameters:

timeseries (numpy.ndarray) – The time series used to calculate the correlation time from.
method ({"sokal", "chodera"}) – Method to choose summation cutoff \(N_\mathrm{cut}\).
mintime (int) – Minimum possible value for \(N_\mathrm{cut}\).
sokal_factor (float) – Cut-off factor \(c\) for the Sokal method.

Returns:

tau – Integrated correlation time \(\tau\). If -1 (only for method="sokal") the provided time series does not provide sufficient statistics to estimate a correlation time.

Return type:

float

Raises:

ValueError – If mintime is larger than the length of the timeseries.
ValueError – If method is not one of “sokal” or “chodera”.

References

maicos.lib.math.new_mean(old_mean: float, data: float, length: int) → float[source]¶

Compute the arithmetic mean of a series iteratively.

Compute the arithmetic mean of n samples based on an existing mean of n-1 and the n-th value.

Given the mean of a data series

\[\bar x_N = \frac{1}{N} \sum_{n=1}^N x_n\]

we seperate the last value

\[\bar x_N = \frac{1}{N} \sum_{n=1}^{N-1} x_n + \frac{x_N}{N}\]

and multiply 1 = (N - 1)/(N - 1)

\[\begin{split}\bar x_N = \frac{N-1}{N} \frac{1}{N-1} \\ \sum_{n=1}^{N-1} x_n + \frac{x_N}{N}\end{split}\]

The first term can be identified as the mean of the first N - 1 values and we arrive at

\[\bar x_N = \frac{N-1}{N} \bar x_{N-1} + \frac{x_N}{N}\]

Parameters:

old_mean (float) – arithmetic mean of the first n - 1 samples.
data (float) – n-th value of the series.
length (int) – Length of the updated series, here called n.

Returns:

new_mean – Updated mean of the series of n values.

Return type:

float

Examples

The mean of a data set can easily be calculated from the data points. However this requires one to keep all data points on hand until the end of the calculation.

>>> print(np.mean([1, 3, 5, 7]))
4.0

Alternatively, one can update an existing mean, this requires only knowledge of the total number of samples.

>>> print(new_mean(np.mean([1, 3, 5]), data=7, length=4))
4.0

Calculate the variance of a timeseries iteratively.

The variance of a timeseries \(x_n\) can be calculated iteratively by using the following formula:

\[S_n = S_n-1 + (n-1) * (x_n - \bar{x}_n-1)^2 / (n-1)\]

Here, \(\bar{x}_n\) is the mean of the timeseries up to the \(n\)-th value.

Floating point imprecision can lead to slight negative variances leading non defined standard deviations. Therefore a negetaive variance is set to 0.

Parameters:

old_variance (float, numpy.ndarray) – The variance of the first n-1 samples.
old_mean (float) – The mean of the first n-1 samples.
new_mean (float, numpy.ndarray) – The mean of the full n samples.
data (float, numpy.ndarray) – The n-th value of the series.
length (int) – Length of the updated series, here called n.

Returns:

new_variance – Updated variance of the series of n values.

Return type:

float

Examples

The data set [1, 5, 5, 1] has a variance of 4.0

>>> print(np.var([1, 5, 5, 1]))
4.0

Knowing the total number of data points, this operation can be performed iteratively.

>>> print(
...     new_variance(
...         old_variance=np.var([1, 5, 5]),
...         old_mean=np.mean([1, 5, 5]),
...         new_mean=np.mean([1, 5, 5, 1]),
...         data=1,
...         length=4,
...     )
... )
4.0

maicos.lib.math.scalar_prod_corr(a: ndarray, b: ndarray | None = None, subtract_mean: bool = False) → ndarray[source]¶

Give the corr. function of the scalar product of two vector timeseries.

Arguments should be given in the form a[t, i], where t is the time variable along which the correlation is calculated, and i indexes the vector components.

Parameters:

a (numpy.ndarray) – The first vector timeseries of shape (t, i).
b (numpy.ndarray) – The second vector timeseries of shape (t, i). If None, correlation with itself is calculated.
subtract_mean (bool) – If True, subtract the mean from the timeseries before calculating the correlation.

Returns:

The correlation function of the scalar product of the vector timeseries.

Return type:

numpy.ndarray

maicos.lib.math.symmetrize(m: ndarray, axis: None | int | tuple[int] = None, inplace: bool = False, is_odd: bool = False) → ndarray[source]¶

Symmeterize an array.

The shape of the array is preserved, but the elements are symmetrized with respect to the given axis.

Parameters:

m (array_like) – Input array to symmetrize
axis (int, tuple(int)) – Axis or axes along which to symmetrize over. The default, axis=None, will symmetrize over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, symmetrizing is performed on all of the axes specified in the tuple.
inplace (bool) – Do symmetrizations inplace. If False a new array is returned.
is_odd (bool) – The parity to use for symmetrization. If False (default), the symmetrization is done with “even” parity, meaning that the output array will be symmetric with respect to the specified axis. If True, the symmetrization is done with “odd” parity, meaning that the output array will be antisymmetric.

Returns:

out – the symmetrized array

Return type:

array_like

Notes

symmetrize uses np.flip() for flipping the indices.

Examples

>>> A = np.arange(10).astype(float)
>>> A
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>> symmetrize(A)
array([4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5])
>>> symmetrize(A, inplace=True)
array([4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5])
>>> A
array([4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5])

Antisymmetrization can be achieved by setting is_odd=True. >>> A = np.arange(10).astype(float) >>> A array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) >>> symmetrize(A, is_odd=True) array([-4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5])

It also works for arrays with more than 1 dimensions in a general dimension.

>>> A = np.arange(20).astype(float).reshape(2, 10).T
>>> A
array([[ 0., 10.],
       [ 1., 11.],
       [ 2., 12.],
       [ 3., 13.],
       [ 4., 14.],
       [ 5., 15.],
       [ 6., 16.],
       [ 7., 17.],
       [ 8., 18.],
       [ 9., 19.]])
>>> symmetrize(A)
array([[9.5, 9.5],
       [9.5, 9.5],
       [9.5, 9.5],
       [9.5, 9.5],
       [9.5, 9.5],
       [9.5, 9.5],
       [9.5, 9.5],
       [9.5, 9.5],
       [9.5, 9.5],
       [9.5, 9.5]])
>>> symmetrize(A, axis=0)
array([[ 4.5, 14.5],
       [ 4.5, 14.5],
       [ 4.5, 14.5],
       [ 4.5, 14.5],
       [ 4.5, 14.5],
       [ 4.5, 14.5],
       [ 4.5, 14.5],
       [ 4.5, 14.5],
       [ 4.5, 14.5],
       [ 4.5, 14.5]])

maicos.lib.math.transform_cylinder(positions: ndarray, origin: ndarray, dim: int) → ndarray[source]¶

Transform positions into cylinder coordinates.

The origin of th coordinate system is at origin, the direction of the cylinder is defined by dim.

Parameters:

positions (numpy.ndarray) – Cartesian coordinates (x,y,z)
origin (numpy.ndarray) – Origin of the new cylindrical coordinate system (x,y,z).
dim (int) – Direction of the cylinder axis (0=x, 1=y, 2=z).

Returns:

Positions in cylinder coordinates (r, phi, z)

Return type:

numpy.ndarray

maicos.lib.math.transform_sphere(positions: ndarray, origin: ndarray) → ndarray[source]¶

Transform positions into spherical coordinates.

The origin of the new coordinate system is at origin.

Parameters:

positions (numpy.ndarray) – Cartesian coordinates (x,y,z)
origin (numpy.ndarray) – Origin of the new spherical coordinate system (x,y,z).

Returns:

Positions in spherical coordinates (\(r\), phi, theta)

Return type:

numpy.ndarray