Skip to content

Analysis API Reference

Metrics

mdpp.analysis.metrics

Core structure and dynamics metrics computed from trajectories.

RMSDResult(time_ps, rmsd_nm, atom_indices) dataclass

RMSD time series.

time_ns property

Return frame times in nanoseconds.

rmsd_angstrom property

Return RMSD values in Angstrom.

RMSFResult(rmsf_nm, atom_indices, residue_ids) dataclass

Per-atom RMSF values.

rmsf_angstrom property

Return RMSF values in Angstrom.

DeltaRMSFResult(delta_rmsf_nm, residue_ids, sem_nm) dataclass

Per-residue RMSF difference between two systems (B minus A).

Averaging is done in MSF (mean-square fluctuation) space: per-residue RMSF^2 values are averaged across replicas, then the square root is taken. The delta is computed on the resulting average RMSF values.

The SEM on each system's average RMSF is propagated through the sqrt transform, then the two independent SEMs are combined in quadrature to give the SEM on the delta.

delta_rmsf_angstrom property

Return delta-RMSF values in Angstrom.

sem_angstrom property

Return SEM on the delta-RMSF in Angstrom.

DCCMResult(correlation, atom_indices, residue_ids) dataclass

Dynamic cross-correlation matrix.

SASAResult(time_ps, values_nm2, atom_indices, mode, residue_ids) dataclass

Solvent accessible surface area.

time_ns property

Return frame times in nanoseconds.

total_nm2 property

Return summed SASA for each frame.

RadiusOfGyrationResult(time_ps, radius_gyration_nm, atom_indices) dataclass

Radius of gyration time series.

time_ns property

Return frame times in nanoseconds.

radius_gyration_angstrom property

Return radius of gyration values in Angstrom.

compute_rmsd(traj, *, atom_selection='backbone', reference_frame=0, timestep_ps=None, dtype=None)

Compute RMSD over time.

The trajectory should be aligned before calling this function (see :func:~mdpp.core.trajectory.align_trajectory).

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory (pre-aligned).

required
atom_selection str

Atoms used in RMSD calculation.

'backbone'
reference_frame int

Reference frame index for RMSD.

0
timestep_ps float | None

Optional time step in ps to override trajectory time.

None
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
RMSDResult

RMSDResult containing time and RMSD.

compute_rmsf(traj, *, atom_selection='name CA', dtype=None)

Compute per-atom RMSF from positional fluctuations.

The trajectory should be aligned before calling this function (see :func:~mdpp.core.trajectory.align_trajectory).

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory (pre-aligned).

required
atom_selection str

Atoms included in RMSF calculation.

'name CA'
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
RMSFResult

RMSFResult with atom and residue mapping.

compute_dccm(traj, *, atom_selection='name CA', backend='numpy', dtype=None)

Compute dynamic cross-correlation matrix (DCCM).

The trajectory should be aligned before calling this function (see :func:~mdpp.core.trajectory.align_trajectory).

The covariance is dispatched through a pluggable backend registry (see :mod:mdpp.analysis._backends._dccm). The default "numpy" backend uses BLAS GEMM via reshape + matmul -- this is multi-threaded out of the box, unlike np.einsum which falls back to a single-threaded contraction loop and becomes the bottleneck for any non-trivial trajectory. Other backends ("numba", "torch", "jax", "cupy") are available for users who want explicit CPU parallelism or GPU acceleration.

mdtraj stores coordinates in float32; the covariance kernel runs in the backend's native dtype and the wrapper casts to the resolved dtype (float32 by default). Float32 precision is sufficient: empirical tests show a maximum correlation error of ~4e-6 relative to float64, well below any physically meaningful threshold.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory (pre-aligned).

required
atom_selection str

Atoms used in DCCM.

'name CA'
backend DCCMBackend

Compute backend. One of "numpy" (default), "numba", "cupy", "torch", or "jax".

'numpy'
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
DCCMResult

DCCMResult with correlation matrix and residue IDs.

compute_sasa(traj, *, atom_selection='protein', mode='residue', probe_radius=0.14, n_sphere_points=960, timestep_ps=None, dtype=None)

Compute solvent-accessible surface area via Shrake-Rupley.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
atom_selection str | None

Optional atom selection before SASA.

'protein'
mode str

Either "atom" or "residue".

'residue'
probe_radius float

Probe radius in nm.

0.14
n_sphere_points int

Number of sphere points per atom.

960
timestep_ps float | None

Optional timestep override in ps.

None
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
SASAResult

SASAResult containing frame-resolved values.

compute_radius_of_gyration(traj, *, atom_selection='protein', timestep_ps=None, dtype=None)

Compute radius of gyration over time.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
atom_selection str

Atom selection used to compute radius of gyration.

'protein'
timestep_ps float | None

Optional timestep override in ps.

None
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
RadiusOfGyrationResult

RadiusOfGyrationResult with per-frame values.

average_rmsf_with_sem(results, *, dtype=None)

Average RMSF across replicas in MSF space and propagate SEM.

Parameters:

Name Type Description Default
results list[RMSFResult]

RMSF results from each replica.

required
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
NDArray[floating]

(avg_rmsf_nm, sem_rmsf_nm). SEM is None when fewer than 2

NDArray[floating] | None

replicas are provided.

The SEM on MSF is propagated through the sqrt transform: sem_rmsf = sem_msf / (2 * avg_rmsf).

compute_delta_rmsf(results_a, results_b, *, indices_a=None, indices_b=None, residue_ids=None, dtype=None)

Compute per-residue RMSF difference between two systems.

The RMSF for each system is first averaged across replicas in MSF space (sqrt(mean(RMSF^2))), then the delta is taken as B minus A. Positive values indicate that system B is more flexible.

The SEM on each system's average RMSF is propagated through the sqrt transform (sem_rmsf = sem_msf / (2 * avg_rmsf)), then the two independent SEMs are combined in quadrature to give the SEM on the delta. At least 2 replicas per system are required for SEM; otherwise DeltaRMSFResult.sem_nm is None.

For systems with identical residue counts, indices_a and indices_b may be omitted and the comparison is element-wise.

For systems with different sequences, supply aligned index arrays so that indices_a[i] and indices_b[i] point to the same structural position in each system. The caller is responsible for generating these mappings (e.g. from a multiple sequence alignment).

Parameters:

Name Type Description Default
results_a list[RMSFResult]

RMSF results for system A (one per replica).

required
results_b list[RMSFResult]

RMSF results for system B (one per replica).

required
indices_a NDArray[int_] | None

Optional 0-based residue indices into system A's RMSF array at aligned positions. Must have the same length as indices_b.

None
indices_b NDArray[int_] | None

Optional 0-based residue indices into system B's RMSF array at aligned positions.

None
residue_ids NDArray[int_] | None

Optional residue IDs for the x-axis of the resulting delta-RMSF (e.g. a reference sequence numbering). When None and indices are not provided, residue IDs are taken from results_a[0]. When None and indices are provided, residue IDs are taken from results_a[0] at the positions given by indices_a.

None
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
DeltaRMSFResult

DeltaRMSFResult with the per-residue difference and SEM.

Raises:

Type Description
ValueError

If input lists are empty, replicas within a system have inconsistent lengths, index arrays differ in length, or unindexed systems have different residue counts.

Hydrogen Bonds

mdpp.analysis.hbond

Hydrogen-bond analysis utilities.

HBondResult(time_ps, triplets, presence, count_per_frame, occupancy, method, distance_cutoff_nm, angle_cutoff_deg) dataclass

Hydrogen-bond detection results.

time_ns property

Return frame times in nanoseconds.

format_hbond_triplets(topology, triplets)

Format donor-hydrogen-acceptor triplets into readable labels.

Parameters:

Name Type Description Default
topology Topology

Trajectory topology.

required
triplets NDArray[int_]

Integer array with shape (n_hbonds, 3).

required

Returns:

Type Description
list[str]

List of labels such as "ALA1:N-H ... GLU10:OE1".

compute_hbonds(traj, *, method='baker_hubbard', exclude_water=True, periodic=True, sidechain_only=False, freq=0.1, distance_cutoff_nm=0.25, angle_cutoff_deg=120.0, timestep_ps=None, dtype=None)

Compute hydrogen bonds and per-frame counts.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
method str

Hydrogen bond method: "baker_hubbard" or "wernet_nilsson".

'baker_hubbard'
exclude_water bool

Whether to ignore water-mediated hydrogen bonds.

True
periodic bool

Whether to apply periodic boundary conditions.

True
sidechain_only bool

For "baker_hubbard", restrict to sidechain interactions.

False
freq float

For "baker_hubbard", minimum occupancy fraction for returned bonds.

0.1
distance_cutoff_nm float

H...A distance cutoff used for presence matrix.

0.25
angle_cutoff_deg float

D-H...A angle cutoff used for presence matrix.

120.0
timestep_ps float | None

Optional frame timestep override in ps.

None
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
HBondResult

HBondResult containing detected bonds, occupancy, and per-frame counts.

Contacts

mdpp.analysis.contacts

Contact analysis for molecular dynamics trajectories.

ContactResult(time_ps, distances_nm, residue_pairs) dataclass

Per-frame inter-residue contact distances.

time_ns property

Return frame times in nanoseconds.

NativeContactResult(time_ps, fraction, native_pairs, cutoff_nm) dataclass

Fraction of native contacts (Q) over time.

time_ns property

Return frame times in nanoseconds.

compute_contacts(traj, *, contacts='all', scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None)

Compute inter-residue contact distances over time.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
contacts str | NDArray[int_]

Residue pairs to monitor. "all" computes all pairs; otherwise an (n_pairs, 2) integer array of residue index pairs.

'all'
scheme str

Contact scheme passed to mdtraj.compute_contacts. One of "closest-heavy", "closest", "ca", or "sidechain-heavy".

'closest-heavy'
periodic bool

Whether to apply periodic boundary conditions.

True
timestep_ps float | None

Optional frame timestep override in ps.

None
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
ContactResult

ContactResult with per-frame distances and residue pair indices.

compute_contact_frequency(traj, *, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, dtype=None)

Compute the fraction of frames each residue pair is in contact.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
cutoff_nm float

Distance threshold in nm below which a contact is counted.

0.45
scheme str

Contact scheme passed to mdtraj.compute_contacts.

'closest-heavy'
periodic bool

Whether to apply periodic boundary conditions.

True
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
NDArray[floating]

A tuple of (frequency, residue_pairs) where frequency has

NDArray[int_]

shape (n_pairs,) with values in [0, 1] and residue_pairs

tuple[NDArray[floating], NDArray[int_]]

has shape (n_pairs, 2).

compute_native_contacts(traj, *, reference_frame=0, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None)

Compute the fraction of native contacts Q(t) over time.

Native contacts are residue pairs that are within cutoff_nm in the reference frame. Q(t) is the fraction of those pairs that remain in contact at each frame.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
reference_frame int

Frame index defining native contacts.

0
cutoff_nm float

Distance threshold in nm for a contact.

0.45
scheme str

Contact scheme for mdtraj.compute_contacts.

'closest-heavy'
periodic bool

Whether to apply periodic boundary conditions.

True
timestep_ps float | None

Optional frame timestep override in ps.

None
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
NativeContactResult

NativeContactResult with per-frame Q values.

Raises:

Type Description
ValueError

If reference_frame is out of range or no native contacts are found.

Distance

mdpp.analysis.distance

Pairwise distance analysis for molecular dynamics trajectories.

Five backends are available for pairwise distance computation:

+----------+------------------+-----+----------------------------+ | Backend | Device | PBC | Dependency | +==========+==================+=====+============================+ | mdtraj | CPU (1 thread) | Yes | built-in | | numba | CPU (all cores) | No | built-in (numba) | | cupy | GPU (CUDA) | No | pip install cupy-cuda12x| | torch | GPU (CUDA) / CPU | No | pip install torch | | jax | GPU / TPU / CPU | No | pip install jax[cuda12]| +----------+------------------+-----+----------------------------+

The GPU backends (cupy, torch, jax) use vectorised fancy-index differencing. This materialises an intermediate array of shape (n_frames, n_pairs, 3) on the device, so GPU memory must be sufficient. The Numba backend computes element-by-element with no intermediate allocation, making it the fastest at small-to-medium scales and competitive even at large scales.

Benchmark results (24-core CPU, NVIDIA GPU)::

1K frames x 100 atoms (4,950 pairs)
  numba  0.005s  2.6x    mdtraj 0.013s  1.0x

3K frames x 200 atoms (19,900 pairs)
  numba  0.021s  7.9x    mdtraj 0.166s  1.0x

3K frames x 400 atoms (79,800 pairs)
  numba  0.059s 10.4x    mdtraj 0.617s  1.0x

GPU backends approach Numba at higher pair counts where device parallelism offsets transfer overhead. Use backend="numba" as the default for non-periodic featurisation workloads.

DistanceResult(time_ps, distances_nm, atom_pairs) dataclass

Per-frame pairwise distances.

time_ns property

Return frame times in nanoseconds.

distances_angstrom property

Return distances in Angstrom.

compute_distances(traj, *, atom_pairs, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None)

Compute pairwise distances between atom pairs over time.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
atom_pairs ArrayLike

Array of shape (n_pairs, 2) with atom index pairs.

required
periodic bool

Whether to apply periodic boundary conditions.

True
backend DistanceBackend

Distance computation backend. "mdtraj" (default, PBC-capable), "numba" (CPU-parallel), "cupy"/ "torch"/"jax" (GPU-accelerated). Non-mdtraj backends do not support periodic boundary conditions.

'mdtraj'
timestep_ps float | None

Optional frame timestep override in ps.

None
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
DistanceResult

DistanceResult with per-frame distances for each pair.

compute_minimum_distance(traj, *, group1, group2, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None)

Compute the minimum distance between two atom groups per frame.

All pairwise distances between group1 and group2 atoms are computed, and the minimum per frame is returned.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
group1 str

MDTraj selection string for the first group.

required
group2 str

MDTraj selection string for the second group.

required
periodic bool

Whether to apply periodic boundary conditions.

True
backend DistanceBackend

Distance computation backend. "mdtraj" (default, PBC-capable), "numba" (CPU-parallel), "cupy"/ "torch"/"jax" (GPU-accelerated). Non-mdtraj backends do not support periodic boundary conditions.

'mdtraj'
timestep_ps float | None

Optional frame timestep override in ps.

None
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
DistanceResult

DistanceResult where distances_nm has shape (n_frames, 1)

DistanceResult

and atom_pairs contains the closest pair at frame 0.

DSSP

mdpp.analysis.dssp

Secondary structure assignment via DSSP.

DSSPResult(assignments, residue_ids, frequency, categories) dataclass

Per-frame secondary structure assignments.

Attributes:

Name Type Description
assignments NDArray[str_]

Character array of shape (n_frames, n_residues) with DSSP codes ("H", "E", "C" when simplified, or full 8-state codes otherwise).

residue_ids NDArray[int_]

Residue sequence IDs corresponding to columns.

frequency NDArray[floating]

Array of shape (n_residues, n_categories) giving the fraction of frames each residue spends in each secondary structure category.

categories list[str]

List of unique category labels matching the last axis of frequency.

compute_dssp(traj, *, simplified=True, dtype=None)

Compute per-residue secondary structure assignments across frames.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
simplified bool

If True, use 3-state classification (H=helix, E=sheet, C=coil). Otherwise use the full 8-state DSSP codes.

True
dtype DtypeArg

Output float dtype for frequency array. If None, uses the package default.

None

Returns:

Type Description
DSSPResult

DSSPResult with per-frame assignments and per-residue frequencies.

Decomposition

mdpp.analysis.decomposition

Dimensionality reduction and feature engineering helpers.

DistanceFeatures(values, pairs, atom_indices) dataclass

Pairwise distance features (e.g. CA-CA distances).

TorsionFeatures(values, labels) dataclass

Backbone torsion features.

PCAResult(projections, components, explained_variance_ratio, feature_mean, feature_scale, model) dataclass

Principal component analysis outputs.

TICAResult(projections, lagtime, model) dataclass

Time-lagged independent component analysis outputs.

featurize_backbone_torsions(traj, *, atom_selection='protein', sincos_embedding=True, dtype=None)

Featurize backbone phi/psi torsions.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
atom_selection str | None

Optional atom selection before featurization.

'protein'
sincos_embedding bool

If True (default), return [cos(angle), sin(angle)] columns instead of raw radian angles. This embedding handles the discontinuity at +/-pi correctly and is required for downstream PCA / TICA on circular variables. Set to False to keep raw radian angles (e.g. for Ramachandran plots). Note: this is unrelated to mdtraj's periodic argument for minimum image convention.

True
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
TorsionFeatures

TorsionFeatures with values and labels.

Raises:

Type Description
ValueError

If no phi/psi torsions are available.

featurize_ca_distances(traj, *, atom_selection='name CA', backend='mdtraj', periodic=False, dtype=None)

Featurize all pairwise distances between selected atoms.

Computes the N*(N-1)/2 pairwise distances for the selected atoms at each frame, producing a feature matrix suitable for PCA or TICA.

Five backends are available (see :func:mdpp.analysis.distance._compute_pairwise_distances): "mdtraj" (default, PBC-capable, single-threaded), "numba" (CPU-parallel), and "cupy"/"torch"/"jax" (GPU-accelerated). Non-mdtraj backends do not support periodic boundary conditions.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
atom_selection str

MDTraj selection string for the atoms to include. Defaults to "name CA" for alpha-carbon distances.

'name CA'
backend DistanceBackend

Distance computation backend. Defaults to "mdtraj" for API consistency; switch to "numba" (CPU-parallel) or "cupy"/"torch"/"jax" (GPU-accelerated) explicitly when performance matters.

'mdtraj'
periodic bool

Whether to apply minimum image convention. Only effective with backend="mdtraj".

False
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
DistanceFeatures

DistanceFeatures with values, atom pairs, and atom indices.

Raises:

Type Description
ValueError

If the selection matches fewer than 2 atoms, or an unknown backend is requested.

compute_pca(features, *, n_components=2, standardize=True, dtype=None)

Compute PCA projection from feature vectors.

Sklearn PCA (>= 1.8) preserves input dtype (float32 or float64).

Parameters:

Name Type Description Default
features ArrayLike

Input feature matrix (n_samples, n_features).

required
n_components int

Number of principal components.

2
standardize bool

Whether to z-score features before PCA.

True
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
PCAResult

PCAResult containing projections and explained variance ratio.

project_pca(features, *, fitted, dtype=None)

Project new features using a previously fitted PCA.

The features are standardized using the mean and scale from the fitted PCA, then transformed using its model. This is the correct way to project a second dataset (e.g. a different system) onto the same principal component axes for direct comparison.

Parameters:

Name Type Description Default
features ArrayLike

Input feature matrix (n_samples, n_features). Must have the same number of features as the fitted PCA.

required
fitted PCAResult

PCAResult from a previous compute_pca call whose principal component axes will be used.

required
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
PCAResult

PCAResult with projections onto the fitted PCA axes. The

PCAResult

components, explained_variance_ratio, feature_mean,

PCAResult

feature_scale, and model are shared from fitted.

Raises:

Type Description
ValueError

If the feature dimension does not match the fitted PCA.

compute_tica(features, *, lagtime, n_components=2, dtype=None)

Compute TICA projection from feature vectors.

Deeptime upcasts to float64 internally for covariance estimation, so the input dtype does not affect numerical accuracy. The dtype parameter controls the dtype of the output arrays only.

Parameters:

Name Type Description Default
features ArrayLike

Input feature matrix (n_samples, n_features).

required
lagtime int

Lag time in frames.

required
n_components int

Number of independent components.

2
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
TICAResult

TICAResult containing projected coordinates and fitted model.

Free Energy Surface

mdpp.analysis.fes

Free-energy surface computation utilities.

FES2DResult(free_energy_kj_mol, probability_density, x_edges, y_edges, observed_mask, temperature_k) dataclass

2D free-energy surface derived from a histogram.

x_centers property

Return x-axis bin centers.

y_centers property

Return y-axis bin centers.

compute_fes_2d(x_values, y_values, *, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None)

Compute a 2D free-energy surface from two collective variables.

np.histogram2d returns float64 probability density regardless of input dtype, so the log/energy arithmetic naturally runs in float64. Output arrays are cast to dtype.

Parameters:

Name Type Description Default
x_values ArrayLike

Samples for CV1.

required
y_values ArrayLike

Samples for CV2.

required
bins BinsType

Histogram bin count.

100
value_range RangeType

Optional ((x_min, x_max), (y_min, y_max)) range.

None
temperature_k float

Temperature in Kelvin.

DEFAULT_TEMPERATURE_K
min_probability float

Lower bound to avoid log(0).

1e-12
mask_unsampled bool

If True, unsampled bins are set to NaN.

True
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
FES2DResult

FES2DResult with free energy shifted so its minimum is 0.

compute_fes_from_projection(projection, *, x_index=0, y_index=1, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None)

Compute a 2D FES from a projection matrix.

Parameters:

Name Type Description Default
projection ArrayLike

Matrix (n_samples, n_components).

required
x_index int

Component index for x-axis.

0
y_index int

Component index for y-axis.

1
bins BinsType

Histogram bin count.

100
value_range RangeType

Optional histogram range.

None
temperature_k float

Temperature in Kelvin.

DEFAULT_TEMPERATURE_K
min_probability float

Lower bound to avoid log(0).

1e-12
mask_unsampled bool

If True, unsampled bins are set to NaN.

True
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
FES2DResult

FES2DResult computed from selected projection components.

Clustering

mdpp.analysis.clustering

Conformational clustering from RMSD matrices.

Each clustering algorithm is a frozen dataclass configured at construction time and invoked as a callable::

result = Gromos(cutoff_nm=0.2)(rmsd_matrix)
result = DBSCAN(eps=0.15, min_samples=5)(rmsd_matrix)
result = KMeans(n_clusters=10)(pca.projections)

RMSDMatrixResult(rmsd_matrix_nm, atom_indices) dataclass

Pairwise RMSD matrix between trajectory frames.

rmsd_matrix_angstrom property

Return the RMSD matrix in Angstrom.

Note

Each access allocates a new (n_frames, n_frames) array. Cache the result in a local variable if you need it more than once -- at 120k frames this is ~54 GB per call.

ClusteringResult(labels, n_clusters, medoid_frames) dataclass

Conformational clustering output.

FeatureClusteringResult(labels, n_clusters, cluster_centers, medoid_frames, inertia) dataclass

Clustering result from feature-vector-based methods.

Gromos(cutoff_nm=0.15) dataclass

GROMOS clustering (Daura et al. 1999).

Greedy largest-cluster-first assignment via Numba-JIT kernels. O(n) auxiliary memory -- no copies of the RMSD matrix.

Parameters:

Name Type Description Default
cutoff_nm float

Neighbour cutoff in nm.

0.15

Example::

result = Gromos(cutoff_nm=0.2)(rmsd_matrix)

__call__(rmsd_matrix)

Cluster rmsd_matrix and return a :class:ClusteringResult.

Hierarchical(linkage_method='average', distance_threshold=0.15, n_clusters=None) dataclass

Agglomerative hierarchical clustering (scipy).

Uses distance_threshold by default. Set n_clusters to use a fixed cluster count instead (overrides distance_threshold).

Note

Scipy builds an O(n^2) float64 condensed distance matrix internally. At 120k frames this is ~57 GB.

Parameters:

Name Type Description Default
linkage_method str

"average", "complete", or "single". "ward" is not valid for RMSD matrices.

'average'
distance_threshold float

Distance cutoff in nm.

0.15
n_clusters int | None

Fixed cluster count (overrides distance_threshold).

None

Example::

result = Hierarchical(linkage_method="average", distance_threshold=0.2)(rmsd_matrix)

__call__(rmsd_matrix)

Cluster rmsd_matrix and return a :class:ClusteringResult.

DBSCAN(eps=0.15, min_samples=5, backend='numba') dataclass

DBSCAN density-based clustering.

Two backends:

  • "numba" (default) -- custom Numba-JIT kernel. Reuses the parallel neighbour-count kernel from GROMOS and a sequential BFS for label assignment. O(n) auxiliary memory, no copies.
  • "sklearn" -- official scikit-learn DBSCAN with metric="precomputed".

Noise frames receive label -1.

Parameters:

Name Type Description Default
eps float

Neighbourhood radius in nm.

0.15
min_samples int

Minimum neighbours (including self) for a core point.

5
backend Literal['numba', 'sklearn']

"numba" or "sklearn".

'numba'

Example::

result = DBSCAN(eps=0.15, min_samples=5)(rmsd_matrix)
result = DBSCAN(eps=0.15, backend="sklearn")(rmsd_matrix)

__call__(rmsd_matrix)

Cluster rmsd_matrix and return a :class:ClusteringResult.

HDBSCAN(min_cluster_size=5, min_samples=5) dataclass

HDBSCAN hierarchical density-based clustering (sklearn >= 1.3).

Noise frames receive label -1.

Parameters:

Name Type Description Default
min_cluster_size int

Minimum number of frames in a cluster.

5
min_samples int

Number of neighbours for core-point estimation.

5

Example::

result = HDBSCAN(min_cluster_size=50, min_samples=5)(rmsd_matrix)

__call__(rmsd_matrix)

Cluster rmsd_matrix and return a :class:ClusteringResult.

KMeans(n_clusters=10, random_state=42, dtype=None) dataclass

K-Means clustering (scikit-learn).

Parameters:

Name Type Description Default
n_clusters int

Number of clusters.

10
random_state int | None

Seed for centroid initialisation. Defaults to 42 for reproducible runs across sessions. Pass None to let scikit-learn pick a non-deterministic seed.

42
dtype DtypeArg

Output float dtype for cluster_centers.

None

Example::

result = KMeans(n_clusters=10)(pca.projections)
result = KMeans(n_clusters=10, random_state=None)(pca.projections)

__call__(features)

Cluster features and return a :class:FeatureClusteringResult.

MiniBatchKMeans(n_clusters=10, batch_size=1024, random_state=42, dtype=None) dataclass

Mini-Batch K-Means clustering (scikit-learn).

Parameters:

Name Type Description Default
n_clusters int

Number of clusters.

10
batch_size int

Mini-batch size.

1024
random_state int | None

Seed for centroid initialisation and mini-batch sampling. Defaults to 42 for reproducible runs across sessions. Pass None to let scikit-learn pick a non-deterministic seed.

42
dtype DtypeArg

Output float dtype for cluster_centers.

None

Example::

result = MiniBatchKMeans(n_clusters=10, batch_size=1024)(pca.projections)
result = MiniBatchKMeans(n_clusters=10, random_state=None)(pca.projections)

__call__(features)

Cluster features and return a :class:FeatureClusteringResult.

RegularSpace(dmin=0.5, dtype=None) dataclass

Regular-space clustering (deeptime).

The number of clusters is determined by dmin, not specified upfront.

Parameters:

Name Type Description Default
dmin float

Minimum distance between cluster centres.

0.5
dtype DtypeArg

Output float dtype for cluster_centers.

None

Example::

result = RegularSpace(dmin=0.5)(pca.projections)

__call__(features)

Cluster features and return a :class:FeatureClusteringResult.

compute_rmsd_matrix(traj, *, atom_selection='backbone', backend='mdtraj', dtype=None)

Compute an all-vs-all RMSD matrix between trajectory frames.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
atom_selection str

Atoms used for RMSD calculation.

'backbone'
backend RMSDBackend

Computation backend. Defaults to "mdtraj" for API consistency with other analysis functions; switch to a faster backend explicitly when performance matters.

  • "mdtraj" (default) -- mdtraj precentered RMSD loop (CPU).
  • "numba" -- Numba-parallel QCP kernel (CPU, 50-200x faster).
  • "torch" -- PyTorch einsum + QCP (CUDA/CPU, float32).
  • "jax" -- JAX einsum + QCP (GPU/TPU/CPU, float32).
  • "cupy" -- CuPy einsum + QCP (CUDA, float32).
'mdtraj'
dtype DtypeArg

Output float dtype. If None, uses the package default.

None

Returns:

Type Description
RMSDMatrixResult

RMSDMatrixResult with a symmetric (n_frames, n_frames) matrix.

Raises:

Type Description
ValueError

If an unsupported backend is specified.

ImportError

If the requested backend package is not installed.

Memory note

Every backend returns its native float32 output matrix (the numba kernel uses float64 accumulators internally but stores float32 in the result buffer; GPU kernels compute in float32 end-to-end). This wrapper casts with copy=False so when the resolved dtype is float32 (the package default) there is no second copy of the (n_frames, n_frames) matrix. For a 120k-frame trajectory this saves ~115 GB of peak RAM versus the old "cast to float64 for the Protocol contract, then cast back" path. Passing dtype=np.float64 still forces a one-time upcast.