Analysis API Reference¶

Metrics¶

`mdpp.analysis.metrics` ¶

Core structure and dynamics metrics computed from trajectories.

`RMSDResult(time_ps, rmsd_nm, atom_indices)` `dataclass` ¶

RMSD time series.

`time_ns` `property` ¶

Return frame times in nanoseconds.

`rmsd_angstrom` `property` ¶

Return RMSD values in Angstrom.

`RMSFResult(rmsf_nm, atom_indices, residue_ids)` `dataclass` ¶

Per-atom RMSF values.

`rmsf_angstrom` `property` ¶

Return RMSF values in Angstrom.

`DeltaRMSFResult(delta_rmsf_nm, residue_ids, sem_nm)` `dataclass` ¶

Per-residue RMSF difference between two systems (B minus A).

Averaging is done in MSF (mean-square fluctuation) space: per-residue RMSF^2 values are averaged across replicas, then the square root is taken. The delta is computed on the resulting average RMSF values.

The SEM on each system's average RMSF is propagated through the sqrt transform, then the two independent SEMs are combined in quadrature to give the SEM on the delta.

`delta_rmsf_angstrom` `property` ¶

Return delta-RMSF values in Angstrom.

`sem_angstrom` `property` ¶

Return SEM on the delta-RMSF in Angstrom.

`DCCMResult(correlation, atom_indices, residue_ids)` `dataclass` ¶

Dynamic cross-correlation matrix.

`SASAResult(time_ps, values_nm2, atom_indices, mode, residue_ids)` `dataclass` ¶

Solvent accessible surface area.

`time_ns` `property` ¶

Return frame times in nanoseconds.

`total_nm2` `property` ¶

Return summed SASA for each frame.

`RadiusOfGyrationResult(time_ps, radius_gyration_nm, atom_indices)` `dataclass` ¶

Radius of gyration time series.

`time_ns` `property` ¶

Return frame times in nanoseconds.

`radius_gyration_angstrom` `property` ¶

Return radius of gyration values in Angstrom.

`compute_rmsd(traj, *, atom_selection='backbone', reference_frame=0, timestep_ps=None, dtype=None)` ¶

Compute RMSD over time.

The trajectory should be aligned before calling this function (see :func:~mdpp.core.trajectory.align_trajectory).

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory (pre-aligned).	required
`atom_selection`	`str`	Atoms used in RMSD calculation.	`'backbone'`
`reference_frame`	`int`	Reference frame index for RMSD.	`0`
`timestep_ps`	`float \| None`	Optional time step in ps to override trajectory time.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`RMSDResult`	RMSDResult containing time and RMSD.

`compute_rmsf(traj, *, atom_selection='name CA', dtype=None)` ¶

Compute per-atom RMSF from positional fluctuations.

The trajectory should be aligned before calling this function (see :func:~mdpp.core.trajectory.align_trajectory).

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory (pre-aligned).	required
`atom_selection`	`str`	Atoms included in RMSF calculation.	`'name CA'`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`RMSFResult`	RMSFResult with atom and residue mapping.

`compute_dccm(traj, *, atom_selection='name CA', backend='numpy', dtype=None)` ¶

Compute dynamic cross-correlation matrix (DCCM).

The trajectory should be aligned before calling this function (see :func:~mdpp.core.trajectory.align_trajectory).

The covariance is dispatched through a pluggable backend registry (see :mod:mdpp.analysis._backends._dccm). The default "numpy" backend uses BLAS GEMM via reshape + matmul -- this is multi-threaded out of the box, unlike np.einsum which falls back to a single-threaded contraction loop and becomes the bottleneck for any non-trivial trajectory. Other backends ("numba", "torch", "jax", "cupy") are available for users who want explicit CPU parallelism or GPU acceleration.

mdtraj stores coordinates in float32; the covariance kernel runs in the backend's native dtype and the wrapper casts to the resolved dtype (float32 by default). Float32 precision is sufficient: empirical tests show a maximum correlation error of ~4e-6 relative to float64, well below any physically meaningful threshold.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory (pre-aligned).	required
`atom_selection`	`str`	Atoms used in DCCM.	`'name CA'`
`backend`	`DCCMBackend`	Compute backend. One of `"numpy"` (default), `"numba"`, `"cupy"`, `"torch"`, or `"jax"`.	`'numpy'`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`DCCMResult`	DCCMResult with correlation matrix and residue IDs.

`compute_sasa(traj, *, atom_selection='protein', mode='residue', probe_radius=0.14, n_sphere_points=960, timestep_ps=None, dtype=None)` ¶

Compute solvent-accessible surface area via Shrake-Rupley.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`atom_selection`	`str \| None`	Optional atom selection before SASA.	`'protein'`
`mode`	`str`	Either `"atom"` or `"residue"`.	`'residue'`
`probe_radius`	`float`	Probe radius in nm.	`0.14`
`n_sphere_points`	`int`	Number of sphere points per atom.	`960`
`timestep_ps`	`float \| None`	Optional timestep override in ps.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`SASAResult`	SASAResult containing frame-resolved values.

`compute_radius_of_gyration(traj, *, atom_selection='protein', timestep_ps=None, dtype=None)` ¶

Compute radius of gyration over time.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`atom_selection`	`str`	Atom selection used to compute radius of gyration.	`'protein'`
`timestep_ps`	`float \| None`	Optional timestep override in ps.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`RadiusOfGyrationResult`	RadiusOfGyrationResult with per-frame values.

`average_rmsf_with_sem(results, *, dtype=None)` ¶

Average RMSF across replicas in MSF space and propagate SEM.

Parameters:

Name	Type	Description	Default
`results`	`list[RMSFResult]`	RMSF results from each replica.	required
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`NDArray[floating]`	(avg_rmsf_nm, sem_rmsf_nm). SEM is `None` when fewer than 2
`NDArray[floating] \| None`	replicas are provided.

The SEM on MSF is propagated through the sqrt transform: sem_rmsf = sem_msf / (2 * avg_rmsf).

`compute_delta_rmsf(results_a, results_b, *, indices_a=None, indices_b=None, residue_ids=None, dtype=None)` ¶

Compute per-residue RMSF difference between two systems.

The RMSF for each system is first averaged across replicas in MSF space (sqrt(mean(RMSF^2))), then the delta is taken as B minus A. Positive values indicate that system B is more flexible.

The SEM on each system's average RMSF is propagated through the sqrt transform (sem_rmsf = sem_msf / (2 * avg_rmsf)), then the two independent SEMs are combined in quadrature to give the SEM on the delta. At least 2 replicas per system are required for SEM; otherwise DeltaRMSFResult.sem_nm is None.

For systems with identical residue counts, indices_a and indices_b may be omitted and the comparison is element-wise.

For systems with different sequences, supply aligned index arrays so that indices_a[i] and indices_b[i] point to the same structural position in each system. The caller is responsible for generating these mappings (e.g. from a multiple sequence alignment).

Parameters:

Name	Type	Description	Default
`results_a`	`list[RMSFResult]`	RMSF results for system A (one per replica).	required
`results_b`	`list[RMSFResult]`	RMSF results for system B (one per replica).	required
`indices_a`	`NDArray[int_] \| None`	Optional 0-based residue indices into system A's RMSF array at aligned positions. Must have the same length as `indices_b`.	`None`
`indices_b`	`NDArray[int_] \| None`	Optional 0-based residue indices into system B's RMSF array at aligned positions.	`None`
`residue_ids`	`NDArray[int_] \| None`	Optional residue IDs for the x-axis of the resulting delta-RMSF (e.g. a reference sequence numbering). When `None` and indices are not provided, residue IDs are taken from `results_a[0]`. When `None` and indices are provided, residue IDs are taken from `results_a[0]` at the positions given by `indices_a`.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`DeltaRMSFResult`	DeltaRMSFResult with the per-residue difference and SEM.

Raises:

Type	Description
`ValueError`	If input lists are empty, replicas within a system have inconsistent lengths, index arrays differ in length, or unindexed systems have different residue counts.

Hydrogen Bonds¶

`mdpp.analysis.hbond` ¶

Hydrogen-bond analysis utilities.

`HBondResult(time_ps, triplets, presence, count_per_frame, occupancy, method, distance_cutoff_nm, angle_cutoff_deg)` `dataclass` ¶

Hydrogen-bond detection results.

`time_ns` `property` ¶

Return frame times in nanoseconds.

`format_hbond_triplets(topology, triplets)` ¶

Format donor-hydrogen-acceptor triplets into readable labels.

Parameters:

Name	Type	Description	Default
`topology`	`Topology`	Trajectory topology.	required
`triplets`	`NDArray[int_]`	Integer array with shape `(n_hbonds, 3)`.	required

Returns:

Type	Description
`list[str]`	List of labels such as `"ALA1:N-H ... GLU10:OE1"`.

`compute_hbonds(traj, *, method='baker_hubbard', exclude_water=True, periodic=True, sidechain_only=False, freq=0.1, distance_cutoff_nm=0.25, angle_cutoff_deg=120.0, timestep_ps=None, dtype=None)` ¶

Compute hydrogen bonds and per-frame counts.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`method`	`str`	Hydrogen bond method: `"baker_hubbard"` or `"wernet_nilsson"`.	`'baker_hubbard'`
`exclude_water`	`bool`	Whether to ignore water-mediated hydrogen bonds.	`True`
`periodic`	`bool`	Whether to apply periodic boundary conditions.	`True`
`sidechain_only`	`bool`	For `"baker_hubbard"`, restrict to sidechain interactions.	`False`
`freq`	`float`	For `"baker_hubbard"`, minimum occupancy fraction for returned bonds.	`0.1`
`distance_cutoff_nm`	`float`	H...A distance cutoff used for presence matrix.	`0.25`
`angle_cutoff_deg`	`float`	D-H...A angle cutoff used for presence matrix.	`120.0`
`timestep_ps`	`float \| None`	Optional frame timestep override in ps.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`HBondResult`	HBondResult containing detected bonds, occupancy, and per-frame counts.

Contacts¶

`mdpp.analysis.contacts` ¶

Contact analysis for molecular dynamics trajectories.

`ContactResult(time_ps, distances_nm, residue_pairs)` `dataclass` ¶

Per-frame inter-residue contact distances.

`time_ns` `property` ¶

Return frame times in nanoseconds.

`NativeContactResult(time_ps, fraction, native_pairs, cutoff_nm)` `dataclass` ¶

Fraction of native contacts (Q) over time.

`time_ns` `property` ¶

Return frame times in nanoseconds.

`compute_contacts(traj, *, contacts='all', scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None)` ¶

Compute inter-residue contact distances over time.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`contacts`	`str \| NDArray[int_]`	Residue pairs to monitor. `"all"` computes all pairs; otherwise an `(n_pairs, 2)` integer array of residue index pairs.	`'all'`
`scheme`	`str`	Contact scheme passed to `mdtraj.compute_contacts`. One of `"closest-heavy"`, `"closest"`, `"ca"`, or `"sidechain-heavy"`.	`'closest-heavy'`
`periodic`	`bool`	Whether to apply periodic boundary conditions.	`True`
`timestep_ps`	`float \| None`	Optional frame timestep override in ps.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`ContactResult`	ContactResult with per-frame distances and residue pair indices.

`compute_contact_frequency(traj, *, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, dtype=None)` ¶

Compute the fraction of frames each residue pair is in contact.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`cutoff_nm`	`float`	Distance threshold in nm below which a contact is counted.	`0.45`
`scheme`	`str`	Contact scheme passed to `mdtraj.compute_contacts`.	`'closest-heavy'`
`periodic`	`bool`	Whether to apply periodic boundary conditions.	`True`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`NDArray[floating]`	A tuple of `(frequency, residue_pairs)` where `frequency` has
`NDArray[int_]`	shape `(n_pairs,)` with values in `[0, 1]` and `residue_pairs`
`tuple[NDArray[floating], NDArray[int_]]`	has shape `(n_pairs, 2)`.

`compute_native_contacts(traj, *, reference_frame=0, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None)` ¶

Compute the fraction of native contacts Q(t) over time.

Native contacts are residue pairs that are within cutoff_nm in the reference frame. Q(t) is the fraction of those pairs that remain in contact at each frame.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`reference_frame`	`int`	Frame index defining native contacts.	`0`
`cutoff_nm`	`float`	Distance threshold in nm for a contact.	`0.45`
`scheme`	`str`	Contact scheme for `mdtraj.compute_contacts`.	`'closest-heavy'`
`periodic`	`bool`	Whether to apply periodic boundary conditions.	`True`
`timestep_ps`	`float \| None`	Optional frame timestep override in ps.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`NativeContactResult`	NativeContactResult with per-frame Q values.

Raises:

Type	Description
`ValueError`	If `reference_frame` is out of range or no native contacts are found.

Distance¶

`mdpp.analysis.distance` ¶

Pairwise distance analysis for molecular dynamics trajectories.

Five backends are available for pairwise distance computation:

The GPU backends (cupy, torch, jax) use vectorised fancy-index differencing. This materialises an intermediate array of shape (n_frames, n_pairs, 3) on the device, so GPU memory must be sufficient. The Numba backend computes element-by-element with no intermediate allocation, making it the fastest at small-to-medium scales and competitive even at large scales.

Benchmark results (24-core CPU, NVIDIA GPU)::

1K frames x 100 atoms (4,950 pairs)
  numba  0.005s  2.6x    mdtraj 0.013s  1.0x

3K frames x 200 atoms (19,900 pairs)
  numba  0.021s  7.9x    mdtraj 0.166s  1.0x

3K frames x 400 atoms (79,800 pairs)
  numba  0.059s 10.4x    mdtraj 0.617s  1.0x

GPU backends approach Numba at higher pair counts where device parallelism offsets transfer overhead. Use backend="numba" as the default for non-periodic featurisation workloads.

`DistanceResult(time_ps, distances_nm, atom_pairs)` `dataclass` ¶

Per-frame pairwise distances.

`time_ns` `property` ¶

Return frame times in nanoseconds.

`distances_angstrom` `property` ¶

Return distances in Angstrom.

`compute_distances(traj, *, atom_pairs, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None)` ¶

Compute pairwise distances between atom pairs over time.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`atom_pairs`	`ArrayLike`	Array of shape `(n_pairs, 2)` with atom index pairs.	required
`periodic`	`bool`	Whether to apply periodic boundary conditions.	`True`
`backend`	`DistanceBackend`	Distance computation backend. `"mdtraj"` (default, PBC-capable), `"numba"` (CPU-parallel), `"cupy"`/ `"torch"`/`"jax"` (GPU-accelerated). Non-mdtraj backends do not support periodic boundary conditions.	`'mdtraj'`
`timestep_ps`	`float \| None`	Optional frame timestep override in ps.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`DistanceResult`	DistanceResult with per-frame distances for each pair.

`compute_minimum_distance(traj, *, group1, group2, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None)` ¶

Compute the minimum distance between two atom groups per frame.

All pairwise distances between group1 and group2 atoms are computed, and the minimum per frame is returned.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`group1`	`str`	MDTraj selection string for the first group.	required
`group2`	`str`	MDTraj selection string for the second group.	required
`periodic`	`bool`	Whether to apply periodic boundary conditions.	`True`
`backend`	`DistanceBackend`	Distance computation backend. `"mdtraj"` (default, PBC-capable), `"numba"` (CPU-parallel), `"cupy"`/ `"torch"`/`"jax"` (GPU-accelerated). Non-mdtraj backends do not support periodic boundary conditions.	`'mdtraj'`
`timestep_ps`	`float \| None`	Optional frame timestep override in ps.	`None`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`DistanceResult`	DistanceResult where `distances_nm` has shape `(n_frames, 1)`
`DistanceResult`	and `atom_pairs` contains the closest pair at frame 0.

DSSP¶

`mdpp.analysis.dssp` ¶

Secondary structure assignment via DSSP.

`DSSPResult(assignments, residue_ids, frequency, categories)` `dataclass` ¶

Per-frame secondary structure assignments.

Attributes:

Name	Type	Description
`assignments`	`NDArray[str_]`	Character array of shape `(n_frames, n_residues)` with DSSP codes (`"H"`, `"E"`, `"C"` when simplified, or full 8-state codes otherwise).
`residue_ids`	`NDArray[int_]`	Residue sequence IDs corresponding to columns.
`frequency`	`NDArray[floating]`	Array of shape `(n_residues, n_categories)` giving the fraction of frames each residue spends in each secondary structure category.
`categories`	`list[str]`	List of unique category labels matching the last axis of `frequency`.

`compute_dssp(traj, *, simplified=True, dtype=None)` ¶

Compute per-residue secondary structure assignments across frames.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`simplified`	`bool`	If `True`, use 3-state classification (H=helix, E=sheet, C=coil). Otherwise use the full 8-state DSSP codes.	`True`
`dtype`	`DtypeArg`	Output float dtype for frequency array. If `None`, uses the package default.	`None`

Returns:

Type	Description
`DSSPResult`	DSSPResult with per-frame assignments and per-residue frequencies.

Decomposition¶

`mdpp.analysis.decomposition` ¶

Dimensionality reduction and feature engineering helpers.

`DistanceFeatures(values, pairs, atom_indices)` `dataclass` ¶

Pairwise distance features (e.g. CA-CA distances).

`TorsionFeatures(values, labels)` `dataclass` ¶

Backbone torsion features.

`PCAResult(projections, components, explained_variance_ratio, feature_mean, feature_scale, model)` `dataclass` ¶

Principal component analysis outputs.

`TICAResult(projections, lagtime, model)` `dataclass` ¶

Time-lagged independent component analysis outputs.

`featurize_backbone_torsions(traj, *, atom_selection='protein', sincos_embedding=True, dtype=None)` ¶

Featurize backbone phi/psi torsions.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`atom_selection`	`str \| None`	Optional atom selection before featurization.	`'protein'`
`sincos_embedding`	`bool`	If True (default), return `[cos(angle), sin(angle)]` columns instead of raw radian angles. This embedding handles the discontinuity at +/-pi correctly and is required for downstream PCA / TICA on circular variables. Set to False to keep raw radian angles (e.g. for Ramachandran plots). Note: this is unrelated to mdtraj's `periodic` argument for minimum image convention.	`True`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`TorsionFeatures`	TorsionFeatures with values and labels.

Raises:

Type	Description
`ValueError`	If no phi/psi torsions are available.

`featurize_ca_distances(traj, *, atom_selection='name CA', backend='mdtraj', periodic=False, dtype=None)` ¶

Featurize all pairwise distances between selected atoms.

Computes the N*(N-1)/2 pairwise distances for the selected atoms at each frame, producing a feature matrix suitable for PCA or TICA.

Five backends are available (see :func:mdpp.analysis.distance._compute_pairwise_distances): "mdtraj" (default, PBC-capable, single-threaded), "numba" (CPU-parallel), and "cupy"/"torch"/"jax" (GPU-accelerated). Non-mdtraj backends do not support periodic boundary conditions.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`atom_selection`	`str`	MDTraj selection string for the atoms to include. Defaults to `"name CA"` for alpha-carbon distances.	`'name CA'`
`backend`	`DistanceBackend`	Distance computation backend. Defaults to `"mdtraj"` for API consistency; switch to `"numba"` (CPU-parallel) or `"cupy"`/`"torch"`/`"jax"` (GPU-accelerated) explicitly when performance matters.	`'mdtraj'`
`periodic`	`bool`	Whether to apply minimum image convention. Only effective with `backend="mdtraj"`.	`False`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`DistanceFeatures`	DistanceFeatures with values, atom pairs, and atom indices.

Raises:

Type	Description
`ValueError`	If the selection matches fewer than 2 atoms, or an unknown backend is requested.

`compute_pca(features, *, n_components=2, standardize=True, dtype=None)` ¶

Compute PCA projection from feature vectors.

Sklearn PCA (>= 1.8) preserves input dtype (float32 or float64).

Parameters:

Name	Type	Description	Default
`features`	`ArrayLike`	Input feature matrix `(n_samples, n_features)`.	required
`n_components`	`int`	Number of principal components.	`2`
`standardize`	`bool`	Whether to z-score features before PCA.	`True`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`PCAResult`	PCAResult containing projections and explained variance ratio.

`project_pca(features, *, fitted, dtype=None)` ¶

Project new features using a previously fitted PCA.

The features are standardized using the mean and scale from the fitted PCA, then transformed using its model. This is the correct way to project a second dataset (e.g. a different system) onto the same principal component axes for direct comparison.

Parameters:

Name	Type	Description	Default
`features`	`ArrayLike`	Input feature matrix `(n_samples, n_features)`. Must have the same number of features as the fitted PCA.	required
`fitted`	`PCAResult`	PCAResult from a previous `compute_pca` call whose principal component axes will be used.	required
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`PCAResult`	PCAResult with projections onto the fitted PCA axes. The
`PCAResult`	`components`, `explained_variance_ratio`, `feature_mean`,
`PCAResult`	`feature_scale`, and `model` are shared from `fitted`.

Raises:

Type	Description
`ValueError`	If the feature dimension does not match the fitted PCA.

`compute_tica(features, *, lagtime, n_components=2, dtype=None)` ¶

Compute TICA projection from feature vectors.

Deeptime upcasts to float64 internally for covariance estimation, so the input dtype does not affect numerical accuracy. The dtype parameter controls the dtype of the output arrays only.

Parameters:

Name	Type	Description	Default
`features`	`ArrayLike`	Input feature matrix `(n_samples, n_features)`.	required
`lagtime`	`int`	Lag time in frames.	required
`n_components`	`int`	Number of independent components.	`2`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default (see :func:`mdpp.set_default_dtype`).	`None`

Returns:

Type	Description
`TICAResult`	TICAResult containing projected coordinates and fitted model.

Free Energy Surface¶

`mdpp.analysis.fes` ¶

Free-energy surface computation utilities.

`FES2DResult(free_energy_kj_mol, probability_density, x_edges, y_edges, observed_mask, temperature_k)` `dataclass` ¶

2D free-energy surface derived from a histogram.

`x_centers` `property` ¶

Return x-axis bin centers.

`y_centers` `property` ¶

Return y-axis bin centers.

`compute_fes_2d(x_values, y_values, *, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None)` ¶

Compute a 2D free-energy surface from two collective variables.

np.histogram2d returns float64 probability density regardless of input dtype, so the log/energy arithmetic naturally runs in float64. Output arrays are cast to dtype.

Parameters:

Name	Type	Description	Default
`x_values`	`ArrayLike`	Samples for CV1.	required
`y_values`	`ArrayLike`	Samples for CV2.	required
`bins`	`BinsType`	Histogram bin count.	`100`
`value_range`	`RangeType`	Optional `((x_min, x_max), (y_min, y_max))` range.	`None`
`temperature_k`	`float`	Temperature in Kelvin.	`DEFAULT_TEMPERATURE_K`
`min_probability`	`float`	Lower bound to avoid `log(0)`.	`1e-12`
`mask_unsampled`	`bool`	If True, unsampled bins are set to `NaN`.	`True`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`FES2DResult`	FES2DResult with free energy shifted so its minimum is 0.

`compute_fes_from_projection(projection, *, x_index=0, y_index=1, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None)` ¶

Compute a 2D FES from a projection matrix.

Parameters:

Name	Type	Description	Default
`projection`	`ArrayLike`	Matrix `(n_samples, n_components)`.	required
`x_index`	`int`	Component index for x-axis.	`0`
`y_index`	`int`	Component index for y-axis.	`1`
`bins`	`BinsType`	Histogram bin count.	`100`
`value_range`	`RangeType`	Optional histogram range.	`None`
`temperature_k`	`float`	Temperature in Kelvin.	`DEFAULT_TEMPERATURE_K`
`min_probability`	`float`	Lower bound to avoid `log(0)`.	`1e-12`
`mask_unsampled`	`bool`	If True, unsampled bins are set to `NaN`.	`True`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`FES2DResult`	FES2DResult computed from selected projection components.

Clustering¶

`mdpp.analysis.clustering` ¶

Conformational clustering from RMSD matrices.

Each clustering algorithm is a frozen dataclass configured at construction time and invoked as a callable::

result = Gromos(cutoff_nm=0.2)(rmsd_matrix)
result = DBSCAN(eps=0.15, min_samples=5)(rmsd_matrix)
result = KMeans(n_clusters=10)(pca.projections)

`RMSDMatrixResult(rmsd_matrix_nm, atom_indices)` `dataclass` ¶

Pairwise RMSD matrix between trajectory frames.

`rmsd_matrix_angstrom` `property` ¶

Return the RMSD matrix in Angstrom.

Note

Each access allocates a new (n_frames, n_frames) array. Cache the result in a local variable if you need it more than once -- at 120k frames this is ~54 GB per call.

`ClusteringResult(labels, n_clusters, medoid_frames)` `dataclass` ¶

Conformational clustering output.

`FeatureClusteringResult(labels, n_clusters, cluster_centers, medoid_frames, inertia)` `dataclass` ¶

Clustering result from feature-vector-based methods.

`Gromos(cutoff_nm=0.15)` `dataclass` ¶

GROMOS clustering (Daura et al. 1999).

Greedy largest-cluster-first assignment via Numba-JIT kernels. O(n) auxiliary memory -- no copies of the RMSD matrix.

Parameters:

Name	Type	Description	Default
`cutoff_nm`	`float`	Neighbour cutoff in nm.	`0.15`

Example::

result = Gromos(cutoff_nm=0.2)(rmsd_matrix)

`call(rmsd_matrix)` ¶

Cluster rmsd_matrix and return a :class:ClusteringResult.

`Hierarchical(linkage_method='average', distance_threshold=0.15, n_clusters=None)` `dataclass` ¶

Agglomerative hierarchical clustering (scipy).

Uses distance_threshold by default. Set n_clusters to use a fixed cluster count instead (overrides distance_threshold).

Note

Scipy builds an O(n^2) float64 condensed distance matrix internally. At 120k frames this is ~57 GB.

Parameters:

Name	Type	Description	Default
`linkage_method`	`str`	`"average"`, `"complete"`, or `"single"`. `"ward"` is not valid for RMSD matrices.	`'average'`
`distance_threshold`	`float`	Distance cutoff in nm.	`0.15`
`n_clusters`	`int \| None`	Fixed cluster count (overrides distance_threshold).	`None`

Example::

result = Hierarchical(linkage_method="average", distance_threshold=0.2)(rmsd_matrix)

`call(rmsd_matrix)` ¶

Cluster rmsd_matrix and return a :class:ClusteringResult.

`DBSCAN(eps=0.15, min_samples=5, backend='numba')` `dataclass` ¶

DBSCAN density-based clustering.

Two backends:

"numba" (default) -- custom Numba-JIT kernel. Reuses the parallel neighbour-count kernel from GROMOS and a sequential BFS for label assignment. O(n) auxiliary memory, no copies.
"sklearn" -- official scikit-learn DBSCAN with metric="precomputed".

Noise frames receive label -1.

Parameters:

Name	Type	Description	Default
`eps`	`float`	Neighbourhood radius in nm.	`0.15`
`min_samples`	`int`	Minimum neighbours (including self) for a core point.	`5`
`backend`	`Literal['numba', 'sklearn']`	`"numba"` or `"sklearn"`.	`'numba'`

Example::

result = DBSCAN(eps=0.15, min_samples=5)(rmsd_matrix)
result = DBSCAN(eps=0.15, backend="sklearn")(rmsd_matrix)

`call(rmsd_matrix)` ¶

Cluster rmsd_matrix and return a :class:ClusteringResult.

`HDBSCAN(min_cluster_size=5, min_samples=5)` `dataclass` ¶

HDBSCAN hierarchical density-based clustering (sklearn >= 1.3).

Noise frames receive label -1.

Parameters:

Name	Type	Description	Default
`min_cluster_size`	`int`	Minimum number of frames in a cluster.	`5`
`min_samples`	`int`	Number of neighbours for core-point estimation.	`5`

Example::

result = HDBSCAN(min_cluster_size=50, min_samples=5)(rmsd_matrix)

`call(rmsd_matrix)` ¶

Cluster rmsd_matrix and return a :class:ClusteringResult.

`KMeans(n_clusters=10, random_state=42, dtype=None)` `dataclass` ¶

K-Means clustering (scikit-learn).

Parameters:

Name	Type	Description	Default
`n_clusters`	`int`	Number of clusters.	`10`
`random_state`	`int \| None`	Seed for centroid initialisation. Defaults to 42 for reproducible runs across sessions. Pass `None` to let scikit-learn pick a non-deterministic seed.	`42`
`dtype`	`DtypeArg`	Output float dtype for cluster_centers.	`None`

Example::

result = KMeans(n_clusters=10)(pca.projections)
result = KMeans(n_clusters=10, random_state=None)(pca.projections)

`call(features)` ¶

Cluster features and return a :class:FeatureClusteringResult.

`MiniBatchKMeans(n_clusters=10, batch_size=1024, random_state=42, dtype=None)` `dataclass` ¶

Mini-Batch K-Means clustering (scikit-learn).

Parameters:

Name	Type	Description	Default
`n_clusters`	`int`	Number of clusters.	`10`
`batch_size`	`int`	Mini-batch size.	`1024`
`random_state`	`int \| None`	Seed for centroid initialisation and mini-batch sampling. Defaults to 42 for reproducible runs across sessions. Pass `None` to let scikit-learn pick a non-deterministic seed.	`42`
`dtype`	`DtypeArg`	Output float dtype for cluster_centers.	`None`

Example::

result = MiniBatchKMeans(n_clusters=10, batch_size=1024)(pca.projections)
result = MiniBatchKMeans(n_clusters=10, random_state=None)(pca.projections)

`call(features)` ¶

Cluster features and return a :class:FeatureClusteringResult.

`RegularSpace(dmin=0.5, dtype=None)` `dataclass` ¶

Regular-space clustering (deeptime).

The number of clusters is determined by dmin, not specified upfront.

Parameters:

Name	Type	Description	Default
`dmin`	`float`	Minimum distance between cluster centres.	`0.5`
`dtype`	`DtypeArg`	Output float dtype for cluster_centers.	`None`

Example::

result = RegularSpace(dmin=0.5)(pca.projections)

`call(features)` ¶

Cluster features and return a :class:FeatureClusteringResult.

`compute_rmsd_matrix(traj, *, atom_selection='backbone', backend='mdtraj', dtype=None)` ¶

Compute an all-vs-all RMSD matrix between trajectory frames.

Parameters:

Name	Type	Description	Default
`traj`	`Trajectory`	Input trajectory.	required
`atom_selection`	`str`	Atoms used for RMSD calculation.	`'backbone'`
`backend`	`RMSDBackend`	Computation backend. Defaults to `"mdtraj"` for API consistency with other analysis functions; switch to a faster backend explicitly when performance matters. `"mdtraj"` (default) -- mdtraj precentered RMSD loop (CPU). `"numba"` -- Numba-parallel QCP kernel (CPU, 50-200x faster). `"torch"` -- PyTorch einsum + QCP (CUDA/CPU, float32). `"jax"` -- JAX einsum + QCP (GPU/TPU/CPU, float32). `"cupy"` -- CuPy einsum + QCP (CUDA, float32).	`'mdtraj'`
`dtype`	`DtypeArg`	Output float dtype. If `None`, uses the package default.	`None`

Returns:

Type	Description
`RMSDMatrixResult`	RMSDMatrixResult with a symmetric `(n_frames, n_frames)` matrix.

Raises:

Type	Description
`ValueError`	If an unsupported backend is specified.
`ImportError`	If the requested backend package is not installed.

Memory note

Every backend returns its native float32 output matrix (the numba kernel uses float64 accumulators internally but stores float32 in the result buffer; GPU kernels compute in float32 end-to-end). This wrapper casts with copy=False so when the resolved dtype is float32 (the package default) there is no second copy of the (n_frames, n_frames) matrix. For a 120k-frame trajectory this saves ~115 GB of peak RAM versus the old "cast to float64 for the Protocol contract, then cast back" path. Passing dtype=np.float64 still forces a one-time upcast.

Analysis API Reference¶

Metrics¶

mdpp.analysis.metrics ¶

RMSDResult(time_ps, rmsd_nm, atom_indices) dataclass ¶

time_ns property ¶

rmsd_angstrom property ¶

RMSFResult(rmsf_nm, atom_indices, residue_ids) dataclass ¶

rmsf_angstrom property ¶

DeltaRMSFResult(delta_rmsf_nm, residue_ids, sem_nm) dataclass ¶

delta_rmsf_angstrom property ¶

sem_angstrom property ¶

DCCMResult(correlation, atom_indices, residue_ids) dataclass ¶

SASAResult(time_ps, values_nm2, atom_indices, mode, residue_ids) dataclass ¶

time_ns property ¶

total_nm2 property ¶

RadiusOfGyrationResult(time_ps, radius_gyration_nm, atom_indices) dataclass ¶

time_ns property ¶

radius_gyration_angstrom property ¶

compute_rmsd(traj, *, atom_selection='backbone', reference_frame=0, timestep_ps=None, dtype=None) ¶

compute_rmsf(traj, *, atom_selection='name CA', dtype=None) ¶

compute_dccm(traj, *, atom_selection='name CA', backend='numpy', dtype=None) ¶

compute_sasa(traj, *, atom_selection='protein', mode='residue', probe_radius=0.14, n_sphere_points=960, timestep_ps=None, dtype=None) ¶

compute_radius_of_gyration(traj, *, atom_selection='protein', timestep_ps=None, dtype=None) ¶

average_rmsf_with_sem(results, *, dtype=None) ¶

compute_delta_rmsf(results_a, results_b, *, indices_a=None, indices_b=None, residue_ids=None, dtype=None) ¶

Hydrogen Bonds¶

mdpp.analysis.hbond ¶

HBondResult(time_ps, triplets, presence, count_per_frame, occupancy, method, distance_cutoff_nm, angle_cutoff_deg) dataclass ¶

time_ns property ¶

format_hbond_triplets(topology, triplets) ¶

compute_hbonds(traj, *, method='baker_hubbard', exclude_water=True, periodic=True, sidechain_only=False, freq=0.1, distance_cutoff_nm=0.25, angle_cutoff_deg=120.0, timestep_ps=None, dtype=None) ¶

Contacts¶

mdpp.analysis.contacts ¶

ContactResult(time_ps, distances_nm, residue_pairs) dataclass ¶

time_ns property ¶

NativeContactResult(time_ps, fraction, native_pairs, cutoff_nm) dataclass ¶

time_ns property ¶

compute_contacts(traj, *, contacts='all', scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None) ¶

compute_contact_frequency(traj, *, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, dtype=None) ¶

compute_native_contacts(traj, *, reference_frame=0, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None) ¶

Distance¶

mdpp.analysis.distance ¶

DistanceResult(time_ps, distances_nm, atom_pairs) dataclass ¶

time_ns property ¶

distances_angstrom property ¶

compute_distances(traj, *, atom_pairs, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None) ¶

compute_minimum_distance(traj, *, group1, group2, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None) ¶

DSSP¶

mdpp.analysis.dssp ¶

DSSPResult(assignments, residue_ids, frequency, categories) dataclass ¶

compute_dssp(traj, *, simplified=True, dtype=None) ¶

Decomposition¶

mdpp.analysis.decomposition ¶

DistanceFeatures(values, pairs, atom_indices) dataclass ¶

TorsionFeatures(values, labels) dataclass ¶

PCAResult(projections, components, explained_variance_ratio, feature_mean, feature_scale, model) dataclass ¶

TICAResult(projections, lagtime, model) dataclass ¶

featurize_backbone_torsions(traj, *, atom_selection='protein', sincos_embedding=True, dtype=None) ¶

featurize_ca_distances(traj, *, atom_selection='name CA', backend='mdtraj', periodic=False, dtype=None) ¶

compute_pca(features, *, n_components=2, standardize=True, dtype=None) ¶

project_pca(features, *, fitted, dtype=None) ¶

compute_tica(features, *, lagtime, n_components=2, dtype=None) ¶

Free Energy Surface¶

mdpp.analysis.fes ¶

FES2DResult(free_energy_kj_mol, probability_density, x_edges, y_edges, observed_mask, temperature_k) dataclass ¶

x_centers property ¶

y_centers property ¶

compute_fes_2d(x_values, y_values, *, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None) ¶

compute_fes_from_projection(projection, *, x_index=0, y_index=1, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None) ¶

Clustering¶

mdpp.analysis.clustering ¶

RMSDMatrixResult(rmsd_matrix_nm, atom_indices) dataclass ¶

rmsd_matrix_angstrom property ¶

ClusteringResult(labels, n_clusters, medoid_frames) dataclass ¶

FeatureClusteringResult(labels, n_clusters, cluster_centers, medoid_frames, inertia) dataclass ¶

Gromos(cutoff_nm=0.15) dataclass ¶

__call__(rmsd_matrix) ¶

Hierarchical(linkage_method='average', distance_threshold=0.15, n_clusters=None) dataclass ¶

__call__(rmsd_matrix) ¶

DBSCAN(eps=0.15, min_samples=5, backend='numba') dataclass ¶

`mdpp.analysis.metrics` ¶

`RMSDResult(time_ps, rmsd_nm, atom_indices)` `dataclass` ¶

`time_ns` `property` ¶

`rmsd_angstrom` `property` ¶

`RMSFResult(rmsf_nm, atom_indices, residue_ids)` `dataclass` ¶

`rmsf_angstrom` `property` ¶

`DeltaRMSFResult(delta_rmsf_nm, residue_ids, sem_nm)` `dataclass` ¶

`delta_rmsf_angstrom` `property` ¶

`sem_angstrom` `property` ¶

`DCCMResult(correlation, atom_indices, residue_ids)` `dataclass` ¶

`SASAResult(time_ps, values_nm2, atom_indices, mode, residue_ids)` `dataclass` ¶

`time_ns` `property` ¶

`total_nm2` `property` ¶

`RadiusOfGyrationResult(time_ps, radius_gyration_nm, atom_indices)` `dataclass` ¶

`time_ns` `property` ¶

`radius_gyration_angstrom` `property` ¶

`compute_rmsd(traj, *, atom_selection='backbone', reference_frame=0, timestep_ps=None, dtype=None)` ¶

`compute_rmsf(traj, *, atom_selection='name CA', dtype=None)` ¶

`compute_dccm(traj, *, atom_selection='name CA', backend='numpy', dtype=None)` ¶

`compute_sasa(traj, *, atom_selection='protein', mode='residue', probe_radius=0.14, n_sphere_points=960, timestep_ps=None, dtype=None)` ¶

`compute_radius_of_gyration(traj, *, atom_selection='protein', timestep_ps=None, dtype=None)` ¶

`average_rmsf_with_sem(results, *, dtype=None)` ¶

`compute_delta_rmsf(results_a, results_b, *, indices_a=None, indices_b=None, residue_ids=None, dtype=None)` ¶

`mdpp.analysis.hbond` ¶

`HBondResult(time_ps, triplets, presence, count_per_frame, occupancy, method, distance_cutoff_nm, angle_cutoff_deg)` `dataclass` ¶

`time_ns` `property` ¶

`format_hbond_triplets(topology, triplets)` ¶

`compute_hbonds(traj, *, method='baker_hubbard', exclude_water=True, periodic=True, sidechain_only=False, freq=0.1, distance_cutoff_nm=0.25, angle_cutoff_deg=120.0, timestep_ps=None, dtype=None)` ¶

`mdpp.analysis.contacts` ¶

`ContactResult(time_ps, distances_nm, residue_pairs)` `dataclass` ¶

`time_ns` `property` ¶

`NativeContactResult(time_ps, fraction, native_pairs, cutoff_nm)` `dataclass` ¶

`time_ns` `property` ¶

`compute_contacts(traj, *, contacts='all', scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None)` ¶

`compute_contact_frequency(traj, *, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, dtype=None)` ¶

`compute_native_contacts(traj, *, reference_frame=0, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None)` ¶

`mdpp.analysis.distance` ¶

`DistanceResult(time_ps, distances_nm, atom_pairs)` `dataclass` ¶

`time_ns` `property` ¶

`distances_angstrom` `property` ¶

`compute_distances(traj, *, atom_pairs, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None)` ¶

`compute_minimum_distance(traj, *, group1, group2, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None)` ¶

`mdpp.analysis.dssp` ¶

`DSSPResult(assignments, residue_ids, frequency, categories)` `dataclass` ¶

`compute_dssp(traj, *, simplified=True, dtype=None)` ¶

`mdpp.analysis.decomposition` ¶

`DistanceFeatures(values, pairs, atom_indices)` `dataclass` ¶

`TorsionFeatures(values, labels)` `dataclass` ¶

`PCAResult(projections, components, explained_variance_ratio, feature_mean, feature_scale, model)` `dataclass` ¶

`TICAResult(projections, lagtime, model)` `dataclass` ¶

`featurize_backbone_torsions(traj, *, atom_selection='protein', sincos_embedding=True, dtype=None)` ¶

`featurize_ca_distances(traj, *, atom_selection='name CA', backend='mdtraj', periodic=False, dtype=None)` ¶

`compute_pca(features, *, n_components=2, standardize=True, dtype=None)` ¶

`project_pca(features, *, fitted, dtype=None)` ¶

`compute_tica(features, *, lagtime, n_components=2, dtype=None)` ¶

`mdpp.analysis.fes` ¶

`FES2DResult(free_energy_kj_mol, probability_density, x_edges, y_edges, observed_mask, temperature_k)` `dataclass` ¶

`x_centers` `property` ¶

`y_centers` `property` ¶

`compute_fes_2d(x_values, y_values, *, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None)` ¶

`compute_fes_from_projection(projection, *, x_index=0, y_index=1, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None)` ¶

`mdpp.analysis.clustering` ¶

`RMSDMatrixResult(rmsd_matrix_nm, atom_indices)` `dataclass` ¶

`rmsd_matrix_angstrom` `property` ¶

`ClusteringResult(labels, n_clusters, medoid_frames)` `dataclass` ¶

`FeatureClusteringResult(labels, n_clusters, cluster_centers, medoid_frames, inertia)` `dataclass` ¶

`Gromos(cutoff_nm=0.15)` `dataclass` ¶

`call(rmsd_matrix)` ¶

`Hierarchical(linkage_method='average', distance_threshold=0.15, n_clusters=None)` `dataclass` ¶

`call(rmsd_matrix)` ¶

`DBSCAN(eps=0.15, min_samples=5, backend='numba')` `dataclass` ¶

`call(rmsd_matrix)` ¶

`HDBSCAN(min_cluster_size=5, min_samples=5)` `dataclass` ¶

`call(rmsd_matrix)` ¶

`KMeans(n_clusters=10, random_state=42, dtype=None)` `dataclass` ¶

`call(features)` ¶

`MiniBatchKMeans(n_clusters=10, batch_size=1024, random_state=42, dtype=None)` `dataclass` ¶

`call(features)` ¶

`RegularSpace(dmin=0.5, dtype=None)` `dataclass` ¶

`call(features)` ¶