Analysis API Reference¶
Metrics¶
mdpp.analysis.metrics
¶
Core structure and dynamics metrics computed from trajectories.
RMSDResult(time_ps, rmsd_nm, atom_indices)
dataclass
¶
RMSFResult(rmsf_nm, atom_indices, residue_ids)
dataclass
¶
Per-atom RMSF values.
rmsf_angstrom
property
¶
Return RMSF values in Angstrom.
DeltaRMSFResult(delta_rmsf_nm, residue_ids, sem_nm)
dataclass
¶
Per-residue RMSF difference between two systems (B minus A).
Averaging is done in MSF (mean-square fluctuation) space: per-residue RMSF^2 values are averaged across replicas, then the square root is taken. The delta is computed on the resulting average RMSF values.
The SEM on each system's average RMSF is propagated through the sqrt transform, then the two independent SEMs are combined in quadrature to give the SEM on the delta.
DCCMResult(correlation, atom_indices, residue_ids)
dataclass
¶
Dynamic cross-correlation matrix.
SASAResult(time_ps, values_nm2, atom_indices, mode, residue_ids)
dataclass
¶
RadiusOfGyrationResult(time_ps, radius_gyration_nm, atom_indices)
dataclass
¶
compute_rmsd(traj, *, atom_selection='backbone', reference_frame=0, timestep_ps=None, dtype=None)
¶
Compute RMSD over time.
The trajectory should be aligned before calling this function
(see :func:~mdpp.core.trajectory.align_trajectory).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory (pre-aligned). |
required |
atom_selection
|
str
|
Atoms used in RMSD calculation. |
'backbone'
|
reference_frame
|
int
|
Reference frame index for RMSD. |
0
|
timestep_ps
|
float | None
|
Optional time step in ps to override trajectory time. |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
RMSDResult
|
RMSDResult containing time and RMSD. |
compute_rmsf(traj, *, atom_selection='name CA', dtype=None)
¶
Compute per-atom RMSF from positional fluctuations.
The trajectory should be aligned before calling this function
(see :func:~mdpp.core.trajectory.align_trajectory).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory (pre-aligned). |
required |
atom_selection
|
str
|
Atoms included in RMSF calculation. |
'name CA'
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
RMSFResult
|
RMSFResult with atom and residue mapping. |
compute_dccm(traj, *, atom_selection='name CA', backend='numpy', dtype=None)
¶
Compute dynamic cross-correlation matrix (DCCM).
The trajectory should be aligned before calling this function
(see :func:~mdpp.core.trajectory.align_trajectory).
The covariance is dispatched through a pluggable backend registry
(see :mod:mdpp.analysis._backends._dccm). The default
"numpy" backend uses BLAS GEMM via reshape + matmul -- this is
multi-threaded out of the box, unlike np.einsum which falls
back to a single-threaded contraction loop and becomes the bottleneck
for any non-trivial trajectory. Other backends ("numba",
"torch", "jax", "cupy") are available for users who
want explicit CPU parallelism or GPU acceleration.
mdtraj stores coordinates in float32; the covariance kernel runs in the backend's native dtype and the wrapper casts to the resolved dtype (float32 by default). Float32 precision is sufficient: empirical tests show a maximum correlation error of ~4e-6 relative to float64, well below any physically meaningful threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory (pre-aligned). |
required |
atom_selection
|
str
|
Atoms used in DCCM. |
'name CA'
|
backend
|
DCCMBackend
|
Compute backend. One of |
'numpy'
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
DCCMResult
|
DCCMResult with correlation matrix and residue IDs. |
compute_sasa(traj, *, atom_selection='protein', mode='residue', probe_radius=0.14, n_sphere_points=960, timestep_ps=None, dtype=None)
¶
Compute solvent-accessible surface area via Shrake-Rupley.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
atom_selection
|
str | None
|
Optional atom selection before SASA. |
'protein'
|
mode
|
str
|
Either |
'residue'
|
probe_radius
|
float
|
Probe radius in nm. |
0.14
|
n_sphere_points
|
int
|
Number of sphere points per atom. |
960
|
timestep_ps
|
float | None
|
Optional timestep override in ps. |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
SASAResult
|
SASAResult containing frame-resolved values. |
compute_radius_of_gyration(traj, *, atom_selection='protein', timestep_ps=None, dtype=None)
¶
Compute radius of gyration over time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
atom_selection
|
str
|
Atom selection used to compute radius of gyration. |
'protein'
|
timestep_ps
|
float | None
|
Optional timestep override in ps. |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
RadiusOfGyrationResult
|
RadiusOfGyrationResult with per-frame values. |
average_rmsf_with_sem(results, *, dtype=None)
¶
Average RMSF across replicas in MSF space and propagate SEM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
list[RMSFResult]
|
RMSF results from each replica. |
required |
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
NDArray[floating]
|
(avg_rmsf_nm, sem_rmsf_nm). SEM is |
NDArray[floating] | None
|
replicas are provided. |
The SEM on MSF is propagated through the sqrt transform:
sem_rmsf = sem_msf / (2 * avg_rmsf).
compute_delta_rmsf(results_a, results_b, *, indices_a=None, indices_b=None, residue_ids=None, dtype=None)
¶
Compute per-residue RMSF difference between two systems.
The RMSF for each system is first averaged across replicas in MSF space
(sqrt(mean(RMSF^2))), then the delta is taken as B minus A.
Positive values indicate that system B is more flexible.
The SEM on each system's average RMSF is propagated through the sqrt
transform (sem_rmsf = sem_msf / (2 * avg_rmsf)), then the two
independent SEMs are combined in quadrature to give the SEM on the
delta. At least 2 replicas per system are required for SEM; otherwise
DeltaRMSFResult.sem_nm is None.
For systems with identical residue counts, indices_a and
indices_b may be omitted and the comparison is element-wise.
For systems with different sequences, supply aligned index arrays
so that indices_a[i] and indices_b[i] point to the same
structural position in each system. The caller is responsible for
generating these mappings (e.g. from a multiple sequence alignment).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_a
|
list[RMSFResult]
|
RMSF results for system A (one per replica). |
required |
results_b
|
list[RMSFResult]
|
RMSF results for system B (one per replica). |
required |
indices_a
|
NDArray[int_] | None
|
Optional 0-based residue indices into system A's RMSF
array at aligned positions. Must have the same length as
|
None
|
indices_b
|
NDArray[int_] | None
|
Optional 0-based residue indices into system B's RMSF array at aligned positions. |
None
|
residue_ids
|
NDArray[int_] | None
|
Optional residue IDs for the x-axis of the resulting
delta-RMSF (e.g. a reference sequence numbering). When
|
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
DeltaRMSFResult
|
DeltaRMSFResult with the per-residue difference and SEM. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If input lists are empty, replicas within a system have inconsistent lengths, index arrays differ in length, or unindexed systems have different residue counts. |
Hydrogen Bonds¶
mdpp.analysis.hbond
¶
Hydrogen-bond analysis utilities.
HBondResult(time_ps, triplets, presence, count_per_frame, occupancy, method, distance_cutoff_nm, angle_cutoff_deg)
dataclass
¶
Hydrogen-bond detection results.
time_ns
property
¶
Return frame times in nanoseconds.
format_hbond_triplets(topology, triplets)
¶
Format donor-hydrogen-acceptor triplets into readable labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topology
|
Topology
|
Trajectory topology. |
required |
triplets
|
NDArray[int_]
|
Integer array with shape |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of labels such as |
compute_hbonds(traj, *, method='baker_hubbard', exclude_water=True, periodic=True, sidechain_only=False, freq=0.1, distance_cutoff_nm=0.25, angle_cutoff_deg=120.0, timestep_ps=None, dtype=None)
¶
Compute hydrogen bonds and per-frame counts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
method
|
str
|
Hydrogen bond method: |
'baker_hubbard'
|
exclude_water
|
bool
|
Whether to ignore water-mediated hydrogen bonds. |
True
|
periodic
|
bool
|
Whether to apply periodic boundary conditions. |
True
|
sidechain_only
|
bool
|
For |
False
|
freq
|
float
|
For |
0.1
|
distance_cutoff_nm
|
float
|
H...A distance cutoff used for presence matrix. |
0.25
|
angle_cutoff_deg
|
float
|
D-H...A angle cutoff used for presence matrix. |
120.0
|
timestep_ps
|
float | None
|
Optional frame timestep override in ps. |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
HBondResult
|
HBondResult containing detected bonds, occupancy, and per-frame counts. |
Contacts¶
mdpp.analysis.contacts
¶
Contact analysis for molecular dynamics trajectories.
ContactResult(time_ps, distances_nm, residue_pairs)
dataclass
¶
Per-frame inter-residue contact distances.
time_ns
property
¶
Return frame times in nanoseconds.
NativeContactResult(time_ps, fraction, native_pairs, cutoff_nm)
dataclass
¶
Fraction of native contacts (Q) over time.
time_ns
property
¶
Return frame times in nanoseconds.
compute_contacts(traj, *, contacts='all', scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None)
¶
Compute inter-residue contact distances over time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
contacts
|
str | NDArray[int_]
|
Residue pairs to monitor. |
'all'
|
scheme
|
str
|
Contact scheme passed to |
'closest-heavy'
|
periodic
|
bool
|
Whether to apply periodic boundary conditions. |
True
|
timestep_ps
|
float | None
|
Optional frame timestep override in ps. |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
ContactResult
|
ContactResult with per-frame distances and residue pair indices. |
compute_contact_frequency(traj, *, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, dtype=None)
¶
Compute the fraction of frames each residue pair is in contact.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
cutoff_nm
|
float
|
Distance threshold in nm below which a contact is counted. |
0.45
|
scheme
|
str
|
Contact scheme passed to |
'closest-heavy'
|
periodic
|
bool
|
Whether to apply periodic boundary conditions. |
True
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
NDArray[floating]
|
A tuple of |
NDArray[int_]
|
shape |
tuple[NDArray[floating], NDArray[int_]]
|
has shape |
compute_native_contacts(traj, *, reference_frame=0, cutoff_nm=0.45, scheme='closest-heavy', periodic=True, timestep_ps=None, dtype=None)
¶
Compute the fraction of native contacts Q(t) over time.
Native contacts are residue pairs that are within cutoff_nm in
the reference frame. Q(t) is the fraction of those pairs that remain
in contact at each frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
reference_frame
|
int
|
Frame index defining native contacts. |
0
|
cutoff_nm
|
float
|
Distance threshold in nm for a contact. |
0.45
|
scheme
|
str
|
Contact scheme for |
'closest-heavy'
|
periodic
|
bool
|
Whether to apply periodic boundary conditions. |
True
|
timestep_ps
|
float | None
|
Optional frame timestep override in ps. |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
NativeContactResult
|
NativeContactResult with per-frame Q values. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Distance¶
mdpp.analysis.distance
¶
Pairwise distance analysis for molecular dynamics trajectories.
Five backends are available for pairwise distance computation:
+----------+------------------+-----+----------------------------+
| Backend | Device | PBC | Dependency |
+==========+==================+=====+============================+
| mdtraj | CPU (1 thread) | Yes | built-in |
| numba | CPU (all cores) | No | built-in (numba) |
| cupy | GPU (CUDA) | No | pip install cupy-cuda12x|
| torch | GPU (CUDA) / CPU | No | pip install torch |
| jax | GPU / TPU / CPU | No | pip install jax[cuda12]|
+----------+------------------+-----+----------------------------+
The GPU backends (cupy, torch, jax) use vectorised fancy-index
differencing. This materialises an intermediate array of shape
(n_frames, n_pairs, 3) on the device, so GPU memory must be
sufficient. The Numba backend computes element-by-element with no
intermediate allocation, making it the fastest at small-to-medium
scales and competitive even at large scales.
Benchmark results (24-core CPU, NVIDIA GPU)::
1K frames x 100 atoms (4,950 pairs)
numba 0.005s 2.6x mdtraj 0.013s 1.0x
3K frames x 200 atoms (19,900 pairs)
numba 0.021s 7.9x mdtraj 0.166s 1.0x
3K frames x 400 atoms (79,800 pairs)
numba 0.059s 10.4x mdtraj 0.617s 1.0x
GPU backends approach Numba at higher pair counts where device
parallelism offsets transfer overhead. Use backend="numba"
as the default for non-periodic featurisation workloads.
DistanceResult(time_ps, distances_nm, atom_pairs)
dataclass
¶
compute_distances(traj, *, atom_pairs, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None)
¶
Compute pairwise distances between atom pairs over time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
atom_pairs
|
ArrayLike
|
Array of shape |
required |
periodic
|
bool
|
Whether to apply periodic boundary conditions. |
True
|
backend
|
DistanceBackend
|
Distance computation backend. |
'mdtraj'
|
timestep_ps
|
float | None
|
Optional frame timestep override in ps. |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
DistanceResult
|
DistanceResult with per-frame distances for each pair. |
compute_minimum_distance(traj, *, group1, group2, periodic=True, backend='mdtraj', timestep_ps=None, dtype=None)
¶
Compute the minimum distance between two atom groups per frame.
All pairwise distances between group1 and group2 atoms are
computed, and the minimum per frame is returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
group1
|
str
|
MDTraj selection string for the first group. |
required |
group2
|
str
|
MDTraj selection string for the second group. |
required |
periodic
|
bool
|
Whether to apply periodic boundary conditions. |
True
|
backend
|
DistanceBackend
|
Distance computation backend. |
'mdtraj'
|
timestep_ps
|
float | None
|
Optional frame timestep override in ps. |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
DistanceResult
|
DistanceResult where |
DistanceResult
|
and |
DSSP¶
mdpp.analysis.dssp
¶
Secondary structure assignment via DSSP.
DSSPResult(assignments, residue_ids, frequency, categories)
dataclass
¶
Per-frame secondary structure assignments.
Attributes:
| Name | Type | Description |
|---|---|---|
assignments |
NDArray[str_]
|
Character array of shape |
residue_ids |
NDArray[int_]
|
Residue sequence IDs corresponding to columns. |
frequency |
NDArray[floating]
|
Array of shape |
categories |
list[str]
|
List of unique category labels matching the last axis
of |
compute_dssp(traj, *, simplified=True, dtype=None)
¶
Compute per-residue secondary structure assignments across frames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
simplified
|
bool
|
If |
True
|
dtype
|
DtypeArg
|
Output float dtype for frequency array. If |
None
|
Returns:
| Type | Description |
|---|---|
DSSPResult
|
DSSPResult with per-frame assignments and per-residue frequencies. |
Decomposition¶
mdpp.analysis.decomposition
¶
Dimensionality reduction and feature engineering helpers.
DistanceFeatures(values, pairs, atom_indices)
dataclass
¶
Pairwise distance features (e.g. CA-CA distances).
TorsionFeatures(values, labels)
dataclass
¶
Backbone torsion features.
PCAResult(projections, components, explained_variance_ratio, feature_mean, feature_scale, model)
dataclass
¶
Principal component analysis outputs.
TICAResult(projections, lagtime, model)
dataclass
¶
Time-lagged independent component analysis outputs.
featurize_backbone_torsions(traj, *, atom_selection='protein', sincos_embedding=True, dtype=None)
¶
Featurize backbone phi/psi torsions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
atom_selection
|
str | None
|
Optional atom selection before featurization. |
'protein'
|
sincos_embedding
|
bool
|
If True (default), return |
True
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
TorsionFeatures
|
TorsionFeatures with values and labels. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no phi/psi torsions are available. |
featurize_ca_distances(traj, *, atom_selection='name CA', backend='mdtraj', periodic=False, dtype=None)
¶
Featurize all pairwise distances between selected atoms.
Computes the N*(N-1)/2 pairwise distances for the selected atoms
at each frame, producing a feature matrix suitable for PCA or TICA.
Five backends are available (see
:func:mdpp.analysis.distance._compute_pairwise_distances):
"mdtraj" (default, PBC-capable, single-threaded), "numba"
(CPU-parallel), and "cupy"/"torch"/"jax" (GPU-accelerated).
Non-mdtraj backends do not support periodic boundary conditions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
atom_selection
|
str
|
MDTraj selection string for the atoms to include.
Defaults to |
'name CA'
|
backend
|
DistanceBackend
|
Distance computation backend. Defaults to |
'mdtraj'
|
periodic
|
bool
|
Whether to apply minimum image convention. Only
effective with |
False
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
DistanceFeatures
|
DistanceFeatures with values, atom pairs, and atom indices. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the selection matches fewer than 2 atoms, or an unknown backend is requested. |
compute_pca(features, *, n_components=2, standardize=True, dtype=None)
¶
Compute PCA projection from feature vectors.
Sklearn PCA (>= 1.8) preserves input dtype (float32 or float64).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
ArrayLike
|
Input feature matrix |
required |
n_components
|
int
|
Number of principal components. |
2
|
standardize
|
bool
|
Whether to z-score features before PCA. |
True
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
PCAResult
|
PCAResult containing projections and explained variance ratio. |
project_pca(features, *, fitted, dtype=None)
¶
Project new features using a previously fitted PCA.
The features are standardized using the mean and scale from the fitted PCA, then transformed using its model. This is the correct way to project a second dataset (e.g. a different system) onto the same principal component axes for direct comparison.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
ArrayLike
|
Input feature matrix |
required |
fitted
|
PCAResult
|
PCAResult from a previous |
required |
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
PCAResult
|
PCAResult with projections onto the fitted PCA axes. The |
PCAResult
|
|
PCAResult
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the feature dimension does not match the fitted PCA. |
compute_tica(features, *, lagtime, n_components=2, dtype=None)
¶
Compute TICA projection from feature vectors.
Deeptime upcasts to float64 internally for covariance estimation, so the input dtype does not affect numerical accuracy. The dtype parameter controls the dtype of the output arrays only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
ArrayLike
|
Input feature matrix |
required |
lagtime
|
int
|
Lag time in frames. |
required |
n_components
|
int
|
Number of independent components. |
2
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
TICAResult
|
TICAResult containing projected coordinates and fitted model. |
Free Energy Surface¶
mdpp.analysis.fes
¶
Free-energy surface computation utilities.
FES2DResult(free_energy_kj_mol, probability_density, x_edges, y_edges, observed_mask, temperature_k)
dataclass
¶
compute_fes_2d(x_values, y_values, *, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None)
¶
Compute a 2D free-energy surface from two collective variables.
np.histogram2d returns float64 probability density regardless of
input dtype, so the log/energy arithmetic naturally runs in float64.
Output arrays are cast to dtype.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_values
|
ArrayLike
|
Samples for CV1. |
required |
y_values
|
ArrayLike
|
Samples for CV2. |
required |
bins
|
BinsType
|
Histogram bin count. |
100
|
value_range
|
RangeType
|
Optional |
None
|
temperature_k
|
float
|
Temperature in Kelvin. |
DEFAULT_TEMPERATURE_K
|
min_probability
|
float
|
Lower bound to avoid |
1e-12
|
mask_unsampled
|
bool
|
If True, unsampled bins are set to |
True
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
FES2DResult
|
FES2DResult with free energy shifted so its minimum is 0. |
compute_fes_from_projection(projection, *, x_index=0, y_index=1, bins=100, value_range=None, temperature_k=DEFAULT_TEMPERATURE_K, min_probability=1e-12, mask_unsampled=True, dtype=None)
¶
Compute a 2D FES from a projection matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
projection
|
ArrayLike
|
Matrix |
required |
x_index
|
int
|
Component index for x-axis. |
0
|
y_index
|
int
|
Component index for y-axis. |
1
|
bins
|
BinsType
|
Histogram bin count. |
100
|
value_range
|
RangeType
|
Optional histogram range. |
None
|
temperature_k
|
float
|
Temperature in Kelvin. |
DEFAULT_TEMPERATURE_K
|
min_probability
|
float
|
Lower bound to avoid |
1e-12
|
mask_unsampled
|
bool
|
If True, unsampled bins are set to |
True
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
FES2DResult
|
FES2DResult computed from selected projection components. |
Clustering¶
mdpp.analysis.clustering
¶
Conformational clustering from RMSD matrices.
Each clustering algorithm is a frozen dataclass configured at construction time and invoked as a callable::
result = Gromos(cutoff_nm=0.2)(rmsd_matrix)
result = DBSCAN(eps=0.15, min_samples=5)(rmsd_matrix)
result = KMeans(n_clusters=10)(pca.projections)
RMSDMatrixResult(rmsd_matrix_nm, atom_indices)
dataclass
¶
Pairwise RMSD matrix between trajectory frames.
rmsd_matrix_angstrom
property
¶
Return the RMSD matrix in Angstrom.
Note
Each access allocates a new (n_frames, n_frames) array.
Cache the result in a local variable if you need it more
than once -- at 120k frames this is ~54 GB per call.
ClusteringResult(labels, n_clusters, medoid_frames)
dataclass
¶
Conformational clustering output.
FeatureClusteringResult(labels, n_clusters, cluster_centers, medoid_frames, inertia)
dataclass
¶
Clustering result from feature-vector-based methods.
Gromos(cutoff_nm=0.15)
dataclass
¶
GROMOS clustering (Daura et al. 1999).
Greedy largest-cluster-first assignment via Numba-JIT kernels. O(n) auxiliary memory -- no copies of the RMSD matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cutoff_nm
|
float
|
Neighbour cutoff in nm. |
0.15
|
Example::
result = Gromos(cutoff_nm=0.2)(rmsd_matrix)
__call__(rmsd_matrix)
¶
Cluster rmsd_matrix and return a :class:ClusteringResult.
Hierarchical(linkage_method='average', distance_threshold=0.15, n_clusters=None)
dataclass
¶
Agglomerative hierarchical clustering (scipy).
Uses distance_threshold by default. Set n_clusters to use a
fixed cluster count instead (overrides distance_threshold).
Note
Scipy builds an O(n^2) float64 condensed distance matrix
internally. At 120k frames this is ~57 GB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
linkage_method
|
str
|
|
'average'
|
distance_threshold
|
float
|
Distance cutoff in nm. |
0.15
|
n_clusters
|
int | None
|
Fixed cluster count (overrides distance_threshold). |
None
|
Example::
result = Hierarchical(linkage_method="average", distance_threshold=0.2)(rmsd_matrix)
__call__(rmsd_matrix)
¶
Cluster rmsd_matrix and return a :class:ClusteringResult.
DBSCAN(eps=0.15, min_samples=5, backend='numba')
dataclass
¶
DBSCAN density-based clustering.
Two backends:
"numba"(default) -- custom Numba-JIT kernel. Reuses the parallel neighbour-count kernel from GROMOS and a sequential BFS for label assignment. O(n) auxiliary memory, no copies."sklearn"-- official scikit-learnDBSCANwithmetric="precomputed".
Noise frames receive label -1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
eps
|
float
|
Neighbourhood radius in nm. |
0.15
|
min_samples
|
int
|
Minimum neighbours (including self) for a core point. |
5
|
backend
|
Literal['numba', 'sklearn']
|
|
'numba'
|
Example::
result = DBSCAN(eps=0.15, min_samples=5)(rmsd_matrix)
result = DBSCAN(eps=0.15, backend="sklearn")(rmsd_matrix)
__call__(rmsd_matrix)
¶
Cluster rmsd_matrix and return a :class:ClusteringResult.
HDBSCAN(min_cluster_size=5, min_samples=5)
dataclass
¶
HDBSCAN hierarchical density-based clustering (sklearn >= 1.3).
Noise frames receive label -1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_cluster_size
|
int
|
Minimum number of frames in a cluster. |
5
|
min_samples
|
int
|
Number of neighbours for core-point estimation. |
5
|
Example::
result = HDBSCAN(min_cluster_size=50, min_samples=5)(rmsd_matrix)
__call__(rmsd_matrix)
¶
Cluster rmsd_matrix and return a :class:ClusteringResult.
KMeans(n_clusters=10, random_state=42, dtype=None)
dataclass
¶
K-Means clustering (scikit-learn).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_clusters
|
int
|
Number of clusters. |
10
|
random_state
|
int | None
|
Seed for centroid initialisation. Defaults to 42 for
reproducible runs across sessions. Pass |
42
|
dtype
|
DtypeArg
|
Output float dtype for cluster_centers. |
None
|
Example::
result = KMeans(n_clusters=10)(pca.projections)
result = KMeans(n_clusters=10, random_state=None)(pca.projections)
__call__(features)
¶
Cluster features and return a :class:FeatureClusteringResult.
MiniBatchKMeans(n_clusters=10, batch_size=1024, random_state=42, dtype=None)
dataclass
¶
Mini-Batch K-Means clustering (scikit-learn).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_clusters
|
int
|
Number of clusters. |
10
|
batch_size
|
int
|
Mini-batch size. |
1024
|
random_state
|
int | None
|
Seed for centroid initialisation and mini-batch
sampling. Defaults to 42 for reproducible runs across
sessions. Pass |
42
|
dtype
|
DtypeArg
|
Output float dtype for cluster_centers. |
None
|
Example::
result = MiniBatchKMeans(n_clusters=10, batch_size=1024)(pca.projections)
result = MiniBatchKMeans(n_clusters=10, random_state=None)(pca.projections)
__call__(features)
¶
Cluster features and return a :class:FeatureClusteringResult.
RegularSpace(dmin=0.5, dtype=None)
dataclass
¶
Regular-space clustering (deeptime).
The number of clusters is determined by dmin, not specified
upfront.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dmin
|
float
|
Minimum distance between cluster centres. |
0.5
|
dtype
|
DtypeArg
|
Output float dtype for cluster_centers. |
None
|
Example::
result = RegularSpace(dmin=0.5)(pca.projections)
__call__(features)
¶
Cluster features and return a :class:FeatureClusteringResult.
compute_rmsd_matrix(traj, *, atom_selection='backbone', backend='mdtraj', dtype=None)
¶
Compute an all-vs-all RMSD matrix between trajectory frames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
atom_selection
|
str
|
Atoms used for RMSD calculation. |
'backbone'
|
backend
|
RMSDBackend
|
Computation backend. Defaults to
|
'mdtraj'
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
RMSDMatrixResult
|
RMSDMatrixResult with a symmetric |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an unsupported backend is specified. |
ImportError
|
If the requested backend package is not installed. |
Memory note
Every backend returns its native float32 output matrix
(the numba kernel uses float64 accumulators internally but
stores float32 in the result buffer; GPU kernels compute in
float32 end-to-end). This wrapper casts with copy=False
so when the resolved dtype is float32 (the package default)
there is no second copy of the (n_frames, n_frames)
matrix. For a 120k-frame trajectory this saves ~115 GB of
peak RAM versus the old "cast to float64 for the Protocol
contract, then cast back" path. Passing dtype=np.float64
still forces a one-time upcast.