Core API Reference¶
Trajectory¶
mdpp.core.trajectory
¶
Trajectory loading and selection helpers based on MDTraj.
select_atom_indices(topology, selection)
¶
Return atom indices selected by an MDTraj DSL selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topology
|
Topology
|
Trajectory topology. |
required |
selection
|
str
|
MDTraj selection string (for example, |
required |
Returns:
| Type | Description |
|---|---|
NDArray[int_]
|
Atom indices matching the selection. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the selection matches no atoms. |
residue_ids_from_indices(topology, atom_indices)
¶
Map atom indices to residue sequence IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topology
|
Topology
|
Trajectory topology. |
required |
atom_indices
|
NDArray[int_]
|
Atom indices to map. |
required |
Returns:
| Type | Description |
|---|---|
NDArray[int_]
|
Residue IDs for each atom index. |
trajectory_time_ps(traj, *, timestep_ps=None, dtype=None)
¶
Return per-frame time values in picoseconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
timestep_ps
|
float | None
|
Optional fixed timestep to enforce. If provided, generated
time values are |
None
|
dtype
|
DtypeArg
|
Output float dtype. If |
None
|
Returns:
| Type | Description |
|---|---|
NDArray[floating]
|
Time array in picoseconds. |
load_trajectory(trajectory_path, *, topology_path=None, start=0, stop=None, stride=1, atom_selection=None)
¶
Load a single trajectory with optional frame and atom selection.
Frame selection follows Python's range(start, stop, stride)
convention: start is included, stop is excluded, and stride
controls the step size. All three refer to raw frame indices in
the trajectory file.
When atom_selection is provided, the selected atom indices are passed directly to the underlying mdtraj reader so that only those atoms are read from disk. This avoids loading the full-atom trajectory into memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trajectory_path
|
PathLike
|
Path to trajectory file (for example, |
required |
topology_path
|
PathLike | None
|
Optional topology path (for example, |
None
|
start
|
int
|
First raw frame index to load (inclusive). Default is 0. |
0
|
stop
|
int | None
|
Raw frame index at which to stop loading (exclusive). If
|
None
|
stride
|
int
|
Frame stride (step size). Default is 1. |
1
|
atom_selection
|
str | None
|
Optional MDTraj atom selection string. Matching atoms are loaded directly from disk (no post-load slicing). |
None
|
Returns:
| Type | Description |
|---|---|
Trajectory
|
Loaded trajectory containing only the selected frames and atoms. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
load_trajectories(trajectory_paths, *, topology_paths=None, start=0, stop=None, stride=1, atom_selection=None, max_workers=None)
¶
Load a list of trajectories with a shared interface.
Frame selection follows Python's range(start, stop, stride)
convention: start is included, stop is excluded, and stride
controls the step size. All three refer to raw frame indices.
When max_workers is set, trajectories are loaded in parallel using
:class:multiprocessing.Pool (process-based parallelism).
Why processes instead of threads
mdtraj's C-level XTC/TRR parsers hold the GIL during frame decoding, so threads cannot run concurrently on the CPU-bound parsing step. Benchmarks on 6 replicas (stride=10, 1000 frames each, ~5000 atoms) show:
============ ====== ========= =========== Method Time Speedup RSS delta ============ ====== ========= =========== Sequential 9.7 s 1.0x +16.8 MB Threads (6) 4.5 s 2.2x +7.7 MB mp.Pool (6) 0.9 s 11.2x +0.0 MB ============ ====== ========= ===========
Processes win on both speed and memory. Worker processes allocate trajectory data in their own address space; when the pool closes that memory is fully released to the OS, leaving zero RSS growth in the parent. Threads allocate within the parent and rely on Python's allocator to (possibly) return pages.
Why multiprocessing.Pool instead of ProcessPoolExecutor:
Both perform identically in benchmarks for this workload. Pool
is chosen for its simpler API (map returns results directly)
and maxtasksperchild support, which can guard against memory
leaks from large trajectory allocations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trajectory_paths
|
Sequence[PathLike]
|
Trajectory paths. |
required |
topology_paths
|
Sequence[PathLike | None] | None
|
Optional topology paths. If provided, must match
|
None
|
start
|
int
|
First raw frame index to load (inclusive). Default is 0. |
0
|
stop
|
int | None
|
Raw frame index at which to stop (exclusive). If |
None
|
stride
|
int
|
Frame stride (step size). Default is 1. |
1
|
atom_selection
|
str | None
|
Optional atom selection for slicing. |
None
|
max_workers
|
int | None
|
If set, load trajectories in parallel using processes.
The value controls the maximum number of concurrent worker
processes. If |
None
|
Returns:
| Type | Description |
|---|---|
list[Trajectory]
|
Loaded trajectories in the same order as |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
align_trajectory(traj, *, atom_selection='name CA', reference_frame=0, inplace=False)
¶
Align a trajectory to a reference frame.
md.Trajectory.superpose modifies coordinates in place. When
inplace=False (the default), only the xyz array is copied;
topology and time are shared with the original trajectory. This
avoids the expensive deepcopy(topology) that traj[:] performs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
traj
|
Trajectory
|
Input trajectory. |
required |
atom_selection
|
str
|
Atoms used for alignment. |
'name CA'
|
reference_frame
|
int
|
Reference frame index. |
0
|
inplace
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
Trajectory
|
The aligned trajectory. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Parsers¶
mdpp.core.parsers
¶
Thin wrappers around external parsers for MD engine output files.
read_xvg(path, *, dtype=None)
¶
Read a GROMACS XVG file into a DataFrame.
Parses metadata lines (lines starting with @) to extract column labels
from legend entries. Data lines are read with NumPy for performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
StrPath
|
Path to a |
required |
dtype
|
DtypeArg
|
Float dtype for the data. If |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame whose first column is typically time and remaining columns |
DataFrame
|
are labeled from the XVG legend entries (or |
DataFrame
|
etc. when legends are absent). |
read_edr(path)
¶
Read a GROMACS EDR energy file into a DataFrame.
Uses panedr internally. Install it with pip install panedr.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
StrPath
|
Path to a |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with a |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |