Skip to content

Core API Reference

Trajectory

mdpp.core.trajectory

Trajectory loading and selection helpers based on MDTraj.

select_atom_indices(topology, selection)

Return atom indices selected by an MDTraj DSL selection.

Parameters:

Name Type Description Default
topology Topology

Trajectory topology.

required
selection str

MDTraj selection string (for example, "name CA").

required

Returns:

Type Description
NDArray[int_]

Atom indices matching the selection.

Raises:

Type Description
ValueError

If the selection matches no atoms.

residue_ids_from_indices(topology, atom_indices)

Map atom indices to residue sequence IDs.

Parameters:

Name Type Description Default
topology Topology

Trajectory topology.

required
atom_indices NDArray[int_]

Atom indices to map.

required

Returns:

Type Description
NDArray[int_]

Residue IDs for each atom index.

trajectory_time_ps(traj, *, timestep_ps=None, dtype=None)

Return per-frame time values in picoseconds.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
timestep_ps float | None

Optional fixed timestep to enforce. If provided, generated time values are np.arange(n_frames) * timestep_ps.

None
dtype DtypeArg

Output float dtype. If None, uses the package default (see :func:mdpp.set_default_dtype).

None

Returns:

Type Description
NDArray[floating]

Time array in picoseconds.

load_trajectory(trajectory_path, *, topology_path=None, start=0, stop=None, stride=1, atom_selection=None)

Load a single trajectory with optional frame and atom selection.

Frame selection follows Python's range(start, stop, stride) convention: start is included, stop is excluded, and stride controls the step size. All three refer to raw frame indices in the trajectory file.

When atom_selection is provided, the selected atom indices are passed directly to the underlying mdtraj reader so that only those atoms are read from disk. This avoids loading the full-atom trajectory into memory.

Parameters:

Name Type Description Default
trajectory_path PathLike

Path to trajectory file (for example, .xtc).

required
topology_path PathLike | None

Optional topology path (for example, .pdb).

None
start int

First raw frame index to load (inclusive). Default is 0.

0
stop int | None

Raw frame index at which to stop loading (exclusive). If None, read to the end of the file.

None
stride int

Frame stride (step size). Default is 1.

1
atom_selection str | None

Optional MDTraj atom selection string. Matching atoms are loaded directly from disk (no post-load slicing).

None

Returns:

Type Description
Trajectory

Loaded trajectory containing only the selected frames and atoms.

Raises:

Type Description
ValueError

If stride is less than 1, start is negative, or stop is not greater than start.

load_trajectories(trajectory_paths, *, topology_paths=None, start=0, stop=None, stride=1, atom_selection=None, max_workers=None)

Load a list of trajectories with a shared interface.

Frame selection follows Python's range(start, stop, stride) convention: start is included, stop is excluded, and stride controls the step size. All three refer to raw frame indices.

When max_workers is set, trajectories are loaded in parallel using :class:multiprocessing.Pool (process-based parallelism).

Why processes instead of threads

mdtraj's C-level XTC/TRR parsers hold the GIL during frame decoding, so threads cannot run concurrently on the CPU-bound parsing step. Benchmarks on 6 replicas (stride=10, 1000 frames each, ~5000 atoms) show:

============ ====== ========= =========== Method Time Speedup RSS delta ============ ====== ========= =========== Sequential 9.7 s 1.0x +16.8 MB Threads (6) 4.5 s 2.2x +7.7 MB mp.Pool (6) 0.9 s 11.2x +0.0 MB ============ ====== ========= ===========

Processes win on both speed and memory. Worker processes allocate trajectory data in their own address space; when the pool closes that memory is fully released to the OS, leaving zero RSS growth in the parent. Threads allocate within the parent and rely on Python's allocator to (possibly) return pages.

Why multiprocessing.Pool instead of ProcessPoolExecutor: Both perform identically in benchmarks for this workload. Pool is chosen for its simpler API (map returns results directly) and maxtasksperchild support, which can guard against memory leaks from large trajectory allocations.

Parameters:

Name Type Description Default
trajectory_paths Sequence[PathLike]

Trajectory paths.

required
topology_paths Sequence[PathLike | None] | None

Optional topology paths. If provided, must match trajectory_paths length.

None
start int

First raw frame index to load (inclusive). Default is 0.

0
stop int | None

Raw frame index at which to stop (exclusive). If None, read to the end of each file.

None
stride int

Frame stride (step size). Default is 1.

1
atom_selection str | None

Optional atom selection for slicing.

None
max_workers int | None

If set, load trajectories in parallel using processes. The value controls the maximum number of concurrent worker processes. If None, trajectories are loaded sequentially.

None

Returns:

Type Description
list[Trajectory]

Loaded trajectories in the same order as trajectory_paths.

Raises:

Type Description
ValueError

If topology_paths length does not match trajectories.

align_trajectory(traj, *, atom_selection='name CA', reference_frame=0, inplace=False)

Align a trajectory to a reference frame.

md.Trajectory.superpose modifies coordinates in place. When inplace=False (the default), only the xyz array is copied; topology and time are shared with the original trajectory. This avoids the expensive deepcopy(topology) that traj[:] performs.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
atom_selection str

Atoms used for alignment.

'name CA'
reference_frame int

Reference frame index.

0
inplace bool

If True, align traj in place and return it. If False (default), return a new trajectory that shares topology and time but has its own aligned coordinates.

False

Returns:

Type Description
Trajectory

The aligned trajectory.

Raises:

Type Description
ValueError

If reference_frame is out of range.

Parsers

mdpp.core.parsers

Thin wrappers around external parsers for MD engine output files.

read_xvg(path, *, dtype=None)

Read a GROMACS XVG file into a DataFrame.

Parses metadata lines (lines starting with @) to extract column labels from legend entries. Data lines are read with NumPy for performance.

Parameters:

Name Type Description Default
path StrPath

Path to a .xvg file.

required
dtype DtypeArg

Float dtype for the data. If None, uses the package default.

None

Returns:

Type Description
DataFrame

DataFrame whose first column is typically time and remaining columns

DataFrame

are labeled from the XVG legend entries (or "col_0", "col_1",

DataFrame

etc. when legends are absent).

read_edr(path)

Read a GROMACS EDR energy file into a DataFrame.

Uses panedr internally. Install it with pip install panedr.

Parameters:

Name Type Description Default
path StrPath

Path to a .edr file.

required

Returns:

Type Description
DataFrame

DataFrame with a Time column and one column per energy term.

Raises:

Type Description
ImportError

If panedr is not installed.