Skip to content

Preparation API Reference

Protein

mdpp.prep.protein

Protein structure preparation and manipulation utilities.

PropkaResidue(residue_type, res_num, chain_id, pka, model_pka) dataclass

PROPKA pKa prediction for a single titratable residue.

Attributes:

Name Type Description
residue_type str

Group label (e.g. ASP, HIS, N+, C-).

res_num int

Residue sequence number.

chain_id str

PDB chain identifier.

pka float

PROPKA-predicted pKa value.

model_pka float

Reference model pKa value.

label property

Formatted residue label matching PROPKA output style.

is_protonated_at(pH)

Whether PROPKA predicts the residue to be protonated at the given pH.

is_default_protonated_at(pH)

Whether the model pKa predicts the residue to be protonated at the given pH.

PropkaResult(residues) dataclass

PROPKA pKa prediction results for all titratable residues.

Attributes:

Name Type Description
residues tuple[PropkaResidue, ...]

pKa predictions for each titratable residue.

get_nonstandard(pH)

Return residues where PROPKA and model pKa disagree on protonation state.

A residue is "non-standard" when pKa > pH and model_pKa <= pH (or vice versa), meaning PDBFixer would assign a different protonation state than what PROPKA predicts.

Parameters:

Name Type Description Default
pH float

pH value for protonation state comparison.

required

Returns:

Type Description
tuple[PropkaResidue, ...]

Residues with non-standard predicted protonation.

ChainSelect(chain_ids)

Bases: Select

Biopython Select subclass that accepts only specified PDB chains.

Parameters:

Name Type Description Default
chain_ids str | list[str]

One or more chain identifiers to keep.

required

Example::

from Bio.PDB import PDBIO, PDBParser
from mdpp.prep import ChainSelect

parser = PDBParser(QUIET=True)
structure = parser.get_structure("complex", "complex.pdb")
io = PDBIO()
io.set_structure(structure)
io.save("protein.pdb", ChainSelect("A"))

Initialize the ChainSelect object.

Parameters:

Name Type Description Default
chain_ids str | list[str]

The chain IDs to keep.

required

accept_chain(chain)

Return 1 if the chain should be kept, 0 otherwise.

run_propka(pdb_path)

Run PROPKA to predict pKa values for titratable protein residues.

Parameters:

Name Type Description Default
pdb_path StrPath

Path to the input PDB file.

required

Returns:

Type Description
PropkaResult

pKa predictions for all titratable residues found.

fix_pdb(pdb_path, fixed_pdb_path, pH=7.0, *, protonation='model')

Fix a PDB file by adding missing residues, atoms, and hydrogens.

Removes heterogens (excluding water by default), identifies missing residues and atoms, then adds them back along with hydrogens at the specified pH.

Runs PROPKA to check for residues whose environment-shifted pKa predicts a different protonation state than the model-pKa default used by PDBFixer, and logs a warning for each such residue.

Parameters:

Name Type Description Default
pdb_path StrPath

Path to the input PDB file.

required
fixed_pdb_path StrPath

Path where the fixed PDB will be written.

required
pH float

pH value for hydrogen placement.

7.0
protonation Literal['model', 'propka']

Protonation policy. "model" (default) uses PDBFixer's built-in model pKa values. "propka" keeps the model default for most residues but overrides the residues where PROPKA disagrees (PropkaResult.get_nonstandard) with PROPKA's predicted state, applied via OpenMM Modeller variants. Supported overrides are ASP/GLU/LYS/HIS/CYS (a neutral histidine uses the HIE tautomer); unsupported residue types (e.g. termini) keep the default and are logged.

'model'

strip_solvent(traj, *, keep_ions=False)

Remove solvent molecules from a trajectory.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
keep_ions bool

If True, retain common ions (Na+, Cl-, K+, etc.) while still removing water.

False

Returns:

Type Description
Trajectory

A new trajectory with solvent removed.

extract_chain(traj, chain_id)

Extract a single chain from a trajectory.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
chain_id int

Zero-based chain index to extract.

required

Returns:

Type Description
Trajectory

A new trajectory containing only the specified chain.

Raises:

Type Description
ValueError

If chain_id is out of range.

Ligand

mdpp.prep.ligand

Ligand parameterization and topology assignment utilities.

assign_topology(mol, template_mol)

Assign bond orders and hydrogens from a template molecule to a ligand.

Uses the template (typically from SMILES) without hydrogens to assign bond orders to the ligand's heavy-atom coordinates, then adds hydrogens with 3D coordinates.

Parameters:

Name Type Description Default
mol Mol

The ligand molecule (usually from a PDB/MOL2 with no bond orders).

required
template_mol Mol

The reference molecule with correct bond orders.

required

Returns:

Type Description
Mol

A new molecule with assigned bond orders and added hydrogens.

constraint_minimization(mol, *, max_iters=5000)

Minimize hydrogen positions while keeping heavy atoms fixed.

Uses the Universal Force Field (UFF) with fixed-point constraints on all non-hydrogen atoms.

Parameters:

Name Type Description Default
mol Mol

Input molecule with 3D coordinates (conformer 0).

required
max_iters int

Maximum number of minimization iterations.

5000

Returns:

Type Description
Mol

The molecule with optimized hydrogen positions.

Topology

mdpp.prep.topology

Trajectory manipulation utilities for system preparation.

merge_trajectories(trajectories)

Concatenate multiple trajectories along the time axis.

All trajectories must share the same topology and number of atoms.

Parameters:

Name Type Description Default
trajectories Sequence[Trajectory]

Sequence of trajectories to concatenate.

required

Returns:

Type Description
Trajectory

A single trajectory containing all frames in order.

Raises:

Type Description
ValueError

If fewer than two trajectories are provided or topologies do not match.

slice_trajectory(traj, *, start=None, stop=None, stride=None)

Slice a trajectory by frame range with validation.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
start int | None

Starting frame index (inclusive). Defaults to 0.

None
stop int | None

Stopping frame index (exclusive). Defaults to n_frames.

None
stride int | None

Frame stride. Defaults to 1.

None

Returns:

Type Description
Trajectory

A new trajectory with the selected frames.

subsample_trajectory(traj, n_frames)

Evenly subsample a trajectory to a target number of frames.

Parameters:

Name Type Description Default
traj Trajectory

Input trajectory.

required
n_frames int

Desired number of output frames.

required

Returns:

Type Description
Trajectory

A new trajectory with approximately n_frames evenly spaced frames.

Raises:

Type Description
ValueError

If n_frames is less than 1 or exceeds the trajectory length.

APBS

mdpp.prep.apbs

APBS Poisson-Boltzmann input generation and log parsing.

Helpers for driving APBS (Adaptive Poisson-Boltzmann Solver) from Python.

Two pure functions are exposed:

  • :func:write_apbs_input generates a multigrid APBS .in file from an existing .pqr by deriving the grid bounding box from the radius-inflated atom coordinates and rounding dime up to the nearest c * 2**n + 1 value required by APBS multigrid.
  • :func:infer_debye_length parses a Debye length (in Angstrom) out of an APBS log; used to bootstrap downstream BrownDye input from the same APBS run.

Both functions are pure Python with no subprocess calls; they only read/write text files. Run APBS itself by calling the apbs CLI separately.

write_apbs_input(stem, work_dir, *, ionic_strength_m=DEFAULT_IONIC_STRENGTH_M, solute_dielectric=DEFAULT_SOLUTE_DIELECTRIC, solvent_dielectric=DEFAULT_SOLVENT_DIELECTRIC, solvent_radius_a=DEFAULT_SOLVENT_RADIUS_A, temperature_k=DEFAULT_TEMPERATURE_K, fine_spacing_a=DEFAULT_FINE_SPACING_A, fine_padding_a=DEFAULT_FINE_PADDING_A, coarse_padding_a=DEFAULT_COARSE_PADDING_A)

Write an APBS multigrid input file {stem}.in for {stem}.pqr.

Physics defaults mirror pdb2pqr --apbs-input canonical defaults (lpbe / bcfl sdh / srfm smol / chgm spl2 / pdie 2.0 / sdie 78.54 / srad 1.4 / sdens 10 / swin 0.30 / temp 298.15) with two intentional overrides:

  • Explicit Na+/Cl- ion lines at ionic_strength_m with Pauling radii (1.875 / 1.815 A). The pdb2pqr canonical input omits ions, which sets the Debye length to infinity. Explicit ions are required when the resulting .dx feeds a BrownDye2 simulation that needs a finite Debye length for far-field electrostatics.
  • Larger grid padding (fine_padding_a / coarse_padding_a) than pdb2pqr's fadd=20 / cfac=1.7 defaults so the outer grid comfortably exceeds the BrownDye b-radius. dime is rounded up to the nearest c * 2**n + 1 value required by APBS multigrid.

Parameters:

Name Type Description Default
stem str

PQR file stem (without extension). work_dir / "{stem}.pqr" must already exist; the input file is written next to it at work_dir / "{stem}.in".

required
work_dir StrPath

directory containing {stem}.pqr.

required
ionic_strength_m float

ion concentration in mol/L (default 0.150 M).

DEFAULT_IONIC_STRENGTH_M
solute_dielectric float

interior (solute) dielectric constant.

DEFAULT_SOLUTE_DIELECTRIC
solvent_dielectric float

bulk solvent dielectric constant.

DEFAULT_SOLVENT_DIELECTRIC
solvent_radius_a float

probe solvent radius in Angstrom.

DEFAULT_SOLVENT_RADIUS_A
temperature_k float

simulation temperature in Kelvin.

DEFAULT_TEMPERATURE_K
fine_spacing_a float

target fine grid spacing in Angstrom.

DEFAULT_FINE_SPACING_A
fine_padding_a float

fine grid padding added to the radius-inflated atom bounding box (per axis).

DEFAULT_FINE_PADDING_A
coarse_padding_a float

coarse grid padding added to the radius-inflated atom bounding box (per axis); the coarse grid never shrinks below the fine grid.

DEFAULT_COARSE_PADDING_A

Returns:

Type Description
Path

Path to the written {stem}.in file.

Raises:

Type Description
ValueError

if {stem}.pqr has no ATOM/HETATM records.

infer_debye_length(*apbs_logs)

Return the first Debye length (Angstrom) parsed from any APBS log.

Scans logs in argument order; returns as soon as a Debye length is found in any one of them. Missing log files are skipped silently.

Parameters:

Name Type Description Default
*apbs_logs StrPath

paths to one or more APBS log files.

()

Returns:

Type Description
float

Debye length in Angstrom.

Raises:

Type Description
RuntimeError

if no log contains a recognisable Debye length entry.

BrownDye2

mdpp.prep.browndye

BrownDye2 input.xml and contact_types.xml generation.

Helpers for building the XML inputs consumed by BrownDye2's bd_top:

  • :func:write_contact_types writes contact_types.xml with one entry per unique (atom_name, residue_name) heavy-atom pair per body.
  • :func:build_input_xml and :func:write_input_xml produce the top-level input.xml from two :class:BrownDyeBody descriptors and a shared :class:BrownDyeSolvent configuration.

All helpers are pure Python: no subprocess calls, no XML schema validation. Run BrownDye's own tools (pqr2xml, make_rxn_pairs, make_rxn_file, bd_top, nam_simulation) separately.

The Debye length feeding :class:BrownDyeSolvent is typically obtained from an APBS run via :func:mdpp.prep.apbs.infer_debye_length.

BrownDyeBody(name, atoms_xml, grid_dx, is_protein=True, dielectric=DEFAULT_BODY_DIELECTRIC, all_in_surface=False) dataclass

Configuration for one BrownDye core/body block in input.xml.

Attributes:

Name Type Description
name str

BrownDye body name. Also used as the <core><name> tag.

atoms_xml str

Relative path to the atoms.xml produced by pqr2xml, as it should appear inside input.xml (typically just "{name}_atoms.xml" when running BrownDye from the same directory).

grid_dx str

Relative path to the APBS .dx grid for this body.

is_protein bool

Maps to the <is_protein> tag (lowercase true/false in the serialised XML).

dielectric float

Interior dielectric for this body.

all_in_surface bool

Maps to the <all_in_surface> tag.

BrownDyeSolvent(debye_length_a, dielectric=DEFAULT_BD_SOLVENT_DIELECTRIC, relative_viscosity=DEFAULT_RELATIVE_VISCOSITY, kT=DEFAULT_KT, desolvation_parameter=DEFAULT_DESOLVATION_PARAMETER, solvent_radius_a=DEFAULT_SOLVENT_RADIUS_A) dataclass

Solvent block parameters shared by all bodies in a BrownDye system.

BrownDye uses kT-units internally, so :attr:dielectric is the BrownDye solvent dielectric (typically 78.0) and may differ from the APBS sdie value used to compute the electrostatic grid.

Attributes:

Name Type Description
debye_length_a float

Debye length in Angstrom (usually obtained from the APBS log via :func:mdpp.prep.apbs.infer_debye_length).

dielectric float

BrownDye solvent dielectric (kT-units).

relative_viscosity float

Relative solvent viscosity.

kT float

Thermal energy unit (BrownDye uses kT = 1).

desolvation_parameter float

BrownDye desolvation scale factor.

solvent_radius_a float

Probe solvent radius in Angstrom.

write_contact_types(mol0_pqr, mol1_pqr, out_path)

Write a BrownDye contact_types.xml from two PQR files.

Lists every unique heavy-atom (atom_name, residue_name) per body. The output is consumed by make_rxn_pairs to enumerate candidate contact pairs between the two bodies.

Parameters:

Name Type Description Default
mol0_pqr StrPath

PQR file for the first body (writes <molecule0> block).

required
mol1_pqr StrPath

PQR file for the second body (writes <molecule1> block).

required
out_path StrPath

Destination contact_types.xml path.

required

Returns:

Type Description
Path

out_path as a :class:Path, for chaining.

build_input_xml(body0, body1, *, solvent, reaction_file=DEFAULT_REACTION_FILE, n_threads=DEFAULT_N_THREADS, seed=DEFAULT_SEED, n_trajectories=DEFAULT_N_TRAJECTORIES, n_trajectories_per_output=DEFAULT_N_TRAJECTORIES_PER_OUTPUT, max_n_steps=DEFAULT_MAX_N_STEPS, n_steps_per_output=DEFAULT_N_STEPS_PER_OUTPUT, results_file=DEFAULT_RESULTS_FILE, trajectory_file=DEFAULT_TRAJECTORY_FILE)

Build the BrownDye top-level input.xml as a string.

The minimum core dt tolerances are hardcoded to 0.0 (BrownDye's own defaults); the time step is determined dynamically. Override after the fact if you need non-default tolerances.

Parameters:

Name Type Description Default
body0 BrownDyeBody

First body descriptor.

required
body1 BrownDyeBody

Second body descriptor.

required
solvent BrownDyeSolvent

Shared solvent parameters (including Debye length).

required
reaction_file str

Filename of the BrownDye reaction definition XML.

DEFAULT_REACTION_FILE
n_threads int

Number of BrownDye worker threads.

DEFAULT_N_THREADS
seed int

Random seed for trajectory propagation.

DEFAULT_SEED
n_trajectories int

Total number of trajectories to launch.

DEFAULT_N_TRAJECTORIES
n_trajectories_per_output int

Trajectories per results.xml flush.

DEFAULT_N_TRAJECTORIES_PER_OUTPUT
max_n_steps int

Maximum BrownDye steps per trajectory.

DEFAULT_MAX_N_STEPS
n_steps_per_output int

Stride between trajectory frames written to trajectory{N}.xml. Set to 1 to record every step.

DEFAULT_N_STEPS_PER_OUTPUT
results_file str

Filename for cumulative results.

DEFAULT_RESULTS_FILE
trajectory_file str

Base name for per-thread trajectory XML dumps (BrownDye writes {trajectory_file}{thread}.xml plus a matching .index.xml).

DEFAULT_TRAJECTORY_FILE

Returns:

Type Description
str

The full input.xml content as a UTF-8 string.

write_input_xml(out_path, body0, body1, *, solvent, reaction_file=DEFAULT_REACTION_FILE, n_threads=DEFAULT_N_THREADS, seed=DEFAULT_SEED, n_trajectories=DEFAULT_N_TRAJECTORIES, n_trajectories_per_output=DEFAULT_N_TRAJECTORIES_PER_OUTPUT, max_n_steps=DEFAULT_MAX_N_STEPS, n_steps_per_output=DEFAULT_N_STEPS_PER_OUTPUT, results_file=DEFAULT_RESULTS_FILE, trajectory_file=DEFAULT_TRAJECTORY_FILE)

Write the BrownDye top-level input.xml to out_path.

Thin filesystem wrapper around :func:build_input_xml; see that function for parameter semantics.

Returns:

Type Description
Path

out_path as a :class:Path, for chaining.