Working with molecules¶
Introduction¶
Molecule objects¶
The class molmod.molecules.Molecule
is a fundamental part of the MolMod
package. Instances of this class hold all information to describe the geometry
and the topology of a molecular structure. In practice the following attributes
can be stored in a Molecule object:
numbers
– An array of atomic numbers with N elements. This is the only mandatory attribute. All other attributes are optional.coordinates
– An array with atomic coordinates (in Bohr) with shape (N,3).title
– A short description of the molecule.masses
– An array with atomic masses (in \(m_e\))graph
– AMolecularGraph
instance describing the connectivity of the atoms.symbols
– A tuple with atomic symbols. These may be the names of the elements or some force-field atom types.unit_cell
– AUnitCell
instance describing the periodic boundary conditions.
One can create Molecule instances based on several conventional file formats as follows:
mol = Molecule.from_file("foo.xyz")
The extension of the file is used to detect the file format. The following
formats are currently supported: *.cml
, *.fchk
, *.pdb
, *.sdf
,
*.xyz
. It is relatively trivial to implement new file formats, but there is
no real urgent need for that because the babel program can convert most files
into one of these formats.
A Molecule object can be written to file as follows:
mol.write_to_file("bar.xyz")
The currently supported formats for writing files are: *.xyz
, *.cml
Derived quantities¶
Some derived quantities are accessible as attributes. The contents of these attributes are only computed when they are accessed for the first time. Once a derived quantity is computed it is stored internally (in the molecule object) in case it would be requested a second time. This mechanism is called caching. It imposes good scripting practices when using the MolMod package and also improves the efficiency of the scripts. These are a few convenient derived quantities:
chemical_formula
– A string containing the chemical formula.com
– The center of mass of the molecule.distance_matrix
– A square matrix with all inter-atomic distances in a molecule (ignoring periodic boundary conditions for the moment).inertia_tensor
– The inertia tensor of the molecule computed with the center of mass as the origin.mass
– The total mass of the system.size
– The number of atoms.
An up-to-date list can be found in the reference documentation:
molmod.molecules.Molecule
. Most derived quantities depend on optional
attributes. If these optional attributes are not present, an error is raised
when a derived quantity is requested.
Modifying molecule objects¶
The caching mechanism has one important side-effect. When the underlying data
of the derived quantities change, the cached results are no longer correct.
This problem is called cache inconsistency. There are typically two ways to
impose cache consistency: (i) all underlying quantities are kept constant, or
(ii) cached results are deleted when the underlying data change. The first
mechanism is used throughout the MolMod package. When you need to modify a
Molecule
object, just create a new one with modified attributes. This is
facilitated by the copy_with
method:
mol1 = Molecule.from_file("foo.xyz")
mol2 = mol1.copy_with(title="bar")
Any of the above attributes can be modified through copy_with
. Cached
derived quantities are never discarded with this approach.
The concept of derived quantities and read-only attributes is extensively used in the following classes:
Molecule
, seemolmod.molecules.Molecule
.Graph
, seemolmod.graphs.Graph
.MolecularGraph
, seemolmod.molecular_graphs.MolecularGraph
.UnitCell
, seemolmod.unit_cells.UnitCell
.Translation
, seemolmod.transformations.Translation
.Rotation
, seemolmod.transformations.Rotation
.Complete
, seemolmod.transformations.Complete
.
Examples¶
Conversion of files¶
Although the conversion of files with molecular systems from one format to the other is easily done with the babel program, the example is still useful for didactic purposes.
File: molmod/examples/001_molecules/a_convert.py
#!/usr/bin/env python
from molmod import *
mol = Molecule.from_file("ibuprofen.sdf")
mol.write_to_file("ibuprofen.xyz")
Center of mass to origin¶
This example shows how a modified molecule object is made.
File: molmod/examples/001_molecules/b_com.py
#!/usr/bin/env python
from molmod import *
# 0) Load the molecule from file
mol1 = Molecule.from_file("ibuprofen.sdf")
# 1) The atomic masses are not included in the sdf file. We need them below to
# compute the center of mass, so we assign standard masses.
mol1.set_default_masses()
print(mol1.masses/amu)
# 2) Create modified coordinates. The attribute coordinates is an N by 3 array,
# while the com attribute is an array with three components. The following
# subtraction is performed elementwise at the level of the rows. Each row
# of the coordinates array corresponds to one atom. From each atomic coordinate
# the center of mass is subtracted to construct a new coordinates array.
new_coordinates = mol1.coordinates - mol1.com
# 3) Make a copy of mol1 with updated coordinates and write it to a file.
mol2 = mol1.copy_with(coordinates=new_coordinates)
mol2.write_to_file("ibuprofen_com.xyz")
print("Written file ibuprofen_com.xyz")
All carbons¶
The following program prints the X-, Y- and Z-coordinates of all Carbon atoms in the ibuprofen molecule.
File: molmod/examples/001_molecules/c_carbon.py
#!/usr/bin/env python
from molmod import *
# 0) Load the molecule
mol = Molecule.from_file("ibuprofen.sdf")
# 1) Print out all carbon positions
for i in range(mol.size):
if mol.numbers[i] == 6:
print(mol.coordinates[i]/angstrom)
Size of a molecule¶
One can define the size of a molecule in several ways. The following program prints out the largest inter-atomic distance in Ångström.
File: molmod/examples/001_molecules/d_size.py
#!/usr/bin/env python
from molmod import *
# 0) Load the molecule.
mol = Molecule.from_file("ibuprofen.sdf")
# 1) Print the largest element in the distance matrix.
print("Largest interatomic distance [A]:")
print(mol.distance_matrix.max()/angstrom)
# Some comments:
# - One can just write code that assumes the attribute distance_matrix is
# present, but in fact it is only computed once the first time it is used.
# - The method max is a method of the numpy array. It returns the largest value
# in an array.
Shape of a molecule¶
One can define the shape of a molecule based on the covariance of the atomic coordinates with respect to their geometric center. Some shape categories were introduced by Zingg. The following program computes the shape category for the ibuprofen molecule
File: molmod/examples/001_molecules/e_shape.py
#!/usr/bin/env python
from __future__ import print_function
from molmod import *
from numpy import dot, sqrt
from numpy.linalg import eigvalsh
# 0) Load the molecule.
mol = Molecule.from_file("ibuprofen.sdf")
# 1) Compute the arithmetic center of the atomic coordinates.
center = mol.coordinates.mean(axis=0)
# Without the axis=0 argument, the average over all X, Y and Z components would
# be computed. Now it just computes the averages for each column. axis=1 would
# refer to averaging over rows.
# 2) Move the arithmetic center of the coordinates to the origin. The same
# comments from the b_com.py apply.
centered = mol.coordinates - mol.coordinates.mean(axis=0)
# 3) Compute the covariance matrix of the centered coordinates.
covar = dot(centered.transpose(), centered)/mol.size
# 4) Compute the eigenvalues of the symmetric covariance matrix.
evals = eigvalsh(covar)
# 5) The spread along the three eigenvectors is computed as the standard deviation
# of the atomic coordinates along that direction:
c, b, a = sqrt(evals)
print("Spread along the long axis [A]:", a/angstrom)
print("Spread along the intermediate axis [A]:", b/angstrom)
print("Spread along the short axis [A]:", c/angstrom)
# 6) Test in which category this shape belongs. The factor R is set to 1.5.
R = 1.5
if b < R*c:
if a < R*b:
shape = 'equant' # sphere-like
else:
shape = 'prolate' # sigar-like
else:
if a < R*b:
shape = 'oblate' # pancake-like
else:
shape = 'bladed' # keyboard-like
print("The shape category:", shape)
Problems¶
Write a program that makes a list of all (unique) pairs of Carbon atoms that have an inter-atomic distance below 2.0 Ångström.
Try to find for each shape category (equant, prolate, oblate and bladed) some molecules from PubChem. Just download a bunch of SDF files and use the glob module from Python to loop over all these files:
from glob import glob for fn in glob("*.sdf"): mol = Molecule.from_file(fn) # do something here
If you feel lucky, extend to program so that it downloads the first 100 molecules from the PubChem database.