Working with molecular graphs

Introduction

Molecular graphs are objects that describe the connectivity of atoms in a molecule. In graph speech, every atom is a vertex and every bond is an edge. Some background on Graph Theory may be helpful. The molmod.molecular_graphs.MolecularGraph class in MolMod can store atomic numbers associated with vertices and bond orders associated with edges. The class molmod.graphs.Graph is the base class that implements all the abstract graph theory without refering to atoms and bonds.

Examples

Edges

The first example shows how one creates a molecular graph.. The most important attribute of a graph object is edges. This is (in most cases) a list of bonds in the molecule, but more edges may be included if that would be useful.

File: molmod/examples/002_graphs/a_graphs.py

#!/usr/bin/env python

from __future__ import print_function

from molmod import *

# 0) Create a molecule object based on the XYZ file 'caffeine.xyz'.
mol = Molecule.from_file("caffeine.xyz")

# 1) Newly created molecule objects do not have a graph, unless bond information
# is present in the data file. XYZ files only contain atomic elements and
# the corresponding coordinates. Therefore, the following line will not raise
# an error:
assert(mol.graph is None)

# 2) A molecular graph object can be derived from the geometry using a few rules
# of thumb and a database of well-known bond lengths. Such a routine is
# implemented in MolMod. It works as follows:
mol.set_default_graph()
# There are also other ways to define graphs with more control over the rules of
# thumb that detect the bonded atom pairs, e.g.
#
#   mol.graph = MolecularGraph.from_geometry(scaling=1.5)
#
# will also detect the breaking bond in a transition state.

# 3) Print all edges, i.e. bonds in the graph. The edges list is ordered and
# each item is a frozenset with two elements to stress the undericted nature of
# the molecular graphs.
print("All edges (or bonds)")
print(mol.graph.edges)
print()
# Print the third bond.
print("The third bond:", mol.graph.edges[2])
# It is not possible to access only one of the two atom indexes of an edge. The
# following won't work because a frozenset is like unordered list.
#
#   print graph.edges[2][0]
#
# One can get both indexes of an edge at the same time:
i, j = mol.graph.edges[2]
print("The indexes of the second bond:", i, "and", j)
# It is not possible to know a priori which number is assigned to i and which to
# j.

# 4) Print all the atom indexes of all C-H bonds on screen. Note that counting
# starts from zero.
# The loop relies on the 'unpack' feature in Python. It is equivalent to
#
#   for edge in mol.graph.edges:
#       i, j = edge
#
# but is a little more compact.
for i, j in mol.graph.edges:
    if mol.numbers[i] == 6 and mol.numbers[j] == 1:
        print("C-H bond:", i, j)
    elif mol.numbers[j] == 6 and mol.numbers[i] == 1:
        print("C-H bond:", j, i)

From the edges many related properties can be derived. They are accessible as attributes of a graph object and are only constructed the first time they are used. A few examples, e.g. neighbors, distances, or symmetries.

Neighbors

The neighbors attribute is a dictionary that relates each vertex, i.e atom, in the graph with its direct neighbors. It can be used to get the direct chemical environment of an atom, or just to count the number of bonds.

File: molmod/examples/002_graphs/b_neighbors.py

#!/usr/bin/env python

from __future__ import print_function

from molmod import *

# 0) Create a molecule object based on the XYZ file 'caffeine.xyz'. Also
# initialize the graph.
mol = Molecule.from_file("caffeine.xyz")
mol.set_default_graph()
# The neighbors attribute is a dictionary with atom indexes as keys and a set
# of neighboring atom indexes as corresponding values. The dictionary will be
# constructed as soon as it is first accessed, e.g.
print("All neighbors:")
print(mol.graph.neighbors)
print()

# 1) Print the atomic number of the third atom and the atomic numbers of its
# neighbors.
print("Symbol of third atom (should be oxygen):", mol.symbols[2])
print("The number of bonds to the third atom:", len(mol.graph.neighbors[2]))
print("Indexes of neigbors of third atom:", mol.graph.neighbors[2])
print("Symbols of neigbors of third atom:", [mol.symbols[i] for i in mol.graph.neighbors[2]])

# 2) Look up all methyl groups.
for i, ns in mol.graph.neighbors.items():
    if mol.numbers[i] == 6 and len(ns) == 4:
        # Get the indexes of the hydrogen atoms.
        h_indexes = [n for n in ns if mol.numbers[n]==1]
        # We must have three hydrogens
        if len(h_indexes) != 3:
            continue
        # Print stuff
        print("C=%i H=%i H=%i H=%i" % (i, h_indexes[0], h_indexes[1], h_indexes[2]))

Distances

The distances attribute is a square matrix with a row and a column for every vertex. Each element contains the (minimal) number of bonds between two atoms. The matrix is constructed with the Floyd-Warshall algorithm.

File: molmod/examples/002_graphs/c_distances.py

#!/usr/bin/env python

from __future__ import print_function

from molmod import *

# 0) Create a molecule object based on the XYZ file 'caffeine.xyz'. Also
# initialize the graph.
mol = Molecule.from_file("caffeine.xyz")
mol.set_default_graph()
# The distances attribute is a square matrix of integers with a row and column
# for every atom.
print("Number of atoms:", mol.size)
print("Shape of the distances array:", mol.graph.distances.shape)

# 1) The distances array can be used to get the 'minimal' number of bonds
# between two atoms. E.g. for two atoms in a five_membered ring, this is at most
# two:
print("Distance between atom 5 and 6 (part of a 5-ring):", mol.graph.distances[5,6])

# 2) The matrix can also be used to construct a mask of atom_pairs that are
# separated by at most three bonds.
mask = (mol.graph.distances <= 3) & (mol.graph.distances != 0)
# For a nice print output, the mask is converted to integers.
print("Mask of atom pairs that are typically involved in valence interactions:")
print(mask.astype(int))

Symmetries

All isomorphisms of a molecular graph can be requested. For molecular graphs, only those isomorphisms are considered that do not alter the atomic numbers.

File: molmod/examples/002_graphs/d_symmetries.py

#!/usr/bin/env python

from __future__ import print_function

from molmod import *

# 0) Create a molecule object based on the XYZ file 'ethanol.xyz'. Also
# initialize the graph.
mol = Molecule.from_file("ethanol.xyz")
mol.set_default_graph()

# 1) Print out all the isomorphisms. For the ethanol molecule, this comes down
# to all possible ways to rotate/mirror the methyl moieties, and combination
# thereof.
print("Isomorphisms in the form of one-to-one mappings")
for symmetry in mol.graph.symmetries:
    print(symmetry)

# 2) One can also request isomorphisms in the form of permutations, which is
# often more convenient:
print("Isomorphisms in the form of permutations")
for cycle in mol.graph.symmetry_cycles:
    print(cycle)

Problems

  • Write a program that prints the atom indexes of all the C-O bonds in the caffeine molecule.
  • Write a program that prints the atom indexes of all the Nitrogen atoms that are not bonded to a methyl group.
  • Write a program that finds all Nitrogen atoms in the caffeine that are at least three bonds away from an Oxygen atom.
  • Download the benzene molecule from Pubchem and write a program that prints out all the graph isomorphisms as permutations. The SDF files from pubchem already contain the bond graph, so there is no need for mol.set_default_graph() after loading the molecule. How is the number of isomorphisms related to the rotational symmetry number of benzene?